bigfunctions > upsert
upsert¶
Call or Deploy upsert
?
✅ You can call this upsert
bigfunction directly from your Google Cloud Project (no install required).
- This
upsert
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
- For any question or difficulties, please read Getting Started.
- Found a bug? Please raise an issue here
Public BigFunctions Datasets are like:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
upsert(query_or_table_or_view, destination_table, insertion_mode, primary_keys, recency_field)
Description
Merges query_or_table_or_view
into the destination_table
.
A record is identified by its primary_keys
. A unique combination of those fields is a unique record.
Before the merging operation, the records are identified and deduplicated according to the primary_keys
.
If recency_field
is filled then the last record version is kept else it is chosen arbitrarily.
Param | Possible values |
---|---|
query_or_table_or_view |
Can be a fully qualified table or view (<project-id>.)?<dataset_id>.<table_or_view_name> . Can also be a plain query in BigQuery Standard SQL. |
destination_table |
Must be a fully qualified table (<project-id>.)?<dataset_id>.<table_or_view_name> . |
insertion_mode |
Three insertion mode are available:
|
primary_keys |
Combination of field identifying a record. If primary_keys = [] , every row will be considered as a unique record. |
recency_field |
Orderable field (ie. timestamp , integer , ...) to identify the relative frechness of a record version. |
Examples¶
1. Merge tables in delta mode
call bigfunctions.eu.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.us.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.europe_west1.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
2. Merge from query in full
call bigfunctions.eu.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.us.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.europe_west1.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
Use cases¶
Let's illustrate the upsert
function with a concrete use case: managing a product catalog in BigQuery.
Scenario: You have a BigQuery table called product_catalog
that stores information about your products. You receive regular updates about product information from various sources, and you need to efficiently update your product_catalog
table with these changes.
Table Schema (product_catalog):
product_id
(STRING): Unique identifier for each product (primary key)name
(STRING): Product nameprice
(NUMERIC): Product pricedescription
(STRING): Product descriptionlast_updated
(TIMESTAMP): Timestamp indicating the last update time
Update Data: You receive a new batch of product updates in another table or as the result of a query. This data may contain new products, updates to existing products, or even information about products that need to be removed.
Use Case Examples:
1. Delta Update (Insert and Update):
You want to insert new products and update existing ones based on the latest information. You use the delta
insertion mode and the last_updated
field to determine the most recent record.
CALL bigfunctions.<your-region>.upsert(
'dataset_id.product_updates', -- Source table with updates
'dataset_id.product_catalog', -- Destination table
'delta', -- Insertion mode
['product_id'], -- Primary key
'last_updated' -- Recency field
);
This will:
- Insert: Any new products (based on
product_id
) found inproduct_updates
that are not present inproduct_catalog
. - Update: For products with matching
product_id
in both tables, the values inproduct_catalog
will be updated with the values fromproduct_updates
if thelast_updated
timestamp inproduct_updates
is more recent.
2. Full Merge (Insert, Update, and Delete):
You want to perform a complete synchronization of your product catalog. This means inserting new products, updating existing ones, and deleting products that are no longer present in the source data. You use the full
insertion mode.
CALL bigfunctions.<your-region>.upsert(
-- Query that selects active products from a larger dataset
'SELECT * FROM dataset_id.all_products WHERE active = TRUE',
'dataset_id.product_catalog', -- Destination table
'full', -- Insertion mode
['product_id'], -- Primary key
'last_updated' -- Recency field
);
This will:
- Insert: New products.
- Update: Existing products with more recent data.
- Delete: Products present in
product_catalog
but not returned by the source query (meaning they are no longer active).
3. Insert Only:
If you only want to insert new products without updating existing ones:
CALL bigfunctions.<your-region>.upsert(
'dataset_id.new_products', -- Source table with new products
'dataset_id.product_catalog', -- Destination table
'insert_only', -- Insertion mode
['product_id'], -- Primary key
NULL -- No recency field needed for insert only
);
These examples demonstrate how the upsert
function simplifies the process of merging data into a BigQuery table, handling various update scenarios with a single function call. Remember to replace <your-region>
with the appropriate BigQuery region (e.g., us
, eu
, us-central1
).