upsert¶

upsert(query_or_table_or_view, destination_table, insertion_mode, primary_keys, recency_field)

Description¶

Merges query_or_table_or_view into the destination_table.

A record is identified by its primary_keys. A unique combination of those fields is a unique record. Before the merging operation, the records are identified and deduplicated according to the primary_keys. If recency_field is filled then the last record version is kept else it is chosen arbitrarily.

Param	Possible values
`query_or_table_or_view`	Can be a fully qualified table or view `(<project-id>.)?<dataset_id>.<table_or_view_name>`. Can also be a plain query in BigQuery Standard SQL.
`destination_table`	Must be a fully qualified table `(<project-id>.)?<dataset_id>.<table_or_view_name>`.
`insertion_mode`	Three insertion mode are available: `"insert_only"`: existing records in `query_or_table_or_view` and not existing in `destination_table` are inserted. Deletion and update are not possible. `"delta"`: same as `insert_only` with the updatable records. Records existing both in `query_or_table_or_view` and in `destination_table` are updated. If `recency_field` is filled, only the most recent version from source and destination is kept. `"full"`: same as `delta` with the deletable records. Records not existing in `query_or_table_or_view` and existing in `destination_table` are deleted.
`primary_keys`	Combination of field identifying a record. If `primary_keys = []`, every row will be considered as a unique record.
`recency_field`	Orderable field (ie. `timestamp`, `integer`, ...) to identify the relative frechness of a record version.

Usage¶

Call or Deploy upsert ?

Call upsert directly

The easiest way to use bigfunctions

upsert function is deployed in 39 public datasets for all of the 39 BigQuery regions.
It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
(You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region	Dataset
`eu`	`bigfunctions.eu`
`us`	`bigfunctions.us`
`europe-west1`	`bigfunctions.europe_west1`
`asia-east1`	`bigfunctions.asia_east1`
...	...

Deploy upsert in your project

Why deploy?

You may prefer to deploy upsert in your own project to build and manage your own catalog of functions.
This is particularly useful if you want to create private functions (for example calling your internal APIs).
Get started by reading the framework page

Deployment

upsert function can be deployed with:

pip install bigfunctions
bigfun get upsert
bigfun deploy upsert

Examples¶

1. Merge tables in delta mode

EUUSeurope-west1

call bigfunctions.eu.upsert("dataset_id.source_table_or_view", "dataset_id.destination_table", "delta", ['id'], "timestamp_field");

call bigfunctions.us.upsert("dataset_id.source_table_or_view", "dataset_id.destination_table", "delta", ['id'], "timestamp_field");

call bigfunctions.europe_west1.upsert("dataset_id.source_table_or_view", "dataset_id.destination_table", "delta", ['id'], "timestamp_field");

2. Merge from query in full

EUUSeurope-west1

call bigfunctions.eu.upsert("select * from dataset_id.source_table_or_view where filter_field = true", "dataset_id.destination_table", "full", ['id'], null);

call bigfunctions.us.upsert("select * from dataset_id.source_table_or_view where filter_field = true", "dataset_id.destination_table", "full", ['id'], null);

call bigfunctions.europe_west1.upsert("select * from dataset_id.source_table_or_view where filter_field = true", "dataset_id.destination_table", "full", ['id'], null);

Use cases¶

Let's illustrate the upsert function with a concrete use case: managing a product catalog in BigQuery.

Scenario: You have a BigQuery table called product_catalog that stores information about your products. You receive regular updates about product information from various sources, and you need to efficiently update your product_catalog table with these changes.

Table Schema (product_catalog):

product_id (STRING): Unique identifier for each product (primary key)
name (STRING): Product name
price (NUMERIC): Product price
description (STRING): Product description
last_updated (TIMESTAMP): Timestamp indicating the last update time

Update Data: You receive a new batch of product updates in another table or as the result of a query. This data may contain new products, updates to existing products, or even information about products that need to be removed.

Use Case Examples:

1. Delta Update (Insert and Update):

You want to insert new products and update existing ones based on the latest information. You use the delta insertion mode and the last_updated field to determine the most recent record.

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.product_updates',  -- Source table with updates
    'dataset_id.product_catalog', -- Destination table
    'delta',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

Insert: Any new products (based on product_id) found in product_updates that are not present in product_catalog.
Update: For products with matching product_id in both tables, the values in product_catalog will be updated with the values from product_updates if the last_updated timestamp in product_updates is more recent.

2. Full Merge (Insert, Update, and Delete):

You want to perform a complete synchronization of your product catalog. This means inserting new products, updating existing ones, and deleting products that are no longer present in the source data. You use the full insertion mode.

CALL bigfunctions.<your-region>.upsert(
    -- Query that selects active products from a larger dataset
    'SELECT * FROM dataset_id.all_products WHERE active = TRUE',
    'dataset_id.product_catalog', -- Destination table
    'full',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

Insert: New products.
Update: Existing products with more recent data.
Delete: Products present in product_catalog but not returned by the source query (meaning they are no longer active).

3. Insert Only:

If you only want to insert new products without updating existing ones:

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.new_products',    -- Source table with new products
    'dataset_id.product_catalog', -- Destination table
    'insert_only',                -- Insertion mode
    ['product_id'],              -- Primary key
    NULL                         -- No recency field needed for insert only
);

These examples demonstrate how the upsert function simplifies the process of merging data into a BigQuery table, handling various update scenarios with a single function call. Remember to replace <your-region> with the appropriate BigQuery region (e.g., us, eu, us-central1).

Need help or Found a bug?

Get help using upsert

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about upsert

If the function does not work as expected, please

report a bug so that it can be improved.
or open the discussion with the community on Slack.

We also provide professional suppport.

Show your by adding a on