Skip to content

upsert

upsert(query_or_table_or_view, destination_table, insertion_mode, primary_keys, recency_field)

Description

Merges query_or_table_or_view into the destination_table.

A record is identified by its primary_keys. A unique combination of those fields is a unique record. Before the merging operation, the records are identified and deduplicated according to the primary_keys. If recency_field is filled then the last record version is kept else it is chosen arbitrarily.

Param Possible values
query_or_table_or_view Can be a fully qualified table or view (<project-id>.)?<dataset_id>.<table_or_view_name>.
Can also be a plain query in BigQuery Standard SQL.
destination_table Must be a fully qualified table (<project-id>.)?<dataset_id>.<table_or_view_name>.
insertion_mode Three insertion mode are available:
  • "insert_only": existing records in query_or_table_or_view and not existing in destination_table are inserted. Deletion and update are not possible.
  • "delta": same as insert_only with the updatable records. Records existing both in query_or_table_or_view and in destination_table are updated. If recency_field is filled, only the most recent version from source and destination is kept.
  • "full": same as delta with the deletable records. Records not existing in query_or_table_or_view and existing in destination_table are deleted.
primary_keys Combination of field identifying a record. If primary_keys = [], every row will be considered as a unique record.
recency_field Orderable field (ie. timestamp, integer, ...) to identify the relative frechness of a record version.

Examples

Call or Deploy upsert ?
Call upsert directly

The easiest way to use bigfunctions

  • upsert function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy upsert in your project

Why deploy?

  • You may prefer to deploy upsert in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

upsert function can be deployed with:

pip install bigfunctions
bigfun get upsert
bigfun deploy upsert

1. Merge tables in delta mode

call bigfunctions.eu.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.us.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.europe_west1.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');

2. Merge from query in full

call bigfunctions.eu.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.us.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.europe_west1.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
Need help or Found a bug using upsert?
Get help using upsert

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about upsert

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.

Use cases

Let's illustrate the upsert function with a concrete use case: managing a product catalog in BigQuery.

Scenario: You have a BigQuery table called product_catalog that stores information about your products. You receive regular updates about product information from various sources, and you need to efficiently update your product_catalog table with these changes.

Table Schema (product_catalog):

  • product_id (STRING): Unique identifier for each product (primary key)
  • name (STRING): Product name
  • price (NUMERIC): Product price
  • description (STRING): Product description
  • last_updated (TIMESTAMP): Timestamp indicating the last update time

Update Data: You receive a new batch of product updates in another table or as the result of a query. This data may contain new products, updates to existing products, or even information about products that need to be removed.

Use Case Examples:

1. Delta Update (Insert and Update):

You want to insert new products and update existing ones based on the latest information. You use the delta insertion mode and the last_updated field to determine the most recent record.

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.product_updates',  -- Source table with updates
    'dataset_id.product_catalog', -- Destination table
    'delta',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

  • Insert: Any new products (based on product_id) found in product_updates that are not present in product_catalog.
  • Update: For products with matching product_id in both tables, the values in product_catalog will be updated with the values from product_updates if the last_updated timestamp in product_updates is more recent.

2. Full Merge (Insert, Update, and Delete):

You want to perform a complete synchronization of your product catalog. This means inserting new products, updating existing ones, and deleting products that are no longer present in the source data. You use the full insertion mode.

CALL bigfunctions.<your-region>.upsert(
    -- Query that selects active products from a larger dataset
    'SELECT * FROM dataset_id.all_products WHERE active = TRUE',
    'dataset_id.product_catalog', -- Destination table
    'full',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

  • Insert: New products.
  • Update: Existing products with more recent data.
  • Delete: Products present in product_catalog but not returned by the source query (meaning they are no longer active).

3. Insert Only:

If you only want to insert new products without updating existing ones:

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.new_products',    -- Source table with new products
    'dataset_id.product_catalog', -- Destination table
    'insert_only',                -- Insertion mode
    ['product_id'],              -- Primary key
    NULL                         -- No recency field needed for insert only
);

These examples demonstrate how the upsert function simplifies the process of merging data into a BigQuery table, handling various update scenarios with a single function call. Remember to replace <your-region> with the appropriate BigQuery region (e.g., us, eu, us-central1).

Spread the word!

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on