Skip to content

bigfunctions > upsert

upsert

Call or Deploy upsert ?

✅ You can call this upsert bigfunction directly from your Google Cloud Project (no install required).

  • This upsert function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • For any question or difficulties, please read Getting Started.
  • Found a bug? Please raise an issue here

Public BigFunctions Datasets are like:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

upsert(query_or_table_or_view, destination_table, insertion_mode, primary_keys, recency_field)

Description

Merges query_or_table_or_view into the destination_table.

A record is identified by its primary_keys. A unique combination of those fields is a unique record. Before the merging operation, the records are identified and deduplicated according to the primary_keys. If recency_field is filled then the last record version is kept else it is chosen arbitrarily.

Param Possible values
query_or_table_or_view Can be a fully qualified table or view (<project-id>.)?<dataset_id>.<table_or_view_name>.
Can also be a plain query in BigQuery Standard SQL.
destination_table Must be a fully qualified table (<project-id>.)?<dataset_id>.<table_or_view_name>.
insertion_mode Three insertion mode are available:
  • "insert_only": existing records in query_or_table_or_view and not existing in destination_table are inserted. Deletion and update are not possible.
  • "delta": same as insert_only with the updatable records. Records existing both in query_or_table_or_view and in destination_table are updated. If recency_field is filled, only the most recent version from source and destination is kept.
  • "full": same as delta with the deletable records. Records not existing in query_or_table_or_view and existing in destination_table are deleted.
primary_keys Combination of field identifying a record. If primary_keys = [], every row will be considered as a unique record.
recency_field Orderable field (ie. timestamp, integer, ...) to identify the relative frechness of a record version.

Examples

1. Merge tables in delta mode

call bigfunctions.eu.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.us.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');
call bigfunctions.europe_west1.upsert('dataset_id.source_table_or_view', 'dataset_id.destination_table', 'delta', ['id'], 'timestamp_field');

2. Merge from query in full

call bigfunctions.eu.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.us.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);
call bigfunctions.europe_west1.upsert('select * from dataset_id.source_table_or_view where filter_field = true', 'dataset_id.destination_table', 'full', ['id'], null);

Use cases

Let's illustrate the upsert function with a concrete use case: managing a product catalog in BigQuery.

Scenario: You have a BigQuery table called product_catalog that stores information about your products. You receive regular updates about product information from various sources, and you need to efficiently update your product_catalog table with these changes.

Table Schema (product_catalog):

  • product_id (STRING): Unique identifier for each product (primary key)
  • name (STRING): Product name
  • price (NUMERIC): Product price
  • description (STRING): Product description
  • last_updated (TIMESTAMP): Timestamp indicating the last update time

Update Data: You receive a new batch of product updates in another table or as the result of a query. This data may contain new products, updates to existing products, or even information about products that need to be removed.

Use Case Examples:

1. Delta Update (Insert and Update):

You want to insert new products and update existing ones based on the latest information. You use the delta insertion mode and the last_updated field to determine the most recent record.

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.product_updates',  -- Source table with updates
    'dataset_id.product_catalog', -- Destination table
    'delta',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

  • Insert: Any new products (based on product_id) found in product_updates that are not present in product_catalog.
  • Update: For products with matching product_id in both tables, the values in product_catalog will be updated with the values from product_updates if the last_updated timestamp in product_updates is more recent.

2. Full Merge (Insert, Update, and Delete):

You want to perform a complete synchronization of your product catalog. This means inserting new products, updating existing ones, and deleting products that are no longer present in the source data. You use the full insertion mode.

CALL bigfunctions.<your-region>.upsert(
    -- Query that selects active products from a larger dataset
    'SELECT * FROM dataset_id.all_products WHERE active = TRUE',
    'dataset_id.product_catalog', -- Destination table
    'full',                      -- Insertion mode
    ['product_id'],              -- Primary key
    'last_updated'               -- Recency field
);

This will:

  • Insert: New products.
  • Update: Existing products with more recent data.
  • Delete: Products present in product_catalog but not returned by the source query (meaning they are no longer active).

3. Insert Only:

If you only want to insert new products without updating existing ones:

CALL bigfunctions.<your-region>.upsert(
    'dataset_id.new_products',    -- Source table with new products
    'dataset_id.product_catalog', -- Destination table
    'insert_only',                -- Insertion mode
    ['product_id'],              -- Primary key
    NULL                         -- No recency field needed for insert only
);

These examples demonstrate how the upsert function simplifies the process of merging data into a BigQuery table, handling various update scenarios with a single function call. Remember to replace <your-region> with the appropriate BigQuery region (e.g., us, eu, us-central1).