Skip to content

bigfunctions > load_api_data_into_temp_dataset

load_api_data_into_temp_dataset

Call or Deploy load_api_data_into_temp_dataset ?

✅ You can call this load_api_data_into_temp_dataset bigfunction directly from your Google Cloud Project (no install required).

  • This load_api_data_into_temp_dataset function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

load_api_data_into_temp_dataset(source, source_config, streams, state)

Description

Load data from 250+ sources using Airbyte Python Connectors .

  • The function creates a temporary dataset only accessible to you in bigfunctions project.
  • Airbye Serverless will extract data from source (one of 250+ Airbyte Python Connectors available on PyPI) using source_config (source configuration in yaml format expected by Airbyte Serverless).
  • It will create one table per stream (a stream is like a resource type) in the dataset + one table _airbyte_logs for logs and one table _airbyte_states for states.
  • If you provide a state, only new data from that state is loaded.
  • While running, connector logs are appended in table _airbyte_logs.
  • Examples below explain how to set the arguments.

Examples

1. Show valid sources for source argument by setting source to null

You can then copy one of these sources for source argument.

select bigfunctions.eu.load_api_data_into_temp_dataset(null, null, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset(null, null, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset(null, null, null, null)
+--------------------------------------------------------------------------------------------------------------------------+
| destination_dataset                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------+
| # AVAILABLE SOURCES

airbyte-source-activecampaign==0.1.10
airbyte-source-adjust==0.1.11
airbyte-source-aha==0.3.10
...
 |
+--------------------------------------------------------------------------------------------------------------------------+

2. Show source_config sample at expected format by setting source_config to null.

You can then copy the result, modify it and provide it as source_config argument.

select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| destination_dataset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # SOURCE CONFIG

dataset_name: # REQUIRED | string | The Name of the final table to replicate this file into (should include letters, numbers dash and underscores only).
format: "csv" # REQUIRED | string | The Format of the file which should be replicated (Warning: some formats may be experimental, please refer to the docs).
reader_options: # OPTIONAL | string | This should be a string in JSON format. It depends on the chosen file format to provide additional options and tune its behavior. | Examples: {}, {"sep": " "}, {"sep": " ", "header": 0, "names": ["column1", "column2"] }
url: # REQUIRED | string | The URL path to access the file which should be replicated. | Examples: https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv, gs://my-google-bucket/data.csv, s3://gdelt-open-data/events/20190914.export.csv
provider:
  ## -------- Pick one valid structure among the examples below: --------
  storage: "HTTPS" # REQUIRED | string
  user_agent: # OPTIONAL | boolean | Add User-Agent to request
  ...
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3. Encrypt Secrets!

You usually have to write secrets in source_config such as api keys.

We strongly advise NOT TO write your secrets in source_config in plain text.

Otherwise, they will be stored in plain text in your BigQuery logs for months.

Instead, you can use the following snippet to generate an encrypted version of your secrets that you can safely copy in source_config as shown in example below. This public bigfunction (deployed on bigfunctions GCP project) will be able to decrypt them. But no one else can.

select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
+---------------------+
| destination_dataset |
+---------------------+
| ...                 |
+---------------------+

4. Show available streams by setting streams argument to null.

You can then copy one or several of these streams (separate them with commas) for streams argument.

select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
+---------------------------------+
| destination_dataset             |
+---------------------------------+
| # AVAILABLE STREAMS

my_stream
 |
+---------------------------------+

5. Extract and load my_stream into the temporary dataset.

select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
+------------------------------------+
| destination_dataset                |
+------------------------------------+
| bigfunctions.temp__dkzodskslfdkdl` |
+------------------------------------+

6. Provide a state to load only new data (since this state)

select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
+---------------------+
| destination_dataset |
+---------------------+
| ...                 |
+---------------------+

Need help using load_api_data_into_temp_dataset?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using load_api_data_into_temp_dataset?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

Let's say you're a data analyst working for an e-commerce company and you want to analyze customer feedback from your Zendesk Support instance. Here's how load_api_data_into_temp_dataset could help:

  1. Discover Available Connectors and Configuration:

You start by checking if a Zendesk connector exists and what configuration parameters it requires:

SELECT bigfunctions.us.load_api_data_into_temp_dataset(null, null, null, null);

This would list all available Airbyte connectors, including (hopefully) airbyte-source-zendesk-support. Then, you'd run:

SELECT bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', null, null, null);  -- Replace with actual version

This provides a sample source_config YAML showing the required fields like credentials.access_token.

  1. Encrypt your Zendesk API Token:

Use the provided code snippet to encrypt your Zendesk access token. This crucial step protects your sensitive information. Replace kdoekdswlxzapdldpzlfpfd in the example with your actual encrypted token.

  1. Load Zendesk Data to a Temporary Dataset:

Now, load data from the 'tickets' stream (assuming you are interested in support tickets) into a temporary BigQuery dataset:

SELECT bigfunctions.us.load_api_data_into_temp_dataset(
    'airbyte-source-zendesk-support==2.6.10',  -- Replace with actual version
    '''
      credentials:
        access_token: ENCRYPTED_SECRET(YOUR_ENCRYPTED_TOKEN)
      start_date: '2023-01-01T00:00:00Z' -- Optional: Pull data from a specific date
    ''',
    'tickets', -- Specify the 'tickets' stream
    null  -- Initial load, no state provided
);
This creates a temporary dataset (the name is returned by the function) containing a table named tickets with your Zendesk ticket data, as well as _airbyte_logs and _airbyte_states tables.

  1. Incremental Loads:

After the initial load, you can perform incremental updates by retrieving the latest state from the _airbyte_states table and using it in subsequent calls. This ensures you only pull new or updated ticket data. Example:

SELECT state FROM `YOUR_TEMP_DATASET._airbyte_states` ORDER BY emitted_at DESC LIMIT 1; -- Get the latest state

-- Store the state in a variable (replace with the actual retrieved state)
DECLARE latest_state STRING DEFAULT '{"tickets": {"cutoff_time": "2023-10-27T12:00:00Z"}}';


SELECT bigfunctions.us.load_api_data_into_temp_dataset(
    'airbyte-source-zendesk-support==2.6.10', -- Replace with actual version
    '''
      credentials:
        access_token: ENCRYPTED_SECRET(YOUR_ENCRYPTED_TOKEN)
    ''',
    'tickets',
    latest_state
);
  1. Analyze Data:

Finally, query the temporary dataset to analyze your Zendesk ticket data directly within BigQuery.

This use case demonstrates how load_api_data_into_temp_dataset simplifies data ingestion from external APIs like Zendesk into BigQuery, while prioritizing security and enabling incremental updates. This approach can be applied to other data sources supported by Airbyte connectors.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on