bigfunctions > load_api_data_into_temp_dataset
load_api_data_into_temp_dataset¶
Call or Deploy load_api_data_into_temp_dataset
?
✅ You can call this load_api_data_into_temp_dataset
bigfunction directly from your Google Cloud Project (no install required).
- This
load_api_data_into_temp_dataset
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
load_api_data_into_temp_dataset(source, source_config, streams, state)
Description
Load data from 250+ sources using Airbyte Python Connectors .
- The function creates a temporary dataset only accessible to you in
bigfunctions
project. - Airbye Serverless will extract data from
source
(one of 250+ Airbyte Python Connectors available on PyPI) usingsource_config
(source configuration in yaml format expected by Airbyte Serverless). - It will create one table per stream (a stream is like a resource type) in the dataset + one table
_airbyte_logs
for logs and one table_airbyte_states
for states. - If you provide a
state
, only new data from that state is loaded. - While running, connector logs are appended in table
_airbyte_logs
. - Examples below explain how to set the arguments.
Examples¶
1. Show valid sources for source
argument by setting source
to null
You can then copy one of these sources for source
argument.
select bigfunctions.eu.load_api_data_into_temp_dataset(null, null, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset(null, null, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset(null, null, null, null)
+--------------------------------------------------------------------------------------------------------------------------+
| destination_dataset |
+--------------------------------------------------------------------------------------------------------------------------+
| # AVAILABLE SOURCES
airbyte-source-activecampaign==0.1.10
airbyte-source-adjust==0.1.11
airbyte-source-aha==0.3.10
...
|
+--------------------------------------------------------------------------------------------------------------------------+
2. Show source_config
sample at expected format by setting source_config
to null
.
You can then copy the result, modify it and provide it as source_config
argument.
select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| destination_dataset |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # SOURCE CONFIG
dataset_name: # REQUIRED | string | The Name of the final table to replicate this file into (should include letters, numbers dash and underscores only).
format: "csv" # REQUIRED | string | The Format of the file which should be replicated (Warning: some formats may be experimental, please refer to the docs).
reader_options: # OPTIONAL | string | This should be a string in JSON format. It depends on the chosen file format to provide additional options and tune its behavior. | Examples: {}, {"sep": " "}, {"sep": " ", "header": 0, "names": ["column1", "column2"] }
url: # REQUIRED | string | The URL path to access the file which should be replicated. | Examples: https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv, gs://my-google-bucket/data.csv, s3://gdelt-open-data/events/20190914.export.csv
provider:
## -------- Pick one valid structure among the examples below: --------
storage: "HTTPS" # REQUIRED | string
user_agent: # OPTIONAL | boolean | Add User-Agent to request
...
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
3. Encrypt Secrets!
You usually have to write secrets in source_config
such as api keys.
We strongly advise NOT TO write your secrets in source_config
in plain text.
Otherwise, they will be stored in plain text in your BigQuery logs for months.
Instead, you can use the following snippet to generate an encrypted version of your secrets
that you can safely copy in source_config
as shown in example below.
This public bigfunction (deployed on bigfunctions GCP project) will be able to decrypt them.
But no one else can.
select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null)
+---------------------+
| destination_dataset |
+---------------------+
| ... |
+---------------------+
4. Show available streams by setting streams
argument to null
.
You can then copy one or several of these streams (separate them with commas) for streams
argument.
select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, null, null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, null, null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, null, null)
+---------------------------------+
| destination_dataset |
+---------------------------------+
| # AVAILABLE STREAMS
my_stream
|
+---------------------------------+
5. Extract and load my_stream
into the temporary dataset.
select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', null)
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', null)
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', null)
+------------------------------------+
| destination_dataset |
+------------------------------------+
| bigfunctions.temp__dkzodskslfdkdl` |
+------------------------------------+
6. Provide a state to load only new data (since this state)
select bigfunctions.eu.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, tickets, {...})
select bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, tickets, {...})
select bigfunctions.europe_west1.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, tickets, {...})
+---------------------+
| destination_dataset |
+---------------------+
| ... |
+---------------------+
Need help using load_api_data_into_temp_dataset
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using load_api_data_into_temp_dataset
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
Let's say you're a data analyst working for an e-commerce company and you want to analyze customer feedback from your Zendesk Support instance. Here's how load_api_data_into_temp_dataset
could help:
- Discover Available Connectors and Configuration:
You start by checking if a Zendesk connector exists and what configuration parameters it requires:
SELECT bigfunctions.us.load_api_data_into_temp_dataset(null, null, null, null);
This would list all available Airbyte connectors, including (hopefully) airbyte-source-zendesk-support
. Then, you'd run:
SELECT bigfunctions.us.load_api_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', null, null, null); -- Replace with actual version
This provides a sample source_config
YAML showing the required fields like credentials.access_token
.
- Encrypt your Zendesk API Token:
Use the provided code snippet to encrypt your Zendesk access token. This crucial step protects your sensitive information. Replace kdoekdswlxzapdldpzlfpfd
in the example with your actual encrypted token.
- Load Zendesk Data to a Temporary Dataset:
Now, load data from the 'tickets' stream (assuming you are interested in support tickets) into a temporary BigQuery dataset:
SELECT bigfunctions.us.load_api_data_into_temp_dataset(
'airbyte-source-zendesk-support==2.6.10', -- Replace with actual version
'''
credentials:
access_token: ENCRYPTED_SECRET(YOUR_ENCRYPTED_TOKEN)
start_date: '2023-01-01T00:00:00Z' -- Optional: Pull data from a specific date
''',
'tickets', -- Specify the 'tickets' stream
null -- Initial load, no state provided
);
tickets
with your Zendesk ticket data, as well as _airbyte_logs
and _airbyte_states
tables.
- Incremental Loads:
After the initial load, you can perform incremental updates by retrieving the latest state from the _airbyte_states
table and using it in subsequent calls. This ensures you only pull new or updated ticket data. Example:
SELECT state FROM `YOUR_TEMP_DATASET._airbyte_states` ORDER BY emitted_at DESC LIMIT 1; -- Get the latest state
-- Store the state in a variable (replace with the actual retrieved state)
DECLARE latest_state STRING DEFAULT '{"tickets": {"cutoff_time": "2023-10-27T12:00:00Z"}}';
SELECT bigfunctions.us.load_api_data_into_temp_dataset(
'airbyte-source-zendesk-support==2.6.10', -- Replace with actual version
'''
credentials:
access_token: ENCRYPTED_SECRET(YOUR_ENCRYPTED_TOKEN)
''',
'tickets',
latest_state
);
- Analyze Data:
Finally, query the temporary dataset to analyze your Zendesk ticket data directly within BigQuery.
This use case demonstrates how load_api_data_into_temp_dataset
simplifies data ingestion from external APIs like Zendesk into BigQuery, while prioritizing security and enabling incremental updates. This approach can be applied to other data sources supported by Airbyte connectors.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!