bigfunctions > load_api_data
load_api_data¶
Call or Deploy load_api_data
?
✅ You can call this load_api_data
bigfunction directly from your Google Cloud Project (no install required).
- This
load_api_data
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
load_api_data(source, source_config, streams, destination_dataset)
Description
Load data from 250+ sources using Airbyte Python Connectors .
- The function creates a temporary dataset only accessible to you in
bigfunctions
project. - Airbye Serverless will extract data from
source
(one of 250+ Airbyte Python Connectors available on PyPI) usingsource_config
(source configuration in yaml format expected by Airbyte Serverless). - It will create one table per stream (a stream is like a resource type) in the dataset + one table
_airbyte_logs
for logs and one table_airbyte_states
for states. - The data in then moved from the temporary dataset to be appended in
destination_dataset
. Tables of the temporary dataset are deleted. - If you call this function several times, the function will start by getting the latest state from
destination_dataset._airbyte_states
table to only extract and load new data. - Examples below explain how to set the arguments.
Examples¶
1. Show valid sources for source
argument by setting source
to null
You can then copy one of these sources for source
argument.
call bigfunctions.eu.load_api_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data(null, null, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE SOURCES |
| |
| airbyte-source-activecampaign==0.1.10 |
| airbyte-source-adjust==0.1.11 |
| airbyte-source-aha==0.3.10 |
| ... |
+----------------------------------------+
2. Show source_config
sample at expected format by setting source_config
to null
.
You can then copy the result, modify it and provide it as source_config
argument.
call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
+----------------------------------------------------+
| result |
+----------------------------------------------------+
| # SOURCE CONFIG
|
| dataset_name: # REQUIRED | string | The Name of... |
| format: "csv" # REQUIRED | string | The Format ... |
| reader_options: # OPTIONAL | string | This shou... |
| url: # REQUIRED | string | The URL path to acce... |
| provider: |
| ## -------- Pick one valid structure among th... |
| storage: "HTTPS" # REQUIRED | string |
| user_agent: # OPTIONAL | boolean | Add User-A... |
| ... |
+----------------------------------------------------+
3. Encrypt Secrets!
You usually have to write secrets in source_config
such as api keys.
We strongly advise NOT TO write your secrets in source_config
in plain text.
Otherwise, they will be stored in plain text in your BigQuery logs for months.
Instead, you can use the following snippet to generate an encrypted version of your secrets
that you can safely copy in source_config
as shown in example below.
This public bigfunction (deployed on bigfunctions GCP project) will be able to decrypt them.
But no one else can.
call bigfunctions.eu.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
...
4. Show available streams by setting streams
argument to null
.
You can then copy one or several of these streams (separate them with commas) for streams
argument.
call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE STREAMS |
| |
| my_stream |
+----------------------------------------+
5. Extract and load my_stream
into your_project.your_dataset
.
call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| ok |
+----------------------------------------+
Need help using load_api_data
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using load_api_data
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
Let's say you want to analyze customer feedback from your Zendesk Support instance in BigQuery. You can use the load_api_data
function to achieve this without manual data extraction and uploads.
1. Identify the Source and Explore Configuration:
- Source:
airbyte-source-zendesk-support==2.6.10
(or a later compatible version)
2. Generate Encrypted Secret for your Zendesk Access Token:
- Follow the instructions in the documentation to encrypt your Zendesk access token. This ensures your credentials aren't exposed in logs. Let's assume the encrypted secret is
ENCRYPTED_SECRET(your_encrypted_token)
.
3. Determine Available Streams:
- Call
load_api_data
withstreams
set tonull
to see what data Zendesk makes available:
call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(your_encrypted_token)
''', null, null);
select * from bigfunction_result;
tickets
, users
, organizations
, etc.
4. Select Desired Streams and Destination:
- Decide which streams you need (e.g.,
tickets
,users
). - Choose your BigQuery destination dataset. For example:
your_project.your_zendesk_data
5. Load the Data:
- Call
load_api_data
with the correct parameters:
call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(your_encrypted_token)
''', 'tickets,users', 'your_project.your_zendesk_data');
select * from bigfunction_result;
This will:
- Create temporary tables within the
bigfunctions
project. - Extract data from the
tickets
andusers
streams in Zendesk. - Load the extracted data into the temporary tables.
- Move the data from the temporary tables to your specified destination dataset (
your_project.your_zendesk_data
). - Clean up the temporary tables and resources.
Result: You now have Zendesk ticket and user data in your BigQuery dataset, ready for analysis. Subsequent calls will incrementally load new or updated data based on the state saved in the _airbyte_states
table.
Key Improvements over other Methods:
- Simplified Data Integration: No need to build custom ETL pipelines or manage infrastructure.
- Wide Connector Support: Access data from 250+ sources through Airbyte.
- Incremental Loads: Avoids redundant data processing by loading only new or changed data.
- Secure Credential Handling: Encryption protects sensitive information.
- Serverless: Leverages BigQuery's serverless architecture for scalability and cost-efficiency.
This example showcases how load_api_data
streamlines data ingestion from external APIs into BigQuery, enabling efficient data analysis and reporting. You can adapt this approach to integrate data from various other sources supported by Airbyte connectors.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!