Skip to content

bigfunctions > load_api_data

load_api_data

Call or Deploy load_api_data ?

✅ You can call this load_api_data bigfunction directly from your Google Cloud Project (no install required).

  • This load_api_data function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

load_api_data(source, source_config, streams, destination_dataset)

Description

Load data from 250+ sources using Airbyte Python Connectors .

  • The function creates a temporary dataset only accessible to you in bigfunctions project.
  • Airbye Serverless will extract data from source (one of 250+ Airbyte Python Connectors available on PyPI) using source_config (source configuration in yaml format expected by Airbyte Serverless).
  • It will create one table per stream (a stream is like a resource type) in the dataset + one table _airbyte_logs for logs and one table _airbyte_states for states.
  • The data in then moved from the temporary dataset to be appended in destination_dataset. Tables of the temporary dataset are deleted.
  • If you call this function several times, the function will start by getting the latest state from destination_dataset._airbyte_states table to only extract and load new data.
  • Examples below explain how to set the arguments.

Examples

1. Show valid sources for source argument by setting source to null

You can then copy one of these sources for source argument.

call bigfunctions.eu.load_api_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data(null, null, null, null);
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| # AVAILABLE SOURCES                    |
|                                        |
| airbyte-source-activecampaign==0.1.10  |
| airbyte-source-adjust==0.1.11          |
| airbyte-source-aha==0.3.10             |
| ...                                    |
+----------------------------------------+


2. Show source_config sample at expected format by setting source_config to null.

You can then copy the result, modify it and provide it as source_config argument.

call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;

+----------------------------------------------------+
|                 result                             |
+----------------------------------------------------+
| # SOURCE CONFIG
|
| dataset_name: # REQUIRED | string | The Name of... |
| format: "csv" # REQUIRED | string | The Format ... |
| reader_options: # OPTIONAL | string | This shou... |
| url: # REQUIRED | string | The URL path to acce... |
| provider:                                          |
|   ## -------- Pick one valid structure among th... |
|   storage: "HTTPS" # REQUIRED | string             |
|   user_agent: # OPTIONAL | boolean | Add User-A... |
| ...                                                |
+----------------------------------------------------+


3. Encrypt Secrets!

You usually have to write secrets in source_config such as api keys.

We strongly advise NOT TO write your secrets in source_config in plain text.

Otherwise, they will be stored in plain text in your BigQuery logs for months.

Instead, you can use the following snippet to generate an encrypted version of your secrets that you can safely copy in source_config as shown in example below. This public bigfunction (deployed on bigfunctions GCP project) will be able to decrypt them. But no one else can.

call bigfunctions.eu.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;

...

4. Show available streams by setting streams argument to null.

You can then copy one or several of these streams (separate them with commas) for streams argument.

call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| # AVAILABLE STREAMS                    |
|                                        |
| my_stream                              |
+----------------------------------------+


5. Extract and load my_stream into your_project.your_dataset.

call bigfunctions.eu.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.us.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.europe_west1.load_api_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| ok                                     |
+----------------------------------------+


Need help using load_api_data?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using load_api_data?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

Let's say you want to analyze customer feedback from your Zendesk Support instance in BigQuery. You can use the load_api_data function to achieve this without manual data extraction and uploads.

1. Identify the Source and Explore Configuration:

  • Source: airbyte-source-zendesk-support==2.6.10 (or a later compatible version)

2. Generate Encrypted Secret for your Zendesk Access Token:

  • Follow the instructions in the documentation to encrypt your Zendesk access token. This ensures your credentials aren't exposed in logs. Let's assume the encrypted secret is ENCRYPTED_SECRET(your_encrypted_token).

3. Determine Available Streams:

  • Call load_api_data with streams set to null to see what data Zendesk makes available:

call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(your_encrypted_token)
''', null, null);
select * from bigfunction_result;
* This will return a list of available streams, such as tickets, users, organizations, etc.

4. Select Desired Streams and Destination:

  • Decide which streams you need (e.g., tickets, users).
  • Choose your BigQuery destination dataset. For example: your_project.your_zendesk_data

5. Load the Data:

  • Call load_api_data with the correct parameters:
call bigfunctions.us.load_api_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(your_encrypted_token)
''', 'tickets,users', 'your_project.your_zendesk_data');
select * from bigfunction_result;

This will:

  • Create temporary tables within the bigfunctions project.
  • Extract data from the tickets and users streams in Zendesk.
  • Load the extracted data into the temporary tables.
  • Move the data from the temporary tables to your specified destination dataset (your_project.your_zendesk_data).
  • Clean up the temporary tables and resources.

Result: You now have Zendesk ticket and user data in your BigQuery dataset, ready for analysis. Subsequent calls will incrementally load new or updated data based on the state saved in the _airbyte_states table.

Key Improvements over other Methods:

  • Simplified Data Integration: No need to build custom ETL pipelines or manage infrastructure.
  • Wide Connector Support: Access data from 250+ sources through Airbyte.
  • Incremental Loads: Avoids redundant data processing by loading only new or changed data.
  • Secure Credential Handling: Encryption protects sensitive information.
  • Serverless: Leverages BigQuery's serverless architecture for scalability and cost-efficiency.

This example showcases how load_api_data streamlines data ingestion from external APIs into BigQuery, enabling efficient data analysis and reporting. You can adapt this approach to integrate data from various other sources supported by Airbyte connectors.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on