Skip to content

load_saas_data

load_saas_data(source, source_config, streams, destination_dataset)

Description

Load SAAS data from 250+ sources using Airbyte Python Connectors .

  • The function creates a temporary dataset only accessible to you in bigfunctions project.
  • Airbye Serverless will extract data from source (one of 250+ Airbyte Python Connectors available on PyPI) using source_config (source configuration in yaml format expected by Airbyte Serverless).
  • It will create one table per stream (a stream is like a resource type) in the dataset + one table _airbyte_logs for logs and one table _airbyte_states for states.
  • The data in then moved from the temporary dataset to be appended in destination_dataset. Tables of the temporary dataset are deleted.
  • If you call this function several times, the function will start by getting the latest state from destination_dataset._airbyte_states table to only extract and load new data.
  • Examples below explain how to set the arguments.

Encrypt your secrets! ⚠️

Do NOT write secrets in plain text in your SQL queries!

Otherwise, anyone with access to your BigQuery logs can read them.

Instead, generate an encrypted version of your secret that you can safely share.

Enter a secret value to encrypt below along with the emails of the users who are authorized to use it. It will generate an encrypted version that you can paste into the arguments of your function (exactly like if you passed the plain text version). If a user, who is not in the auhorized users list, tries to use the encrypted version, the function will raise a permission error. Besides, the encrypted version can only be used with this function load_saas_data.

Encrypt a secret

How secret encryption works

Technically, this encryption system uses the same encryption mechanism used to transfer data over the internet. It uses a pair of a public and private keys.

The public key (contained in this web page) is used to encrypt a text. The corresponding private key is the only one who is able to decrypt the text. The private key is stored in a secret manager and is only accessible to this function. Thus, this function (and this function only) can decrypt it.

Moreover, the function will check that the caller of the function belong to the kist of authorized users that you gave at encryption time.

Thanks to this:

  • Nobody but this function will be able to decrypt it.
  • Nobody but authorized users can use the encrypted version in a function.
  • No function but the function load_saas_data can decrypt it.

Examples

Call or Deploy load_saas_data ?
Call load_saas_data directly

The easiest way to use bigfunctions

  • load_saas_data function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy load_saas_data in your project

Why deploy?

  • You may prefer to deploy load_saas_data in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

load_saas_data function can be deployed with:

pip install bigfunctions
bigfun get load_saas_data
bigfun deploy load_saas_data

1. Show valid sources for source argument by setting source to null

You can then copy one of these sources for source argument.

call bigfunctions.eu.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data(null, null, null, null);
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| # AVAILABLE SOURCES                    |
|                                        |
| airbyte-source-activecampaign==0.1.10  |
| airbyte-source-adjust==0.1.11          |
| airbyte-source-aha==0.3.10             |
| ...                                    |
+----------------------------------------+


2. Show source_config sample at expected format by setting source_config to null.

You can then copy the result, modify it and provide it as source_config argument.

call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;

+----------------------------------------------------+
|                 result                             |
+----------------------------------------------------+
| # SOURCE CONFIG
|
| dataset_name: # REQUIRED | string | The Name of... |
| format: "csv" # REQUIRED | string | The Format ... |
| reader_options: # OPTIONAL | string | This shou... |
| url: # REQUIRED | string | The URL path to acce... |
| provider:                                          |
|   ## -------- Pick one valid structure among th... |
|   storage: "HTTPS" # REQUIRED | string             |
|   user_agent: # OPTIONAL | boolean | Add User-A... |
| ...                                                |
+----------------------------------------------------+


3. Provide source_config with secrets encrypted:

call bigfunctions.eu.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null);
select * from bigfunction_result;

...

4. Show available streams by setting streams argument to null.

You can then copy one or several of these streams (separate them with commas) for streams argument.

call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null);
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| # AVAILABLE STREAMS                    |
|                                        |
| my_stream                              |
+----------------------------------------+


5. Extract and load my_stream into your_project.your_dataset.

call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;

+----------------------------------------+
|                 result                 |
+----------------------------------------+
| ok                                     |
+----------------------------------------+


Need help or Found a bug using load_saas_data?
Get help using load_saas_data

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about load_saas_data

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.

Spread the word!

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on