load_saas_data¶
load_saas_data(source, source_config, streams, destination_dataset)
Description¶
Load SAAS data from 250+ sources using the Python Connectors of Airbyte using Airbye Serverless.
- The function creates a temporary dataset (only accessible to you) in
bigfunctions
project. - It extracts data from
source
usingsource_config
(source configuration in yaml format expected by Airbyte Serverless) into: - One table per
stream
(a stream is like a resource type), - A table for logs:
_airbyte_logs
- A table for states:
_airbyte_states
(to track where it stopped and enable incremental extraction). - The data in then moved from the temporary dataset to be appended in
destination_dataset
. - The temporary dataset is then deleted.
- If you call this function several times, the function starts by getting the latest state from
destination_dataset._airbyte_states
table to only extract and load new data.
Usage¶
Call or Deploy load_saas_data
?
Call load_saas_data
directly
The easiest way to use bigfunctions
load_saas_data
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy load_saas_data
in your project
Why deploy?
- You may prefer to deploy
load_saas_data
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
load_saas_data
function can be deployed with:
pip install bigfunctions
bigfun get load_saas_data
bigfun deploy load_saas_data
Keep the secrets safe!
Do NOT write secrets in plain text in your SQL queries!
Otherwise, anyone with access to your BigQuery logs can read and use them.
Instead, generate an encrypted version that you can safely share:
- Enter a secret value below along with the emails of the users who are authorized to use it (separated by commas).
- Click on
Encrypt Secret
. - The browser (no server is called) will generate an encrypted version and copy it in the clipboard
- Paste the encrypted secret into the arguments of your function exactly like if you passed the plain text version.
- The bigfunction will decrypt it and check that the calling user is authorized.
More on secret encryption
Technically, this encryption system uses the same encryption mechanism used to transfer data over the internet. It uses a pair of a public and private keys.
The public key (contained in this web page) is used to encrypt a text. The corresponding private key is the only one who is able to decrypt the text. The private key is stored in a secret manager and is only accessible to this function. Thus, this function (and this function only) can decrypt it.
Moreover, the function will check that the caller of the function belong to the list of authorized users
that you gave at encryption time.
Thanks to this:
- Nobody but this function will be able to decrypt it.
- Nobody but
authorized users
can use the encrypted version in a function. - No function but the function
load_saas_data
can decrypt it.
Examples¶
1. Show valid sources for source
argument by setting source
to null
You can then copy one of these sources for source
argument.
call bigfunctions.eu.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data(null, null, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE SOURCES |
| |
| airbyte-source-activecampaign==0.1.10 |
| airbyte-source-adjust==0.1.11 |
| airbyte-source-aha==0.3.10 |
| ... |
+----------------------------------------+
2. Show source_config
sample at expected format by setting source_config
to null
.
You can then copy the result, modify it and provide it as source_config
argument.
call bigfunctions.eu.load_saas_data("airbyte-source-file==0.5.13", null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data("airbyte-source-file==0.5.13", null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data("airbyte-source-file==0.5.13", null, null, null);
select * from bigfunction_result;
+----------------------------------------------------+
| result |
+----------------------------------------------------+
| # SOURCE CONFIG
|
| dataset_name: # REQUIRED | string | The Name of... |
| format: "csv" # REQUIRED | string | The Format ... |
| reader_options: # OPTIONAL | string | This shou... |
| url: # REQUIRED | string | The URL path to acce... |
| provider: |
| ## -------- Pick one valid structure among th... |
| storage: "HTTPS" # REQUIRED | string |
| user_agent: # OPTIONAL | boolean | Add User-A... |
| ... |
+----------------------------------------------------+
3. Provide source_config
with secrets encrypted:
call bigfunctions.eu.load_saas_data("airbyte-source-zendesk-support==2.6.10", '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data("airbyte-source-zendesk-support==2.6.10", '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data("airbyte-source-zendesk-support==2.6.10", '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
...
4. Show available streams by setting streams
argument to null
.
You can then copy one or several of these streams (separate them with commas) for streams
argument.
call bigfunctions.eu.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE STREAMS |
| |
| my_stream |
+----------------------------------------+
5. Extract and load my_stream
into your_project.your_dataset
.
call bigfunctions.eu.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, "my_stream", "your_project.your_dataset");
select * from bigfunction_result;
call bigfunctions.us.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, "my_stream", "your_project.your_dataset");
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data("airbyte-source-file==0.5.13", '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, "my_stream", "your_project.your_dataset");
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| ok |
+----------------------------------------+
Need help or Found a bug?
Get help using load_saas_data
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about load_saas_data
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.