load_saas_data¶
load_saas_data(source, source_config, streams, destination_dataset)
Description¶
Load SAAS data from 250+ sources using Airbyte Python Connectors .
- The function creates a temporary dataset only accessible to you in
bigfunctions
project. - Airbye Serverless will extract data from
source
(one of 250+ Airbyte Python Connectors available on PyPI) usingsource_config
(source configuration in yaml format expected by Airbyte Serverless). - It will create one table per stream (a stream is like a resource type) in the dataset + one table
_airbyte_logs
for logs and one table_airbyte_states
for states. - The data in then moved from the temporary dataset to be appended in
destination_dataset
. Tables of the temporary dataset are deleted. - If you call this function several times, the function will start by getting the latest state from
destination_dataset._airbyte_states
table to only extract and load new data. - Examples below explain how to set the arguments.
Encrypt your secrets! ⚠️¶
Do NOT write secrets in plain text in your SQL queries!
Otherwise, anyone with access to your BigQuery logs can read them.
Instead, generate an encrypted version of your secret that you can safely share.
Enter a secret value to encrypt below along with the emails of the users who are authorized to use it. It will generate an encrypted version that you can paste into the arguments of your function (exactly like if you passed the plain text version). If a user, who is not in the auhorized users list, tries to use the encrypted version, the function will raise a permission error. Besides, the encrypted version can only be used with this function
load_saas_data
.
Encrypt a secret
How secret encryption works
Technically, this encryption system uses the same encryption mechanism used to transfer data over the internet. It uses a pair of a public and private keys.
The public key (contained in this web page) is used to encrypt a text. The corresponding private key is the only one who is able to decrypt the text. The private key is stored in a secret manager and is only accessible to this function. Thus, this function (and this function only) can decrypt it.
Moreover, the function will check that the caller of the function belong to the kist of authorized users
that you gave at encryption time.
Thanks to this:
- Nobody but this function will be able to decrypt it.
- Nobody but
authorized users
can use the encrypted version in a function. - No function but the function
load_saas_data
can decrypt it.
Examples¶
Call or Deploy load_saas_data
?
Call load_saas_data
directly
The easiest way to use bigfunctions
load_saas_data
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy load_saas_data
in your project
Why deploy?
- You may prefer to deploy
load_saas_data
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
load_saas_data
function can be deployed with:
pip install bigfunctions
bigfun get load_saas_data
bigfun deploy load_saas_data
1. Show valid sources for source
argument by setting source
to null
You can then copy one of these sources for source
argument.
call bigfunctions.eu.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data(null, null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data(null, null, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE SOURCES |
| |
| airbyte-source-activecampaign==0.1.10 |
| airbyte-source-adjust==0.1.11 |
| airbyte-source-aha==0.3.10 |
| ... |
+----------------------------------------+
2. Show source_config
sample at expected format by setting source_config
to null
.
You can then copy the result, modify it and provide it as source_config
argument.
call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', null, null, null);
select * from bigfunction_result;
+----------------------------------------------------+
| result |
+----------------------------------------------------+
| # SOURCE CONFIG
|
| dataset_name: # REQUIRED | string | The Name of... |
| format: "csv" # REQUIRED | string | The Format ... |
| reader_options: # OPTIONAL | string | This shou... |
| url: # REQUIRED | string | The URL path to acce... |
| provider: |
| ## -------- Pick one valid structure among th... |
| storage: "HTTPS" # REQUIRED | string |
| user_agent: # OPTIONAL | boolean | Add User-A... |
| ... |
+----------------------------------------------------+
3. Provide source_config
with secrets encrypted:
call bigfunctions.eu.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-zendesk-support==2.6.10', '''
credentials:
access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
'''
, null, null);
select * from bigfunction_result;
...
4. Show available streams by setting streams
argument to null
.
You can then copy one or several of these streams (separate them with commas) for streams
argument.
call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, null, null);
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| # AVAILABLE STREAMS |
| |
| my_stream |
+----------------------------------------+
5. Extract and load my_stream
into your_project.your_dataset
.
call bigfunctions.eu.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.us.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
call bigfunctions.europe_west1.load_saas_data('airbyte-source-file==0.5.13', '''
dataset_name: "my_stream"
format: "csv"
url: https://raw.githubusercontent.com/MobilityData/gbfs/refs/heads/master/systems.csv
provider:
storage: "HTTPS"
'''
, 'my_stream', 'your_project.your_dataset');
select * from bigfunction_result;
+----------------------------------------+
| result |
+----------------------------------------+
| ok |
+----------------------------------------+
Need help or Found a bug using load_saas_data
?
Get help using load_saas_data
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about load_saas_data
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.
Spread the word!¶
BigFunctions is fully open-source. Help make it a success by spreading the word!