Skip to content

load_saas_data_into_temp_dataset

load_saas_data_into_temp_dataset(source, source_config, streams, state)

Description

Load SAAS data from 250+ sources using Airbyte Python Connectors .

  • The function creates a temporary dataset only accessible to you in bigfunctions project.
  • Airbye Serverless will extract data from source (one of 250+ Airbyte Python Connectors available on PyPI) using source_config (source configuration in yaml format expected by Airbyte Serverless).
  • It will create one table per stream (a stream is like a resource type) in the dataset + one table _airbyte_logs for logs and one table _airbyte_states for states.
  • If you provide a state, only new data from that state is loaded.
  • While running, connector logs are appended in table _airbyte_logs.
  • Examples below explain how to set the arguments.

Encrypt your secrets! ⚠️

Do NOT write secrets in plain text in your SQL queries!

Otherwise, anyone with access to your BigQuery logs can read them.

Instead, generate an encrypted version of your secret that you can safely share.

Enter a secret value to encrypt below along with the emails of the users who are authorized to use it. It will generate an encrypted version that you can paste into the arguments of your function (exactly like if you passed the plain text version). If a user, who is not in the auhorized users list, tries to use the encrypted version, the function will raise a permission error. Besides, the encrypted version can only be used with this function load_saas_data_into_temp_dataset.

Encrypt a secret

How secret encryption works

Technically, this encryption system uses the same encryption mechanism used to transfer data over the internet. It uses a pair of a public and private keys.

The public key (contained in this web page) is used to encrypt a text. The corresponding private key is the only one who is able to decrypt the text. The private key is stored in a secret manager and is only accessible to this function. Thus, this function (and this function only) can decrypt it.

Moreover, the function will check that the caller of the function belong to the kist of authorized users that you gave at encryption time.

Thanks to this:

  • Nobody but this function will be able to decrypt it.
  • Nobody but authorized users can use the encrypted version in a function.
  • No function but the function load_saas_data_into_temp_dataset can decrypt it.

Examples

Call or Deploy load_saas_data_into_temp_dataset ?
Call load_saas_data_into_temp_dataset directly

The easiest way to use bigfunctions

  • load_saas_data_into_temp_dataset function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy load_saas_data_into_temp_dataset in your project

Why deploy?

  • You may prefer to deploy load_saas_data_into_temp_dataset in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

load_saas_data_into_temp_dataset function can be deployed with:

pip install bigfunctions
bigfun get load_saas_data_into_temp_dataset
bigfun deploy load_saas_data_into_temp_dataset

1. Show valid sources for source argument by setting source to null

You can then copy one of these sources for source argument.

select bigfunctions.eu.load_saas_data_into_temp_dataset(null, null, null, null)
select bigfunctions.us.load_saas_data_into_temp_dataset(null, null, null, null)
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset(null, null, null, null)
+--------------------------------------------------------------------------------------------------------------------------+
| destination_dataset                                                                                                      |
+--------------------------------------------------------------------------------------------------------------------------+
| # AVAILABLE SOURCES

airbyte-source-activecampaign==0.1.10
airbyte-source-adjust==0.1.11
airbyte-source-aha==0.3.10
...
 |
+--------------------------------------------------------------------------------------------------------------------------+

2. Show source_config sample at expected format by setting source_config to null.

You can then copy the result, modify it and provide it as source_config argument.

select bigfunctions.eu.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.us.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', null, null, null)
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| destination_dataset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| # SOURCE CONFIG

dataset_name: # REQUIRED | string | The Name of the final table to replicate this file into (should include letters, numbers dash and underscores only).
format: "csv" # REQUIRED | string | The Format of the file which should be replicated (Warning: some formats may be experimental, please refer to the docs).
reader_options: # OPTIONAL | string | This should be a string in JSON format. It depends on the chosen file format to provide additional options and tune its behavior. | Examples: {}, {"sep": " "}, {"sep": " ", "header": 0, "names": ["column1", "column2"] }
url: # REQUIRED | string | The URL path to access the file which should be replicated. | Examples: https://storage.googleapis.com/covid19-open-data/v2/latest/epidemiology.csv, gs://my-google-bucket/data.csv, s3://gdelt-open-data/events/20190914.export.csv
provider:
  ## -------- Pick one valid structure among the examples below: --------
  storage: "HTTPS" # REQUIRED | string
  user_agent: # OPTIONAL | boolean | Add User-Agent to request
  ...
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

3. Provide source_config with secrets encrypted:

select bigfunctions.eu.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
select bigfunctions.us.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , null, null)
+---------------------+
| destination_dataset |
+---------------------+
| ...                 |
+---------------------+

4. Show available streams by setting streams argument to null.

You can then copy one or several of these streams (separate them with commas) for streams argument.

select bigfunctions.eu.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
select bigfunctions.us.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , null, null)
+---------------------------------+
| destination_dataset             |
+---------------------------------+
| # AVAILABLE STREAMS

my_stream
 |
+---------------------------------+

5. Extract and load my_stream into the temporary dataset.

select bigfunctions.eu.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
select bigfunctions.us.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset('airbyte-source-file==0.5.13', '''
  dataset_name: "my_stream"
  format: "csv"
  url: https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv
  provider:
    storage: "HTTPS"
  '''
  , 'my_stream', null)
+------------------------------------+
| destination_dataset                |
+------------------------------------+
| bigfunctions.temp__dkzodskslfdkdl` |
+------------------------------------+

6. Provide a state to load only new data (since this state)

select bigfunctions.eu.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
select bigfunctions.us.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
select bigfunctions.europe_west1.load_saas_data_into_temp_dataset('airbyte-source-zendesk-support==2.6.10', '''
  credentials:
    access_token: ENCRYPTED_SECRET(kdoekdswlxzapdldpzlfpfd)
  '''
  , tickets, {...})
+---------------------+
| destination_dataset |
+---------------------+
| ...                 |
+---------------------+

Need help or Found a bug using load_saas_data_into_temp_dataset?
Get help using load_saas_data_into_temp_dataset

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about load_saas_data_into_temp_dataset

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.

Spread the word!

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on