Skip to content

bigfunctions > load_file_into_temp_dataset

load_file_into_temp_dataset

Call or Deploy load_file_into_temp_dataset ?

✅ You can call this load_file_into_temp_dataset bigfunction directly from your Google Cloud Project (no install required).

  • This load_file_into_temp_dataset function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

load_file_into_temp_dataset(url, file_type, options)

Description

Download web file into a temp dataset in bigfunctions project.

graph load file

Each call to this function creates a new temporary dataset which:

  • will contain the destination_table with the file data.
  • is accessible only to you (who calls the function) and the function. You have permission to read data, delete the tables and delete the dataset.
  • has a limited period of life. Default expiration time is set to 1h so that every table created will be automatically deleted after 1h. Empty datasets are periodically removed.
  • has a random name

File Data is downloaded using ibis with DuckDB. Available file_type values are:

  • csv : doc
  • json : doc
  • parquet : doc
  • delta : doc
  • geo : doc. (this uses GDAL under the hood and enable you to also read .xls, .xlsx, .shp ...)

Examples

1. load random csv

select bigfunctions.eu.load_file_into_temp_dataset(
    'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv', 
    'csv', null
  )
select bigfunctions.us.load_file_into_temp_dataset(
    'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv', 
    'csv', null
  )
select bigfunctions.europe_west1.load_file_into_temp_dataset(
    'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv', 
    'csv', null
  )
+------------------------------------------------------------------+
| destination_table                                                |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+

2. load json - french departements

select bigfunctions.eu.load_file_into_temp_dataset(
    'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region', 
    'json', null
  )
select bigfunctions.us.load_file_into_temp_dataset(
    'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region', 
    'json', null
  )
select bigfunctions.europe_west1.load_file_into_temp_dataset(
    'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region', 
    'json', null
  )
+------------------------------------------------------------------+
| destination_table                                                |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+

3. load parquet on Google Cloud Storage

select bigfunctions.eu.load_file_into_temp_dataset(
    'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet', 
    'parquet', null
  )
select bigfunctions.us.load_file_into_temp_dataset(
    'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet', 
    'parquet', null
  )
select bigfunctions.europe_west1.load_file_into_temp_dataset(
    'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet', 
    'parquet', null
  )
+------------------------------------------------------------------+
| destination_table                                                |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+

4. load xls or xlsx

select bigfunctions.eu.load_file_into_temp_dataset(
    'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx', 
    'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
  )
select bigfunctions.us.load_file_into_temp_dataset(
    'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx', 
    'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
  )
select bigfunctions.europe_west1.load_file_into_temp_dataset(
    'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx', 
    'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
  )
+------------------------------------------------------------------+
| destination_table                                                |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+

5. load french tricky csv

select bigfunctions.eu.load_file_into_temp_dataset(
    'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6', 
    'csv', 
  '''{
    "columns": {
        "code_commune_insee": "VARCHAR",
        "nom_commune_insee": "VARCHAR",
        "code_postal": "VARCHAR",
        "lb_acheminement": "VARCHAR",
        "ligne_5": "VARCHAR"
    },
    "delim": ";",
    "skip": 1
  }'''
  )
select bigfunctions.us.load_file_into_temp_dataset(
    'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6', 
    'csv', 
  '''{
    "columns": {
        "code_commune_insee": "VARCHAR",
        "nom_commune_insee": "VARCHAR",
        "code_postal": "VARCHAR",
        "lb_acheminement": "VARCHAR",
        "ligne_5": "VARCHAR"
    },
    "delim": ";",
    "skip": 1
  }'''
  )
select bigfunctions.europe_west1.load_file_into_temp_dataset(
    'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6', 
    'csv', 
  '''{
    "columns": {
        "code_commune_insee": "VARCHAR",
        "nom_commune_insee": "VARCHAR",
        "code_postal": "VARCHAR",
        "lb_acheminement": "VARCHAR",
        "ligne_5": "VARCHAR"
    },
    "delim": ";",
    "skip": 1
  }'''
  )
+------------------------------------------------------------------+
| destination_table                                                |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+

Need help using load_file_into_temp_dataset?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using load_file_into_temp_dataset?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

This function is useful for quickly loading data from various online sources directly into BigQuery for analysis without needing to manually download, format, and upload the data. Here are a few specific use cases:

1. Data Exploration and Prototyping:

  • You find a dataset on a public repository (like Github) or a government data portal, and you want to quickly explore it in BigQuery. load_file_into_temp_dataset lets you load the data directly without intermediate steps. This is perfect for initial data analysis and prototyping before deciding to store the data permanently.

2. Ad-hoc Analysis of Public Data:

  • You need to analyze some publicly available data, such as weather data, stock prices, or social media trends, for a one-time report or analysis. You can use this function to load the data on demand without storing it permanently.

3. ETL Pipelines with Dynamic Data Sources:

  • You're building an ETL pipeline that needs to process data from various sources that are updated frequently. load_file_into_temp_dataset can be integrated into your pipeline to dynamically load data from different URLs as needed. This is especially helpful when dealing with data sources that don't have a stable schema or format.

4. Data Enrichment:

  • You have a dataset in BigQuery and need to enrich it with external data, such as geographic information, currency exchange rates, or product catalogs. You can use this function to load the external data into a temporary table and then join it with your existing table.

5. Sharing Data Snippets:

  • You want to share a small dataset with a colleague or client without giving them access to your entire data warehouse. Load the data into a temporary dataset using this function and then grant them temporary access. This offers a secure and convenient way to share data snippets.

Example: Analyzing Tweet Sentiment from a Public API:

Imagine an API that returns tweet data in JSON format. You want to analyze the sentiment of tweets related to a specific hashtag.

  1. Call the API to retrieve the tweets. The API might offer a download link or allow you to stream the data directly.
  2. Use load_file_into_temp_dataset within a BigQuery query to load the JSON data from the API's URL.
  3. Apply BigQuery's text processing functions to analyze the sentiment of the tweets in the temporary table.
  4. Generate your report or visualization directly from the results.

This avoids the need to download the JSON file, create a table schema, and manually load the data, significantly speeding up your analysis. The temporary dataset automatically cleans itself up, simplifying data management.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on