bigfunctions > load_file_into_temp_dataset
load_file_into_temp_dataset¶
Call or Deploy load_file_into_temp_dataset
?
✅ You can call this load_file_into_temp_dataset
bigfunction directly from your Google Cloud Project (no install required).
- This
load_file_into_temp_dataset
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
load_file_into_temp_dataset(url, file_type, options)
Description
Download web file into a temp dataset in bigfunctions project.
Each call to this function creates a new temporary dataset which:
- will contain the
destination_table
with the file data. - is accessible only to you (who calls the function) and the function. You have permission to read data, delete the tables and delete the dataset.
- has a limited period of life. Default expiration time is set to 1h so that every table created will be automatically deleted after 1h. Empty datasets are periodically removed.
- has a random name
File Data is downloaded using ibis with DuckDB. Available file_type
values are:
- csv : doc
- json : doc
- parquet : doc
- delta : doc
- geo : doc. (this uses GDAL under the hood and enable you to also read .xls, .xlsx, .shp ...)
Examples¶
1. load random csv
select bigfunctions.eu.load_file_into_temp_dataset(
'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv',
'csv', null
)
select bigfunctions.us.load_file_into_temp_dataset(
'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv',
'csv', null
)
select bigfunctions.europe_west1.load_file_into_temp_dataset(
'https://raw.githubusercontent.com/AntoineGiraud/dbt_hypermarche/refs/heads/main/input/achats.csv',
'csv', null
)
+------------------------------------------------------------------+
| destination_table |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+
2. load json - french departements
select bigfunctions.eu.load_file_into_temp_dataset(
'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region',
'json', null
)
select bigfunctions.us.load_file_into_temp_dataset(
'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region',
'json', null
)
select bigfunctions.europe_west1.load_file_into_temp_dataset(
'https://geo.api.gouv.fr/departements?fields=nom,code,codeRegion,region',
'json', null
)
+------------------------------------------------------------------+
| destination_table |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+
3. load parquet on Google Cloud Storage
select bigfunctions.eu.load_file_into_temp_dataset(
'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet',
'parquet', null
)
select bigfunctions.us.load_file_into_temp_dataset(
'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet',
'parquet', null
)
select bigfunctions.europe_west1.load_file_into_temp_dataset(
'gs://bike-sharing-history/toulouse/jcdecaux/2024/Feb.parquet',
'parquet', null
)
+------------------------------------------------------------------+
| destination_table |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+
4. load xls or xlsx
select bigfunctions.eu.load_file_into_temp_dataset(
'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx',
'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
)
select bigfunctions.us.load_file_into_temp_dataset(
'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx',
'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
)
select bigfunctions.europe_west1.load_file_into_temp_dataset(
'https://github.com/AntoineGiraud/dbt_hypermarche/raw/refs/heads/main/input/Hypermarche.xlsx',
'geo', '{"layer":"Retours", "open_options": ["HEADERS=FORCE"]}'
)
+------------------------------------------------------------------+
| destination_table |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+
5. load french tricky csv
select bigfunctions.eu.load_file_into_temp_dataset(
'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6',
'csv',
'''{
"columns": {
"code_commune_insee": "VARCHAR",
"nom_commune_insee": "VARCHAR",
"code_postal": "VARCHAR",
"lb_acheminement": "VARCHAR",
"ligne_5": "VARCHAR"
},
"delim": ";",
"skip": 1
}'''
)
select bigfunctions.us.load_file_into_temp_dataset(
'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6',
'csv',
'''{
"columns": {
"code_commune_insee": "VARCHAR",
"nom_commune_insee": "VARCHAR",
"code_postal": "VARCHAR",
"lb_acheminement": "VARCHAR",
"ligne_5": "VARCHAR"
},
"delim": ";",
"skip": 1
}'''
)
select bigfunctions.europe_west1.load_file_into_temp_dataset(
'https://www.data.gouv.fr/fr/datasets/r/323af5b8-7831-445b-9a46-d4da140b61b6',
'csv',
'''{
"columns": {
"code_commune_insee": "VARCHAR",
"nom_commune_insee": "VARCHAR",
"code_postal": "VARCHAR",
"lb_acheminement": "VARCHAR",
"ligne_5": "VARCHAR"
},
"delim": ";",
"skip": 1
}'''
)
+------------------------------------------------------------------+
| destination_table |
+------------------------------------------------------------------+
| bigfunctions.temp_6bdb75ca_7f72_4f1f_b46a_6ca59f7f66ac.file_data |
+------------------------------------------------------------------+
Need help using load_file_into_temp_dataset
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using load_file_into_temp_dataset
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
This function is useful for quickly loading data from various online sources directly into BigQuery for analysis without needing to manually download, format, and upload the data. Here are a few specific use cases:
1. Data Exploration and Prototyping:
- You find a dataset on a public repository (like Github) or a government data portal, and you want to quickly explore it in BigQuery.
load_file_into_temp_dataset
lets you load the data directly without intermediate steps. This is perfect for initial data analysis and prototyping before deciding to store the data permanently.
2. Ad-hoc Analysis of Public Data:
- You need to analyze some publicly available data, such as weather data, stock prices, or social media trends, for a one-time report or analysis. You can use this function to load the data on demand without storing it permanently.
3. ETL Pipelines with Dynamic Data Sources:
- You're building an ETL pipeline that needs to process data from various sources that are updated frequently.
load_file_into_temp_dataset
can be integrated into your pipeline to dynamically load data from different URLs as needed. This is especially helpful when dealing with data sources that don't have a stable schema or format.
4. Data Enrichment:
- You have a dataset in BigQuery and need to enrich it with external data, such as geographic information, currency exchange rates, or product catalogs. You can use this function to load the external data into a temporary table and then join it with your existing table.
5. Sharing Data Snippets:
- You want to share a small dataset with a colleague or client without giving them access to your entire data warehouse. Load the data into a temporary dataset using this function and then grant them temporary access. This offers a secure and convenient way to share data snippets.
Example: Analyzing Tweet Sentiment from a Public API:
Imagine an API that returns tweet data in JSON format. You want to analyze the sentiment of tweets related to a specific hashtag.
- Call the API to retrieve the tweets. The API might offer a download link or allow you to stream the data directly.
- Use
load_file_into_temp_dataset
within a BigQuery query to load the JSON data from the API's URL. - Apply BigQuery's text processing functions to analyze the sentiment of the tweets in the temporary table.
- Generate your report or visualization directly from the results.
This avoids the need to download the JSON file, create a table schema, and manually load the data, significantly speeding up your analysis. The temporary dataset automatically cleans itself up, simplifying data management.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!