bigfunctions > explore_dataset
explore_dataset¶
Call or Deploy explore_dataset
?
✅ You can call this explore_dataset
bigfunction directly from your Google Cloud Project (no install required).
- This
explore_dataset
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
explore_dataset(fully_qualified_dataset)
Description
Show infos about dataset tables
See the result as a data visualization in BigQuery Console!
The result of this function can be vizualized as an html report directly in BigQuery Console!
- Install this bookmarklet: bigfunctions (it has to be done only once)
- Open BigQuery console
- Click on the installed bookmarklet.
- From now on, the bookmarklet code will observe the BigQuery console page.
- If a BigQuery result appears with a unique cell containing html content, it will be rendered.
- You will have to click on the bookmarklet again:
- If you refresh the Bigquery console page,
- If you open the BigQuery console in a new tab of your browser.
- Run the query of the example and open the result of the latest subquery. The result will be shown as a nice html content.
Examples¶
call bigfunctions.eu.explore_dataset("bigfunctions.eu");
select html from bigfunction_result;
call bigfunctions.us.explore_dataset("bigfunctions.us");
select html from bigfunction_result;
call bigfunctions.europe_west1.explore_dataset("bigfunctions.europe_west1");
select html from bigfunction_result;
Need help using explore_dataset
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using explore_dataset
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
The explore_dataset
function, as described, provides information about the tables within a specified BigQuery dataset. Here are some use cases:
-
Data Discovery and Exploration: A data analyst or scientist new to a project can use this function to quickly understand the available datasets and their contents. They can see the names of tables, which can give hints about the kind of data stored. This speeds up the initial data discovery phase.
-
Data Auditing and Documentation: For compliance or documentation purposes, someone might need a list of all tables in a dataset.
explore_dataset
could automate generating this list, potentially including additional information. -
Impact Analysis: Before making changes to a dataset (e.g., deleting tables, changing schemas), a developer could use this function to identify potentially affected downstream processes or reports.
-
Data Governance: A data governance team could use this function to monitor dataset usage and ensure adherence to naming conventions or other data management policies.
-
Building Data Catalogs: This function could be a building block for a more comprehensive data catalog. The output could be ingested into a metadata store or visualized in a custom dashboard.
Example Scenario:
Imagine a data analyst joins a new team. They are tasked with analyzing customer behavior data. They know the data resides in a BigQuery dataset called project.customer_data
. They could use the explore_dataset
function like this (assuming the dataset is in the us
region):
call bigfunctions.us.explore_dataset("project.customer_data");
select html from bigfunction_result;
This would give them a quick overview of all tables in the customer_data
dataset, helping them understand what data is available and where to start their analysis. The HTML output could potentially include table descriptions, schemas, last modified dates, or sizes, making the exploration even more efficient.
Limitations based on documentation:
The documentation heavily emphasizes using the pre-deployed public versions of this function. This might be convenient for quick checks but raises some concerns for production use:
- Dependency on external project: Relying on a third-party project introduces a potential point of failure or unexpected changes.
- Security: If you need to explore datasets with sensitive information, calling a public function isn't advisable. Deploying the function in your own project would allow you to control access.
- Customization: The output is limited to HTML. If you need to process the information programmatically (e.g., store it in a database), you'd need to parse the HTML. Deploying the function yourself allows for customizing the output format.
For serious applications, consider deploying the explore_dataset
function within your own Google Cloud Project for better control, security, and customization.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!