Skip to content

bigfunctions > explore_column

explore_column

Call or Deploy explore_column ?

✅ You can call this explore_column bigfunction directly from your Google Cloud Project (no install required).

  • This explore_column function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

explore_column(fully_qualified_column)

Description

Show column statistics

See the result as a data visualization in BigQuery Console!

The result of this function can be vizualized as an html report directly in BigQuery Console!

  1. Install this bookmarklet: bigfunctions (it has to be done only once)
  2. Open BigQuery console
  3. Click on the installed bookmarklet.
    • From now on, the bookmarklet code will observe the BigQuery console page.
    • If a BigQuery result appears with a unique cell containing html content, it will be rendered.
  4. You will have to click on the bookmarklet again:
    • If you refresh the Bigquery console page,
    • If you open the BigQuery console in a new tab of your browser.
  5. Run the query of the example and open the result of the latest subquery. The result will be shown as a nice html content.


bookmarklet usage

Examples

call bigfunctions.eu.explore_column("bigfunctions.eu.natality.weight_pounds");
select html from bigfunction_result;
call bigfunctions.us.explore_column("bigfunctions.us.natality.weight_pounds");
select html from bigfunction_result;
call bigfunctions.europe_west1.explore_column("bigfunctions.europe_west1.natality.weight_pounds");
select html from bigfunction_result;

screenshot

Need help using explore_column?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using explore_column?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

The explore_column function, as described, provides statistics about a specified column in a BigQuery table. Here are a few use cases:

  • Data Understanding/Exploration: When working with a new dataset, you can quickly use explore_column to get a sense of the distribution of values within a particular column. This helps understand data types, ranges, potential outliers, and the general characteristics of the data. For example, if you have a column representing customer ages, explore_column could show you the average age, minimum and maximum ages, and potentially a histogram of the age distribution.

  • Data Quality Assessment: explore_column can help identify data quality issues. For instance, it might reveal unexpected values in a column (e.g., negative values in a column supposed to store positive numbers), a high number of NULL values, or a skewed distribution that might warrant further investigation.

  • Feature Engineering: Before using a column in a machine learning model, explore_column can help determine appropriate preprocessing steps. For example, if a column has a highly skewed distribution, you might decide to apply a logarithmic transformation. Understanding the distribution can also help you choose appropriate binning strategies for categorical features.

  • Report Generation: The function generates HTML output which can be incorporated into automated reports. This allows for easy sharing of column-level statistics with stakeholders without manual analysis.

  • Data Monitoring: By periodically running explore_column on key columns, you can monitor changes in data distributions over time. This can be useful for detecting anomalies or drifts in the data that might indicate problems with data ingestion or underlying business processes.

Example Scenario:

Imagine you're analyzing a dataset of website user activity. You have a column called time_spent_on_page (in seconds). Using explore_column(your_project.your_dataset.your_table.time_spent_on_page) would quickly provide you with stats like the average, minimum, maximum time spent on a page, potentially a histogram visualization, and help you answer questions like:

  • Are there users spending an unusually long or short time on the page?
  • Is the distribution skewed? Are most users spending a short time, with a few outliers spending a very long time?
  • Are there a significant number of NULL values, indicating potential tracking issues?

Based on this information, you can make decisions about data cleaning, feature engineering, or further investigation.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on