Skip to content

sql_to_flatten_json_column

sql_to_flatten_json_column(data, fully_qualified_column)

Description

Generate the SQL to flatten a json column into multiple flat columns.

Usage

Call or Deploy sql_to_flatten_json_column ?
Call sql_to_flatten_json_column directly

The easiest way to use bigfunctions

  • sql_to_flatten_json_column function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy sql_to_flatten_json_column in your project

Why deploy?

  • You may prefer to deploy sql_to_flatten_json_column in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

sql_to_flatten_json_column function can be deployed with:

pip install bigfunctions
bigfun get sql_to_flatten_json_column
bigfun deploy sql_to_flatten_json_column

Examples

1. Example with a temp table. Copy the code in your BigQuery console & run it!

create temp table sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

);

select bigfunctions.eu.sql_to_flatten_json_column(data, sample_data.data)
from sample_data
create temp table sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

);

select bigfunctions.us.sql_to_flatten_json_column(data, sample_data.data)
from sample_data
create temp table sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

);

select bigfunctions.europe_west1.sql_to_flatten_json_column(data, sample_data.data)
from sample_data
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sql                                                                                                                                                                                                                                       |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| select
    *,
    cast(json_value(`data`, '$.created_at') as date) as created_at,
    json_extract(`data`, '$.user.friends') as user__friends,
    cast(json_value(`data`, '$.user.name') as string) as user__name,
  from `sample_data`
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

2. Get SQL to flatten column data of your table project.dataset.table

select bigfunctions.eu.sql_to_flatten_json_column(data, project.dataset.table.data)
select bigfunctions.us.sql_to_flatten_json_column(data, project.dataset.table.data)
select bigfunctions.europe_west1.sql_to_flatten_json_column(data, project.dataset.table.data)
+-------------------------------------------------------+
| sql                                                   |
+-------------------------------------------------------+
| select
    *,
    ...
  from `project.dataset.table`
 |
+-------------------------------------------------------+

Use cases

You have a BigQuery table containing a JSON column called data, and you want to analyze specific fields within these JSON objects. Instead of repeatedly using JSON_EXTRACT or JSON_VALUE in your queries, you can use sql_to_flatten_json_column to generate a SQL query that extracts all the JSON fields into separate columns. This makes subsequent analysis easier and potentially more performant.

Use Case Example:

Let's say you have a table called website_events with a JSON column named event_details:

Table: website_events
Columns: event_id (INT64), event_timestamp (TIMESTAMP), event_details (STRING)

Sample Data:
1, 2024-10-26 10:00:00, '{"eventType": "pageview", "pageUrl": "/home", "userId": "123"}'
2, 2024-10-26 10:01:00, '{"eventType": "click", "elementId": "button1", "userId": "456"}'

You want to analyze the eventType, pageUrl (when available), and userId for all events.

Steps:

  1. Generate the flattening SQL:

In the BigQuery console, run the following query, replacing <your-project-id>.<your-dataset>.website_events with the fully qualified table name and choosing the appropriate BigFunctions dataset for your region (e.g., bigfunctions.us for US, bigfunctions.eu for EU, etc.):

SELECT bigfunctions.<your-region>.sql_to_flatten_json_column(event_details, '<your-project-id>.<your-dataset>.website_events.event_details');
  1. Execute the generated SQL: The output of the above query will be a new SQL query that flattens the JSON. It will look something like this:
SELECT
    *,
    CAST(JSON_VALUE(`event_details`, '$.eventType') AS STRING) AS eventType,
    JSON_VALUE(`event_details`, '$.pageUrl') AS pageUrl,
    CAST(JSON_VALUE(`event_details`, '$.userId') AS STRING) AS userId
FROM
    `<your-project-id>.<your-dataset>.website_events`
  1. Copy and run the generated SQL: This final query will give you a table with individual columns for eventType, pageUrl, and userId.

Benefits:

  • Simplified Queries: Instead of constantly extracting JSON fields in every query, you have dedicated columns, making your queries cleaner and easier to read.
  • Potential Performance Improvement: BigQuery can sometimes optimize queries against flattened data better than queries with repeated JSON extractions.
  • Data Exploration: Flattening the JSON makes it easier to explore the data in the BigQuery UI and identify all the fields present in the JSON data.

This approach is especially useful when you need to analyze the JSON data repeatedly or when the JSON structure is complex and contains numerous nested fields.


Need help or Found a bug?
Get help using sql_to_flatten_json_column

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about sql_to_flatten_json_column

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on