Skip to content

json_column_schema

json_column_schema(data)

Description

Returns the schema of a json column .

  • The schema of the json column is the union of schemas of the json objects of each cell.
  • schema is a json object like {path: type} with
  • path the path of the nested field
  • type among (string, numeric, bool, date, timestamp, array)

Usage

Call or Deploy json_column_schema ?
Call json_column_schema directly

The easiest way to use bigfunctions

  • json_column_schema function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy json_column_schema in your project

Why deploy?

  • You may prefer to deploy json_column_schema in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

json_column_schema function can be deployed with:

pip install bigfunctions
bigfun get json_column_schema
bigfun deploy json_column_schema

Examples

with sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

)


select bigfunctions.eu.json_column_schema(data)
from sample_data
with sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

)


select bigfunctions.us.json_column_schema(data)
from sample_data
with sample_data as (

  select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
      union all
      select json '{"user": {"friends": ["Jack", "Peter"]}}' as data

)


select bigfunctions.europe_west1.json_column_schema(data)
from sample_data
+------------------------------------------------------------------------+
| schema                                                                 |
+------------------------------------------------------------------------+
| {"created_at": "date", "user.name": "string", "user.friends": "array"} |
+------------------------------------------------------------------------+

Use cases

You have a BigQuery table containing a JSON column, and you want to understand the structure and data types within those JSON objects. However, the JSON data isn't entirely consistent across all rows; some rows might have additional fields or different data types for the same field. json_column_schema helps you discover all the possible fields and their respective types present in the entire column.

Scenario: You're analyzing event data stored as JSON in BigQuery. Each event might have slightly different properties. For instance:

{"event_type": "page_view", "url": "/home", "user_id": 123}
{"event_type": "purchase", "item_id": 456, "user_id": 456, "value": 10.99}
{"event_type": "sign_up", "user_id": 789, "referral_code": "ABC"}

Use Case with json_column_schema:

WITH event_data AS (
    SELECT JSON '{"event_type": "page_view", "url": "/home", "user_id": 123}' AS event_json
    UNION ALL
    SELECT JSON '{"event_type": "purchase", "item_id": 456, "user_id": 456, "value": 10.99}' AS event_json
    UNION ALL
    SELECT JSON '{"event_type": "sign_up", "user_id": 789, "referral_code": "ABC"}' AS event_json
)
SELECT bigfunctions.us.json_column_schema(event_json) AS schema
FROM event_data;

Result:

{"event_type": "string", "url": "string", "user_id": "numeric", "item_id": "numeric", "value": "numeric", "referral_code": "string"}

Benefits:

  • Schema Discovery: You automatically identify all potential fields and their most general types within the JSON column without manual inspection.
  • Data Validation: You can use the generated schema to validate incoming data or build data quality checks.
  • Downstream Processing: Knowing the complete schema helps you design efficient queries and transformations for further analysis. For example, you could use this information to create a materialized view with extracted JSON fields.

This function is particularly useful for exploring and understanding semi-structured JSON data where the schema might not be strictly enforced or evolves over time. It allows you to programmatically handle variations in JSON structure.


Need help or Found a bug?
Get help using json_column_schema

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about json_column_schema

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on