json_column_schema¶
json_column_schema(data)
Description¶
Returns the schema of a json column .
- The schema of the json column is the union of schemas of the json objects of each cell.
schema
is a json object like{path: type}
withpath
the path of the nested fieldtype
among (string
,numeric
,bool
,date
,timestamp
,array
)
Usage¶
Call or Deploy json_column_schema
?
Call json_column_schema
directly
The easiest way to use bigfunctions
json_column_schema
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy json_column_schema
in your project
Why deploy?
- You may prefer to deploy
json_column_schema
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
json_column_schema
function can be deployed with:
pip install bigfunctions
bigfun get json_column_schema
bigfun deploy json_column_schema
Examples¶
with sample_data as (
select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
union all
select json '{"user": {"friends": ["Jack", "Peter"]}}' as data
)
select bigfunctions.eu.json_column_schema(data)
from sample_data
with sample_data as (
select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
union all
select json '{"user": {"friends": ["Jack", "Peter"]}}' as data
)
select bigfunctions.us.json_column_schema(data)
from sample_data
with sample_data as (
select json '{"created_at": "2022-01-01", "user": {"name": "James"}}' as data
union all
select json '{"user": {"friends": ["Jack", "Peter"]}}' as data
)
select bigfunctions.europe_west1.json_column_schema(data)
from sample_data
+------------------------------------------------------------------------+
| schema |
+------------------------------------------------------------------------+
| {"created_at": "date", "user.name": "string", "user.friends": "array"} |
+------------------------------------------------------------------------+
Use cases¶
You have a BigQuery table containing a JSON column, and you want to understand the structure and data types within those JSON objects. However, the JSON data isn't entirely consistent across all rows; some rows might have additional fields or different data types for the same field. json_column_schema
helps you discover all the possible fields and their respective types present in the entire column.
Scenario: You're analyzing event data stored as JSON in BigQuery. Each event might have slightly different properties. For instance:
{"event_type": "page_view", "url": "/home", "user_id": 123}
{"event_type": "purchase", "item_id": 456, "user_id": 456, "value": 10.99}
{"event_type": "sign_up", "user_id": 789, "referral_code": "ABC"}
Use Case with json_column_schema
:
WITH event_data AS (
SELECT JSON '{"event_type": "page_view", "url": "/home", "user_id": 123}' AS event_json
UNION ALL
SELECT JSON '{"event_type": "purchase", "item_id": 456, "user_id": 456, "value": 10.99}' AS event_json
UNION ALL
SELECT JSON '{"event_type": "sign_up", "user_id": 789, "referral_code": "ABC"}' AS event_json
)
SELECT bigfunctions.us.json_column_schema(event_json) AS schema
FROM event_data;
Result:
{"event_type": "string", "url": "string", "user_id": "numeric", "item_id": "numeric", "value": "numeric", "referral_code": "string"}
Benefits:
- Schema Discovery: You automatically identify all potential fields and their most general types within the JSON column without manual inspection.
- Data Validation: You can use the generated schema to validate incoming data or build data quality checks.
- Downstream Processing: Knowing the complete schema helps you design efficient queries and transformations for further analysis. For example, you could use this information to create a materialized view with extracted JSON fields.
This function is particularly useful for exploring and understanding semi-structured JSON data where the schema might not be strictly enforced or evolves over time. It allows you to programmatically handle variations in JSON structure.
Need help or Found a bug?
Get help using json_column_schema
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about json_column_schema
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.