bigfunctions > json_schema
json_schema¶
Call or Deploy json_schema
?
✅ You can call this json_schema
bigfunction directly from your Google Cloud Project (no install required).
- This
json_schema
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
json_schema(data)
Description
Returns the schema of data
(with data
a json object) as [{path, type}]
with path
the path of the nested field
and type
among (string
, numeric
, bool
, date
, timestamp
)
Examples¶
select bigfunctions.eu.json_schema('{"created_at": "2022-01-01", "user": {"name": "James", "friends": ["Jack", "Peter"]}}')
select bigfunctions.us.json_schema('{"created_at": "2022-01-01", "user": {"name": "James", "friends": ["Jack", "Peter"]}}')
select bigfunctions.europe_west1.json_schema('{"created_at": "2022-01-01", "user": {"name": "James", "friends": ["Jack", "Peter"]}}')
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| schema |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| [
| struct("created_at" as path, "date" as type),
| struct("user.name" as path, "string" as type),
| struct("user.friends" as path, "array" as type)
| ]
|
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
Need help using json_schema
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using json_schema
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
A use case for the json_schema
function is to dynamically determine the schema of JSON data stored in a BigQuery table without prior knowledge of its structure. This can be particularly helpful in situations like:
-
Data ingestion from diverse sources: Imagine receiving JSON data from various APIs or partners where the structure might not be consistent or documented thoroughly.
json_schema
can be used to automatically analyze a sample of the incoming data and infer its schema. This information can then be used to create or validate table schemas, ensuring proper data loading. -
Data exploration and analysis: When dealing with unfamiliar JSON data,
json_schema
helps quickly understand its structure and the types of information it contains. This is useful for exploratory data analysis and building queries without manually examining the JSON objects. -
Schema evolution tracking: By periodically applying
json_schema
to incoming data, you can detect changes in the JSON structure over time. This allows you to adapt your processing pipelines or table schemas as needed, ensuring compatibility and avoiding errors. -
Data validation: After inferring the schema, it can be used to validate subsequent JSON data against the expected structure. This can prevent malformed data from being ingested, ensuring data quality.
-
Automated documentation: The output of
json_schema
can be used to generate documentation for the JSON data, simplifying communication and understanding among different teams or users.
Example Scenario:
Let's say you have a BigQuery table containing a raw_data
column storing JSON strings from different sources. You can use the following query to get the schema of the JSON data in each row:
SELECT bigfunctions.us.json_schema(raw_data) AS inferred_schema
FROM your_dataset.your_table;
This will return a table where each row contains the inferred schema of the corresponding JSON data in raw_data
. You can then further process this output to:
- Identify the common schema across different JSON data.
- Create a new table with the appropriate schema to store the extracted JSON data in a structured format.
- Flag rows with unexpected schemas for further investigation.
By dynamically determining the schema of JSON data using json_schema
, you can make your data ingestion, analysis, and validation processes more robust and efficient.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!