xml_extract¶
xml_extract(xml, x_path)
Description¶
Returns content extracted from XML from given XPATH
Usage¶
Call or Deploy xml_extract
?
Call xml_extract
directly
The easiest way to use bigfunctions
xml_extract
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy xml_extract
in your project
Why deploy?
- You may prefer to deploy
xml_extract
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
xml_extract
function can be deployed with:
pip install bigfunctions
bigfun get xml_extract
bigfun deploy xml_extract
Examples¶
1. Only one element for the xpath
select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
+-----------------+
| extracted_value |
+-----------------+
| ["John Doe"] |
+-----------------+
2. Multiple elements for the xpath
select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
+--------------------------+
| extracted_value |
+--------------------------+
| ["John Doe", "Jane Doe"] |
+--------------------------+
3. Incorrect xpath
select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
+-----------------+
| extracted_value |
+-----------------+
| null |
+-----------------+
Use cases¶
Let's say you have a BigQuery table called product_catalog
that stores product information, including an XML description field. The XML data might look like this:
<product>
<name>Awesome Gadget</name>
<features>
<feature>Long battery life</feature>
<feature>Waterproof</feature>
</features>
<price currency="USD">99.99</price>
</product>
Use Case 1: Extracting Feature List
You want to analyze the most common product features. You can use xml_extract
to pull out all the features into an array:
SELECT
product_id,
bigfunctions.us.xml_extract(xml_description, '/product/features/feature') AS features
FROM
product_catalog;
This query would return a table with product_id
and a features
column containing an array of strings, like ["Long battery life", "Waterproof"]
. You can then unnest this array for further analysis.
Use Case 2: Finding Products within a Price Range
You want to find all products priced between $50 and $100. You can use xml_extract
to extract the price and then filter based on its value:
SELECT
product_id,
CAST(bigfunctions.us.xml_extract(xml_description, '/product/price')[OFFSET(0)] AS BIGNUMERIC) AS price
FROM
product_catalog
WHERE CAST(bigfunctions.us.xml_extract(xml_description, '/product/price')[OFFSET(0)] AS BIGNUMERIC) BETWEEN 50 AND 100;
This query extracts the price, casts it to a numeric type (important!), and then filters the results. The [OFFSET(0)]
is used since xml_extract
returns an array, even for single elements.
Use Case 3: Checking for a Specific Feature
You want to find all products that have the "Waterproof" feature.
SELECT
product_id
FROM
product_catalog
WHERE 'Waterproof' IN (
SELECT feature FROM UNNEST(bigfunctions.us.xml_extract(xml_description, '/product/features/feature')) AS feature
);
This query uses UNNEST
to turn the array of features into individual rows and then filters based on the presence of "Waterproof".
These are just a few examples. The key takeaway is that xml_extract
allows you to query and analyze data embedded within XML structures stored in your BigQuery tables without needing complex string manipulation or external tools. This makes working with XML data in BigQuery significantly easier. Remember to replace bigfunctions.us
with the appropriate dataset for your BigQuery region.
Need help or Found a bug?
Get help using xml_extract
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about xml_extract
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.