Skip to content

xml_extract

xml_extract(xml, x_path)

Description

Returns content extracted from XML from given XPATH

Usage

Call or Deploy xml_extract ?
Call xml_extract directly

The easiest way to use bigfunctions

  • xml_extract function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy xml_extract in your project

Why deploy?

  • You may prefer to deploy xml_extract in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

xml_extract function can be deployed with:

pip install bigfunctions
bigfun get xml_extract
bigfun deploy xml_extract

Examples

1. Only one element for the xpath

select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
+-----------------+
| extracted_value |
+-----------------+
| ["John Doe"]    |
+-----------------+

2. Multiple elements for the xpath

select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003cname\u003eJane Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/name")
+--------------------------+
| extracted_value          |
+--------------------------+
| ["John Doe", "Jane Doe"] |
+--------------------------+

3. Incorrect xpath

select bigfunctions.eu.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
select bigfunctions.us.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
select bigfunctions.europe_west1.xml_extract("\u003ccustomer\u003e\u003cname\u003eJohn Doe\u003c/name\u003e\u003c/customer\u003e", "/customer/na")
+-----------------+
| extracted_value |
+-----------------+
| null            |
+-----------------+

Use cases

Let's say you have a BigQuery table called product_catalog that stores product information, including an XML description field. The XML data might look like this:

<product>
  <name>Awesome Gadget</name>
  <features>
    <feature>Long battery life</feature>
    <feature>Waterproof</feature>
  </features>
  <price currency="USD">99.99</price>
</product>

Use Case 1: Extracting Feature List

You want to analyze the most common product features. You can use xml_extract to pull out all the features into an array:

SELECT
    product_id,
    bigfunctions.us.xml_extract(xml_description, '/product/features/feature') AS features
  FROM
    product_catalog;

This query would return a table with product_id and a features column containing an array of strings, like ["Long battery life", "Waterproof"]. You can then unnest this array for further analysis.

Use Case 2: Finding Products within a Price Range

You want to find all products priced between $50 and $100. You can use xml_extract to extract the price and then filter based on its value:

SELECT
    product_id,
    CAST(bigfunctions.us.xml_extract(xml_description, '/product/price')[OFFSET(0)] AS BIGNUMERIC) AS price
  FROM
    product_catalog
  WHERE CAST(bigfunctions.us.xml_extract(xml_description, '/product/price')[OFFSET(0)] AS BIGNUMERIC) BETWEEN 50 AND 100;

This query extracts the price, casts it to a numeric type (important!), and then filters the results. The [OFFSET(0)] is used since xml_extract returns an array, even for single elements.

Use Case 3: Checking for a Specific Feature

You want to find all products that have the "Waterproof" feature.

SELECT
    product_id
  FROM
    product_catalog
  WHERE 'Waterproof' IN (
    SELECT feature FROM UNNEST(bigfunctions.us.xml_extract(xml_description, '/product/features/feature')) AS feature
  );

This query uses UNNEST to turn the array of features into individual rows and then filters based on the presence of "Waterproof".

These are just a few examples. The key takeaway is that xml_extract allows you to query and analyze data embedded within XML structures stored in your BigQuery tables without needing complex string manipulation or external tools. This makes working with XML data in BigQuery significantly easier. Remember to replace bigfunctions.us with the appropriate dataset for your BigQuery region.


Need help or Found a bug?
Get help using xml_extract

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about xml_extract

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on