Skip to content

parse_url

parse_url(url)

Description

Return url parts (inspired from sql-snippets repo)

Usage

Call or Deploy parse_url ?
Call parse_url directly

The easiest way to use bigfunctions

  • parse_url function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy parse_url in your project

Why deploy?

  • You may prefer to deploy parse_url in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

parse_url function can be deployed with:

pip install bigfunctions
bigfun get parse_url
bigfun deploy parse_url

Examples

select bigfunctions.eu.parse_url("https://www.yoursite.com/pricing/details?myparam1=123\u0026myparam2=abc#newsfeed")
select bigfunctions.us.parse_url("https://www.yoursite.com/pricing/details?myparam1=123\u0026myparam2=abc#newsfeed")
select bigfunctions.europe_west1.parse_url("https://www.yoursite.com/pricing/details?myparam1=123\u0026myparam2=abc#newsfeed")
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| url_parts                                                                                                                                            |
+------------------------------------------------------------------------------------------------------------------------------------------------------+
| struct<'www.yoursite.com' as host, 'pricing/details' as path, 'myparam1=123&myparam2=abc#newsfeed' as query, 'newsfeed' as ref, 'https' as protocol> |
+------------------------------------------------------------------------------------------------------------------------------------------------------+

Use cases

You could use the parse_url function to analyze website traffic logs stored in BigQuery. Imagine you have a table with a column named request_url containing full URLs of pages visited. You want to understand which parts of your website are most popular, which campaigns (identified through URL parameters) are driving traffic, or which sections are accessed most frequently by users from specific referring domains.

Here's a practical example:

SELECT
    parsed_url.host,
    parsed_url.path,
    REGEXP_EXTRACT(parsed_url.query, r'utm_campaign=([^&]*)') AS utm_campaign,
    REGEXP_EXTRACT(parsed_url.ref, r'//([^/]*)') AS referring_domain,
    COUNT(*) AS page_views
  FROM
    `your_project.your_dataset.your_table`,
    UNNEST([bigfunctions.your_region.parse_url(request_url)]) AS parsed_url
  GROUP BY 1, 2, 3, 4
  ORDER BY page_views DESC;

Explanation:

  1. your_project.your_dataset.your_table: Replace this with the actual location of your website traffic log table in BigQuery.
  2. bigfunctions.your_region.parse_url(request_url): This calls the parse_url function (make sure to replace your_region with your BigQuery region) on the request_url column, breaking it down into its components. The result is an array containing a struct.
  3. UNNEST(...) AS parsed_url: This unnests the resulting array so that you can access individual fields of the URL parts struct.
  4. parsed_url.host, parsed_url.path, etc.: These access the individual components of the URL, like host, path, query string, and referring domain.
  5. REGEXP_EXTRACT(...): These functions extract specific parameters from the query string and referring domain. In this example, it's extracting the utm_campaign parameter (often used for tracking marketing campaigns) and the main domain from the referrer. You can adapt these regular expressions to extract other parameters you're interested in.
  6. COUNT(*) AS page_views: This counts the number of times each combination of host, path, campaign, and referring domain appears, representing the number of page views.
  7. GROUP BY 1, 2, 3, 4: This groups the results by the extracted fields.
  8. ORDER BY page_views DESC: This sorts the results to show the most viewed pages first.

This query gives you valuable insights into user behavior on your website, allowing you to identify popular content, track marketing campaign effectiveness, and understand referral traffic patterns. You could further refine this by adding filters based on date ranges, user segments, or other criteria relevant to your analysis.


Need help or Found a bug?
Get help using parse_url

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about parse_url

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on