get_webpage_metadata¶
get_webpage_metadata(url)
Description¶
Get webpage metadata (using metadata_parser python library)
Usage¶
Call or Deploy get_webpage_metadata
?
Call get_webpage_metadata
directly
The easiest way to use bigfunctions
get_webpage_metadata
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy get_webpage_metadata
in your project
Why deploy?
- You may prefer to deploy
get_webpage_metadata
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
get_webpage_metadata
function can be deployed with:
pip install bigfunctions
bigfun get get_webpage_metadata
bigfun deploy get_webpage_metadata
Examples¶
select bigfunctions.eu.get_webpage_metadata("https://apps.apple.com/fr/app/nickel-compte-pour-tous/id1119225763")
select bigfunctions.us.get_webpage_metadata("https://apps.apple.com/fr/app/nickel-compte-pour-tous/id1119225763")
select bigfunctions.europe_west1.get_webpage_metadata("https://apps.apple.com/fr/app/nickel-compte-pour-tous/id1119225763")
+----------+
| metadata |
+----------+
| {...} |
+----------+
Use cases¶
You could use this function in BigQuery to analyze a dataset of URLs and extract metadata from each URL. Here are a few concrete use cases:
-
SEO Analysis: Imagine you have a table of competitor websites. You could use
get_webpage_metadata
to extract title tags, descriptions, and other metadata to understand their SEO strategies and identify opportunities. You could analyze trends in keywords used in titles and descriptions. -
Content Auditing: For a large website, you might have a table of all your pages. This function could help you audit your content by extracting metadata and looking for missing or inconsistent information, like missing title tags or descriptions that are too short.
-
Social Media Analysis: If you have a table of URLs shared on social media, you could use this function to understand the type of content being shared. Extracting titles and descriptions can give you insights into the topics and themes that resonate with your audience.
-
Data Enrichment: Suppose you have a table of news articles with only URLs. You can enrich this data by extracting metadata such as the publisher, publication date, and author, if available, using this function.
-
Classifying Web Pages: Based on the extracted metadata like title and description, you can train a machine learning model to categorize web pages into different topics or industries.
Here's a simplified example in BigQuery (assuming your dataset is in the us
region and your table is named urls
with a column named url
):
SELECT
url,
bigfunctions.us.get_webpage_metadata(url) AS metadata
FROM
`your_project.your_dataset.urls`;
This query would add a new column called metadata
to your table, containing the extracted metadata for each URL. You could then further process this JSON metadata within BigQuery to extract specific fields. For instance, to extract the title:
SELECT
url,
JSON_EXTRACT_SCALAR(bigfunctions.us.get_webpage_metadata(url), '$.title') AS title
FROM
`your_project.your_dataset.urls`;
Need help or Found a bug?
Get help using get_webpage_metadata
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about get_webpage_metadata
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.