bigfunctions > get_webpage_data
get_webpage_data¶
Call or Deploy get_webpage_data
?
✅ You can call this get_webpage_data
bigfunction directly from your Google Cloud Project (no install required).
- This
get_webpage_data
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
get_webpage_data(prompt, url)
Description
Extract data
from url
using prompt
(using scrapegraph-ai python library)
Examples¶
select bigfunctions.eu.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, 'https://unytics.io/bigfunctions/bigfunctions/')
select bigfunctions.us.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, 'https://unytics.io/bigfunctions/bigfunctions/')
select bigfunctions.europe_west1.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, 'https://unytics.io/bigfunctions/bigfunctions/')
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"exchange_rate": "Get `exchange_rate`",
"faker": "Generates fake data",
"get": "Request `url`",
"get_appstore_reviews": "GET Apple App Store Reviews of an app",
...
}
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Need help using get_webpage_data
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using get_webpage_data
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
The provided function get_webpage_data(prompt, url)
allows you to extract specific data from a webpage using a natural language prompt. Here are a few use cases:
- Competitive Analysis: You could extract pricing information from competitor websites. For example:
SELECT bigfunctions.us.get_webpage_data(
'Extract the price of the "Product X" from the product page.',
'https://competitorwebsite.com/product-x'
);
- Market Research: Extract product descriptions and customer reviews from e-commerce sites to understand market trends and customer sentiment:
SELECT bigfunctions.us.get_webpage_data(
'Extract all customer reviews and ratings for "Product Y".',
'https://ecommercewebsite.com/product-y'
);
- Lead Generation: Extract contact information from business directories or websites:
SELECT bigfunctions.us.get_webpage_data(
'Extract the email address and phone number from the contact us page.',
'https://targetcompany.com/contact-us'
);
- Content Aggregation: Pull news headlines and summaries from various news websites to create a consolidated news feed:
SELECT bigfunctions.us.get_webpage_data(
'Extract the headline and summary of the top 3 news articles on the homepage.',
'https://newswebsite.com'
);
- Real Estate Data Analysis: Extract property details like price, square footage, and number of bedrooms from real estate listings:
SELECT bigfunctions.us.get_webpage_data(
'Extract the price, square footage, number of bedrooms, and address of the property.',
'https://realestatewebsite.com/property-listing-123'
);
-
Monitoring Website Changes: Track changes in product availability or pricing on a specific webpage by periodically calling the function with the same prompt and URL.
-
Extracting Data from Tables within Web Pages: The function can be used to parse HTML tables and extract structured data. For instance, it can extract financial data from tables on a company's investor relations page.
The key advantage is the use of natural language prompts, making it easier to specify what data you need without needing to write complex web scraping code or understand the underlying HTML structure. However, the accuracy and reliability depend heavily on the clarity and specificity of the prompt and the complexity of the target website's structure. It's essential to test and refine prompts for optimal results.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!