get_webpage_data¶
get_webpage_data(prompt, url)
Description¶
Extract data
from url
using prompt
(using scrapegraph-ai python library)
Usage¶
Call or Deploy get_webpage_data
?
Call get_webpage_data
directly
The easiest way to use bigfunctions
get_webpage_data
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy get_webpage_data
in your project
Why deploy?
- You may prefer to deploy
get_webpage_data
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
get_webpage_data
function can be deployed with:
pip install bigfunctions
bigfun get get_webpage_data
bigfun deploy get_webpage_data
Requirements
get_webpage_data
uses the following secrets. Get them by reading the documentation link and store them in Google Secret Manager in the project where you deploy the function (and give Accessor role to the service account of the function):
name | description | documentation to get the secret |
---|---|---|
gemini_api_key |
Gemini Api Key | doc |
Examples¶
select bigfunctions.eu.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, "https://unytics.io/bigfunctions/bigfunctions/")
select bigfunctions.us.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, "https://unytics.io/bigfunctions/bigfunctions/")
select bigfunctions.europe_west1.get_webpage_data('''
Return the list of bigfunctions in the category "get data".
Result must be a dict with the name of the bigfunction as key and its description as value.
Do not include arguments in the name.
'''
, "https://unytics.io/bigfunctions/bigfunctions/")
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
"exchange_rate": "Get `exchange_rate`",
"faker": "Generates fake data",
"get": "Request `url`",
"get_appstore_reviews": "GET Apple App Store Reviews of an app",
...
}
|
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Use cases¶
The provided function get_webpage_data(prompt, url)
allows you to extract specific data from a webpage using a natural language prompt. Here are a few use cases:
- Competitive Analysis: You could extract pricing information from competitor websites. For example:
SELECT bigfunctions.us.get_webpage_data(
'Extract the price of the "Product X" from the product page.',
'https://competitorwebsite.com/product-x'
);
- Market Research: Extract product descriptions and customer reviews from e-commerce sites to understand market trends and customer sentiment:
SELECT bigfunctions.us.get_webpage_data(
'Extract all customer reviews and ratings for "Product Y".',
'https://ecommercewebsite.com/product-y'
);
- Lead Generation: Extract contact information from business directories or websites:
SELECT bigfunctions.us.get_webpage_data(
'Extract the email address and phone number from the contact us page.',
'https://targetcompany.com/contact-us'
);
- Content Aggregation: Pull news headlines and summaries from various news websites to create a consolidated news feed:
SELECT bigfunctions.us.get_webpage_data(
'Extract the headline and summary of the top 3 news articles on the homepage.',
'https://newswebsite.com'
);
- Real Estate Data Analysis: Extract property details like price, square footage, and number of bedrooms from real estate listings:
SELECT bigfunctions.us.get_webpage_data(
'Extract the price, square footage, number of bedrooms, and address of the property.',
'https://realestatewebsite.com/property-listing-123'
);
-
Monitoring Website Changes: Track changes in product availability or pricing on a specific webpage by periodically calling the function with the same prompt and URL.
-
Extracting Data from Tables within Web Pages: The function can be used to parse HTML tables and extract structured data. For instance, it can extract financial data from tables on a company's investor relations page.
The key advantage is the use of natural language prompts, making it easier to specify what data you need without needing to write complex web scraping code or understand the underlying HTML structure. However, the accuracy and reliability depend heavily on the clarity and specificity of the prompt and the complexity of the target website's structure. It's essential to test and refine prompts for optimal results.
Need help or Found a bug?
Get help using get_webpage_data
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about get_webpage_data
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.