Skip to content

bigfunctions > get_webpage_data

get_webpage_data

Call or Deploy get_webpage_data ?

✅ You can call this get_webpage_data bigfunction directly from your Google Cloud Project (no install required).

  • This get_webpage_data function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

get_webpage_data(prompt, url)

Description

Extract data from url using prompt (using scrapegraph-ai python library)

Examples

select bigfunctions.eu.get_webpage_data('''
  Return the list of bigfunctions in the category "get data".

  Result must be a dict with the name of the bigfunction as key and its description as value.
  Do not include arguments in the name.
  '''
  , 'https://unytics.io/bigfunctions/bigfunctions/')
select bigfunctions.us.get_webpage_data('''
  Return the list of bigfunctions in the category "get data".

  Result must be a dict with the name of the bigfunction as key and its description as value.
  Do not include arguments in the name.
  '''
  , 'https://unytics.io/bigfunctions/bigfunctions/')
select bigfunctions.europe_west1.get_webpage_data('''
  Return the list of bigfunctions in the category "get data".

  Result must be a dict with the name of the bigfunction as key and its description as value.
  Do not include arguments in the name.
  '''
  , 'https://unytics.io/bigfunctions/bigfunctions/')
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "exchange_rate": "Get `exchange_rate`",
  "faker": "Generates fake data",
  "get": "Request `url`",
  "get_appstore_reviews": "GET Apple App Store Reviews of an app",
  ...
}
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Need help using get_webpage_data?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using get_webpage_data?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

The provided function get_webpage_data(prompt, url) allows you to extract specific data from a webpage using a natural language prompt. Here are a few use cases:

  • Competitive Analysis: You could extract pricing information from competitor websites. For example:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the price of the "Product X" from the product page.',
    'https://competitorwebsite.com/product-x'
);
  • Market Research: Extract product descriptions and customer reviews from e-commerce sites to understand market trends and customer sentiment:
SELECT bigfunctions.us.get_webpage_data(
    'Extract all customer reviews and ratings for "Product Y".',
    'https://ecommercewebsite.com/product-y'
);
  • Lead Generation: Extract contact information from business directories or websites:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the email address and phone number from the contact us page.',
    'https://targetcompany.com/contact-us'
);
  • Content Aggregation: Pull news headlines and summaries from various news websites to create a consolidated news feed:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the headline and summary of the top 3 news articles on the homepage.',
    'https://newswebsite.com'
);
  • Real Estate Data Analysis: Extract property details like price, square footage, and number of bedrooms from real estate listings:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the price, square footage, number of bedrooms, and address of the property.',
    'https://realestatewebsite.com/property-listing-123'
);
  • Monitoring Website Changes: Track changes in product availability or pricing on a specific webpage by periodically calling the function with the same prompt and URL.

  • Extracting Data from Tables within Web Pages: The function can be used to parse HTML tables and extract structured data. For instance, it can extract financial data from tables on a company's investor relations page.

The key advantage is the use of natural language prompts, making it easier to specify what data you need without needing to write complex web scraping code or understand the underlying HTML structure. However, the accuracy and reliability depend heavily on the clarity and specificity of the prompt and the complexity of the target website's structure. It's essential to test and refine prompts for optimal results.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on