Skip to content

get_webpage_data

get_webpage_data(prompt, url)

Description

Extract data from url using prompt (using scrapegraph-ai python library)

Usage

Call or Deploy get_webpage_data ?
Call get_webpage_data directly

The easiest way to use bigfunctions

  • get_webpage_data function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy get_webpage_data in your project

Why deploy?

  • You may prefer to deploy get_webpage_data in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

get_webpage_data function can be deployed with:

pip install bigfunctions
bigfun get get_webpage_data
bigfun deploy get_webpage_data

Requirements

get_webpage_data uses the following secrets. Get them by reading the documentation link and store them in Google Secret Manager in the project where you deploy the function (and give Accessor role to the service account of the function):

name description documentation to get the secret
gemini_api_key Gemini Api Key doc

Examples

select bigfunctions.eu.get_webpage_data('''
      Return the list of bigfunctions in the category "get data".

      Result must be a dict with the name of the bigfunction as key and its description as value.
      Do not include arguments in the name.
      '''
      , "https://unytics.io/bigfunctions/bigfunctions/")
select bigfunctions.us.get_webpage_data('''
      Return the list of bigfunctions in the category "get data".

      Result must be a dict with the name of the bigfunction as key and its description as value.
      Do not include arguments in the name.
      '''
      , "https://unytics.io/bigfunctions/bigfunctions/")
select bigfunctions.europe_west1.get_webpage_data('''
      Return the list of bigfunctions in the category "get data".

      Result must be a dict with the name of the bigfunction as key and its description as value.
      Do not include arguments in the name.
      '''
      , "https://unytics.io/bigfunctions/bigfunctions/")
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| data                                                                                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| {
  "exchange_rate": "Get `exchange_rate`",
  "faker": "Generates fake data",
  "get": "Request `url`",
  "get_appstore_reviews": "GET Apple App Store Reviews of an app",
  ...
}
 |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Use cases

The provided function get_webpage_data(prompt, url) allows you to extract specific data from a webpage using a natural language prompt. Here are a few use cases:

  • Competitive Analysis: You could extract pricing information from competitor websites. For example:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the price of the "Product X" from the product page.',
    'https://competitorwebsite.com/product-x'
);
  • Market Research: Extract product descriptions and customer reviews from e-commerce sites to understand market trends and customer sentiment:
SELECT bigfunctions.us.get_webpage_data(
    'Extract all customer reviews and ratings for "Product Y".',
    'https://ecommercewebsite.com/product-y'
);
  • Lead Generation: Extract contact information from business directories or websites:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the email address and phone number from the contact us page.',
    'https://targetcompany.com/contact-us'
);
  • Content Aggregation: Pull news headlines and summaries from various news websites to create a consolidated news feed:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the headline and summary of the top 3 news articles on the homepage.',
    'https://newswebsite.com'
);
  • Real Estate Data Analysis: Extract property details like price, square footage, and number of bedrooms from real estate listings:
SELECT bigfunctions.us.get_webpage_data(
    'Extract the price, square footage, number of bedrooms, and address of the property.',
    'https://realestatewebsite.com/property-listing-123'
);
  • Monitoring Website Changes: Track changes in product availability or pricing on a specific webpage by periodically calling the function with the same prompt and URL.

  • Extracting Data from Tables within Web Pages: The function can be used to parse HTML tables and extract structured data. For instance, it can extract financial data from tables on a company's investor relations page.

The key advantage is the use of natural language prompts, making it easier to specify what data you need without needing to write complex web scraping code or understand the underlying HTML structure. However, the accuracy and reliability depend heavily on the clarity and specificity of the prompt and the complexity of the target website's structure. It's essential to test and refine prompts for optimal results.


Need help or Found a bug?
Get help using get_webpage_data

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about get_webpage_data

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on