Skip to content

bigfunctions > rare_values

rare_values

Call or Deploy rare_values ?

✅ You can call this rare_values bigfunction directly from your Google Cloud Project (no install required).

  • This rare_values function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

rare_values(values, frequency_threshold)

Description

Returns rare_values among array of values

This function computes the frequency of each value in values array and returns the values which frequency is stricly below the given frequency_threshold.

By returning rare values, this function can be used for anomaly detection in a variety of use cases.

Examples

Detect rare strings in an array of strings with a frequency_threshold of 0.2. cherry appears once for an array of 6 elements so its frequency is 1 / 6 ~= 0.167 < 0.2. It is a rare string compared to the frequency_threshold.

select bigfunctions.eu.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
select bigfunctions.us.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
select bigfunctions.europe_west1.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
+-------------+
| rare_values |
+-------------+
| ["cherry"]  |
+-------------+

Need help using rare_values?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using rare_values?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

Let's say you have a dataset of e-commerce transactions and you want to identify potentially fraudulent orders based on unusual shipping addresses. You could use the rare_values function to find addresses that appear infrequently.

Scenario:

You have a table orders with a column shipping_city. Most orders are shipped to common cities, but fraudulent orders might be shipped to less common locations.

Query:

SELECT
    shipping_city
  FROM
    `your-project.your_dataset.orders`
  WHERE
    shipping_city IN (
      SELECT
          *
        FROM
          UNARRAY(
            bigfunctions.us.rare_values(
              (
                SELECT
                    ARRAY_AGG(shipping_city)
                  FROM
                    `your-project.your_dataset.orders`
              ),
              0.01
            )
          )
    )

Explanation:

  1. SELECT ARRAY_AGG(shipping_city) FROM your-project.your_dataset.orders: This subquery aggregates all the shipping_city values into a single array.
  2. bigfunctions.us.rare_values(... , 0.01): This calls the rare_values function with the array of cities and a frequency_threshold of 0.01. This means any city that appears in less than 1% of the orders will be considered "rare".
  3. SELECT * FROM UNARRAY(...): This unnests the array of rare values returned by rare_values into individual rows.
  4. WHERE shipping_city IN (...): This filters the original orders table to only include rows where the shipping_city is present in the list of rare cities.

Result:

The query will return a list of shipping_city values that are considered rare based on the defined threshold. You can then further investigate these orders to determine if they are potentially fraudulent.

Other Use Cases:

  • Product Anomaly Detection: Identify rarely purchased products, which could indicate data entry errors, discontinued items, or sudden changes in demand.
  • User Behavior Analysis: Find users with uncommon activity patterns, which could be a sign of bots or malicious actors.
  • Error Detection in Logs: Identify rare error messages in system logs, which might point to new or infrequent bugs.
  • Spam Detection: Find rare words or phrases used in emails or messages, which could indicate spam or phishing attempts.

By adjusting the frequency_threshold, you can fine-tune the sensitivity of the rare value detection to suit your specific needs.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on