bigfunctions > rare_values
rare_values¶
Call or Deploy rare_values
?
✅ You can call this rare_values
bigfunction directly from your Google Cloud Project (no install required).
- This
rare_values
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
rare_values(values, frequency_threshold)
Description
Returns rare_values
among array of values
This function computes the frequency of each value in values
array and returns the values which frequency is stricly below the given frequency_threshold
.
By returning rare values, this function can be used for anomaly detection in a variety of use cases.
Examples¶
Detect rare strings in an array of strings with a frequency_threshold
of 0.2.
cherry
appears once for an array of 6 elements so its frequency
is 1 / 6 ~= 0.167 < 0.2.
It is a rare string compared to the frequency_threshold
.
select bigfunctions.eu.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
select bigfunctions.us.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
select bigfunctions.europe_west1.rare_values(["apple", "apple", "banana", "banana", "banana", "cherry"], 0.2)
+-------------+
| rare_values |
+-------------+
| ["cherry"] |
+-------------+
Need help using rare_values
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using rare_values
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
Let's say you have a dataset of e-commerce transactions and you want to identify potentially fraudulent orders based on unusual shipping addresses. You could use the rare_values
function to find addresses that appear infrequently.
Scenario:
You have a table orders
with a column shipping_city
. Most orders are shipped to common cities, but fraudulent orders might be shipped to less common locations.
Query:
SELECT
shipping_city
FROM
`your-project.your_dataset.orders`
WHERE
shipping_city IN (
SELECT
*
FROM
UNARRAY(
bigfunctions.us.rare_values(
(
SELECT
ARRAY_AGG(shipping_city)
FROM
`your-project.your_dataset.orders`
),
0.01
)
)
)
Explanation:
SELECT ARRAY_AGG(shipping_city) FROM your-project.your_dataset.orders
: This subquery aggregates all theshipping_city
values into a single array.bigfunctions.us.rare_values(... , 0.01)
: This calls therare_values
function with the array of cities and afrequency_threshold
of 0.01. This means any city that appears in less than 1% of the orders will be considered "rare".SELECT * FROM UNARRAY(...)
: This unnests the array of rare values returned byrare_values
into individual rows.WHERE shipping_city IN (...)
: This filters the originalorders
table to only include rows where theshipping_city
is present in the list of rare cities.
Result:
The query will return a list of shipping_city
values that are considered rare based on the defined threshold. You can then further investigate these orders to determine if they are potentially fraudulent.
Other Use Cases:
- Product Anomaly Detection: Identify rarely purchased products, which could indicate data entry errors, discontinued items, or sudden changes in demand.
- User Behavior Analysis: Find users with uncommon activity patterns, which could be a sign of bots or malicious actors.
- Error Detection in Logs: Identify rare error messages in system logs, which might point to new or infrequent bugs.
- Spam Detection: Find rare words or phrases used in emails or messages, which could indicate spam or phishing attempts.
By adjusting the frequency_threshold
, you can fine-tune the sensitivity of the rare value detection to suit your specific needs.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!