bigfunctions > roc_auc
roc_auc¶
Call or Deploy roc_auc
?
✅ You can call this roc_auc
bigfunction directly from your Google Cloud Project (no install required).
- This
roc_auc
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
- For any question or difficulties, please read Getting Started.
- Found a bug? Please raise an issue here
Public BigFunctions Datasets are like:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
roc_auc(predictions)
Description
Returns the Area Under the Receiver Operating Characteristic Curve (a.k.a. ROC AUC) given a set of predicted scores and ground truth labels using the trapezoidal rule
Examples¶
1. Random classifier
select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 0.5 |
+---------+
2. Good classifier
select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 1.0 |
+---------+
3. Bad classifier
select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 0.0 |
+---------+
Use cases¶
Let's say you're building a machine learning model in BigQuery to predict customer churn for a subscription service. You've trained your model and it outputs a predicted_score
between 0 and 1 for each customer, where higher scores indicate a higher probability of churn. You also have the ground truth labels indicating whether each customer actually churned (true
) or not (false
).
You can use the roc_auc
function to evaluate the performance of your churn prediction model. Here's how:
SELECT bigfunctions.us.roc_auc(
(
SELECT
ARRAY_AGG(STRUCT(predicted_score, churned))
FROM `your_project.your_dataset.your_predictions_table`
)
);
your_project.your_dataset.your_predictions_table
: This table contains your model's predictions and the actual churn outcomes. It should have at least two columns:predicted_score
(FLOAT64) andchurned
(BOOL).ARRAY_AGG(STRUCT(predicted_score, churned))
: This gathers all the predictions and labels into an array of structs, which is the required input format for theroc_auc
function.bigfunctions.us.roc_auc(...)
: This calls theroc_auc
function in theus
region (replace with your appropriate region) with the array of structs.
The query will return a single value representing the ROC AUC. This value will be between 0 and 1. A higher ROC AUC indicates a better performing model:
- ROC AUC = 1: Perfect classifier.
- ROC AUC = 0.5: No better than random guessing.
- ROC AUC = 0: The classifier is always wrong (predicting positive when it's negative, and vice versa).
By calculating the ROC AUC, you can quantify how well your churn prediction model distinguishes between customers who will churn and those who won't. This allows you to compare different models, tune hyperparameters, and ultimately select the best model for deployment.