Skip to content

bigfunctions > roc_auc

roc_auc

Call or Deploy roc_auc ?

✅ You can call this roc_auc bigfunction directly from your Google Cloud Project (no install required).

  • This roc_auc function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions --> Read Getting Started. This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • For any question or difficulties, please read Getting Started.
  • Found a bug? Please raise an issue here

Public BigFunctions Datasets are like:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

roc_auc(predictions)

Description

Returns the Area Under the Receiver Operating Characteristic Curve (a.k.a. ROC AUC) given a set of predicted scores and ground truth labels using the trapezoidal rule

Examples

1. Random classifier

select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 0.5     |
+---------+

2. Good classifier

select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 1.0     |
+---------+

3. Bad classifier

select bigfunctions.eu.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.roc_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score < 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+---------+
| roc_auc |
+---------+
| 0.0     |
+---------+

Use cases

Let's say you're building a machine learning model in BigQuery to predict customer churn for a subscription service. You've trained your model and it outputs a predicted_score between 0 and 1 for each customer, where higher scores indicate a higher probability of churn. You also have the ground truth labels indicating whether each customer actually churned (true) or not (false).

You can use the roc_auc function to evaluate the performance of your churn prediction model. Here's how:

SELECT bigfunctions.us.roc_auc(
    (
        SELECT
            ARRAY_AGG(STRUCT(predicted_score, churned))
        FROM `your_project.your_dataset.your_predictions_table`
    )
);
  • your_project.your_dataset.your_predictions_table: This table contains your model's predictions and the actual churn outcomes. It should have at least two columns: predicted_score (FLOAT64) and churned (BOOL).
  • ARRAY_AGG(STRUCT(predicted_score, churned)): This gathers all the predictions and labels into an array of structs, which is the required input format for the roc_auc function.
  • bigfunctions.us.roc_auc(...): This calls the roc_auc function in the us region (replace with your appropriate region) with the array of structs.

The query will return a single value representing the ROC AUC. This value will be between 0 and 1. A higher ROC AUC indicates a better performing model:

  • ROC AUC = 1: Perfect classifier.
  • ROC AUC = 0.5: No better than random guessing.
  • ROC AUC = 0: The classifier is always wrong (predicting positive when it's negative, and vice versa).

By calculating the ROC AUC, you can quantify how well your churn prediction model distinguishes between customers who will churn and those who won't. This allows you to compare different models, tune hyperparameters, and ultimately select the best model for deployment.