Skip to content

roc_curve

roc_curve(predictions)

Description

Returns the Receiver Operating Characteristic Curve (a.k.a. ROC Curve) given a set of predicted scores and ground truth labels

Usage

Call or Deploy roc_curve ?
Call roc_curve directly

The easiest way to use bigfunctions

  • roc_curve function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy roc_curve in your project

Why deploy?

  • You may prefer to deploy roc_curve in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

roc_curve function can be deployed with:

pip install bigfunctions
bigfun get roc_curve
bigfun deploy roc_curve

Examples

select * from bigfunctions.eu.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.us.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.europe_west1.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])

+---------------------+--------------------+
| false_positive_rate | true_positive_rate |
+---------------------+--------------------+
|         0.0         |         0.0        |
|         0.0         |         0.5        |
|         0.5         |         0.5        |
|         0.5         |         1.0        |
|         1.0         |         1.0        |
+---------------------+--------------------+


Use cases

You're evaluating a new machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with predicted churn probabilities (output of your model) and the actual churn outcomes (true or false) for a set of customers. You want to assess the performance of your model across different probability thresholds. The ROC curve is a perfect tool for this.

Here's how you would use the roc_curve BigQuery function in this scenario:

#standardSQL
WITH churn_predictions AS (
    SELECT
        customer_id,
        predicted_churn_probability,
        IF(churned, TRUE, FALSE) AS actual_churned
    FROM
        `your_project.your_dataset.customer_churn_data`
)

SELECT *
FROM bigfunctions.your_region.roc_curve(
    ARRAY_AGG(
        STRUCT(predicted_churn_probability, actual_churned)
    )
) AS roc;

Explanation:

  1. churn_predictions CTE: This selects the customer ID, the predicted churn probability from your model, and the actual churn outcome. The IF statement converts the churned column (presumably an integer or string) into a boolean TRUE or FALSE as required by the roc_curve function.

  2. ARRAY_AGG: This aggregates the predicted probability and actual churn outcome into an array of structs, which is the expected input format for the roc_curve function.

  3. bigfunctions.your_region.roc_curve(...): This calls the roc_curve function with the array of structs. Remember to replace your_region with the appropriate BigQuery region (e.g., us, eu, us-central1).

  4. AS roc: This assigns the output of the function to a table alias roc.

Result and Interpretation:

The query will return a table with two columns: false_positive_rate and true_positive_rate. These represent the coordinates of the ROC curve. By plotting these points, you can visualize the trade-off between the model's sensitivity (true positive rate) and its specificity (1 - false positive rate) at various threshold settings. A higher area under the ROC curve (AUC) indicates better model performance.

This example demonstrates how roc_curve can be practically used to evaluate the performance of a binary classification model in a real-world business scenario. You could then use this information to choose an appropriate threshold for your model based on the desired balance between correctly identifying churned customers and minimizing false alarms.


Need help or Found a bug?
Get help using roc_curve

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about roc_curve

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on