Skip to content

bigfunctions > roc_curve

roc_curve

Call or Deploy roc_curve ?

✅ You can call this roc_curve bigfunction directly from your Google Cloud Project (no install required).

  • This roc_curve function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

roc_curve(predictions)

Description

Returns the Receiver Operating Characteristic Curve (a.k.a. ROC Curve) given a set of predicted scores and ground truth labels

Examples

select * from bigfunctions.eu.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.us.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.europe_west1.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])

+---------------------+--------------------+
| false_positive_rate | true_positive_rate |
+---------------------+--------------------+
|         0.0         |         0.0        |
|         0.0         |         0.5        |
|         0.5         |         0.5        |
|         0.5         |         1.0        |
|         1.0         |         1.0        |
+---------------------+--------------------+


Need help using roc_curve?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using roc_curve?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

You're evaluating a new machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with predicted churn probabilities (output of your model) and the actual churn outcomes (true or false) for a set of customers. You want to assess the performance of your model across different probability thresholds. The ROC curve is a perfect tool for this.

Here's how you would use the roc_curve BigQuery function in this scenario:

#standardSQL
WITH churn_predictions AS (
    SELECT
        customer_id,
        predicted_churn_probability,
        IF(churned, TRUE, FALSE) AS actual_churned
    FROM
        `your_project.your_dataset.customer_churn_data`
)

SELECT *
FROM bigfunctions.your_region.roc_curve(
    ARRAY_AGG(
        STRUCT(predicted_churn_probability, actual_churned)
    )
) AS roc;

Explanation:

  1. churn_predictions CTE: This selects the customer ID, the predicted churn probability from your model, and the actual churn outcome. The IF statement converts the churned column (presumably an integer or string) into a boolean TRUE or FALSE as required by the roc_curve function.

  2. ARRAY_AGG: This aggregates the predicted probability and actual churn outcome into an array of structs, which is the expected input format for the roc_curve function.

  3. bigfunctions.your_region.roc_curve(...): This calls the roc_curve function with the array of structs. Remember to replace your_region with the appropriate BigQuery region (e.g., us, eu, us-central1).

  4. AS roc: This assigns the output of the function to a table alias roc.

Result and Interpretation:

The query will return a table with two columns: false_positive_rate and true_positive_rate. These represent the coordinates of the ROC curve. By plotting these points, you can visualize the trade-off between the model's sensitivity (true positive rate) and its specificity (1 - false positive rate) at various threshold settings. A higher area under the ROC curve (AUC) indicates better model performance.

This example demonstrates how roc_curve can be practically used to evaluate the performance of a binary classification model in a real-world business scenario. You could then use this information to choose an appropriate threshold for your model based on the desired balance between correctly identifying churned customers and minimizing false alarms.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on