bigfunctions > roc_curve
roc_curve¶
Call or Deploy roc_curve
?
✅ You can call this roc_curve
bigfunction directly from your Google Cloud Project (no install required).
- This
roc_curve
function is deployed inbigfunctions
GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error). - Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework
Public BigFunctions Datasets:
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Description¶
Signature
roc_curve(predictions)
Description
Returns the Receiver Operating Characteristic Curve (a.k.a. ROC Curve) given a set of predicted scores and ground truth labels
Examples¶
select * from bigfunctions.eu.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.us.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.europe_west1.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
+---------------------+--------------------+
| false_positive_rate | true_positive_rate |
+---------------------+--------------------+
| 0.0 | 0.0 |
| 0.0 | 0.5 |
| 0.5 | 0.5 |
| 0.5 | 1.0 |
| 1.0 | 1.0 |
+---------------------+--------------------+
Need help using roc_curve
?
The community can help! Engage the conversation on Slack
For professional suppport, don't hesitate to chat with us.
Found a bug using roc_curve
?
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
For professional suppport, don't hesitate to chat with us.
Use cases¶
You're evaluating a new machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with predicted churn probabilities (output of your model) and the actual churn outcomes (true or false) for a set of customers. You want to assess the performance of your model across different probability thresholds. The ROC curve is a perfect tool for this.
Here's how you would use the roc_curve
BigQuery function in this scenario:
#standardSQL
WITH churn_predictions AS (
SELECT
customer_id,
predicted_churn_probability,
IF(churned, TRUE, FALSE) AS actual_churned
FROM
`your_project.your_dataset.customer_churn_data`
)
SELECT *
FROM bigfunctions.your_region.roc_curve(
ARRAY_AGG(
STRUCT(predicted_churn_probability, actual_churned)
)
) AS roc;
Explanation:
-
churn_predictions
CTE: This selects the customer ID, the predicted churn probability from your model, and the actual churn outcome. TheIF
statement converts thechurned
column (presumably an integer or string) into a booleanTRUE
orFALSE
as required by theroc_curve
function. -
ARRAY_AGG
: This aggregates the predicted probability and actual churn outcome into an array of structs, which is the expected input format for theroc_curve
function. -
bigfunctions.your_region.roc_curve(...)
: This calls theroc_curve
function with the array of structs. Remember to replaceyour_region
with the appropriate BigQuery region (e.g.,us
,eu
,us-central1
). -
AS roc
: This assigns the output of the function to a table aliasroc
.
Result and Interpretation:
The query will return a table with two columns: false_positive_rate
and true_positive_rate
. These represent the coordinates of the ROC curve. By plotting these points, you can visualize the trade-off between the model's sensitivity (true positive rate) and its specificity (1 - false positive rate) at various threshold settings. A higher area under the ROC curve (AUC) indicates better model performance.
This example demonstrates how roc_curve
can be practically used to evaluate the performance of a binary classification model in a real-world business scenario. You could then use this information to choose an appropriate threshold for your model based on the desired balance between correctly identifying churned customers and minimizing false alarms.
Spread the word¶
BigFunctions is fully open-source. Help make it a success by spreading the word!