roc_curve¶
roc_curve(predictions)
Description¶
Returns the Receiver Operating Characteristic Curve (a.k.a. ROC Curve) given a set of predicted scores and ground truth labels
Usage¶
Call or Deploy roc_curve
?
Call roc_curve
directly
The easiest way to use bigfunctions
roc_curve
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy roc_curve
in your project
Why deploy?
- You may prefer to deploy
roc_curve
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
roc_curve
function can be deployed with:
pip install bigfunctions
bigfun get roc_curve
bigfun deploy roc_curve
Examples¶
select * from bigfunctions.eu.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.us.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.europe_west1.roc_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
+---------------------+--------------------+
| false_positive_rate | true_positive_rate |
+---------------------+--------------------+
| 0.0 | 0.0 |
| 0.0 | 0.5 |
| 0.5 | 0.5 |
| 0.5 | 1.0 |
| 1.0 | 1.0 |
+---------------------+--------------------+
Use cases¶
You're evaluating a new machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with predicted churn probabilities (output of your model) and the actual churn outcomes (true or false) for a set of customers. You want to assess the performance of your model across different probability thresholds. The ROC curve is a perfect tool for this.
Here's how you would use the roc_curve
BigQuery function in this scenario:
#standardSQL
WITH churn_predictions AS (
SELECT
customer_id,
predicted_churn_probability,
IF(churned, TRUE, FALSE) AS actual_churned
FROM
`your_project.your_dataset.customer_churn_data`
)
SELECT *
FROM bigfunctions.your_region.roc_curve(
ARRAY_AGG(
STRUCT(predicted_churn_probability, actual_churned)
)
) AS roc;
Explanation:
-
churn_predictions
CTE: This selects the customer ID, the predicted churn probability from your model, and the actual churn outcome. TheIF
statement converts thechurned
column (presumably an integer or string) into a booleanTRUE
orFALSE
as required by theroc_curve
function. -
ARRAY_AGG
: This aggregates the predicted probability and actual churn outcome into an array of structs, which is the expected input format for theroc_curve
function. -
bigfunctions.your_region.roc_curve(...)
: This calls theroc_curve
function with the array of structs. Remember to replaceyour_region
with the appropriate BigQuery region (e.g.,us
,eu
,us-central1
). -
AS roc
: This assigns the output of the function to a table aliasroc
.
Result and Interpretation:
The query will return a table with two columns: false_positive_rate
and true_positive_rate
. These represent the coordinates of the ROC curve. By plotting these points, you can visualize the trade-off between the model's sensitivity (true positive rate) and its specificity (1 - false positive rate) at various threshold settings. A higher area under the ROC curve (AUC) indicates better model performance.
This example demonstrates how roc_curve
can be practically used to evaluate the performance of a binary classification model in a real-world business scenario. You could then use this information to choose an appropriate threshold for your model based on the desired balance between correctly identifying churned customers and minimizing false alarms.
Need help or Found a bug?
Get help using roc_curve
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about roc_curve
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.