precision_recall_auc¶
precision_recall_auc(predictions)
Description¶
Returns the Area Under the Precision Recall Curve (a.k.a. AUC PR) given a set of predicted scores and ground truth labels using the trapezoidal rule
Usage¶
Call or Deploy precision_recall_auc
?
Call precision_recall_auc
directly
The easiest way to use bigfunctions
precision_recall_auc
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy precision_recall_auc
in your project
Why deploy?
- You may prefer to deploy
precision_recall_auc
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
precision_recall_auc
function can be deployed with:
pip install bigfunctions
bigfun get precision_recall_auc
bigfun deploy precision_recall_auc
Examples¶
1. Random classifier
select bigfunctions.eu.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
+--------+
| auc_pr |
+--------+
| 0.5 |
+--------+
2. Good classifier
select bigfunctions.eu.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+--------+
| auc_pr |
+--------+
| 1.0 |
+--------+
Use cases¶
You're evaluating a machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with customer features and a label indicating whether they churned (1) or not (0). Your model outputs a churn probability score for each customer.
Here's how you would use the precision_recall_auc
function in BigQuery to evaluate your model:
SELECT bigfunctions.YOUR_REGION.precision_recall_auc(
(
SELECT
ARRAY_AGG(
STRUCT(
predicted_churn_probability AS predicted_score,
churned AS label
)
)
FROM
`your_project.your_dataset.customer_churn_predictions`
)
) AS auc_pr;
Explanation:
-
your_project.your_dataset.customer_churn_predictions
: Replace this with the actual location of your BigQuery table containing the predictions. This table should have at least two columns:predicted_churn_probability
: The predicted probability of churn (a floating-point number between 0 and 1).churned
: The ground truth label (1 for churn, 0 for no churn).
-
ARRAY_AGG(STRUCT(...))
: This constructs an array of structs, where each struct contains the predicted score and the true label for a single customer. This is the required input format for theprecision_recall_auc
function. -
bigfunctions.YOUR_REGION.precision_recall_auc
: ReplaceYOUR_REGION
with the appropriate BigQuery region where your data resides (e.g.,us
,eu
,us-central1
). This function calculates the area under the precision-recall curve. -
AS auc_pr
: This assigns the resulting AUC-PR value to a column namedauc_pr
.
Why use AUC-PR in this case?
Churn prediction is often an imbalanced classification problem, meaning there are significantly more non-churners than churners. AUC-PR is a better metric than AUC-ROC for imbalanced datasets because it focuses on the positive class (churners in this case). A higher AUC-PR indicates a better model at identifying churners, even if they are a small portion of the overall customer base.
By calculating the AUC-PR, you get a single number summarizing your model's performance, making it easier to compare different models or track the performance of a single model over time.
Need help or Found a bug?
Get help using precision_recall_auc
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about precision_recall_auc
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.