Skip to content

precision_recall_auc

precision_recall_auc(predictions)

Description

Returns the Area Under the Precision Recall Curve (a.k.a. AUC PR) given a set of predicted scores and ground truth labels using the trapezoidal rule

Usage

Call or Deploy precision_recall_auc ?
Call precision_recall_auc directly

The easiest way to use bigfunctions

  • precision_recall_auc function is deployed in 39 public datasets for all of the 39 BigQuery regions.
  • It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)

Public BigFunctions Datasets

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...
Deploy precision_recall_auc in your project

Why deploy?

  • You may prefer to deploy precision_recall_auc in your own project to build and manage your own catalog of functions.
  • This is particularly useful if you want to create private functions (for example calling your internal APIs).
  • Get started by reading the framework page

Deployment

precision_recall_auc function can be deployed with:

pip install bigfunctions
bigfun get precision_recall_auc
bigfun deploy precision_recall_auc

Examples

1. Random classifier

select bigfunctions.eu.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), rand() > 0.5)) from unnest(generate_array(1, 1000)) as predicted_score))
+--------+
| auc_pr |
+--------+
| 0.5    |
+--------+

2. Good classifier

select bigfunctions.eu.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.us.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
select bigfunctions.europe_west1.precision_recall_auc((select array_agg(struct(cast(predicted_score as float64), predicted_score > 500)) from unnest(generate_array(1, 1000)) as predicted_score))
+--------+
| auc_pr |
+--------+
| 1.0    |
+--------+

Use cases

You're evaluating a machine learning model designed to predict customer churn for a telecommunications company. You have a dataset with customer features and a label indicating whether they churned (1) or not (0). Your model outputs a churn probability score for each customer.

Here's how you would use the precision_recall_auc function in BigQuery to evaluate your model:

SELECT bigfunctions.YOUR_REGION.precision_recall_auc(
    (
        SELECT
            ARRAY_AGG(
                STRUCT(
                    predicted_churn_probability AS predicted_score,
                    churned AS label
                )
            )
        FROM
            `your_project.your_dataset.customer_churn_predictions`
    )
) AS auc_pr;

Explanation:

  1. your_project.your_dataset.customer_churn_predictions: Replace this with the actual location of your BigQuery table containing the predictions. This table should have at least two columns:

    • predicted_churn_probability: The predicted probability of churn (a floating-point number between 0 and 1).
    • churned: The ground truth label (1 for churn, 0 for no churn).
  2. ARRAY_AGG(STRUCT(...)): This constructs an array of structs, where each struct contains the predicted score and the true label for a single customer. This is the required input format for the precision_recall_auc function.

  3. bigfunctions.YOUR_REGION.precision_recall_auc: Replace YOUR_REGION with the appropriate BigQuery region where your data resides (e.g., us, eu, us-central1). This function calculates the area under the precision-recall curve.

  4. AS auc_pr: This assigns the resulting AUC-PR value to a column named auc_pr.

Why use AUC-PR in this case?

Churn prediction is often an imbalanced classification problem, meaning there are significantly more non-churners than churners. AUC-PR is a better metric than AUC-ROC for imbalanced datasets because it focuses on the positive class (churners in this case). A higher AUC-PR indicates a better model at identifying churners, even if they are a small portion of the overall customer base.

By calculating the AUC-PR, you get a single number summarizing your model's performance, making it easier to compare different models or track the performance of a single model over time.


Need help or Found a bug?
Get help using precision_recall_auc

The community can help! Engage the conversation on Slack

We also provide professional suppport.

Report a bug about precision_recall_auc

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

We also provide professional suppport.


Show your ❤ by adding a ⭐ on