Skip to content

bigfunctions > precision_recall_curve

precision_recall_curve

Call or Deploy precision_recall_curve ?

✅ You can call this precision_recall_curve bigfunction directly from your Google Cloud Project (no install required).

  • This precision_recall_curve function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

precision_recall_curve(predictions)

Description

Returns the Precision-Recall Curve (as a table) given predictions, an array of (predicted_score, ground_truth_label)

Examples

select * from bigfunctions.eu.precision_recall_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.us.precision_recall_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])
select * from bigfunctions.europe_west1.precision_recall_curve([(0.1, false), (0.4, false), (0.35, true), (0.8, true)])

+-----------+---------+
| precision |  recall |
+-----------+---------+
|    0.5    |   1.0   |
|    0.667  |   1.0   |
|    0.5    |   0.5   |
|    1.0    |   0.5   |
|    1.0    |   0     |
+-----------+---------+


Need help using precision_recall_curve?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using precision_recall_curve?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

You're evaluating a binary classification model (e.g., spam detection, fraud detection, disease diagnosis) and want to understand its performance across different thresholds. The precision_recall_curve function helps you analyze the trade-off between precision and recall.

Use Case: Optimizing a Fraud Detection Model

Imagine you've trained a model to predict fraudulent transactions. Each transaction is assigned a score between 0 and 1, representing the model's confidence that the transaction is fraudulent. You need to choose a threshold above which you flag a transaction as fraudulent. A higher threshold means higher precision (fewer false positives—legitimate transactions flagged as fraud) but lower recall (more false negatives—fraudulent transactions missed).

Here's how precision_recall_curve helps:

  1. Data Preparation: You have a dataset with the predicted scores from your model and the ground truth labels (whether the transaction was actually fraudulent). This data is formatted as an array of structs, where each struct contains the predicted_score (float64) and the ground_truth_label (bool).

  2. Calling the Function: You use the precision_recall_curve function in your BigQuery query, passing in the array of structs:

SELECT *
FROM bigfunctions.your_region.precision_recall_curve(
    ARRAY[
        (0.1, false), -- Low score, not fraud
        (0.4, false), -- Low score, not fraud
        (0.35, true), -- Moderate score, fraud
        (0.8, true), -- High score, fraud
        (0.95, false), -- Very high score, surprisingly not fraud (potential outlier?)
        (0.6, true), --  Moderate-high score, fraud
        (0.2, false) -- Low score, not fraud
    ]
);
  1. Interpreting the Results: The function returns a table with precision and recall columns. Each row represents a different threshold, and the values show the precision and recall achieved at that threshold. By examining this curve:

  2. Visualization: You can plot the precision-recall curve (precision on the y-axis, recall on the x-axis) to visualize the trade-off.

  3. Threshold Selection: You can identify the optimal threshold based on your specific business requirements. For fraud detection, you might prioritize high recall (catching most fraudulent transactions even if it means more false positives that you can investigate manually) or balance precision and recall based on the costs associated with each type of error.
  4. Model Evaluation: The overall shape of the curve tells you about the performance of your model. A curve closer to the top-right corner indicates a better-performing model. You can compare the precision-recall curves of different models to choose the best one.
  5. Identifying Issues: The example shows a case where a very high score (0.95) was associated with a non-fraudulent transaction. This could be a sign of an issue with your model or a data anomaly worth investigating. The precision-recall curve, combined with an understanding of your data, helps pinpoint such scenarios.

In essence, the precision_recall_curve function provides a powerful tool for evaluating and fine-tuning your binary classification models, enabling you to make informed decisions about selecting the best operating point based on the desired balance between precision and recall.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on