Skip to content

bigfunctions > percentile_value

percentile_value

Call or Deploy percentile_value ?

✅ You can call this percentile_value bigfunction directly from your Google Cloud Project (no install required).

  • This percentile_value function is deployed in bigfunctions GCP project in 39 datasets for all of the 39 BigQuery regions. You need to use the dataset in the same region as your datasets (otherwise you may have a function not found error).
  • Function is public, so it can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
  • You may prefer to deploy the BigFunction in your own project if you want to build and manage your own catalog of functions. This is particularly useful if you want to create private functions (for example calling your internal APIs). Discover the framework

Public BigFunctions Datasets:

Region Dataset
eu bigfunctions.eu
us bigfunctions.us
europe-west1 bigfunctions.europe_west1
asia-east1 bigfunctions.asia_east1
... ...

Description

Signature

percentile_value(arr, percentile)

Description

Returns percentile of an array with percentile a float in range [0, 1].

Algorithm to calculate percentile is based on R. J. Hyndman and Y. Fan, "Sample quantiles in statistical packages," The American Statistician, 50(4), pp. 361-365, 1996

Examples

select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
+------------------+
| percentile_value |
+------------------+
| 13.8             |
+------------------+

select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
+------------------+
| percentile_value |
+------------------+
| 15.9             |
+------------------+

select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
+------------------+
| percentile_value |
+------------------+
| null             |
+------------------+

Need help using percentile_value?

The community can help! Engage the conversation on Slack

For professional suppport, don't hesitate to chat with us.

Found a bug using percentile_value?

If the function does not work as expected, please

  • report a bug so that it can be improved.
  • or open the discussion with the community on Slack.

For professional suppport, don't hesitate to chat with us.

Use cases

Let's illustrate a use case for the percentile_value BigQuery function.

Scenario: You have a table storing website session durations (in seconds) for different users. You want to analyze user engagement and identify the 95th percentile of session durations. This will help you understand how long highly engaged users typically spend on your site.

Table Schema:

CREATE OR REPLACE TABLE `your_project.your_dataset.session_durations` (
  user_id INT64,
  session_duration INT64
);

INSERT INTO `your_project.your_dataset.session_durations` (user_id, session_duration) VALUES
(1, 120), (2, 300), (3, 60), (4, 1800), (5, 45), (6, 900), (7, 240), (8, 30), (9, 600), (10, 150);

Query using percentile_value:

SELECT
    bigfunctions.us.percentile_value(ARRAY_AGG(session_duration), 0.95) AS p95_session_duration
  FROM
    `your_project.your_dataset.session_durations`;

Explanation:

  1. ARRAY_AGG(session_duration): This aggregates all session durations into an array.
  2. bigfunctions.us.percentile_value(..., 0.95): This calculates the 95th percentile value from the array of session durations. Remember to replace us with your BigQuery region if different.
  3. AS p95_session_duration: This aliases the result for clarity.

Result:

The query will return a single value representing the 95th percentile of session durations. This value indicates that 95% of sessions are shorter than or equal to this duration. Let's say the result is 1500 seconds. This tells you that highly engaged users tend to have sessions lasting around 1500 seconds or less.

Benefits of using percentile_value:

  • Simplified calculation: Instead of manually implementing percentile logic, you can use this function directly.
  • Efficiency: BigQuery functions are generally optimized for performance.
  • Flexibility: You can easily change the percentile value (e.g., to calculate the median (50th percentile) or other percentiles) by adjusting the second argument.

This is a simple example. You can apply this function to any scenario where you need to calculate percentiles from an array of values within BigQuery, such as:

  • E-commerce: Analyzing product prices, order values, or customer spending.
  • Gaming: Analyzing player scores, playtime, or in-game purchases.
  • Finance: Analyzing stock prices, transaction amounts, or customer balances.
  • Healthcare: Analyzing patient wait times, treatment costs, or lengths of stay.

By using percentile_value, you can gain valuable insights into the distribution of your data and identify important thresholds or outliers.

Spread the word

BigFunctions is fully open-source. Help make it a success by spreading the word!

Share on Add a on