percentile_value¶
percentile_value(arr, percentile)
Description¶
Returns percentile of an array with percentile a float in range [0, 1].
Algorithm to calculate percentile is based on R. J. Hyndman and Y. Fan, "Sample quantiles in statistical packages," The American Statistician, 50(4), pp. 361-365, 1996
Usage¶
Call or Deploy percentile_value
?
Call percentile_value
directly
The easiest way to use bigfunctions
percentile_value
function is deployed in 39 public datasets for all of the 39 BigQuery regions.- It can be called by anyone. Just copy / paste examples below in your BigQuery console. It just works!
- (You need to use the dataset in the same region as your datasets otherwise you may have a function not found error)
Public BigFunctions Datasets
Region | Dataset |
---|---|
eu |
bigfunctions.eu |
us |
bigfunctions.us |
europe-west1 |
bigfunctions.europe_west1 |
asia-east1 |
bigfunctions.asia_east1 |
... | ... |
Deploy percentile_value
in your project
Why deploy?
- You may prefer to deploy
percentile_value
in your own project to build and manage your own catalog of functions. - This is particularly useful if you want to create private functions (for example calling your internal APIs).
- Get started by reading the framework page
Deployment
percentile_value
function can be deployed with:
pip install bigfunctions
bigfun get percentile_value
bigfun deploy percentile_value
Examples¶
select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3], 0.74)
+------------------+
| percentile_value |
+------------------+
| 13.8 |
+------------------+
select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 0.9)
+------------------+
| percentile_value |
+------------------+
| 15.9 |
+------------------+
select bigfunctions.eu.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
select bigfunctions.us.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
select bigfunctions.europe_west1.percentile_value([20, 16, 15, 13, 10, 9, 8, 8, 7, 6, 3, 2], 2)
+------------------+
| percentile_value |
+------------------+
| null |
+------------------+
Use cases¶
Let's illustrate a use case for the percentile_value
BigQuery function.
Scenario: You have a table storing website session durations (in seconds) for different users. You want to analyze user engagement and identify the 95th percentile of session durations. This will help you understand how long highly engaged users typically spend on your site.
Table Schema:
CREATE OR REPLACE TABLE `your_project.your_dataset.session_durations` (
user_id INT64,
session_duration INT64
);
INSERT INTO `your_project.your_dataset.session_durations` (user_id, session_duration) VALUES
(1, 120), (2, 300), (3, 60), (4, 1800), (5, 45), (6, 900), (7, 240), (8, 30), (9, 600), (10, 150);
Query using percentile_value
:
SELECT
bigfunctions.us.percentile_value(ARRAY_AGG(session_duration), 0.95) AS p95_session_duration
FROM
`your_project.your_dataset.session_durations`;
Explanation:
ARRAY_AGG(session_duration)
: This aggregates all session durations into an array.bigfunctions.us.percentile_value(..., 0.95)
: This calculates the 95th percentile value from the array of session durations. Remember to replaceus
with your BigQuery region if different.AS p95_session_duration
: This aliases the result for clarity.
Result:
The query will return a single value representing the 95th percentile of session durations. This value indicates that 95% of sessions are shorter than or equal to this duration. Let's say the result is 1500 seconds. This tells you that highly engaged users tend to have sessions lasting around 1500 seconds or less.
Benefits of using percentile_value
:
- Simplified calculation: Instead of manually implementing percentile logic, you can use this function directly.
- Efficiency: BigQuery functions are generally optimized for performance.
- Flexibility: You can easily change the percentile value (e.g., to calculate the median (50th percentile) or other percentiles) by adjusting the second argument.
This is a simple example. You can apply this function to any scenario where you need to calculate percentiles from an array of values within BigQuery, such as:
- E-commerce: Analyzing product prices, order values, or customer spending.
- Gaming: Analyzing player scores, playtime, or in-game purchases.
- Finance: Analyzing stock prices, transaction amounts, or customer balances.
- Healthcare: Analyzing patient wait times, treatment costs, or lengths of stay.
By using percentile_value
, you can gain valuable insights into the distribution of your data and identify important thresholds or outliers.
Need help or Found a bug?
Get help using percentile_value
The community can help! Engage the conversation on Slack
We also provide professional suppport.
Report a bug about percentile_value
If the function does not work as expected, please
- report a bug so that it can be improved.
- or open the discussion with the community on Slack.
We also provide professional suppport.