How to Easily Calculate Percentiles in R (With Examples) (2024)

The nth percentile of a dataset is the value that cuts off the first n percent of the data values when all of the values are sorted from least to greatest.

For example, the 90th percentile of a dataset is the value that cuts of the bottom 90% of the data values from the top 10% of data values.

One of the most commonly used percentiles is the 50th percentile, which represents the median value of a dataset: this is the value at which 50% of all data values fall below.

Percentiles can be used to answer questions such as:

  • What score does a student need to earn on a particular test to be in the top 10% of scores? To answer this, we would find the 90th percentile of all scores, which is the value that separates the bottom 90% of values from the top 10%.
  • What heights encompass the middle 50% of heights for students at a particular school? To answer this, we would find the 75th percentile of heights and 25th percentile of heights, which are the two values that determine the upper and lower bounds for the middle 50% of heights.

How to Calculate Percentiles in R

We can easily calculate percentiles in R using the quantilefunction, which uses the following syntax:

quantile(x, probs = seq(0, 1, 0.25))

where:

  • x: a numeric vector whose percentiles we wish to find.
  • probs: a numeric vector of probabilities in [0,1] that represent the percentiles we wish to find.

The following examples show how to use this function in different scenarios.

Finding Percentiles of a Vector

The following code illustrates how to find various percentiles for a given vector in R:

#create vector of 100 random values uniformly distributed between 0 and 500data <- runif(100, 0, 500)#Find the quartiles (25th, 50th, and 75th percentiles) of the vectorquantile(data, probs = c(.25, .5, .75))# 25% 50% 75% # 97.78961 225.07593 356.47943 #Find the deciles (10th, 20th, 30th, ..., 90th percentiles) of the vectorquantile(data, probs = seq(.1, .9, by = .1))# 10% 20% 30% 40% 50% 60% 70% 80% # 45.92510 87.16659 129.49574 178.27989 225.07593 300.79690 337.84393 386.36108 # 90% #423.28070#Find the 37th, 53rd, and 87th percentilesquantile(data, probs = c(.37, .53, .87))# 37% 53% 87% #159.9561 239.8420 418.4787 

Finding Percentiles of a Data Frame Column

To illustrate how to find the percentiles of a specific data frame column, we’ll use the built-in dataset iris:

#view first six rows of iris datasethead(iris) Sepal.Length Sepal.Width Petal.Length Petal.Width Species1 5.1 3.5 1.4 0.2 setosa2 4.9 3.0 1.4 0.2 setosa3 4.7 3.2 1.3 0.2 setosa4 4.6 3.1 1.5 0.2 setosa5 5.0 3.6 1.4 0.2 setosa6 5.4 3.9 1.7 0.4 setosa

The following code shows how to find the 90th percentile value for the column Sepal.Length:

quantile(iris$Sepal.Length, probs = 0.9)#90% #6.9 

Finding Percentiles of Several Data Frame Columns

We can also find percentiles for several columns at once using the apply() function:

#define columns we want to find percentiles forsmall_iris<- iris[ , c('Sepal.Length', 'Sepal.Width', 'Petal.Length', 'Petal.Width')]#use apply() function to find 90th percentile for every columnapply(small_iris, 2, function(x) quantile(x, probs = .9))#Sepal.Length Sepal.Width Petal.Length Petal.Width # 6.90 3.61 5.80 2.20 

Finding Percentiles by Group

We can also find percentiles by group in R using the group_by() function from the dplyr library.

The following code illustrates how to find the 90th percentile of Sepal.Length for each of the
three species in the iris dataset:

#load dplyr librarylibrary(dplyr)#find 90th percentile of Sepal.Length for each of the three speciesiris %>% group_by(Species) %>% summarise(percent90 = quantile(Sepal.Length, probs = .9))# A tibble: 3 x 2# Species percent90# #1 setosa 5.41#2 versicolor 6.7 #3 virginica 7.61

The following code illustrates how to find the 90th percentile for all of the variables by Species:

iris %>% group_by(Species) %>% summarise(percent90_SL = quantile(Sepal.Length, probs = .9), percent90_SW = quantile(Sepal.Width, probs = .9), percent90_PL = quantile(Petal.Length, probs = .9), percent90_PW = quantile(Petal.Width, probs = .9))# A tibble: 3 x 5# Species percent90_SL percent90_SW percent90_PL percent90_PW# #1 setosa 5.41 3.9 1.7 0.4 #2 versicolor 6.7 3.11 4.8 1.51#3 virginica 7.61 3.31 6.31 2.4 

Visualizing Percentiles

There is no built-in function to visualize the percentiles of a dataset in R, but we can create a plot to visualize the percentiles relatively easily.

The following code illustrates how to create a plot of the percentiles for the data values of Sepal.Length from theirisdataset:

n = length(iris$Sepal.Length)plot((1:n - 1)/(n - 1), sort(iris$Sepal.Length), type="l", main = "Visualizing Percentiles", xlab = "Percentile", ylab = "Value")

How to Easily Calculate Percentiles in R (With Examples) (1)

Additional Resources

The following tutorials explain how to perform other common tasks in R:

How to Calculate Percentile Rank in R
How to Calculate Z-Scores in R
How to Calculate Relative Frequencies Using dplyr

How to Easily Calculate Percentiles in R (With Examples) (2024)

FAQs

How to calculate percentiles using R? ›

Q: How can I calculate percentiles in R? A: In R, you can calculate percentiles using the quantile() function, which provides a simple way to determine the percentile rank of a set of values. For more tailored calculations, the empirical cumulative distribution function ( ecdf() ) can also be utilized.

What is the easiest way to calculate percentile? ›

Percentile is found with the equation: P = n/N * 100%. Where P is the percentile, lower case n is the number of data points below the data point of interest, and N is the total number of data points in the data set.

How to get 95th percentile in R? ›

The 0.95 quantile point is exactly the same as the 95th percentile point. R does not work with percentiles, rather R works with quantiles. The R command for this is quantile() where we need to give that function the variable holding the data we are using and we need to give the function one or more decimal values.

How do you find the percentile of a normal distribution in R? ›

We obtain percentile values in R using the function qnorm. This function returns the value of the standard normal (by default) distribution corresponding to a given percentile. For example, qnorm(. 5) returns 0, the median of the standard normal distribution.

How to find quantiles in R Studio? ›

You can use the quantile() function to find quartiles in R. If your data is called “data”, then “quantile(data, prob=c(. 25,. 5,.

What is the general formula for percentile? ›

Key Facts: Percentiles

We calculate percentiles using the formula n = (P/100) x N, where P = percentile, N = number of values in a data set (sorted from smallest to largest), and n = ordinal rank of a given value. Percentiles are frequently used to understand test scores and biometric measurements.

How to calculate the 75th percentile? ›

To calculate the value at the 75th percentile, we use the formula: Value = Value(L) + (index – L) x (Value(U) – Value(L)). Value = 85 + (6.75 – 6) x (90 – 85) = 85 + 0.75 x 5 = 85 + 3.75 = 88.75.

What is the percentile in statistics for dummies? ›

In statistics, a percentile is a term that describes how a score compares to other scores from the same set. While there is no universal definition of percentile, it is commonly expressed as the percentage of values in a set of data scores that fall below a given value.

How do you rank percentiles in R? ›

You can use 'percent_rank' function to get the percentile calculation. In Exploratory, you can simply select 'Create Window Calculation' -> 'Rank' -> 'Percent Rank' from the menu of 'numbers_per_k' column in this case. Once you run it, the calculation is done for each row.

How to calculate 97.5 percentile in R? ›

For this purpose, we can use quantile function in R. To find the 2.5th percentile, we would need to use the probability = 0.025 and for the 97.5th percentile we can use probability = 0.0975.

How do you get percentages in R? ›

Calculation of the percentage by group

Calculate the total: Find the total number of occurrences in the entire dataset. Calculate the percentage: Divide the count of each subgroup by the total count and multiply by 100 to get the percentage.

What does Qnorm() do in R? ›

R's qnorm function calculates which value in a normal population (y) has a given proportion (pN) of values below it. In other words it does the inverse of the cumulative normal function.

How to find quantiles of normal distribution in R? ›

The qnorm() function in R is used to calculate the quantiles of the normal distribution. The function takes two arguments: p – the probability of getting a value less than or equal to the quantile.

How do you work out percentages in R? ›

Count the occurrences: Count the number of occurrences or instances within each group. Calculate the total: Find the total number of occurrences in the entire dataset. Calculate the percentage: Divide the count of each subgroup by the total count and multiply by 100 to get the percentage.

How do you convert z-score to percentile in R? ›

To convert Z-scores to percentiles in R 'pnorm()' function is use, which calculates quantiles from a normal distribution. Convert Z-scores to percentiles first need to calculate the Z-scores for your data points and then convert these Z-scores to percentiles.

Top Articles
Latest Posts
Article information

Author: Velia Krajcik

Last Updated:

Views: 5853

Rating: 4.3 / 5 (74 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Velia Krajcik

Birthday: 1996-07-27

Address: 520 Balistreri Mount, South Armand, OR 60528

Phone: +466880739437

Job: Future Retail Associate

Hobby: Polo, Scouting, Worldbuilding, Cosplaying, Photography, Rowing, Nordic skating

Introduction: My name is Velia Krajcik, I am a handsome, clean, lucky, gleaming, magnificent, proud, glorious person who loves writing and wants to share my knowledge and understanding with you.