In the previous article of this series of R programming, we have discussed the Descriptive Statistics in R. Through this article, we will discuss the apply family of functions. The apply family of function is considered as looping functions that work on repeated tasks and reduces the redundancy that appears due to the looping. These functions are really lifesavers. We will learn about the apply family functions through this article with examples.
The apply family of functions is a built-in family that appears with the built-in packages in R and you don’t need to install separately. If you are new to packages in R, it would be better if you go through our article on Packages in R Programming.
Functions present in the apply family are the ones that allow us to manipulate data frames, arrays, matrices, vectors. These functions are alternative to the loops. However, are more efficient than loops as functions are faster at the execution level. These functions reduce the need for explicitly creating a loop in R. Following is the list of functions that are a part of the apply family.
The apply() function
The lapply() function
The sapply() function
The tapply() function
The mapply() function
These functions are differentiated based on the data structure they are applicable on (vectors, data frames, matrices, etc.)
The apply function helps us to apply a function on rows or columns (margins) of a matrix or a data frame. It has syntax as shown below:
apply(x, MARGIN, FUN, ...)
Where,
x - Stands for the matrix or the data frame on which we want to apply the function.
MARGIN - is a vector that defines, which part of the matrix/data frame the function should be applied on. Ex. if MARGIN = 1, the function will be applied on rows, if MARGIN = 2, the function will be applied on columns, and if MARGIN = c(1, 2), the function will be applied on both rows and columns.
FUN - specifies the function that will be applied on the MARGIN.
… - Any further optional arguments.
Let us see an example of the apply function as shown below:
Let us consider the “cars” data in R. This data set contains the speed and distance of cars.
Image for cars dataset
We would love to find out the average speed and average distance cars cover.
Example code for the apply() function in R
The lapply takes a list as an argument and applies a function to each element of the list by looping. It has syntax as shown below:
lapply(x, FUN, ...)
Where,
X - Specifies a list on which functions should be replicated.
FUN - Is a function that needed to be looped on each element of the list.
… - Any further optional arguments.
Let us see an example for the lapply() function in R:
We have beaver1 and beaver2 datasets in R. We will create a list out of those and apply lapply() on resulted data.
Creating a list of beaver’s data (partial output)
Now, we will use the lapply() function to get the mean of each element of the list created.
Example code for the lapply() function in R
The sapply is a more generalized version of lapply(). It works the same as lapply(), the only difference is in output generalization. Where lapply() returns a list as an output every time, sapply has it’s certain algorithms while returning the output.
If the result is a list with each element of length unity, a vector will be returned.
If the result is a list with more than one element, but each vector of the same length, a matrix will be returned.
If none of the above algorithms is applicable, a list will be returned.
The syntax for the sapply() function is as shown below:
sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)
Where,
x - specifies the list on which we want to apply the function.
FUN - specifies the function to be applied.
… - arguments that can be added.
simplify - argument that specifies if we want to simplify the results or not.
USE.NAME - specifies the argument names to be used or not.
Now, to demonstrate the difference between lapply() and sapply(), we will consider the previous same example where we have a list of beaver1 and beaver2. We will see how the results are different than the lapply().
Example code for the difference between lappy() and sapply() function
In the code above, when we used the lapply() on the list beaver_data to get the sum, it returned a list as an output with two elements. Whereas, when we used sapply() on the same list, the output is generalized to a vector with two elements. Since the result is a list with each element of length unity, sapply() intelligently switched it to a vector with two elements.
The tapply() function can be applied on a subset of a vector where the vector is divided into different levels that are also known as factors. In such cases, where we want to break the data into different subgroups and apply a specific function on each of the subgroups, we can use the tapply() function.
The syntax for tapply() function is as shown below:
tapply(x, INDEX, FUN, ..., simplify = TRUE)
Where,
x - is a vector on which the function is to be applied.
INDEX - is a vector of the factors.
FUN - is a function to be applied to each subgroup.
Simplify - is an argument which specifies if we want a simplified result or not. If we want a simplified result, we should use TRUE otherwise FALSE.
Let us see an example for the tapply() function as shown below:
Consider the airquality data where we have six columns namely Ozone, Solar.R, Wind Temp, Month, Day.
The air quality Data
Out of these six columns, we are interested in finding out the average temperature based on the month values. See the code below that does the task for us.
Example code for the tapply() function in R
Here we get the month-wise average temperature values for the air quality data.
The mapply() is a multivariate version of sapply() function in R. The mapply() applies a function in parallel to the given set of arguments. It applies the same function to each argument passed. Meaning, if we want to sum, the sum() function will be applied over the first argument, second argument, third argument, and so on.
The syntax for the mapply() function is as shown below:
mapply(FUN, ..., MoreArgs = NULL, SIMPLIFY = TRUE, USE.NAMES = TRUE)
Where,
FUN - is the function to be applied over the objects.
… - specifies R objects on which the function should be applied
MoreArgs - specifies the other arguments for the FUN
SIMPLIFY - specifies whether we want simplified results or not.
USE.NAMES - specifies if we want the names for arguments or not.
Let us see an example for the mapply() function in R
Example code for the mapply() function in R
Here, under mapply() we have defined a function that takes two arguments (a, and b) and returns a ^ b as an output. After the function is defined, we have used the two arguments (x = 2, y = 4 and w = 5, z = 5) over which this function applies and returns us the output.
The apply family of functions are looping functions that usually do the same task as a loop does. However, are speedy than the loops.
The apply() function is a function that allows us to apply a function over a matrix or a data frame. MARGIN specifies whether the function will be applied on rows or columns or both.
The lapply() function allows us to apply a function on each element of a list and returns a list as an output (output will always be a list).
The sapply() is a simplified version of lapply. Which checks whether the output of the list can be simplified into a vector or a matrix. If possible, it simplifies the output into either a vector or a matrix or a list accordingly.
The tapply() function can be applied on a subset of a vector, based on different factors. For Ex. If we want to have an average sales value month-wise, you can use the tapply() function. Here the months will be the factors.
The mapply() function is a multivariate version of the sapply() function. It will apply the same function to multiple arguments we pass to it.
This is it from this article, in the next article, we will be coming with an interesting topic from the world of R Programming. Until then, stay safe! Stay healthy! :)
You can read out my other blogs on R Programming on R Programming Articles.
5 Factors Influencing Consumer Behavior
READ MOREElasticity of Demand and its Types
READ MOREAn Overview of Descriptive Analysis
READ MOREWhat is PESTLE Analysis? Everything you need to know about it
READ MOREWhat is Managerial Economics? Definition, Types, Nature, Principles, and Scope
READ MORE5 Factors Affecting the Price Elasticity of Demand (PED)
READ MORE6 Major Branches of Artificial Intelligence (AI)
READ MOREScope of Managerial Economics
READ MOREDifferent Types of Research Methods
READ MOREDijkstra’s Algorithm: The Shortest Path Algorithm
READ MORE
Latest Comments