Category
>R Programming

Using Pipe (%>%) Operator to Simplify Your Code in R Programming

Lalit Salunkhe
Oct 22, 2020

The efficiency and readability are the two important aspects any programmer lives their life for. Efficiency will always be achieved by using the control statements and functions in your code.

However, when it comes to readability, the complex codes might be difficult to read; especially when you have nested code within the code (huge ambiguity). To deal with this, or I should say to simplify your code both in terms of readability and efficiency, we have a pipe operator in R programming.

What is the Pipe Operator?

The pipe operator is a special operational function available under the magrittr and dplyr package (basically developed under magrittr), which allows us to pass the result of one function/argument to the other one in sequence. It is generally denoted by symbol %>% in R Programming. Usage of this operator increases, readability, efficiency, and simplicity of your code when you have nested functions in your code loop.

There are different ways in which we can use the pipe operator in R programming. We will study a few of those ways through examples.

What is the Basic Use of the Pipe Operator?

As it is evident, the pipe operator allows us to assign an argument to a given function and is used in most of the nested functional arguments where the result of one function is an argument for the other function, see the example below for nested functions:

summarize(
  group_by(
    filter(mtcars, mtcars$carb >1),
    cyl
  ),
  mpg_mean = mean(mpg)
)

Here in this example, if you see, multiple functions are used to filter, group, and summarize the data from the mtcars dataset. We filtered the data first, then grouped it by mean value, and finally summarized it. This code though looks difficult to read and the user might get confused while reading it. See the output of the code as below:

The output of the code mentioned above

Let us see how we can use the pipe operator to make this code more readable.

mtcars %>%
          filter(carb > 1)%>%
          group_by(cyl) %>%
          summarize(Avg_mpg = mean(mpg))

Here, the pipe operator is assigning each functional output as an argument to the next one, and so on. The output for this code will be the same as the previous one. See below:

Output for the code with the pipe operator

You could see that the output for this code is the same as the previous one. However, this code is more readable in comparison with the previous one. Here you can easily identify that the mtcars dataset is assigned to filters, where the function takes a column from it to make a comparison. Then, the group_by() function takes the output of it as an argument, and finally, the summarize() function takes the output of the previous execution as an argument. The final output is shown in the screenshot above.

Assignment Using the Pipe (%<>%) Operator

There is this interesting operator called Compound Assignment Infix-Operator (%<>%) under the magrittr package. This function both do the piping value and assigning the same towards an object.

See the code below under which we try to create a data frame from the given mtcars data with five columns with the values for disp > 160 and cyl = 6.

d_frame <- mtcars
d_frame <- d_frame %>% select(1:5) %>% filter(disp > 100, cyl == 6)

See the output of this code as shown below:

Output for creating a data frame

Here, if we see, we have to use that assignment operator inside the second line of code which allows us to do the slicing and assign the result of it to the same data frame. Well, that assignment operator in the second line of code can be eliminated. See the code below:

d_frame <- mtcars
d_frame %<>% select(1:5) %>% filter(disp > 100, cyl == 6)
print(d_frame)

Here, in this code, the compound assignment infix-operator (%<>%) is used to eliminate that redundancy (d_frame <- d_frame %>%). This increases the efficiency of this code and you can see in the screenshot below that there is no difference in the output.

Output for the code with compound assignment infix-operator

Here, we are getting a data frame which is a part of the actual data with the first five columns, values for disp greater than 100, and values for cyl equals to 6. We are getting all the cars that follow this criterion.

How to Use pipe Operator With ggplot2

We can also use the pipe operator with graphical functions that are a part of the well known ggplot2 function. This will allows us a pipeline that allows us to do an Exploratory Data Analytics (EDA). This method increases the speed of doing aggregations (as you don’t need to create those manually) and at the same time, you can have the luxury to avoid those aggregations which you don’t want to have in EDA. Let’s see the code below which allows us to work with the ggplot2 in combination with pipe operator.

library(ggplot2)

mtcars %>% 
  filter(carb > 1) %>% 
  group_by(gear) %>% 
  summarize(avg_mpg = mean(mpg)) %>% 
  ggplot(aes(x = gear, y = avg_mpg)) + 
  geom_bar(stat = "identity")

Here, the bar plot will be created against the avg-mpg values with respect to the no. of gears. We also have used the data values which has carb > 1. Here, the pipeline operator is used to create a pipeline of filtered carb values which are grouped by gear values and summarized by avg_mpg (average value of mpg) and then plotted with gear on x-axis and avg_mpg on the y-axis. See the output graph of this code as below:

Output image of the code above

Additional Tee pipe (%T>%) Operator

There are some additional pipe operators present under the magrittr package. One of those is tee (%T>%) pipe operator. This operator produces the additional side effects (for example it saves the functions, continue the functions to work even after termination, and many more).

There are plotting functions that usually terminate the pipe arguments that come after them. See an example below where the summary function that takes the piped argument after plotting is terminated.

library(ggplot2)
library(tidyr)
mtcars %>%
  select(, 1:4)%>%
  filter(mtcars$carb > 1) %>%
  plot() %>%
  summary()

Here, The code selects the first four columns from the mtcars data and then filter those which have carb value > 1 and we are plotting those variables against each other. Finally, we are using the summary() function to generate the summary of the code. However, the plot() function will terminate the summary() pipeline and you will get the summary statistics as zero and NULL for this. See the output as shown below:

A Plot that gets generated after executing the code

Also, we can see the summary output as NULL as shown below:

The Summary that gets as NULL due to pipe termination

Here, if we use the %T>%, before the plot () function, we can continue that pipeline and get the summary statistics. See the code as shown below:

library(ggplot2)
library(tidyr)
mtcars %>%
  select(, 1:4)%>%
  filter(mtcars$carb > 1) %T>%
  plot() %>%
  summary()

Here, if you see, we have used the tee(%T>%) pipeline after the filter() function which eliminates the termination of the functional pipeline after plot() function. Now, if we run this code, we will get the graph same as above (no changes there) as well as the summary statistics associated with the four columns (variables) that are a part of this plot() as well. See the output below:

Summary statistics are now visible after plot() function since we used the tee (%T>%) Operator

This is how the tee(%T>%) operator works in R Programming.

Conclusion

The pipe operator is used when we have nested functions to use in R Programming. Where the result of one function becomes the argument for the next function.
The pipe functions improve the efficiency as well as readability of code.
Basic use of the pipe function is to create a pipeline or a chain of functional arguments that work exactly the same as the nested functions.
We can also use the assignment pipe (%<>%) also known as a compound assignment infix-operator to assign a value to the right-hand side towards the object to the left-hand side without using the traditional assignment operator (<-).
The pipe operator can also be used to create a chain of functional arguments while working with plotting functions
There are additional pipes such as tee %T>% pipe. This pipe allows us to continue the chaining under R Programming that gets terminated due to adding pipeline after the plotting functions.

This is it from this article. In the next article, we will come up with a new article from the field of R Programming. Until then, stay home, keep safe, and keep enhancing!

Latest Comments

brenwright30

May 11, 2024

THIS IS HOW YOU CAN RECOVER YOUR LOST CRYPTO? Are you a victim of Investment, BTC, Forex, NFT, Credit card, etc Scam? Do you want to investigate a cheating spouse? Do you desire credit repair (all bureaus)? Contact Hacker Steve (Funds Recovery agent) asap to get started. He specializes in all cases of ethical hacking, cryptocurrency, fake investment schemes, recovery scam, credit repair, stolen account, etc. Stay safe out there! Hackersteve911@gmail.com https://hackersteve.great-site.net/