Session 4: Manipulating Data

Session Description

In this session, we’ll spend some more time learning how to use more advanced tools for data manipulation. We will talk about principles of tidy data, and explore the use of the dplyr package for more efficient data manipulation and summarization.

Before Class

Data Carpentry (Manipulating Data Frames)

Please download the Session 4 workbook.

Optional: Check out Allison Horsts’ interactive dplyr tutorial


Other Resources

In this lab, you are introduced to the dplyr package, which is designed to help us manipulate rectangular data frames. dplyr is an extremely useful replacement for some of the base R commands for querying our data.

  • The use of pipes (%>%) greatly helps with the legibility of our commands and allows us to see each of the manipulation steps we are taking our data through before producing a result.
  • Commands like filter() and select() allow us to query out rows and columns of our dataset respectively.
  • The summarise() command allows us to produce summary tables of our dataset which are very useful for exploring patterns and producing information on small multiples of data.
  • The group_by() command allows us to specify grouping variables for our summaries.