Please study Chapter 3 of the book R for Data Science (2nd edition) (https://r4ds.hadley.nz).

library(dplyr)
library(nycflights13)
library(tidyverse)

1 tibble

flights

## # A tibble: 336,776 × 19
##     year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>          <int>
##  1  2013     1     1      517            515         2      830            819
##  2  2013     1     1      533            529         4      850            830
##  3  2013     1     1      542            540         2      923            850
##  4  2013     1     1      544            545        -1     1004           1022
##  5  2013     1     1      554            600        -6      812            837
##  6  2013     1     1      554            558        -4      740            728
##  7  2013     1     1      555            600        -5      913            854
##  8  2013     1     1      557            600        -3      709            723
##  9  2013     1     1      557            600        -3      838            846
## 10  2013     1     1      558            600        -2      753            745
## # ℹ 336,766 more rows
## # ℹ 11 more variables: arr_delay <dbl>, carrier <chr>, flight <int>,
## #   tailnum <chr>, origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>,
## #   hour <dbl>, minute <dbl>, time_hour <dttm>

``flights is a tibble, a special type of data frame used by the tidyverse to avoid some common gotchas. The most important difference between tibbles and data frames is the way tibbles print; they are designed for large datasets, so they only show the first few rows and only the columns that fit on one screen.”

2 3.1.3

From the book:

it’s worth stating what they have in common:
- 1 The first argument is always a data frame.
- 2 The subsequent arguments typically describe which columns to operate on using the variable names (without quotes).
- 3 The output is always a new data frame.

flights |>
  filter(dest == "IAH") |> 
  group_by(year, month, day) |> 
  summarize(
    arr_delay = mean(arr_delay, na.rm = TRUE)
  )

## `summarise()` has grouped output by 'year', 'month'. You can override using the
## `.groups` argument.

## # A tibble: 365 × 4
## # Groups:   year, month [12]
##     year month   day arr_delay
##    <int> <int> <int>     <dbl>
##  1  2013     1     1     17.8 
##  2  2013     1     2      7   
##  3  2013     1     3     18.3 
##  4  2013     1     4     -3.2 
##  5  2013     1     5     20.2 
##  6  2013     1     6      9.28
##  7  2013     1     7     -7.74
##  8  2013     1     8      7.79
##  9  2013     1     9     18.1 
## 10  2013     1    10      6.68
## # ℹ 355 more rows

3 Other parts

3.2 Rows:
- filter: allows you to keep rows based on the values of the columns
- arrange: changes the order of the rows based on the value of the columns
- distinct: finds all the unique rows in a dataset
3.3 Columns:
- mutate: add new columns that are calculated from the existing columns
- select: select columns based on the names of the columns
- rename:
- relocate:
3.5 Groups:
- group_by: divide your dataset into groups meaningful for your analysis
- summarize

4 Optional

If you have time, you can read the following chapter, such as 5. Data tidying

Week 1: Data Transformation – RDS – dplyr tidyverse

1 tibble

2 3.1.3

3 Other parts

4 Optional