Introduction to R Programming

Exercise Solutions

Welcome

Welcome to the Intro to R Programming Workshop!

This is the solutions notebook. Only check this out if you want to see the solutions to the exercises!

For Part I (Google Colab) click here

For Part II (Google Colab) click here.

Link to Slides: https://favstats.github.io/ds3_r_intro/

Link to all Materials: https://github.com/favstats/ds3_r_intro

Exercises I

The following includes a list of exercises that you can complete on your own.

Task 1

Take a look at the table below.

Pick three animals from the Animal Lifespan data we haven’t talked about yet.

Assign the lifespan values to respective objects with appropriate names.

Animal Maximum Longevity (in years)
Human 122.5.5
Domestic dog 24.0
Domestic cat 30.0
American alligator 77.0
Golden hamster 3.9
King penguin 26.0
Lion 27.0
Greenland shark 392.0
Galapagos tortoise 177.0
African bush elephant 65.0
California sea lion 35.7
Fruit fly 0.3
House mouse 4.0
Giraffe 39.5
Wild boar 27.0
giraffe_lifespan <- 29.5
penguin_lifespan <- 26
elephant_lifespan <- 65

Task 2

Create three (different) logical tests which compare the maximum longevity between your chosen animal lifespans.

Does the output you get make sense?

giraffe_lifespan == penguin_lifespan
## [1] FALSE
giraffe_lifespan > penguin_lifespan 
## [1] TRUE
elephant_lifespan != penguin_lifespan
## [1] TRUE

Task 3

Create two vectors with the help of c():

  1. strings (i.e. texts) of all the animals you chose
  2. the respective lifespan values (in the same order)
theanimals <- c("giraffe", "penguin", "elephant")
lifespans <- c(giraffe_lifespan, penguin_lifespan, elephant_lifespan)

Task 4

Calculate the mean of your lifespan vector.

mean(lifespans)
## [1] 40.16667

Task 5

5.1 Retrieve the second value of the vector that contains your animal names.

Tip: Square brackets are your friend.

theanimals[2]
## [1] "penguin"

5.2 Using code, find out which animals in your lifespans vector have a maximum longevity of above 25.

Tip: For an elegant solution you need to use both vectors, square brackets and a logical test. If you need help revisit Indexing with logical tests

theanimals[lifespans > 25]
## [1] "giraffe"  "penguin"  "elephant"

Task 6

Calculate the animal to human conversion ratios for the animals you’ve picked and assign the results to an object.

conversions <- 122.5/lifespans

Task 7

Calculate the human years for your picked animals and assume they are all 5 years old.

conversions*5
## [1] 20.762712 23.557692  9.423077

Task 8

Pick one of the animals you chose and create a function which takes as input animal years and outputs human years. Test the function and validate with results from the seventh exercise.

You can name the function in this style:

[you_animal_name]_to_human_years

Tip: If you need help revisit the section Dog to Human years function

Create the function here:

penguin_to_human_years <- function(animal_years, human_lifespan = 122.5, penguin_lifespan = 26){

  ratio <- human_lifespan/penguin_lifespan

  human_years <- animal_years*ratio

  return(human_years)
}

Try it out here:

penguin_to_human_years(5)
## [1] 23.55769

Exercises II

The following includes a list of exercises that you can complete on your own.

We are going to use the palmerpenguins dataset for the tasks ahead!

Functions reference list

For reference, here is a list of some useful functions.

If you have trouble with any of these functions, try reading the documentation with ?function_name

Remember: all these functions take the data first.

  • filter()

    • Subset rows using column values
  • mutate()

    • Create and modify delete columns
  • rename()

    • Rename columns
  • select()

    • Subset columns using their names and types
  • summarise(); summarize()

    • Summarise each group to fewer rows
  • group_by(); ungroup()

    • Group by one or more variables
  • arrange()

    • Arrange rows by column values
  • count(); tally()

    • Count observations by group
  • distinct()

    • Subset distinct/unique rows
  • pull()

    • Extract a single column
  • ifelse()

    • useful for coding of binary variables
  • case_when()

    • useful for recoding (when ifelse is not enough)
  • separate()

    • separate two variables by some separator
  • pivot_wider()

    • turn data into wide format
  • pivot_longer()

    • turn data into long format

Task 1

Load the tidyverse and janitor packages.

If janitor is not installed yet (it will say janitor not found) install it.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0     ✔ purrr   0.3.5
## ✔ tibble  3.2.1     ✔ dplyr   1.1.2
## ✔ tidyr   1.2.1     ✔ stringr 1.4.1
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.2.3
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(janitor)
## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Task 2

Read in the already cleaned palmerpenguins dataset using

Assign the resulting data to penguins.

Then take a look a look at it using glimpse.

What kind of variables can you recognize?

penguins <- read_csv("https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv")
## Rows: 344 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): species, island, sex
## dbl (5): bill_length_mm, bill_depth_mm, flipper_length_mm, body_mass_g, year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species           <chr> "Adelie", "Adelie", "Adelie", "Adelie", "Adelie", "A…
## $ island            <chr> "Torgersen", "Torgersen", "Torgersen", "Torgersen", …
## $ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
## $ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
## $ flipper_length_mm <dbl> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
## $ body_mass_g       <dbl> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
## $ sex               <chr> "male", "female", "female", NA, "female", "male", "f…
## $ year              <dbl> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…

Task 3

Only keep the variables: species, island and sex.

select(penguins, species, island, sex)
## # A tibble: 344 × 3
##    species island    sex   
##    <chr>   <chr>     <chr> 
##  1 Adelie  Torgersen male  
##  2 Adelie  Torgersen female
##  3 Adelie  Torgersen female
##  4 Adelie  Torgersen <NA>  
##  5 Adelie  Torgersen female
##  6 Adelie  Torgersen male  
##  7 Adelie  Torgersen female
##  8 Adelie  Torgersen male  
##  9 Adelie  Torgersen <NA>  
## 10 Adelie  Torgersen <NA>  
## # ℹ 334 more rows
penguins %>%
   select(species, island, sex)
## # A tibble: 344 × 3
##    species island    sex   
##    <chr>   <chr>     <chr> 
##  1 Adelie  Torgersen male  
##  2 Adelie  Torgersen female
##  3 Adelie  Torgersen female
##  4 Adelie  Torgersen <NA>  
##  5 Adelie  Torgersen female
##  6 Adelie  Torgersen male  
##  7 Adelie  Torgersen female
##  8 Adelie  Torgersen male  
##  9 Adelie  Torgersen <NA>  
## 10 Adelie  Torgersen <NA>  
## # ℹ 334 more rows

Only keep variables 2 to 4.

select(penguins, 2:4)
## # A tibble: 344 × 3
##    island    bill_length_mm bill_depth_mm
##    <chr>              <dbl>         <dbl>
##  1 Torgersen           39.1          18.7
##  2 Torgersen           39.5          17.4
##  3 Torgersen           40.3          18  
##  4 Torgersen           NA            NA  
##  5 Torgersen           36.7          19.3
##  6 Torgersen           39.3          20.6
##  7 Torgersen           38.9          17.8
##  8 Torgersen           39.2          19.6
##  9 Torgersen           34.1          18.1
## 10 Torgersen           42            20.2
## # ℹ 334 more rows
penguins %>%
   select(2:4)
## # A tibble: 344 × 3
##    island    bill_length_mm bill_depth_mm
##    <chr>              <dbl>         <dbl>
##  1 Torgersen           39.1          18.7
##  2 Torgersen           39.5          17.4
##  3 Torgersen           40.3          18  
##  4 Torgersen           NA            NA  
##  5 Torgersen           36.7          19.3
##  6 Torgersen           39.3          20.6
##  7 Torgersen           38.9          17.8
##  8 Torgersen           39.2          19.6
##  9 Torgersen           34.1          18.1
## 10 Torgersen           42            20.2
## # ℹ 334 more rows

Remove the column year.

select(penguins, -year)
## # A tibble: 344 × 7
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 1 more variable: sex <chr>
penguins %>%
   select(-year)
## # A tibble: 344 × 7
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 1 more variable: sex <chr>

Only include columns that contain “mm” in the variable name.

select(penguins, contains("mm"))
## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm flipper_length_mm
##             <dbl>         <dbl>             <dbl>
##  1           39.1          18.7               181
##  2           39.5          17.4               186
##  3           40.3          18                 195
##  4           NA            NA                  NA
##  5           36.7          19.3               193
##  6           39.3          20.6               190
##  7           38.9          17.8               181
##  8           39.2          19.6               195
##  9           34.1          18.1               193
## 10           42            20.2               190
## # ℹ 334 more rows
penguins %>%
   select(contains("mm"))
## # A tibble: 344 × 3
##    bill_length_mm bill_depth_mm flipper_length_mm
##             <dbl>         <dbl>             <dbl>
##  1           39.1          18.7               181
##  2           39.5          17.4               186
##  3           40.3          18                 195
##  4           NA            NA                  NA
##  5           36.7          19.3               193
##  6           39.3          20.6               190
##  7           38.9          17.8               181
##  8           39.2          19.6               195
##  9           34.1          18.1               193
## 10           42            20.2               190
## # ℹ 334 more rows

Task 4

Rename island to location.

select(penguins, location = island)
## # A tibble: 344 × 1
##    location 
##    <chr>    
##  1 Torgersen
##  2 Torgersen
##  3 Torgersen
##  4 Torgersen
##  5 Torgersen
##  6 Torgersen
##  7 Torgersen
##  8 Torgersen
##  9 Torgersen
## 10 Torgersen
## # ℹ 334 more rows
penguins %>%
   rename(location = island)
## # A tibble: 344 × 8
##    species location  bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Task 5

Filter the data so that species only includes Chinstrap.

filter(penguins, species == "Chinstrap")
## # A tibble: 68 × 8
##    species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>     <chr>           <dbl>         <dbl>             <dbl>       <dbl>
##  1 Chinstrap Dream            46.5          17.9               192        3500
##  2 Chinstrap Dream            50            19.5               196        3900
##  3 Chinstrap Dream            51.3          19.2               193        3650
##  4 Chinstrap Dream            45.4          18.7               188        3525
##  5 Chinstrap Dream            52.7          19.8               197        3725
##  6 Chinstrap Dream            45.2          17.8               198        3950
##  7 Chinstrap Dream            46.1          18.2               178        3250
##  8 Chinstrap Dream            51.3          18.2               197        3750
##  9 Chinstrap Dream            46            18.9               195        4150
## 10 Chinstrap Dream            51.3          19.9               198        3700
## # ℹ 58 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>
penguins %>%
   filter(species == "Chinstrap")
## # A tibble: 68 × 8
##    species   island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>     <chr>           <dbl>         <dbl>             <dbl>       <dbl>
##  1 Chinstrap Dream            46.5          17.9               192        3500
##  2 Chinstrap Dream            50            19.5               196        3900
##  3 Chinstrap Dream            51.3          19.2               193        3650
##  4 Chinstrap Dream            45.4          18.7               188        3525
##  5 Chinstrap Dream            52.7          19.8               197        3725
##  6 Chinstrap Dream            45.2          17.8               198        3950
##  7 Chinstrap Dream            46.1          18.2               178        3250
##  8 Chinstrap Dream            51.3          18.2               197        3750
##  9 Chinstrap Dream            46            18.9               195        4150
## 10 Chinstrap Dream            51.3          19.9               198        3700
## # ℹ 58 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Filter the data so that species only includes Chinstrap or Gentoo.

filter(penguins, species %in% c("Chinstrap", "Gentoo"))
## # A tibble: 192 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>             <dbl>       <dbl>
##  1 Gentoo  Biscoe           46.1          13.2               211        4500
##  2 Gentoo  Biscoe           50            16.3               230        5700
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           50            15.2               218        5700
##  5 Gentoo  Biscoe           47.6          14.5               215        5400
##  6 Gentoo  Biscoe           46.5          13.5               210        4550
##  7 Gentoo  Biscoe           45.4          14.6               211        4800
##  8 Gentoo  Biscoe           46.7          15.3               219        5200
##  9 Gentoo  Biscoe           43.3          13.4               209        4400
## 10 Gentoo  Biscoe           46.8          15.4               215        5150
## # ℹ 182 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>
penguins %>%
   filter(species %in% c("Chinstrap", "Gentoo"))
## # A tibble: 192 × 8
##    species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>           <dbl>         <dbl>             <dbl>       <dbl>
##  1 Gentoo  Biscoe           46.1          13.2               211        4500
##  2 Gentoo  Biscoe           50            16.3               230        5700
##  3 Gentoo  Biscoe           48.7          14.1               210        4450
##  4 Gentoo  Biscoe           50            15.2               218        5700
##  5 Gentoo  Biscoe           47.6          14.5               215        5400
##  6 Gentoo  Biscoe           46.5          13.5               210        4550
##  7 Gentoo  Biscoe           45.4          14.6               211        4800
##  8 Gentoo  Biscoe           46.7          15.3               219        5200
##  9 Gentoo  Biscoe           43.3          13.4               209        4400
## 10 Gentoo  Biscoe           46.8          15.4               215        5150
## # ℹ 182 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Filter the data so it includes only penguins that are male and of the species Adelie.

filter(penguins, sex == "male" & species == "Adelie")
## # A tibble: 73 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.3          20.6               190        3650
##  3 Adelie  Torgersen           39.2          19.6               195        4675
##  4 Adelie  Torgersen           38.6          21.2               191        3800
##  5 Adelie  Torgersen           34.6          21.1               198        4400
##  6 Adelie  Torgersen           42.5          20.7               197        4500
##  7 Adelie  Torgersen           46            21.5               194        4200
##  8 Adelie  Biscoe              37.7          18.7               180        3600
##  9 Adelie  Biscoe              38.2          18.1               185        3950
## 10 Adelie  Biscoe              38.8          17.2               180        3800
## # ℹ 63 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>
penguins %>%
   filter(sex == "male" & species == "Adelie")
## # A tibble: 73 × 8
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.3          20.6               190        3650
##  3 Adelie  Torgersen           39.2          19.6               195        4675
##  4 Adelie  Torgersen           38.6          21.2               191        3800
##  5 Adelie  Torgersen           34.6          21.1               198        4400
##  6 Adelie  Torgersen           42.5          20.7               197        4500
##  7 Adelie  Torgersen           46            21.5               194        4200
##  8 Adelie  Biscoe              37.7          18.7               180        3600
##  9 Adelie  Biscoe              38.2          18.1               185        3950
## 10 Adelie  Biscoe              38.8          17.2               180        3800
## # ℹ 63 more rows
## # ℹ 2 more variables: sex <chr>, year <dbl>

Task 6

Create three new variables that calculates bill_length_mm and bill_depth_mm and flipper_length_mm from milimeter to centimeter.

Tip: divide the length value by 10.

mutate(penguins, 
      bill_length_cm = bill_length_mm/10,
      bill_depth_cm = bill_depth_mm/10,
      flipper_length_cm = flipper_length_mm/10
)
## # A tibble: 344 × 11
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 5 more variables: sex <chr>, year <dbl>, bill_length_cm <dbl>,
## #   bill_depth_cm <dbl>, flipper_length_cm <dbl>
penguins %>%
   mutate(bill_length_cm = bill_length_mm/10,
          bill_depth_cm = bill_depth_mm/10,
          flipper_length_cm = flipper_length_mm/10)
## # A tibble: 344 × 11
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 5 more variables: sex <chr>, year <dbl>, bill_length_cm <dbl>,
## #   bill_depth_cm <dbl>, flipper_length_cm <dbl>

Create a new variable called bill_depth_cat which has two values:

  • Everything above a bill depth of 18mm and 18mm itself is “high”
  • Everything below a bill depth of 18mm is “low”
mutate(penguins, bill_depth_cat = ifelse(bill_depth_mm >= 18, "high", "low"))
## # A tibble: 344 × 9
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 3 more variables: sex <chr>, year <dbl>, bill_depth_cat <chr>
penguins %>%
   mutate(bill_depth_cat = ifelse(bill_depth_mm >= 18, "high", "low"))
## # A tibble: 344 × 9
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 3 more variables: sex <chr>, year <dbl>, bill_depth_cat <chr>

Create a new variable called species_short.

  • Adelie should become A
  • Chinstrap should become C
  • Gentoo should become G
mutate(penguins, 
        island_short = case_when(
          species == "Adelie"  ~ "A",
          species == "Chinstrap"  ~ "C",
          species == "Gentoo"  ~ "G",
        ))
## # A tibble: 344 × 9
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 3 more variables: sex <chr>, year <dbl>, island_short <chr>
penguins %>% 
  mutate(island_short = case_when(
          species == "Adelie"  ~ "A",
          species == "Chinstrap"  ~ "C",
          species == "Gentoo"  ~ "G",
      ))
## # A tibble: 344 × 9
##    species island    bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
##    <chr>   <chr>              <dbl>         <dbl>             <dbl>       <dbl>
##  1 Adelie  Torgersen           39.1          18.7               181        3750
##  2 Adelie  Torgersen           39.5          17.4               186        3800
##  3 Adelie  Torgersen           40.3          18                 195        3250
##  4 Adelie  Torgersen           NA            NA                  NA          NA
##  5 Adelie  Torgersen           36.7          19.3               193        3450
##  6 Adelie  Torgersen           39.3          20.6               190        3650
##  7 Adelie  Torgersen           38.9          17.8               181        3625
##  8 Adelie  Torgersen           39.2          19.6               195        4675
##  9 Adelie  Torgersen           34.1          18.1               193        3475
## 10 Adelie  Torgersen           42            20.2               190        4250
## # ℹ 334 more rows
## # ℹ 3 more variables: sex <chr>, year <dbl>, island_short <chr>

Task 7

Calculate the average body_mass_g per island.

grouped_by_island <- group_by(penguins, island) 

summarise(grouped_by_island, avg_body_mass_g = mean(body_mass_g, na.rm = T))
## # A tibble: 3 × 2
##   island    avg_body_mass_g
##   <chr>               <dbl>
## 1 Biscoe              4716.
## 2 Dream               3713.
## 3 Torgersen           3706.

If you haven’t done so already, try using the %>% operator to do this.

penguins %>% 
   group_by(island) %>%
   summarise(avg_body_mass_g = mean(body_mass_g, na.rm = T))
## # A tibble: 3 × 2
##   island    avg_body_mass_g
##   <chr>               <dbl>
## 1 Biscoe              4716.
## 2 Dream               3713.
## 3 Torgersen           3706.

Task 8

Use the pipe operator (%>%) to do all the operations below.

  1. Filter the penguins data so that it only includes Chinstrap or Adelie.
  2. Rename sex to observed_sex
  3. Only keep the variables species, observed_sex, bill_length_mm and bill_depth_mm
  4. Calculate the ratio between bill_length_mm and bill_depth_mm
  5. Sort the data by the highest ratio

Try to create the pipe step by step and execute code as you go to see if it works.

Once you are done, assign the data to new_penguins.

penguins %>% 
   filter(species %in% c("Chinstrap", "Adelie")) %>%
   rename(observed_sex = sex) %>%
   select(species, observed_sex, bill_length_mm, bill_depth_mm) %>%
   mutate(ratio = bill_length_mm/bill_depth_mm) %>%
   arrange(desc(ratio))
## # A tibble: 220 × 5
##    species   observed_sex bill_length_mm bill_depth_mm ratio
##    <chr>     <chr>                 <dbl>         <dbl> <dbl>
##  1 Chinstrap female                 58            17.8  3.26
##  2 Chinstrap female                 48.1          16.4  2.93
##  3 Chinstrap female                 49.8          17.3  2.88
##  4 Chinstrap male                   52            18.1  2.87
##  5 Chinstrap female                 50.9          17.9  2.84
##  6 Chinstrap female                 46.8          16.5  2.84
##  7 Chinstrap female                 47.5          16.8  2.83
##  8 Chinstrap female                 46.9          16.6  2.83
##  9 Chinstrap male                   51.3          18.2  2.82
## 10 Chinstrap male                   55.8          19.8  2.82
## # ℹ 210 more rows

Calculate the average ratio by species and sex, again using pipes.

penguins %>% 
   group_by(island, sex) %>%
   summarise(avg_body_mass_g = mean(body_mass_g, na.rm = T))
## `summarise()` has grouped output by 'island'. You can override using the
## `.groups` argument.
## # A tibble: 9 × 3
## # Groups:   island [3]
##   island    sex    avg_body_mass_g
##   <chr>     <chr>            <dbl>
## 1 Biscoe    female           4319.
## 2 Biscoe    male             5105.
## 3 Biscoe    <NA>             4588.
## 4 Dream     female           3446.
## 5 Dream     male             3987.
## 6 Dream     <NA>             2975 
## 7 Torgersen female           3396.
## 8 Torgersen male             4035.
## 9 Torgersen <NA>             3681.

Task 9

Count the number of penguins by island and species.

penguins %>%
  count(island, species)
## # A tibble: 5 × 3
##   island    species       n
##   <chr>     <chr>     <int>
## 1 Biscoe    Adelie       44
## 2 Biscoe    Gentoo      124
## 3 Dream     Adelie       56
## 4 Dream     Chinstrap    68
## 5 Torgersen Adelie       52

Task 10

Below is a dataset that needs some cleaning.

Use the skills that you have learned so far to turn the data into a tidy dataset.

animal_friends <- tibble(
  Names = c("Francis", "Catniss", "Theodor", "Eugenia"),
  TheAnimals = c("Dog", "Cat", "Hamster", "Rabbit"),
  Sex = c("m", "f", "m", "f"),
  a_opterr = c("me", "me", "me", "me"),
  `Age/Adopted/Condition` = c("8/2020/Very Good", "13/2019/Wild", "1/2021/Fair", "2/2020/Good")    
) 

Start here:

tidy_animal_friends <- animal_friends %>%
  ## first clean the names
  clean_names() %>%
  ## rename some variables
  rename(adopter = a_opterr,
         animals = the_animals) %>%
  remove_constant() %>%
  separate(age_adopted_condition, sep = "/", c("age", "year_adopted", "condition")) 
tidy_animal_friends
## # A tibble: 4 × 6
##   names   animals sex   age   year_adopted condition
##   <chr>   <chr>   <chr> <chr> <chr>        <chr>    
## 1 Francis Dog     m     8     2020         Very Good
## 2 Catniss Cat     f     13    2019         Wild     
## 3 Theodor Hamster m     1     2021         Fair     
## 4 Eugenia Rabbit  f     2     2020         Good

If you are done, turn the final data into long format.

tidy_animal_friends %>%
  pivot_longer(cols = c(sex, age, year_adopted, condition))
## # A tibble: 16 × 4
##    names   animals name         value    
##    <chr>   <chr>   <chr>        <chr>    
##  1 Francis Dog     sex          m        
##  2 Francis Dog     age          8        
##  3 Francis Dog     year_adopted 2020     
##  4 Francis Dog     condition    Very Good
##  5 Catniss Cat     sex          f        
##  6 Catniss Cat     age          13       
##  7 Catniss Cat     year_adopted 2019     
##  8 Catniss Cat     condition    Wild     
##  9 Theodor Hamster sex          m        
## 10 Theodor Hamster age          1        
## 11 Theodor Hamster year_adopted 2021     
## 12 Theodor Hamster condition    Fair     
## 13 Eugenia Rabbit  sex          f        
## 14 Eugenia Rabbit  age          2        
## 15 Eugenia Rabbit  year_adopted 2020     
## 16 Eugenia Rabbit  condition    Good