Part 2
4.) Data manipulation
dplyr / tidyverse5.) Data visualisation
ggplot2heredplyr packageWe will learn how to:
%>% to work with functions more efectivelyselect()rename()arrange()filter()summarise()mutate()But first, install and load package dplyr and here and load dartpoints.csv data
Name Catalog TARL Quad Length Width Thickness B.Width J.Width H.Length
1 Darl 41-0322 41CV0536 26/59 42.8 15.8 5.8 11.3 10.6 11.6
2 Darl 35-2946 41CV0235 21/63 40.5 17.4 5.8 NA 13.7 12.9
Weight Blade.Sh Base.Sh Should.Sh Should.Or Haft.Sh Haft.Or
1 3.6 S I S T S E
2 4.5 S I S T S E
select(dataframe, variable1, variable2)%>% hotkey on my computer “CTRL + SHIFT + M”rename(data, new_name = old_name) can be useful when dealing with complicated code names or different languagesarrange(data, variable) dart_type dart_ID dart_length dart_width dart_weight
1 Darl 36-3321 30.6 17.1 2.3
2 Darl 35-2382 31.2 15.6 2.5
3 Darl 36-3619 32.0 16.0 3.3
4 Darl 36-3520 32.4 14.5 2.5
desc() into the arrange():function filter(data, variable <operator> value) allows you to filter your data based on different conditions, for example minimal weight, type of the dartpoint, etc
logical and mathematical operators: ==, !=, <, >, >=, <=, &, |, etc (use ?dplyr::filter for more details)
here we use > to get only dartpoints longer than 80 mm
df_darts_travis <- df_darts_edit %>%
filter(dart_type == "Travis")
unique(df_darts_travis$dart_type)[1] "Travis"
| or & for filtering with more than one conditiondf_darts_wells_10 <- df_darts_edit %>%
filter(dart_type == "Wells" & dart_weight > 10)
head(df_darts_wells_10) dart_type dart_ID dart_length dart_width dart_weight
1 Wells 36-3088 65.4 25.1 12.6
2 Wells 44-0732 63.1 24.7 16.3
3 Wells 35-3079 58.9 24.4 10.5
Task: instead of & try operator | (OR) and see how the result differs
%in%summarise(data, new_variable = summary_statistics) is much more helpfullmean(), median(), sd(), min()…, (use ?summarise for more details)group_by(data, variable_to_be_grouped_by)df_darts_edit %>%
group_by(dart_type) %>%
summarise(
mean_length = mean(dart_length),
type_count = n()
)# A tibble: 5 × 3
dart_type mean_length type_count
<chr> <dbl> <int>
1 Darl 39.8 28
2 Ensor 42.7 10
3 Pedernales 57.9 32
4 Travis 51.4 11
5 Wells 53.1 10
round()# A tibble: 91 × 6
# Groups: dart_type [5]
dart_type dart_ID dart_length dart_width dart_weight mean_weight
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Pedernales 35-2855 110. 49.3 28.8 10.6
2 Pedernales 36-3879 84 21.2 9.3 10.6
3 Pedernales 35-0173 78.3 28.1 14.8 10.6
4 Pedernales 38-0098 70.4 30.4 13.1 10.6
5 Travis 43-0112 69 20.9 11.4 8.59
6 Pedernales 41-0239 67.2 27.1 15.3 10.6
7 Pedernales 35-2391 66 27.2 12.5 10.6
8 Wells 36-3088 65.4 25.1 12.6 8.68
9 Pedernales 43-0110 65 31.6 4.6 10.6
10 Travis 36-0006 64.6 21.5 15 8.59
# ℹ 81 more rows
summarise() creates a new dataframe from calculated values# A tibble: 5 × 2
dart_type width_max
<chr> <dbl>
1 Darl 23.3
2 Ensor 27.3
3 Pedernales 49.3
4 Travis 22.4
5 Wells 29.6
mutate() adds a new variable to the dataframe# A tibble: 8 × 6
# Groups: dart_type [3]
dart_type dart_ID dart_length dart_width dart_weight width_max
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 Pedernales 35-2855 110. 49.3 28.8 49.3
2 Pedernales 36-3879 84 21.2 9.3 49.3
3 Pedernales 35-0173 78.3 28.1 14.8 49.3
4 Pedernales 38-0098 70.4 30.4 13.1 49.3
5 Travis 43-0112 69 20.9 11.4 22.4
6 Pedernales 41-0239 67.2 27.1 15.3 49.3
7 Pedernales 35-2391 66 27.2 12.5 49.3
8 Wells 36-3088 65.4 25.1 12.6 29.6
df_darts_sum <- df_darts_edit %>%
group_by(dart_type) %>%
summarise(
length_mean = round(mean(dart_length), 1),
weight_mean = round(mean(dart_weight), 1),
type_count = n()) %>%
mutate(type_percent = round(type_count/sum(type_count)*100, 1)) %>%
arrange(desc(type_count))
df_darts_sum# A tibble: 5 × 5
dart_type length_mean weight_mean type_count type_percent
<chr> <dbl> <dbl> <int> <dbl>
1 Pedernales 57.9 10.6 32 35.2
2 Darl 39.8 4.4 28 30.8
3 Travis 51.4 8.6 11 12.1
4 Ensor 42.7 5.1 10 11
5 Wells 53.1 8.7 10 11
write.csv(name_of_your_object, file = "path_to_your_folder") will save your result as a .csv file, which is niceBasic syntax
ggplot(data = <your data frame>) +
aes(x = <variable to be mapped to axis x>) +
geom_<geometry>()
Lets go back to the scatterplot and play a little
Different shapes with their codes:

ggplot(data = df_darts_edit)+
aes(x = dart_weight, y = dart_length, color = dart_type, size = dart_weight)+
geom_point(alpha = 0.5)+
labs(
title = "A very nice plot",
subtitle = "Look at those colors!",
x ="weight (g)",
y = "length (g)",
caption = "Data = package Archdata",
color = "Type of a dart",
size = "Weight of a dart")+
theme_classic()ggplot(data = df_darts_edit)+
aes(x = dart_weight, y = dart_length, color = dart_type, size = dart_weight)+
geom_point(alpha = 0.5)+
facet_wrap(~dart_type)+
labs(
title = "A very nice plot",
subtitle = "Look at those colors!",
x ="weight (g)",
y = "length (g)",
caption = "Data = package Archdata",
color = "Type of a dart",
size = "Weight of a dart")+
theme_light()best_plot <- ggplot(data = df_darts_edit)+
aes(x = dart_weight, y = dart_length, color = dart_type, size = dart_weight)+
geom_point(alpha = 0.5)+
labs(
title = "A very nice plot",
subtitle = "Look at those colors!",
x ="weight (g)",
y = "length (g)",
caption = "Data = package Archdata",
color = "Type of a dart",
size = "Weight of a dart")+
theme_classic()Hint: you can get the information about the dataset by:
geom_histogram(), geom_boxplot(), geom_point(), labs()
RD ND SD H NH Phase
1 11.1 10.0 10.3 5.5 2.5 Subapennine
2 9.5 9.2 9.8 4.8 2.0 Subapennine
3 20.8 20.9 22.0 9.5 3.8 Subapennine
4 19.5 18.2 19.5 8.8 2.7 Subapennine
5 15.5 15.5 18.8 9.8 3.2 Subapennine
6 11.7 11.1 11.5 3.8 1.4 Subapennine
'data.frame': 60 obs. of 6 variables:
$ RD : num 11.1 9.5 20.8 19.5 15.5 11.7 10.8 15 18.5 11 ...
$ ND : num 10 9.2 20.9 18.2 15.5 11.1 10.7 16.1 16.4 8.9 ...
$ SD : num 10.3 9.8 22 19.5 18.8 11.5 10.8 16.4 18 9.5 ...
$ H : num 5.5 4.8 9.5 8.8 9.8 3.8 3.5 11.8 10.5 5.8 ...
$ NH : num 2.5 2 3.8 2.7 3.2 1.4 1.7 3.5 4.8 3.7 ...
$ Phase: chr "Subapennine" "Subapennine" "Subapennine" "Subapennine" ...