In this course, you will learn how to:
ggplot()here, dplyr, tidyr, ca,tibble, ggplot2here, dplyr, tidyr, ggplot2ca and tibble grave_number dating sex artefact_type artefact_count artefact_material
1 900 en.zvo male beaker 3 ceramics
2 900 en.zvo male dartpoint 5 lithics
3 900 en.zvo male wristguard 1 lithics
4 900 en.zvo male dagger 1 copper
5 900 en.zvo male axe 1 copper
6 900 en.zvo male bowl 1 ceramics
7 901 en.zvo female beaker 1 ceramics
8 901 en.zvo female dartpoint 2 lithics
'data.frame': 230 obs. of 6 variables:
$ grave_number : int 900 901 902 903 904 905 906 907 908 909 ...
$ dating : chr "en.zvo" "en.zvo" "en.zvo" "en.zvo" ...
$ sex : chr "male" "female" "male" "female" ...
$ artefact_type : chr "beaker" "beaker" "beaker" "beaker" ...
$ artefact_count : int 3 1 1 1 1 1 1 2 1 1 ...
$ artefact_material: chr "ceramics" "ceramics" "ceramics" "ceramics" ...
tidyr::pivot_wider()remember the function pivot_longer() from the tidyr package? Now we will use its reverse sister pivot_wider()
syntax within the pipeline:
df |> pivot_wider(names_from = , values_from = )
names_from - defines the column from which the new variables will be created (e.i. the values in this column will become variables)
values_from - defines from which column the count values will be taken
# A tibble: 6 × 15
grave_number dating sex beaker dartpoint wristguard dagger axe bowl beam
<int> <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
1 900 en.zvo male 3 5 1 1 1 1 NA
2 901 en.zvo fema… 1 2 1 NA NA 1 2
3 902 en.zvo male 1 2 1 NA NA 1 NA
4 903 en.zvo fema… 1 2 1 NA NA 1 2
5 904 en.zvo male 1 2 1 NA NA 1 NA
6 905 en.zvo fema… 1 2 1 NA NA 1 3
# ℹ 5 more variables: beam_amber <int>, koflik <int>, bracelet_bronze <int>,
# needle <int>, spear <int>
dplyr::mutate() because we want to change specific values based on specific conditionsdf_graves_wide <- df_graves_wide |>
mutate(
across(beaker:spear, ~ replace_na(.x, 0))
)
head(df_graves_wide)# A tibble: 6 × 15
grave_number dating sex beaker dartpoint wristguard dagger axe bowl beam
<int> <chr> <chr> <int> <int> <int> <int> <int> <int> <int>
1 900 en.zvo male 3 5 1 1 1 1 0
2 901 en.zvo fema… 1 2 1 0 0 1 2
3 902 en.zvo male 1 2 1 0 0 1 0
4 903 en.zvo fema… 1 2 1 0 0 1 2
5 904 en.zvo male 1 2 1 0 0 1 0
6 905 en.zvo fema… 1 2 1 0 0 1 3
# ℹ 5 more variables: beam_amber <int>, koflik <int>, bracelet_bronze <int>,
# needle <int>, spear <int>
ca()ca::ca()ca() functionca() into the pipecolnames() function to see the column names [1] "grave_number" "dating" "sex" "beaker"
[5] "dartpoint" "wristguard" "dagger" "axe"
[9] "bowl" "beam" "beam_amber" "koflik"
[13] "bracelet_bronze" "needle" "spear"
plot()plot() and create the so called biplot:ggplot2() function because it works only with dataframes or tables, and the result of the CA is not a dataframe but a list:List of 16
$ sv : num [1:11] 0.715 0.555 0.426 0.302 0.28 ...
$ nd : logi NA
$ rownames : NULL
$ rowmass : num [1:50] 0.0345 0.023 0.0144 0.0201 0.0144 ...
$ rowdist : num [1:50] 1.04 0.795 0.811 0.811 0.811 ...
$ rowinertia: num [1:50] 0.03727 0.01451 0.00944 0.01322 0.00944 ...
$ rowcoord : num [1:50, 1:11] -0.594 -0.582 -0.565 -0.997 -0.565 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:11] "Dim1" "Dim2" "Dim3" "Dim4" ...
$ rowsup : logi(0)
$ colnames : chr [1:12] "beaker" "dartpoint" "wristguard" "dagger" ...
$ colmass : num [1:12] 0.0948 0.2989 0.1063 0.0115 0.0603 ...
$ coldist : num [1:12] 1.03 0.405 0.712 2.42 1.336 ...
$ colinertia: num [1:12] 0.1006 0.0491 0.0539 0.0673 0.1077 ...
$ colcoord : num [1:12, 1:11] -1.127 -0.211 -0.578 -0.954 0.761 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:12] "beaker" "dartpoint" "wristguard" "dagger" ...
.. ..$ : chr [1:11] "Dim1" "Dim2" "Dim3" "Dim4" ...
$ colsup : logi(0)
$ N : int [1:50, 1:12] 3 1 1 1 1 1 1 2 1 1 ...
$ call : language ca.matrix(obj = as.matrix(obj))
- attr(*, "class")= chr "ca"
And it looks like this:
Principal inertias (eigenvalues):
1 2 3 4 5 6 7
Value 0.511633 0.307624 0.181845 0.091006 0.078339 0.059623 0.033804
Percentage 38.58% 23.2% 13.71% 6.86% 5.91% 4.5% 2.55%
8 9 10 11
Value 0.027614 0.022635 0.010099 0.001813
Percentage 2.08% 1.71% 0.76% 0.14%
Rows:
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
Mass 0.034483 0.022989 0.014368 0.020115 0.014368 0.022989 0.008621
ChiDist 1.039607 0.794512 0.810642 0.810643 0.810642 1.022514 1.260380
Inertia 0.037268 0.014511 0.009442 0.013218 0.009442 0.024035 0.013694
Dim. 1 -0.594280 -0.582023 -0.565171 -0.997440 -0.565171 -1.132524 -0.892933
Dim. 2 -0.618500 0.882143 -0.269234 0.480842 -0.269234 0.715240 -0.386827
[,8] [,9] [,10] [,11] [,12] [,13] [,14]
Mass 0.031609 0.017241 0.014368 0.025862 0.014368 0.008621 0.020115
ChiDist 1.020069 0.653439 0.810642 1.214695 0.810642 1.260380 0.810643
Inertia 0.032891 0.007362 0.009442 0.038159 0.009442 0.013694 0.013218
Dim. 1 -0.505019 -0.817328 -0.565171 -1.237589 -0.565171 -0.892933 -0.997440
Dim. 2 -0.684581 0.168310 -0.269234 0.897550 -0.269234 -0.386827 0.480842
[,15] [,16] [,17] [,18] [,19] [,20] [,21]
Mass 0.031609 0.057471 0.022989 0.022989 0.008621 0.014368 0.022989
ChiDist 1.020069 1.164693 1.022514 1.022514 1.260380 0.810642 0.805052
Inertia 0.032891 0.077960 0.024035 0.024035 0.013694 0.009442 0.014899
Dim. 1 -0.505019 -1.124728 -1.132524 -1.132524 -0.892933 -0.565171 -0.854313
Dim. 2 -0.684581 0.926668 0.715240 0.715240 -0.386827 -0.269234 0.481463
[,22] [,23] [,24] [,25] [,26] [,27] [,28]
Mass 0.014368 0.022989 0.014368 0.008621 0.020115 0.008621 0.045977
ChiDist 0.810642 1.022514 0.810642 1.260380 0.810643 1.260380 1.528425
Inertia 0.009442 0.024035 0.009442 0.013694 0.013218 0.013694 0.107406
Dim. 1 -0.565171 -1.132524 -0.565171 -0.892933 -0.997440 -0.892933 1.013292
Dim. 2 -0.269234 0.715240 -0.269234 -0.386827 0.480842 -0.386827 -1.436801
[,29] [,30] [,31] [,32] [,33] [,34] [,35]
Mass 0.020115 0.031609 0.002874 0.014368 0.017241 0.017241 0.043103
ChiDist 1.720234 1.029241 1.531716 1.286873 0.814505 1.567031 1.127401
Inertia 0.059524 0.033485 0.006742 0.023794 0.011438 0.042338 0.054786
Dim. 1 1.754818 0.539908 -0.294629 1.401986 0.282946 1.659640 0.762122
Dim. 2 1.958447 -1.253150 -0.671498 1.213833 -1.042592 1.669647 -1.452072
[,36] [,37] [,38] [,39] [,40] [,41] [,42]
Mass 0.017241 0.002874 0.034483 0.022989 0.028736 0.020115 0.025862
ChiDist 1.567031 1.531716 0.999467 1.748814 1.140247 2.130126 0.990176
Inertia 0.042338 0.006742 0.034446 0.070307 0.037361 0.091270 0.025356
Dim. 1 1.659640 -0.294629 0.673294 1.880786 1.224527 1.524143 1.102153
Dim. 2 1.669647 -0.671498 -1.359819 1.429176 0.902438 0.106477 0.592570
[,43] [,44] [,45] [,46] [,47] [,48] [,49]
Mass 0.002874 0.017241 0.020115 0.037356 0.002874 0.017241 0.014368
ChiDist 1.531716 0.814505 1.426285 1.092480 1.531716 0.814505 1.286873
Inertia 0.006742 0.011438 0.040920 0.044585 0.006742 0.011438 0.023794
Dim. 1 -0.294629 0.282946 1.443630 0.703363 -0.294629 0.282946 1.401986
Dim. 2 -0.671498 -1.042592 1.500527 -1.517977 -0.671498 -1.042592 1.213833
[,50]
Mass 0.002874
ChiDist 1.531716
Inertia 0.006742
Dim. 1 -0.294629
Dim. 2 -0.671498
Columns:
beaker dartpoint wristguard dagger axe bowl beam
Mass 0.094828 0.298851 0.106322 0.011494 0.060345 0.123563 0.100575
ChiDist 1.030170 0.405361 0.712256 2.419617 1.335676 0.483134 1.584578
Inertia 0.100636 0.049106 0.053938 0.067294 0.107657 0.028842 0.252532
Dim. 1 -1.127400 -0.210744 -0.577961 -0.953831 0.761197 0.105557 -1.486444
Dim. 2 0.060119 -0.372439 -0.331328 -0.478237 -1.894577 0.269450 1.306746
beam_amber koflik bracelet_bronze needle spear
Mass 0.051724 0.080460 0.025862 0.043103 0.002874
ChiDist 2.066809 1.185859 2.213668 1.837746 4.555217
Inertia 0.220950 0.113148 0.126733 0.145574 0.059626
Dim. 1 2.108594 1.347019 1.531099 1.663676 1.416626
Dim. 2 2.190109 -0.768237 -1.405001 2.047311 -2.590517
ggplot()ggplot() works only with data in table format, so we need to do some preparations firstggplot() in its own geom_point() layerrowcoord of the CA resultcolcoord[,1] and [,2] we can subset the coordinates in the first and second dimension, respectivelymutate() we can add external vectors into the existing table. Just keep in mind the vectors must have same number of objects as the table has rowsselect() we can select variables we need: coordinates, identifier, sex and dating# A tibble: 4 × 5
dating sex coord_grave_1 coord_grave_2 grave_number
<chr> <chr> <dbl> <dbl> <int>
1 en.zvo male -0.594 -0.619 900
2 en.zvo female -0.582 0.882 901
3 en.zvo male -0.565 -0.269 902
4 en.zvo female -0.997 0.481 903
tidyr::tibble()tidyr::tibble() creates a “tibble” which is a type of dataframe and works well with tidy datatibble() binds together 2 vectors with the coordinates together with the third vector “artefact”, that consists of artefact names. These names are extracted by colnames() from the names of the columns defined by select()ggplot()geom_point()ggplot() and play a bit with the settings:ggplot()+
geom_hline(yintercept = 0,
color = "gray50",
linetype = "dashed",
linewidth = 0.5)+
geom_vline(xintercept = 0,
color = "gray50",
linetype = "dashed",
linewidth = 0.5)+
geom_point(data = table_graves,
aes(x = coord_grave_1, y = coord_grave_2, color = dating, fill = dating, shape = sex),
alpha = 0.5,
size = 2)+
geom_point(data = table_artefacts,
aes(x = coord_artefact_1, y = coord_artefact_2),
shape = 4,
size = 2)+
geom_text(data = table_artefacts,
aes(label = artefact, x=coord_artefact_1, y=coord_artefact_2),
vjust = -1.5,
size = 3)+
scale_shape_manual(values = c("male" = 22, "female" = 25, "child" = 21))+
xlim(-1.55, 2.4)+
ylim(-2.6, 2.6)+
labs(x = paste0("dimension 1 (", eigenvalue_1," %)"),
y = paste0("dimension 2 (", eigenvalue_2," %)"),
title = "Correspondence analysis")+
theme_light()let’s have a look at the plot again and try to answer the research questions:
AES_707 Statistics seminar for archaeologists | Correspondence analysis