Intro & Joining tables

Today we will…

  • learn basics of the correspondence analysis
  • explore distances, similarity and correlation
  • transform your data from the long-table to wide-table format
  • finish the exercise from the last lesson

Joining tables

It’s rare that a data analysis involves only a single data frame. Typically you have many data frames, and you must join them together to answer the questions that you’re interested in.

(Hadley Wickham, R for Data Science, https://r4ds.hadley.nz/joins.html#introduction)

Joins

Inner join

  • Only the rows that have a common key in both x and y tables.

Full join

  • All the rows in tables x and y.
  • Missing values (NA) where keys do not match in x or y.

Left join

  • All rows from x.
  • Missing values (NA) where key from x does not match any key in y.

Example

graves
# A tibble: 3 × 3
  grave_id burial     mound_present
  <chr>    <chr>      <lgl>        
1 G-001    missing    FALSE        
2 G-002    inhumation FALSE        
3 G-003    inhumation TRUE         
artefacts
# A tibble: 5 × 3
  artefact_id grave_id artefact_type
  <chr>       <chr>    <chr>        
1 A-0076      G-001    Whetstone    
2 A-0135      G-001    Arrowhead    
3 A-0187      G-002    Quern stone  
4 A-0116      G-004    Dagger       
5 A-0428      <NA>     Dagger       

Inner join

inner_join(graves, artefacts, join_by(grave_id))
# A tibble: 3 × 5
  grave_id burial     mound_present artefact_id artefact_type
  <chr>    <chr>      <lgl>         <chr>       <chr>        
1 G-001    missing    FALSE         A-0076      Whetstone    
2 G-001    missing    FALSE         A-0135      Arrowhead    
3 G-002    inhumation FALSE         A-0187      Quern stone  

Full join

full_join(graves, artefacts)
# A tibble: 6 × 5
  grave_id burial     mound_present artefact_id artefact_type
  <chr>    <chr>      <lgl>         <chr>       <chr>        
1 G-001    missing    FALSE         A-0076      Whetstone    
2 G-001    missing    FALSE         A-0135      Arrowhead    
3 G-002    inhumation FALSE         A-0187      Quern stone  
4 G-003    inhumation TRUE          <NA>        <NA>         
5 G-004    <NA>       NA            A-0116      Dagger       
6 <NA>     <NA>       NA            A-0428      Dagger       

Left join graves -> artefacts

graves |> 
  left_join(artefacts)
# A tibble: 4 × 5
  grave_id burial     mound_present artefact_id artefact_type
  <chr>    <chr>      <lgl>         <chr>       <chr>        
1 G-001    missing    FALSE         A-0076      Whetstone    
2 G-001    missing    FALSE         A-0135      Arrowhead    
3 G-002    inhumation FALSE         A-0187      Quern stone  
4 G-003    inhumation TRUE          <NA>        <NA>         

Left join artefacts -> graves

artefacts |> 
  left_join(graves)
# A tibble: 5 × 5
  artefact_id grave_id artefact_type burial     mound_present
  <chr>       <chr>    <chr>         <chr>      <lgl>        
1 A-0076      G-001    Whetstone     missing    FALSE        
2 A-0135      G-001    Arrowhead     missing    FALSE        
3 A-0187      G-002    Quern stone   inhumation FALSE        
4 A-0116      G-004    Dagger        <NA>       NA           
5 A-0428      <NA>     Dagger        <NA>       NA