class: center, top ![:scale 30%](/assets/images/coding_club_logo_1.png) # 29 March 2022 ## INBO coding club 01.05 - Isala Van Diest --- class: center, middle ![:scale 100%](/assets/images/20220329/20220329_happy_families.png) --- class: center, middle ![:scale 90%](/assets/images/20220329/20220329_badge_tidy_data.png) --- class: center, middle ## What does tidy data mean? ![:scale 60%](/assets/images/20220329/20220329_paper_tidy_data.png) Hadley Wickham's paper: [Tidy Data](https://www.jstatsoft.org/article/view/v059i10), Journal of Statistical Software, 2014 --- class: center, middle ![:scale 100%](/assets/images/20220329/20220329_paper_untidy_data_example.png) --- class: center, middle ![:scale 80%](/assets/images/20220329/20220329_paper_untidy_to_tidy_data_example.png) --- class: center, middle ![:scale 100%](/assets/images/20220329/20220329_tidy_data_principles_1.png) --- class: center, middle ![:scale 100%](/assets/images/20220329/20220329_tidy_data_principles_2.png) --- class: center, middle ![:scale 100%](/assets/images/20220329/20220329_tidy_data_principles_3.png) --- class: center, top ![:scale 100%](/assets/images/20220329/20220329_cheat_sheet_tidyr.png) [Download cheatsheet here](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20220329_cheat_sheet_tidyr.pdf) --- class: center, top ### How to get started? Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. ### First time coding club? Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. --- class: center, top ### Share your code during the coding session! Go to https://hackmd.io/EzGX1Ws9TzCTRUZuFtgcJQ?edit
--- class: left, top # Download data and code - Download everything automatically via `inborutils::setup_codingclub_session()` - manually*, from [data/20220329](https://github.com/inbo/coding-club/blob/master/data/20220329/) and [src/20220329](https://github.com/inbo/coding-club/blob/master/src/20220329). Place the R script in your folder `src/20220329/` and data in `data/20220329/`. In general, you can use the date in "YYYYMMDD" format , e.g. `setup_codingclub_session("20201027")`, to download the coding club material of October, 27 2020. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).
* __Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)
--- class: left, top Load tidyverse packages ```r library(tidyverse) ``` --- background-image: url(/assets/images/background_challenge_1.png) class: left, top # Challenge 1 Make the file [20220329_surveys.xlsx](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_surveys.xlsx) as tidy as possible: 1. Never modify the raw data: a (very) good practice 2. Document the issues you encountered in the [hackmd (challenge 1)](https://hackmd.io/EzGX1Ws9TzCTRUZuFtgcJQ?both#Challenge-1) --- background-image: url(/assets/images/background_challenge_2.png) class: left, top # Challenge 2 Make the file [20220329_manta_datasample.xlsx](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_manta_datasample.xlsx) as tidy as possible: 1. Never modify the raw data: a (very) good practice 2. Document the issues you encountered in the [hackmd (challenge 2)](https://hackmd.io/EzGX1Ws9TzCTRUZuFtgcJQ?both#Challenge-2) --- class: left, top # Intermezzo: metadata "data that provides information about other data". See [definition from Merriam-Webster](https://www.merriam-webster.com/dictionary/metadata/)
meta + data = transcending the data
metadata can describe fields (= columns) used in a standard, e.g. the [Darwin Core standard](https://dwc.tdwg.org/terms/)
![:scale 110%](/assets/images/20220329/20220329_example_metadata_standard.png) --- class: left, top # Intermezzo: metadata metadata can describe datasets, e.g. a [GBIF checklist](https://www.gbif.org/dataset/6d9e952f-948c-4483-9807-575348147c7e)
![:scale 110%](/assets/images/20220329/20220329_example_metadata_dataset.png) --- background-image: url(/assets/images/background_challenge_3.png) class: left, top # Challenge 3 1. In the world of habitat studies, if multiple vegetation types occur in the same polygon, they are listed using semicolons as separators. Human readable? YES. Machine readable: it can be better :-) Make the data.frame in [20220329_habitat_types.txt](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_habitat_types.txt) tidy. Hint: use the cheat sheet. 2. Other people used the vegetation types they ecountered as columns with logical values (`TRUE`/`FALSE`). Human readable? YES. Machine readable: YES. But not tidy! Make the data.frame in [20220329_habitat_types_2.txt](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_habitat_types_2.txt) tidy and clean it up. See screenshot below. Hint: use the cheat sheet or check [online documentation](https://tidyr.tidyverse.org/articles/pivot.html) of the pivot functions. 3. In the world of camera trap data, there are people who like making things hard to read and understand. Try to tidy the dataset [20220329_data_camtrap.txt](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_camtrap_data.txt) using R and based on the (tidy!) metadata table* [20220329_metadata_camtrap.txt](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_camtrap_metadata.txt).
__\* Note__: you do NOT need to read the metadata table in R. It's just for copy pasting in your code.
![:scale 30%](/assets/images/20220329/20220329_tidy_habitats2.png) --- class: left, top # Bonus challenge The bird rings datafile [bird_rings_untidy.csv](https://github.com/inbo/coding-club/blob/master/data/20220329/20220329_bird_rings_untidy.csv) shows how data can be typically nicely tidy at the beginning of a project. They can become untidy when the project grows up. When this table with bird rings inscriptions was conceived, column `inscription` was enough to uniquely identify a bird. When a bird lost its ring and had to get a second ring the data manager decided to add a new column called `last_inscription`. But what if a bird will get a third ring? Or a fourth? Try to make this dataset tidy using R. Hint: check [online documentation](https://tidyr.tidyverse.org/articles/pivot.html) of the pivot functions or type `vignette("pivot")` in R Console
![:scale 60%](/assets/images/20220329/20220329_pivoting.png) --- class: left, top ## Resources - [challenge solutions](https://github.com/inbo/coding-club/blob/master/src/20220329/20220329_challenges_solutions.R) - [tidyr package documentation](https://tidyr.tidyverse.org/) - [tidyr cheat sheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20220329_cheat_sheet_tidyr.pdf): very instructive and complete - the inspiring [datacarpentry lesson](https://datacarpentry.org/spreadsheet-ecology-lesson/) about spreadsheets - specific documentation about [pivot functions](https://tidyr.tidyverse.org/articles/pivot.html) --- class: center, top ![:scale 30%](/assets/images/coding_club_logo_1.png) Room: 01.05 - Isala Van Diest
Date: __28/04/2022__, van 10:00 tot 12:30
Subject: **data manipluation with dplyr**
(registration announced via DG_useR@inbo.be)