class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) # 30 MARCH 2021 ## INBO coding club Exclusively on
d
I
s
N
ey
BO
+
--- class: center, middle ![:scale 100%](/assets/images/20210330/20210330_happy_families.png) --- class: center, middle ![:scale 90%](/assets/images/20210330/20210330_badge_tidy_data.png) --- class: center, middle ## What does tidy mean? ![:scale 100%](/assets/images/20210330/20210330_meaning_tidy_data_google_translate.png)
source:[Google Translate](https://translate.google.be/?hl=en&sl=en&tl=nl&text=tidy%20data.%0Atidy.%0Ato%20tidy.&op=translate)
--- class: center, middle ## What do we really mean? ![:scale 60%](/assets/images/20210330/20210330_paper_tidy_data.png) Hadley Wickham's paper: [Tidy Data](https://www.jstatsoft.org/article/view/v059i10), Journal of Statistical Software, 2014 --- class: center, middle ![:scale 100%](/assets/images/20210330/20210330_paper_untidy_data_example.png) --- class: center, middle ![:scale 80%](/assets/images/20210330/20210330_paper_untidy_to_tidy_data_example.png) --- class: center, middle ![:scale 100%](/assets/images/20210330/20210330_tidy_data_principles_1.png) --- class: center, middle ![:scale 100%](/assets/images/20210330/20210330_tidy_data_principles_2.png) --- class: center, middle ![:scale 100%](/assets/images/20210330/20210330_tidy_data_principles_3.png) --- class: center, middle ### How to get started? Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. ### First time coding club? Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. --- class: left, middle ![:scale 100%](/assets/images/coding_club_sticky_concept.png)
No yellow sticky notes online :-( We use hackmd (see next slide) but basic principle doesn't change. --- class: center, middle ### Share your code during the coding session! Go to https://hackmd.io/6RPBUP5BT9aOt9SQaUoOyg?edit
--- class: left, middle # Download data and code You can download the material: - automatically via `inborutils::setup_codingclub_session()`* - manually, via [data](https://github.com/inbo/coding-club/blob/master/data/20210330/) and [scripts](https://github.com/inbo/coding-club/blob/master/src/20210330/)**!
__\* Note__: you can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20201027")` to download the coding club material of October, 27 2020. If date is omitted, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).
__\*\* Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)
--- class: left, middle Load tidyverse packages ```r library(tidyverse) ``` --- background-image: url(/assets/images/background_challenge_1.png) class: left, middle # Challenge 1 Make the file [20210330_surveys.xlsx](https://github.com/inbo/coding-club/blob/master/data/20210330/20210330_surveys.xlsx) as tidy as possible: 1. Never modify the raw data: a (very) good practice 2. Document the issues you encountered in the [hackmd (challenge 1)](https://hackmd.io/6RPBUP5BT9aOt9SQaUoOyg?both#Challenge-1)
Based on https://datacarpentry.org/spreadsheet-ecology-lesson/
--- background-image: url(/assets/images/background_challenge_2.png) class: left, middle # Challenge 2 Make the file [20210330_manta_datasample.xlsx](https://github.com/inbo/coding-club/blob/master/data/20210330/20210330_manta_datasample.xlsx) as tidy as possible: 1. Never modify the raw data: a (very) good practice 2. Document the issues you encountered in the [hackmd (challenge 2)](https://hackmd.io/6RPBUP5BT9aOt9SQaUoOyg?both#Challenge-2) --- class: left, middle # Intermezzo: metadata "data that provides information about other data". See [definition from Merriam-Webster](https://www.merriam-webster.com/dictionary/metadata/)
meta + data = transcending the data
metadata can describe fields (= columns) used in a standard, e.g. the [Darwin Core standard](https://dwc.tdwg.org/terms/)
![:scale 110%](/assets/images/20210330/20210330_example_metadata_standard.png) --- class: left, middle # Intermezzo: metadata metadata can describe datasets, e.g. a [GBIF checklist](https://www.gbif.org/dataset/6d9e952f-948c-4483-9807-575348147c7e)
![:scale 110%](/assets/images/20210330/20210330_example_metadata_dataset.png) --- background-image: url(/assets/images/background_challenge_3.png) class: left, middle # Challenge 3 In the world of camera trap data, there are people who like making things hard to read and understand. Try to tidy the dataset [20210330_data_camtrap.csv](https://github.com/inbo/coding-club/blob/master/data/20210330/20210330_camtrap_data.csv) using R and based on the (tidy!) metadata table* [20210330_metadata_camtrap.csv](https://github.com/inbo/coding-club/blob/master/data/20210330/20210330_camtrap_metadata.csv)
* Hint: you don't need to read this metadata table in R. It's more for copy pasting in your code and to show how metadata should also be tidy.
--- class: left, middle # Bonus challenge Some real data from one of the INBO databases. The bird rings datafile [bird_rings_untidy.csv](https://github.com/inbo/coding-club/blob/master/data/20210330/20210330_bird_rings_untidy.csv) shows that it's easy to structure data in a tidy way at the beginning of a project, but data can become untidy when project grows up. When this table with bird rings inscriptions was conceived, column `inscription` was enough to uniquely identify a bird. The researchers probably didn't think that a bird could get a second ring. When it happened, they decided to add a new column called `last_inscription`. And what if a bird will get a third ring?!? Try to make this dataset tidy using R. Hint: check [online documentation](https://tidyr.tidyverse.org/articles/pivot.html) of the pivot_*() functions or type `vignette("pivot")` in R Console
![:scale 60%](/assets/images/20210330/20210330_bird_rings_preview.png) --- class: left, middle ## Resources - R script with [challenge solutions](https://github.com/inbo/coding-club/blob/master/src/20210330/20210330_challenges_solutions.R) - [video recording](https://vimeo.com/532185427) of this coding club has been published on the [INBO vimeo channel](https://vimeo.com/inbo) - [tidyr package documentation](https://tidyr.tidyverse.org/) - [tidyr cheat sheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20210225_cheat_sheet_data_import.pdf): second page is about `tidyr` and tidying data - the inspiring [datacarpentry lesson](https://datacarpentry.org/spreadsheet-ecology-lesson/) about spreadsheets - tidyr's [`pivot_wider()` and `pivot_longer()` specific documentation](https://tidyr.tidyverse.org/articles/pivot.html) --- class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) Room: 01.05 - Isala Van Diest(?)
Date: __29/04/2021__, van 10:00 tot 12:00
Subject: **data manipulation with dplyr**
(registration announced via DG_useR@inbo.be)