Load tidyverse `library("tidyverse")`
Some datasets are saved as .xlsx files. You need to install `readxl` `install.packages("readxl")`
Load it in your R session by typing: `library("readxl")` --- class: left, middle ## Download data and source file - Go to [`data`](https://github.com/inbo/coding-club/tree/master/data) - All you datasets are saved with prefix `20200121`. Click on yours and download* it to `/data` folder on your laptop. - Click [`20200121/20200121_challenges.R`](https://github.com/inbo/coding-club/blob/master/src/20200121/20200121_challenges.R) and download* script file in `/src` folder on your laptop. * __Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) --- class: left, middle ### Cheatsheets and documentation Before starting, some link to documentation and cheatsheets we can need to solve the challenges: - [`readxl` package documentation](https://readxl.tidyverse.org/) to import Excel files in R. - [data import cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20180123_cheat_sheet_data_import.pdf) for importing text files (`txt`, `csv`, `tsv`) in R. - See [data transformation cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20181129_cheat_sheet_data_transformation.pdf) for exploring data.frames. - See [ggplot cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20180522_cheat_sheet_ggplot2.pdf) for help making graphs. --- background-image: url(/assets/images/background_challenge_1.png) class: left, middle # Challenge 1. Import Import your dataset: 1. `xlsx`, `xls` files? Use `read_excel()` function. 2. `txt`, `csv` (text) files? Use `read_delim` function. - What is the `head()` of your data.frame? - What is the `str()`ucture of your data.frame? - Did you know you can inspect some `summary()` statistics? - Did you know you can get `nrow()` and `ncol()`? --- background-image: url(/assets/images/background_challenge_2.png) class: left, middle # Challenge 2. Explore - Did you know you can get `distinct()` rows, so removing duplicated rows if present? - Did you know you can `filter()` rows? 1. `count()` how many NAs you have in a column. Tip: use `is.na()` 2. Filter rows with NAs in a specific column out and assign the result as a new data.frame - Did you know you can `select()` columns? - Did you know you can `rename()` columns? - Calculate `min()`, `max()` and `mean()` of a column - How to calculate how many distinct values are in a column? Tip: it is on 2nd page of [data transformation cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20181129_cheat_sheet_data_transformation.pdf). --- class: left, middle # Intermezzo: this `%>%` is a pype Why using `%>%`? Try this: ``` select(filter(iris, Sepal.Length > 7.5), Petal.Length, Petal.Width, Species) ``` Can you understand what I have done here? Maybe this is better... ``` iris %>% filter(Sepal.Length > 7.5) %>% select(Petal.Length, Petal.Width, Species) ``` `%>%` makes programming and code reading easier. Life is easy with a pipe... --- class: left, middle # Intermezzo: the ggplot recipe What do you need to make a plot? The ggplot recipe says you need: 1. data 2. mapping (essentially which columns you want to use) 3. geometry (do you want a bar plot, line plot, histogram, ...?) ![:scale 90%](/assets/images/20191126/20191126_ggplot_recipe.png) --- class: left, middle background-image: url(/assets/images/background_challenge_3.png) # Challenge 3. Visualize 1. Make two basic plots from your data based on intermezzo 2. Customize them a little more and make them shining ![:scale 60%](/assets/images/20191126/20191126_cheatsheet_ggplot.png) You can check the [ggplot cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20180522_cheat_sheet_ggplot2.pdf) or search on internet. Everybody does, you know... --- class: left, middle # Tutorials online There is a lot of documentation and tutorials online: 1. Datacarpentry: [Data Analysis and Visualization in R for Ecologists](https://datacarpentry.org/R-ecology-lesson/): the most clear and comprehensive lesson on using R for ecology data ever. 2. [Stanford University tutorial about ggplot2](https://cengel.github.io/R-data-viz/data-visualization-with-ggplot2.html). 3. [R Graphics Cookbook, 2nd edition](https://r-graphics.org/): comprehensive --- class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) Room: Herman Teirlinck - 01.05 - Isala Van Diest Date: __28/01/2020__, van 10:00 tot 12:00 Title: **Data exploratory in R with tidyverse** (registration via [gsheet](https://docs.google.com/spreadsheets/d/1D80p7lxLUnWUxEkTIYOMhhYdL39kZOKgKmLOXsr4HGM/edit#gid=967842163))