class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) # 29 FEBRUARY 2024 ## INBO coding club Herman Teirlinck Building 01.19 - Paul Van Ostaijen --- class: left, top ## ROOMIE: room reservation ``` > if (isFALSE(roomie)) { + warning("Please confirm asap the room reservation on the roomie") + } Warning message: Please confirm asap the room reservation on the roomie ``` --- class: left, top ## Record the session Kind reminder to... myself. --- class: left, top # Welcome 👋 Say "Hi!" to Rhea. She has just joined the INBO coding club core team. She will replace Hans, one of the godfathers of the club. The core team members: - Raïsa Carmen - Emma Cartuyvels - Dirk Maes - Rhea Maesele - Amber Mertens - Damiano Oldoni For any idea, suggestion, problem, ... just let us know. --- class: center, middle ![:scale 90%](/assets/images/20240229/20240229_badge_ggplot.png) --- class: center, top ![:scale 80%](/assets/images/20240229/20240229_cheat_sheet_ggplot.png)
Download [cheatsheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20240229_cheat_sheet_ggplot2.pdf) CC-BY RStudio. Download the cheat sheet in [English](https://rstudio.github.io/cheatsheets/data-visualization.pdf) or in [Dutch](https://rstudio.github.io/cheatsheets/translations/dutch/data-visualization_nl.pdf). Do you know that this cheat sheet is available as [html](https://rstudio.github.io/cheatsheets/html/data-visualization.html) format? The pdf is, besides Dutch and English, available in other 7 languages. See e.g. the [French](https://rstudio.github.io/cheatsheets/translations/french/data-visualization_fr.pdf) and the [Spanish](https://rstudio.github.io/cheatsheets/translations/spanish/data-visualization_es.pdf) versions. --- class: left, middle ## How to get started? Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. ## First time coding club? Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. --- class: left, top ![:scale 100%](/assets/images/coding_club_sticky_concept.png) --- class: center, top # Share your code during the coding session Go to https://hackmd.io/OKAoEJMTQZWT_2Lvl5cEmQ?both and start by adding your name in section "Participants".
--- class: left, top # Download data and code You can download the material of today: - automatically via `inborutils::setup_codingclub_session()`* - manually** from GitHub foders [coding-club/data/20240229](https://github.com/inbo/coding-club/tree/master/data/20240229) and [coding-club/src/20240229](https://github.com/inbo/coding-club/tree/master/src/20240229)
__\* Note__: you can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20230228")` to download the coding club material of February, 28 2023. If date is omitted, i.e. `setup_codingclub_session()`, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).
__\*\* Note__: check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)
--- class: left, top # Data and scripts description Today we will work with rodents data: - `20240229_rodentia.txt`: human observations of rodents in Belgium from 1980 to 2020.* R script: - `20240229_challenges.R`: code to start
__\* Note__: Extracted from GBIF.org (27 February 2024) GBIF Occurrence Download https://doi.org/10.15468/dl.ngqknn
--- class: left, middle # ggplot recipe: data - mapping - geometry ![:scale 80%](/assets/images/20240229/20240229_ggplot_recipe.png) --- class: left, top # Load libraries ```r library(tidyverse) ``` --- background-image: url(/assets/images/background_challenge_1.png) class: left, top # Challenge 1 1. Make a bar chart with the number of observations of rodents in Belgium per `year`. Tip: check the difference between `geom_bar()` and `geom_col()`, e.g. via help `?geom_bar`. 2. Change x and y labels to "Year" and "number of observations" respectively. 3. Add the title "Evolution of rodents in Belgium" to the plot. 4. Show only data from 2000. 5. **Fill** all bars with color blue and set contour **color** to red. 6. Sometimes, it is adviced to bin data to deal better with outliers and show trends. In this case, group years in bins of 5 years. --- class: left, top # Intermezzo: aesthetics ```r ggplot(data = rodentia, mapping = aes(x = year)) + geom_bar() ``` This works too: ```r ggplot(data = rodentia) + geom_bar(mapping = aes(x = year)) ``` Both are good. Still, we have the feeling that the second version is used more often. --- class: left, top # Intermezzo: aesthetics The only case where it is better to opt for the second option is when you plot two or more geometries with **different** aesthetics. For example, plot pressure and temperature vs time* : ```r # dummy dataset press_temp <- tibble::tibble( year = c(2000:2004), press = c(1.0, 1.1, 1.4, 1.2, 1.6), temp = c(13.2, 15.1, 12.2, 11.8, 10.9) ) ggplot(press_temp) + geom_point(mapping = aes(x = year, y = press)) + geom_point(mapping = aes(x = year, y = temp), color = "red") + scale_y_continuous(sec.axis = sec_axis(~., name = "temperature")) + theme(axis.line.y.right = element_line(color = "red"), axis.text.y.right = element_text(color = "red"), axis.title.y.right = element_text(color = "red")) ```
__\* Note__: such kind of plot is an example of BAD PRACTICES, by the way! --- background-image: url(/assets/images/background_challenge_2.png) class: left, top # Challenge 2A 1. Make a bar chart plot similar to the one in challenge 1.1 with year on the x-axis, but split the bars at genus level, for example by filling them using different colors. How are displayed the bars, by default? 2. Try other ways to "adjust" the bars. Tip: check the [positon adjustment section](https://ggplot2.tidyverse.org/reference/#position-adjustment) section in ggplot2 reference website. 3. How to split the bar chart in subplots based on family? Set a "free" scale. --- background-image: url(/assets/images/background_challenge_2.png) class: left, top # Challenge 2B Bar charts or histograms are a good solution for displaying data where x is a discrete variable, in our case `year`. But we could actually think of year as a continuous variable. And so, points and lines could be a nice alternative. After running the provided code: 1. Create a plot showing the number of observations of Ondatra zibeticus and Rattus norvegicus. Use both points and lines and distinguish the two species via color. 2. Typically, points are combined with a line showing the trend, calculated via a smoother function. ggplot helps us enormously to model and smooth our data. Try different smoother methods. Question for the staticians in the room and at home: what is the best smoother function and parameters to use in this case? --- class: left, top # Intermezzo: customize non-data components You can personalize almost everything within your plots. ggplot2 provides some predefined themes via functions named `theme_*()`, e.g. `theme_bw()` for a white background with black lines. See more on the cheat sheet. ```r ggplot(data = rodentia, mapping = aes(x = year)) + geom_bar() + theme_bw() ``` But you can also customize any non-data component via function `theme()`! Look at the help `?theme`. Notice the help functions `element_*()`, e.g. `element_line()`, `element_text()` and `element_rect()` (rect stands for rectangle). ```r ggplot(data = rodentia, mapping = aes(x = year)) + geom_bar() + # we love flashy plots! theme(panel.background = element_rect(fill = "green"), panel.grid.major = element_line(colour = "blue", linewidth = 2)) ``` --- class: left, top # Intermezzo: customize non-data components Do you know that INBO has its own theme? It's called [INBOtheme](https://inbo.github.io/INBOtheme/): it's a package. Install it, load it and all figures will authomatically have an INBO touch. Read the nice [tutorials](https://inbo.github.io/INBOtheme/articles/index.html) for more. ```r library(INBOtheme) ggplot(data = rodentia, mapping = aes(x = year)) + geom_bar() + ggtitle(label = "Temporal distribution of rodents observations in Belgium") ``` INBOtheme takes care of the quality of your plots: a warning is returned if more than 4 colors are used. ``` library(INBOtheme) ggplot(data = rodentia, mapping = aes(x = year, fill = family)) + geom_bar() + ggtitle(label = "Temporal distribution of rodents observations in Belgium") > using more than 4 colours might make the plot hard to read ``` To find the most appropriate graph for our data: [from Data to Viz](https://www.data-to-viz.com/) can definitely help us. --- background-image: url(/assets/images/background_challenge_3.png) class: left, top # Challenge 3 Let's use the [R graph gallery](https://r-graph-gallery.com/) as a source of inspiration to visualize data about rodents in Belgium from 1800 and 2000. - Let's start showing the distribution of the observed `species` per `year`. Try to use INBO colors or the viridis palette. Try for example boxplots, violin plots, ridgeplots. - Are all these plots meaningful while showing the distribution per `month` instead of `year`? - Piecharts are not done in scientific papers, BUT they are perfectly accetable in presentations. They can be the best choice even! In our case, a piechart is a good idea for showing that two genera have the absolute majority of observations of rodents collected in Belgium up to 2020. All the other genera should be put together as "Other". Try to make your best piechart ever :-) For the staticians who love astonishing plots, check what you can do using [violin plots with ggstatsplot](https://r-graph-gallery.com/web-violinplot-with-ggstatsplot.html)* or how to show uncertainty with [ggdist](https://mjskay.github.io/ggdist/).
__\* Note__: [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/) R package is an extension of ggplot2 package for creating graphics with details from statistical test.
--- class: left, top # The package of the month: Emma's choice [ggpattern](https://coolbutuseless.github.io/package/ggpattern/index.html) can be definitely funny, but it can be also the cherry on top :-) Just think to add rodents pictures for each species on each bar. ![:scale 80%](/assets/images/20240229/20240229_ggpattern.png) --- class: left, top # The package of the month: Pieter's choice If you forget how to create plots with ggplot (it happens frequently, no worries!), the RStudio's addin esquisse can help you to generate plots with few clicks. The [getting started](https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html) vignette is a very handy tutorial on how to use esquisse. And by exporting the code behind the plot we have just created, we will learn a lot, for sure! ![:scale 80%](/assets/images/20240229/20240229_esquisse.png) --- class: left, top ## Resources - Extended and commented [solutions](https://github.com/inbo/coding-club/blob/main/src/20240229/20240229_challenges_solutions.R) are online available. - The edited [video recording](https://vimeo.com/919340067?share=copy) is available on our [vimeo channel](https://vimeo.com/user/8605285/folder/1978815). - `ggplot2` R package: https://ggplot2.tidyverse.org/ - Article of H. Wickham about the layered [grammar of graphics](http://vita.had.co.nz/papers/layered-grammar.pdf) - The [R Graphics Cookbook, 2nd edition](https://r-graphics.org/): the reference book about graphics in R. A must to have :-) - R for Data Science: [Chapter 3: Data visualization](https://r4ds.had.co.nz/data-visualisation.html) - R for Data Science: [Chapter 28: Graphics for communication](https://r4ds.had.co.nz/graphics-for-communication.html) - [Datacarpentry's data visualiation tutorial](https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html) - [Stanford University tutorial](https://cengel.github.io/R-data-viz/data-visualization-with-ggplot2.html): chapter 1 - [INBOtheme](https://inbo.github.io/INBOtheme/) homepage - [R Graph Gallery](https://r-graph-gallery.com/index.html): a collection of charts made with R - [from Data to Viz](https://www.data-to-viz.com/): a good help to find the most appropriate graph for your data. - Links to the ggplot extensions we could have used today: [ggridges](https://wilkelab.org/ggridges/index.html) for ridgeplots, [ggstatsplot](https://indrajeetpatil.github.io/ggstatsplot/) for statistics. - Packages of the month: [ggpattern](https://coolbutuseless.github.io/package/ggpattern/index.html) and [esquisse](https://cran.r-project.org/web/packages/esquisse/vignettes/get-started.html). --- class: left, top # Topic of the next coding club: you vote Yes, you can vote among **two topics**. The poll for March's coding club is open. Let us know your favorite topic before March 7! https://forms.gle/Zg4uTCPZNZUZFQfH9 You can choose between: - Beyond ggplot: ggplot extensions and other visualization packages - Data wrangling with tidyverse (dplyr, tidyr, ...) --- class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) Room: 01.69 - Paul Janssen
Date: __26/03/2024__, van 10:00 tot 12:30
Subject: to be decided
(registration announced via DG_useR@inbo.be)