class: center, middle  # 27 NOVEMBER 2025 ## INBO coding club Herman Teirlinck Building 01.23 - Léon Stynen --- class: left, top # Reminders 1. Did we confirm the room reservation on the _roomie_? 2. Did we start the recording? --- class: left, top # INBO Coding Club at Living Data 2025 I gave a [lightning talk](https://www.livingdata2025.com/program.html?abstract=7005752) about our INBO coding club at [Living Data 2025](https://www.livingdata2025.com) during the session ["Community Engagement and Capacity Building for Increased Biodiversity Data Accessibility (Part 2)"](https://www.livingdata2025.com/program.html?session=6798479-2_2025-10-23_Ballroom+B1)! Check the [slides](https://speakerdeck.com/damianooldoni/serving-a-sustainable-coding-community-the-inbo-coding-club-story).  --- class: center, middle  --- class: center, middle ## Strings + tidyverse = stringr  Artwork by [Allison Horst](https://allisonhorst.com/) (CC-BY license). Great artworks, worth a look. --- class: left, top  CC BY SA [Posit Software](https://posit.co). Download the [cheat sheet](https://github.com/inbo/coding-club/blob/master/cheat_sheets/20251127_cheat_sheet_stringr.pdf). Do you know that this cheat sheet is available as [html](https://rstudio.github.io/cheatsheets/html/strings.html) format? Also downloadable in [Spanish](https://rstudio.github.io/cheatsheets/translations/spanish/strings_es.pdf), [Portuguese](https://rstudio.github.io/cheatsheets/translations/portuguese/strings_pt_br.pdf) and [Vietnamese](https://rstudio.github.io/cheatsheets/translations/vietnamese/strings_vi.pdf). --- class: left, top ### How to get started? Check the [Each session setup](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup) to get started. ### First time coding club? Check the [First time setup](https://inbo.github.io/coding-club/gettingstarted.html#first-time-setup) section to setup. --- class: left, top  --- class: center, top # Share your code during the coding session Go to https://hackmd.io/D46xpmDFR0KY0p5DL1SQRA?edit and start by adding your name in section "Participants".
--- class: left, top # Download data and code You can download the material of today: - automatically via `inborutils::setup_codingclub_session()`
\*
- manually
\*\*
from GitHub folder [coding-club/data/20251127](https://github.com/inbo/coding-club/tree/master/data/20251127). No R script to download today. Just data.
\* You can use the date in "YYYYMMDD" format to download the coding club material of a specific day, e.g. run `setup_codingclub_session("20220428")` to download the coding club material of April, 28 2022. If date is omitted, i.e. `setup_codingclub_session()`, the date of today is used. For all options, check the [tutorial online](https://inbo.github.io/tutorials/tutorials/r_setup_codingclub_session/).
\*\* Check the getting started instructions on [how to download a single file](https://inbo.github.io/coding-club/gettingstarted.html#each-session-setup)
--- class: left, top # Data and scripts description - `20251127_bird_rings.tsv`: tab separated text file with color and metal rings information. Source: INBO. Attention: the data have been manipulated here and there. - `20251127_scientificnames.tsv`: tab separated text file with some scientific names to clean a little bit. --- class: left, top # Load packages Loading tidyverse, we load stringr as well: ```r library(tidyverse) ``` # Code to run No R script to start with today. Just read files: ```r library(tidyverse) birds <- read_tsv("./data/20251127/20251127_bird_rings.tsv") sc_names_df <- read_tsv("./data/20251127/20251127_scientificnames.tsv") View(sc_names_df) # Column `comment` gives you an idea about what you should clean sc_names <- sc_names_df$scientific_name ``` --- background-image: url(/assets/images/background_challenge_1.png) class: left, top # Challenge 1 For this challenge we will work with two columns of `20251127_bird_rings.txt`: - `color_ring`: column containing the **color** rings - `metal_ring`: column containing the **metal** rings 1. Get the length of the **metal** rings. 2. Do the **color** rings start with a "C"? 3. Do the **color** rings end with a "R"? 4. Are all the **color** rings uppercase? 5. Solve all the anomalies found in (4) by setting all **color** rings uppercase. Extra: tidyverse packages are made to work nicely together. Use stringr and dplyr to get the birds with a 6 characters long **metal** ring and a **color** ring starting with a "C" and ending with a "R". --- class: left, top # Intermezzo: Unicode, UTF-8, UTF-16, UTF-32 ... and Windows **Unicode** is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. It assigns a unique number, or **code point**, to each character, no matter the platform, program, or language. And UTF? UTF stands for **Unicode Transformation Format**. This format specifies how to represent these code points in binary (= sequence of bytes). The most common UTF forms are UTF-8, UTF-16, UTF-32. More details? Check this clear described blog: [Unicode - UTF-8, UTF-16 and UTF32](https://www.geeksforgeeks.org/computer-organization-architecture/what-is-unicode/). People working on Windows systems should be aware that Windows uses by default a different encoding: **Windows-1252** or **CP-1252** (Windows code page 1252). This encoding is not fully compatible with UTF-8. This may lead to problems when reading text files created on Windows systems. Always ensure that your text files are encoded in UTF-8. Still, Windows has started to support UTF-8 natively in recent years. More info: [Wikipedia - Windows-1252](https://en.wikipedia.org/wiki/Windows-1252). --- background-image: url(/assets/images/background_challenge_2.png) class: left, top # Challenge 2 1. Create a new column called `color_ring_complete` containing color ring information in this format: `background_color`+`inscription_color`+`"("`+`color_ring`+`")"`, e.g. RW(FJAC) 2. All color rings have length 4: check it, first, if you don't believe it :-) But are they all **4-letter only**? And is the third letter always an "A"? 3. Do the color rings contain at least a digit? 4. Create a new column called `digit` containing the first digit, if any, as a number. Extra: again, by combining dplyr and stringr, select the birds whose color rings satisfy the conditions in (2). --- class: left, top ## Intermezzo: the power of regex *regex* what?? REGular EXpresssions! Go to page 2 of the cheatsheet.  ``` example_string <- "I. love. the. 2025(!!) INBO. Coding. Club! Session. of. 27/11/2025...." ``` --- class: left, top ## Intermezzo: the power of regex On internet you can find a lot of tutorials about regex. Check for example the ["Learn RegEx with Real Life Examples"](https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/) provided by freecodecamp. Some websites you can use to test regex expressions: - https://regex101.com/ - https://regexr.com/ (more complex, but very complete) Or, if you want to do all in RStudio, the stringr function `str_view()` is your friend. Example: ```r example_string <- "I. love. the. 2025(!!) INBO. Coding. Club! Session. of. 27/11/2025...." str_view(example_string, "[:digit:]") str_view(example_string, "\\d{4}", html = TRUE) ``` Challenge yourself by solving regular expressions online via https://www.hackerrank.com/domains/regex Do you use other regex resources? Shoot it and add the links in the hackmd. I will be happy to improve this slide and the "Resources" slide at the end. --- class: left, top ## Intermezzo: regex in real life Real life example: how to extract the version number from the URL of the used [Camtrap Data Package](https://camtrap-dp.tdwg.org/) profile version? Example, how to extract the expected outputs from these URLs? - URL1: "https://rs.gbif.org/sandbox/data-packages/camtrap-dp/1.0/profile/camtrap-dp-profile.json". Output: 1.0 - URL2: "a/b/c/d1d/cam/3.0.5/camtrap-dp-profile.json". Output: 3.0.5 - URL3: "a/b/c/d1d/cam/v2/camtrap-dp-profile.json". Output: NA - URL 4: "a/b/c/d1.d/cam/3.0.5/camtrap-dp-profile.json". Output: 3.0.5 - URL 5: "cam/12.4.0.999/camtrap-dp-profile.json". Output: 12.4.0 Rule: a version number is composed of two or three numbers separatd by a dot. Solving this example requires some complex regex expressions and is part of the Bonus Challenge. Still, this was a real situation! See [GitHub issue](https://github.com/inbo/camtraptor/issues/295). --- background-image: url(/assets/images/background_challenge_3.png) class: left, top # Challenge 3 Are you bored of working with bird rings? Maybe you find cleaning scientific names something more similar to your daily tasks. This challenge is for you! Matching scientific names against the GBIF Taxonomy Backbone or other taxonomy backbone services can fail sometimes because the names contains abbreviations like "sp.", "spec.", "indet.", "cf", "nov.", "ined". Try to clean the names provided in 20251127_scientificnames.txt by removing such abbreviations. Ensure also that the resulting scientific names have no whitespaces at the start or at the end and also that they have single spaces between words. Hint: check the cheatsheet 🔍. Also, the colunn `comment` in the file helps you about what to clean. --- background-image: url(/assets/images/background_challenge_3.png) class: left, top # Bonus challenge - bird rings again! 1. The dots in color rings (column `color_ring_dots`), e.g. `KRO.C`, `KZ.AC`, are used for improving readibility. Apart from that, the values in column `color_ring_dots` should be exactly the same as the ones in column `color_ring`. Find anomalies. 2. Some metal rings (column `metal_ring`) start with one or more asterisks. Remove them. 3. Find color rings (column `color_ring`) containing two consecutive vowels. 4. How to extract the version number from the URLs provided in the [intermezzo slide](https://coding-club.inbo.be/sessions/20251127_mastering_text_wrangling_in_r.html#18)? --- class: left, top # The R package of the month A tidyverse package to scrape (or harvest) data from web pages: [rvest](https://rvest.tidyverse.org/).  --- class: left, top # Resources - Very comprehensive [solutions](https://github.com/inbo/coding-club/blob/main/src/20251127/20251127_challenges_solutions.R) are available on GitHub. You can opt to download the solutions automatically by using `inborutils::setup_codingclub_session("20251127")`. - The edited [video recording](https://vimeo.com/1141705801) is available on our [vimeo channel](https://vimeo.com/user/8605285/folder/1978815). - The [stringr](https://stringr.tidyverse.org/) R package documentation. - The beautiful [artwork collections](https://allisonhorst.com/) of Allison Horst. - ["Learn RegEx with Real Life Examples"](https://www.freecodecamp.org/news/practical-regex-guide-with-real-life-examples/): freecodecamp tutorial about regex. - Two widely used online regex testers: https://regex101.com/ and https://regexr.com/. - All kind of [regex exercises](https://www.hackerrank.com/domains/regex). - The [rvest](https://rvest.tidyverse.org/) R package documentation. - The [waldo](https://waldo.r-lib.org/) R package documentation. We mentioned and promoted the use of {waldo} to easily compare R objects. --- class: center, middle  Room: 01.71 - Frans Breziers
Date: __16/12/2025__, van 10:00 tot 12:00
Subject: **to be decided**
(registration announced via DG_useR@inbo.be)