![:scale 40%](/assets/images/20230126/20230126_daylight_saving_time.png) --- class: left, middle ## Intermezzo. UTC, time zones & Daylight Saving Time - How to define time zones? Typically `"Continent/City"` works well: ```r as_datetime("2020-08-01 09:00:00", tz = "Asia/Tehran") [1] "2020-08-01 09:00:00 +0430" as_datetime("2020-08-01 09:00:00", tz = "Europe/Brussels") [1] "2020-08-01 09:00:00 CEST" ``` or a time zone abbreviation: ```r as_datetime("2020-08-01 09:00:00", tz = "CET") [1] "2020-08-01 09:00:00 CEST" ``` Note the authomatic conversion to CEST (Central Europe *Summer* Time). However, if you use CEST it will not work, and you get the GMT version: ```r as_datetime("2020-08-01 09:00:00", tz = "CEST") [1] "2020-08-01 07:00:00 GMT" Warning message: In as.POSIXlt.POSIXct(x, tz) : unknown timezone 'CEST' ``` CEST is ** NOT ** a time zone! --- background-image: url(/assets/images/background_challenge_2.png) class: left, top # Challenge 2 First, read 20230126_deployments.txt (code provided!). Notice how the datetimes are authomatically read as standard UTC times if expressed in the ISO standard , i.e. yyyy-mm-ddThh:mm:ss! 1. Add a column called local_start and local_end with clock time showing the timezone. The deployments are all located in Belgium. Notice that the datetimes in `start` and `end` are UTC times. Tip: read the [time-zones](https://lubridate.tidyverse.org/reference/lubridate-package.html#time-zones) section from the "Get started". 2. Oh no, the data manager said us that `start` and `end` are not UTC times but local times: a bug in the system caused the loss of the time zone while writing the datetime. How to set `local_time` and `local_end` properly in this case? Tip: the same as before. Correct UTC `start` and `end` columns as well. 3. Based on `start` and `end`, calculate the duration of each deployment and store it in column `duration` 4. Get hour and day information of the start of the deployments. You can store them as additional columns called `hour_start`, `day_start` --- class: left, top ## Intermezzo: regular expressions How to detect/remove/extract: - any kind of digit? - anything but letters `a `, `b` and `e`? - all full stops (`.`)? - any extra full stop (`.`)? ![:scale 40%](/assets/images/20230126/20230126_regular_expresssions.png) Try yourself some of these rules before moving to challenge 3. Also check the very useful [regex101](https://regex101.com) website. --- background-image: url(/assets/images/background_challenge_3.png) class: left, top # Challenge 3 1. Select the occurrences of persian hogweed coming from the [Finnish Biodiversity Information Facility](https://laji.fi/en) (FinBIF), i.e. the occurrences with occurrenceID starting with `http://tun.fi/`. Store them in a new data.frame called `finbif_occs`. 2. Retrieve the authorship from column `scientificName`, i.e. the string after the value in column `species`. Store it in a new column called `authorship`. 3. Get the "internal" ID for each occurrence of `persian_hogweed` dataframe. Such ID is the number after the very last colon (`:`), dot (`.`) or slash (`/`), e.g. `105853683` for `urn:lsid:artportalen.se:sighting:105853683` or `36631219` for `http://tun.fi/MKC.36631219` or `63a587a8d5de65595547a609#Unit1` for `http://tun.fi/KE.176/63a587a8d5de65595547a609#Unit1`. Tip: one of [these functions](https://stringr.tidyverse.org/reference/str_split.html) and a simple regex can help you a lot! --- class: left, top # Bonus challenge: strings Regex is very powerful but can become quite complex. Fortunately, we have the cheat sheet and regex101 website. But also we have the entire internet, full of regex questions/answers :-) 1. The occurrences in `finbif_occs` are coming from different projects. They can be identified by the letters following the prefix `"http://tun.fi/"` and preceding a dot, e.g. `MKC` for `http://tun.fi/MKC.36631219`, `JX` for `http://tun.fi/JX.1443935#3`. Extract the project acronyms from the `occurrenceID` of `finbif_occs` and store them in column `project`. Tip: check the cheat sheet and use [regex101](https://regex101.com) website. 2. Metal rings, typically applied to animals such as birds, should be expressed as a letter followed by a sequence of 10 digits. If less than 10 digits are present on the ring, dots should be added between the letter and the digits. Example: `E123456` should become `E.....123456`. Tip: cheat sheet helps a lot! ``` metal_rings <- c("A1234567890", "B123456789", "C12345678", "D1234567") ``` --- class: left, top # Bonus challenge: dates The provided code in 20230126_challenges.R analyzes produces a plot where data are grouped by week. How to leave the plot as it is but putting breaks and labels at month level? You should get something like this plot below. Tip: use ggplot function `scale_x_continuous()` and define the right values for `breaks` and `labels` arguments. ![110%](/assets/images/20230126/20230126_bonus_challenge_dates_result.png) --- class: left, middle # The future of UTC: to leap or not to leap? UTC is based on time as measured by atomic clocks so it is independent of Earth's rotation. The [leap second](https://en.wikipedia.org/wiki/Leap_second) is a one-second adjustment that is occasionally applied to Coordinated Universal Time (UTC), to accommodate the imprecise observed solar time (irregularities and slowdown in the Earth's rotation). ![:scale 75%](/assets/images/20230126/20230126_leapsecond2016.png) So, what to do with these annoying leap seconds? Should the standard UTC be getting rid of them or not? --- class: left, top # Resources - [Solutions](https://github.com/inbo/coding-club/blob/main/src/20230126/20230126_challenges_solutions.R) and [video recording](https://vimeo.com/796590222) available - [lubridate](https://lubridate.tidyverse.org) package homepage - [stringr](https://stringr.tidyverse.org) package homepage - [lubridate cheat sheet](https://github.com/inbo/coding-club/blob/main/cheat_sheets/20230126_cheat_sheet_lubridate.pdf) - [stringr cheat sheet](https://github.com/inbo/coding-club/blob/main/cheat_sheets/20220428_cheat_sheet_stringr.pdf) - [Chapter 16: Dates and times](https://r4ds.had.co.nz/dates-and-times.html) from the "R for Data Science" digital book - test your regex via [regex101](https://regex101.com) website - The [world time zone map](https://24timezones.com/timezone-map) - [GMT versus UTC](https://24timezones.com/gmt-vs-utc) - News about the [ban of day-light saving time](https://www.dw.com/en/eu-parliament-votes-to-end-daylight-savings/a-48064185) by end 2021 and the subsequent [obstacles](https://www.bloomberg.com/news/articles/2021-03-11/will-daylight-saving-time-ever-end). - The fascinating world of the [leap second](https://en.wikipedia.org/wiki/Leap_second) --- class: left, top # Coding club topics 2023: you vote! Every month you can vote among **two topics**! Poll for February's coding club is open! Let us know your favorite before **1 February 2023** https://forms.gle/V4w6U39FkHUqTjYu5 ![25%](/assets/images/20230126/20230126_packages_fight.jpg) --- class: center, middle ![:scale 30%](/assets/images/coding_club_logo_1.png) Topic: data manipulation OR making outstanding plots with ggplot Room: HT - 01.04 - Transitielab Date: **28/02/2023**, from **10:00** to **12:30** Help needed with technical setup? You are welcom from **9:45am**