+ - 0:00:00
Notes for current slide
Notes for next slide

26 JANUARY 2021

INBO coding club

Exclusively on INBOflix

1 / 23

2 / 23

No VPN? No party

3 / 23

Introduction

We will explore the functionalities provided by two "made in INBO" packages and then we play with SQL and tidyverse.

  1. inbodb
  2. watina
4 / 23

inbodb package

  1. Connect to any INBO database
  2. Run specific queries for two INBO databases: florabank and INBOVEG

inbodb functions

5 / 23

watina package

  1. Connect to watina database
  2. Run specific queries for watina database
  3. Provide some useful helper functions

watina functions

6 / 23

No cheatsheet, but a well-written INBO tutorial is available, yeah!

7 / 23

How to get started?

Check the Each session setup to get started.

First time coding club?

Check the First time setup section to setup.

8 / 23


No yellow sticky notes online :-( We use hackmd (see next slide) but basic principle doesn't change.

9 / 23

Share your code during the coding session!

Go to https://hackmd.io/YHvcGxRpSEqHbRaNGDRFaw?edit

10 / 23

Challenge 0

  1. Log in to INBO VPN
  2. If not done, install the required packages by following the instructions:
  3. Load the required packages:
    library(tidyverse)
    library(inbodb)
    library(watina)
  4. Open connections to databases
    watina <- connect_watina()
    inboveg <- connect_inbo_dbase("D0010_00_Cydonia")
    florabank <- connect_inbo_dbase("D0021_00_userFlora")
  5. Did you get this error: "The 'ODBC Driver for SQL Server' is missing. Please install it or contact your system administrator."? We will contact ICT helpdesk this afternoon, but now run this after restarting R:
    remotes::install_github("inbo/inbodb@sos")
11 / 23

Challenge 1

  1. Find the R connection pane of RStudio and open a view of table tblTaxon of florabank (D0021_00_userFlora), table ivRecording of INBOVEG (D0010_00_Cydonia) and table DimChemVar of watina (W0002_00_Watina). Tips: check RStudio connection pane documentation; tables are under (default) dbo schema

  2. Still using the RStudio connection pane, find the type of the columns of (one of the) tables in 1, e.g. character, integer

  3. What type does bit stand for? Tip: check e.g. the values contained in column CURRENT_IND of table DimChemVar opened in 1

12 / 23

Intermezzo 1: R is lazy... so are we

Let us get the locations from watina database using watina function get_locs(watina). Why do you get an error if you try to save them in a text format (e.g. csv)?

locations <- get_locs(watina)
write_csv(locations)
> Error in write_delim(x, file, delim = ",", na = na, append = append, col_names = col_names, :
is.data.frame(x) is not TRUE

Is locations not a data.frame???

class(locations)
[1] "tbl_Microsoft SQL Server" "tbl_dbi"
[3] "tbl_sql" "tbl_lazy"
[5] "tbl"

All you have is a table (tbl), but most of all, it's a lazy table... Welcome to the lazy woRld!

13 / 23

Intermezzo 1: R is lazy... so are we

locations is an SQL query, no data are really loaded on your laptop. It is like a reference to the result of the query but the full result of the query is not brought into memory.

If you are thinking: "are you kidding me?" Well, try this:

locations %>% show_query()

IMPORTANT: to avoid bad surprises (crashes, out of memory issues), first run your queries lazily.

14 / 23

Intermezzo 1: R is lazy... so are we

R is not only lazy with databases, R is ALWAYS lazy

number <- 5
multiply_by_2 <- function(a, b) {
a * 2 # b is not used
}
multiply_by_2(number) # no value given to argument b

b is a function argument, but never evaluated, so no error. This is what we mean by being lazy: doing the strict minimum of evaluation*. For this reason lazy evaluation is sometimes referred as call-by-need.

*More about lazy evaluation? Give a look at this tutorial... or spend a coffeebreak with Thierry ;-)

15 / 23

Challenge 2

  1. Using watina package and always in lazy mode:

    a. How many area codes exist in watina database?

    b. select locations from watina database with area_codes "KAL", "WES" or "ZAB" and depth range between 2 and 4 meters

  2. Using inbodb and florabank database and still being lazy, get observations of taxon Erigeron

  3. Using inbodb and inboveg database and still being lazy, search information about survey "ABS-LIM2011"

  4. How to collect the data we queried in 1, 2 and 3?

16 / 23

Intermezzo 2: do I really need to learn SQL?

Not everything is already wrapped in a function, of course. How to write your own SQL query?

If you know SQL syntax, excellent!

gebogen_kransblad_query <- "SELECT NaamWetenschappelijk
FROM tblTaxon
WHERE NaamNederlands = 'Gebogen kransblad'"
tbl(src = florabank,
sql(gebogen_kransblad_query))

But you don't have to be an SQL programmer, thanks to tidyverse!

tbl(src = florabank, "tblTaxon") %>%
filter(NaamNederlands == "Gebogen kransblad") %>%
select(NaamWetenschappelijk)
17 / 23

Challenge 3

Feel free to use tidyverse syntax or SQL syntax or experiment both!

  1. Using florabank and its table tblTaxon:

    a. get the first 10 rows

    b. get the scientific name (NaamWetenschappelijk) of Dutch name (NaamNederlands) Slank snavelmos

    c. get the scientific names and Dutch names of taxa with Dutch name starting with Slank. Tip for SQL query; tip for tidyverse

  2. Using INBOVEG database and its table ivRecording, retrieve the 10 locations (LocationCode) with the highest number of recordings with Latitude between 50.9 and 51.1 and Longitude between 3.5 and 3.9. Show the locations and the number of recordings and order them by decreasing order (location with highest number of recordings first)

  3. How to get the column names of table ivRecording from INBOVEG?

18 / 23

Bonus challenge 1

Writing queries can be hard and we have specialists in house who are great at it... Let's make sure we can reuse our own and their hard work!

Based on solution of challenge 3.2, create a function called inboveg_locs_most_recs():

  • the function has four input arguments:
    • a connection to database ( con)
    • number of locations to return (n_locations),
    • vector with min/max values of Latitude (lat)
    • vector with min/max values of Longitude (lng)
  • the function returns a dataframe with two columns: locationCode and n (number of recordings) arranged by decreasing values of n
  • test your function by using it, e.g.
inboveg_locs_most_recs(inboveg, n_locations = 8, lat = c(50.1, 50.3), lng = c(4.2, 4.9))
  • write it in SQL* or in tidyverse syntax

* Package glue can help you. This part of the tutorial can also help...

19 / 23

Bonus challenge 2

Packages inbodb and watina and their functions can be improved, of course! Add a new functionality to one of the get_ functions of inbodb or watina, e.g. add an argument, sf_poly (a sf polygon) to get_inboveg_header() to retrieve metadata for vegetation recordings located in the polygon sf_poly.

20 / 23

HELP FUTURE-YOU

Do you have queries you run very often? Do you have queries where you have to change some arguments manually (e.g. taxon name, location code)?

Add an issue to the inbodb package and help future-you and your colleagues.

inbodb is not a static thing. It is something we can improve, TOGETHER! *

*Joe Biden would say the same if he would work for INBO

21 / 23

Resources

22 / 23

Room: 01.05 - Isala Van Diest(?)
Date: 25/02/2021, van 10:00 tot 12:00
Subject: read data in R
(registration announced via DG_useR@inbo.be)

23 / 23

2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow