Exclusively on INBOflix
inbodb
packagewatina
packageCheck the Each session setup to get started.
Check the First time setup section to setup.
No yellow sticky notes online :-( We use hackmd (see next slide) but basic principle doesn't change.
Go to https://hackmd.io/YHvcGxRpSEqHbRaNGDRFaw?edit
library(tidyverse)library(inbodb)library(watina)
watina <- connect_watina()inboveg <- connect_inbo_dbase("D0010_00_Cydonia")florabank <- connect_inbo_dbase("D0021_00_userFlora")
"The 'ODBC Driver for SQL Server' is missing. Please install it or contact your system administrator."
? We will contact ICT helpdesk this afternoon, but now run this after restarting R:remotes::install_github("inbo/inbodb@sos")
Find the R connection pane of RStudio and open a view of table tblTaxon
of florabank
(D0021_00_userFlora
), table ivRecording
of INBOVEG (D0010_00_Cydonia
) and table DimChemVar
of watina (W0002_00_Watina
). Tips: check RStudio connection pane documentation; tables are under (default) dbo
schema
Still using the RStudio connection pane, find the type of the columns of (one of the) tables in 1, e.g. character, integer
What type does bit
stand for? Tip: check e.g. the values contained in column CURRENT_IND
of table DimChemVar
opened in 1
Let us get the locations from watina database using watina function get_locs(watina)
. Why do you get an error if you try to save them in a text format (e.g. csv)?
locations <- get_locs(watina)write_csv(locations)> Error in write_delim(x, file, delim = ",", na = na, append = append, col_names = col_names, : is.data.frame(x) is not TRUE
Is locations not a data.frame???
class(locations)[1] "tbl_Microsoft SQL Server" "tbl_dbi"[3] "tbl_sql" "tbl_lazy"[5] "tbl"
All you have is a table (tbl
), but most of all, it's a lazy table... Welcome to the lazy woRld!
locations
is an SQL query, no data are really loaded on your laptop.
It is like a reference to the result of the query but the full result of the query is not brought into memory.
If you are thinking: "are you kidding me?" Well, try this:
locations %>% show_query()
IMPORTANT: to avoid bad surprises (crashes, out of memory issues), first run your queries lazily.
R is not only lazy with databases, R is ALWAYS lazy
number <- 5multiply_by_2 <- function(a, b) { a * 2 # b is not used}multiply_by_2(number) # no value given to argument b
b
is a function argument, but never evaluated, so no error. This is what we mean by being lazy: doing the strict minimum of evaluation*. For this reason lazy evaluation is sometimes referred as call-by-need.
*More about lazy evaluation? Give a look at this tutorial... or spend a coffeebreak with Thierry ;-)
Using watina package and always in lazy mode:
a. How many area codes exist in watina database?
b. select locations from watina database with area_codes
"KAL"
, "WES"
or "ZAB"
and depth range between 2 and 4 meters
Using inbodb and florabank database and still being lazy, get observations of taxon Erigeron
Using inbodb and inboveg database and still being lazy, search information about survey "ABS-LIM2011"
How to collect the data we queried in 1, 2 and 3?
Not everything is already wrapped in a function, of course. How to write your own SQL query?
If you know SQL syntax, excellent!
gebogen_kransblad_query <- "SELECT NaamWetenschappelijk FROM tblTaxon WHERE NaamNederlands = 'Gebogen kransblad'"tbl(src = florabank, sql(gebogen_kransblad_query))
But you don't have to be an SQL programmer, thanks to tidyverse!
tbl(src = florabank, "tblTaxon") %>% filter(NaamNederlands == "Gebogen kransblad") %>% select(NaamWetenschappelijk)
Feel free to use tidyverse syntax or SQL syntax or experiment both!
Using florabank
and its table tblTaxon
:
a. get the first 10 rows
b. get the scientific name (NaamWetenschappelijk
) of Dutch name (NaamNederlands
) Slank snavelmos
c. get the scientific names and Dutch names of taxa with Dutch name starting with Slank. Tip for SQL query; tip for tidyverse
Using INBOVEG
database and its table ivRecording
, retrieve the 10 locations (LocationCode
) with the highest number of recordings with
Latitude
between 50.9 and 51.1 and Longitude
between 3.5 and 3.9. Show the locations and the number of recordings and order them by decreasing order (location with highest number of recordings first)
How to get the column names of table ivRecording
from INBOVEG
?
Writing queries can be hard and we have specialists in house who are great at it... Let's make sure we can reuse our own and their hard work!
Based on solution of challenge 3.2, create a function called inboveg_locs_most_recs()
:
con
)n_locations
),Latitude
(lat)Longitude
(lng)locationCode
and n
(number of recordings) arranged by decreasing values of n
inboveg_locs_most_recs(inboveg, n_locations = 8, lat = c(50.1, 50.3), lng = c(4.2, 4.9))
* Package glue can help you. This part of the tutorial can also help...
Packages inbodb and watina and their functions can be improved, of course!
Add a new functionality to one of the get_
functions of inbodb
or watina
, e.g. add an argument, sf_poly
(a sf
polygon) to get_inboveg_header()
to retrieve metadata for vegetation recordings located in the polygon sf_poly
.
Do you have queries you run very often? Do you have queries where you have to change some arguments manually (e.g. taxon name, location code)?
Add an issue to the inbodb
package and help future-you and your colleagues.
inbodb
is not a static thing. It is something we can improve, TOGETHER! *
*Joe Biden would say the same if he would work for INBO
Room: 01.05 - Isala Van Diest(?)
Date: 25/02/2021, van 10:00 tot 12:00
Subject: read data in R
(registration announced via DG_useR@inbo.be)
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |