::i_am("how2/download-with-R.qmd") here
here() starts at /Users/z3529065/proyectos/typology-website/typology-map-info
We use library here
to define relative paths
Workbook with profile content for Ecosystem Functional Groups of the IUCN Global Ecosystem Typology (Level 3 units) available at https://osf.io/4dcea
We will create a folder for data download from OSF :
And now download the file there
Indicative maps are available from different Zenodo repositories.
We will use libraries zen4R
and parallel
:
One repository holds the bundle of maps in compressed tar archives. This DOI (digital object identifier) automatically resolves to the latest version, but we need to be explicit when we use the parallel download (otherwise it could get stuck in the first DOI):
[1] "10.5281/zenodo.10081251"
We will create a folder for this direct download from zenodo for the latest version of the bundle:
This can be used to download directly to the output folder. Using options(timeout=500)
will get overwritten by argument, so it is needed to specify an appropriate timeout as an argument. For some reason the parallel download does not work with the path
argument, so this workaround uses getwd
and setwd
:
We extract maps from the tar archives in a sandbox folder:
And similarly for the vector data:
Map details are stored in a xml file that is part of the map bundle zenodo download.
Check the file was downloaded:
[1] TRUE
We’ll use the xml2
library to read the xml file
We can query map details for an specific map:
{xml_node}
<Map efg_code="T1.1" map_code="T1.1.web.mix" map_version="v2.0" update="2020-11-08">
[1] <Functional_group>T1.1 Tropical/Subtropical lowland rainforests</Function ...
[2] <Description>Major and minor occurrences were initially identified using ...
[3] <Contributors>\n <map-contributor>JR Ferrer-Paris</map-contributor>\n < ...
[4] <Dataset-doi>10.5281/zenodo.5090450</Dataset-doi>
The field with doi for the individual map are stored in the Dataset-doi
tag. We can run a query for all elements containing this tag:
Now we use this list of DOIs to download a copy of each of the repositories containing files for each ecosystem functional group:
output not shown
oldwd <- getwd()
for (doi in all_dois) {
out_folder <- here::here("gisdata", "indicative-maps", doi)
if (!dir.exists(out_folder))
dir.create(out_folder, recursive = TRUE)
setwd(out_folder)
mycl <- makeCluster(4)
download_zenodo(doi=doi,
parallel = TRUE,
parallel_handler = parLapply,
cl = mycl,
timeout = 5000
)
#stopCluster(cl = mycl)
setwd(oldwd)
}
Now we have two copies of each map file, one in the sandbox folder (extracted from the map bundle), and one downloaded directly from the corresponding record.
Map version is described in the map attributes in the xml file:
efg_code map_code map_version update
"T1.3" "T1.3.WM.nwx" "v1.0" "2021-11-26"
The file extracted from the tar archive uses the same code for the file name:
efg_code <- map_info |> xml_attr("efg_code")
match_pattern <- sprintf("^%s", efg_code)
dir(workdir, pattern = match_pattern)
[1] "T1.3.WM.nwx_v1.0.json" "T1.3.WM.nwx_v1.0.tif"
As well as the file downloaded from the specific repository:
ds_doi <- map_info |> xml_find_first("Dataset-doi") |> xml_text()
downloaded_copy <- here::here("gisdata", "indicative-maps", ds_doi)
dir(downloaded_copy)
[1] "README.md" "T1_3_Trop_montane_rainforests.png"
[3] "T1_3_Trop_montane_rainforests.xml" "T1.3.WM.nwx_v1.0.json"
[5] "T1.3.WM.nwx_v1.0.tif"
Let’s double-check, this is the expected file name:
We’ll use the terra library to read the raster file
terra 1.7.55
Summary of the raster layer for the first copy:
class : SpatRaster
dimensions : 20039, 40076, 1 (nrow, ncol, nlyr)
resolution : 1000, 1000 (x, y)
extent : -20038500, 20037500, -10019500, 10019500 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
source : T1.3.WM.nwx_v1.0.tif
name : T1.3.WM.nwx_v1.0
Summary of the raster layer for the second copy:
class : SpatRaster
dimensions : 20039, 40076, 1 (nrow, ncol, nlyr)
resolution : 1000, 1000 (x, y)
extent : -20038500, 20037500, -10019500, 10019500 (xmin, xmax, ymin, ymax)
coord. ref. : +proj=eqc +lat_ts=0 +lat_0=0 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs
source : T1.3.WM.nwx_v1.0.tif
name : T1.3.WM.nwx_v1.0
The raster looks like this:
Compare this with the thumbnail downloaded from Zenodo:
We follow similar steps for the vector files.
We select the map for this functional group using the map_code and map_version from the map details xml:
We can read both directly from the respective folder:
But notice some differences in the summaries:
class : SpatVector
geometry : polygons
dimensions : 158540, 1 (geometries, attributes)
extent : -17770500, 18911500, -3298500, 4351500 (xmin, xmax, ymin, ymax)
source : T1.3.WM.nwx_v1.0.json
coord. ref. : lon/lat WGS 84 (EPSG:4326)
names : occurrence
type : <int>
values : 2
2
2