import pandas as pd
import os
import rasterio
from matplotlib import pyplot as plt
from PIL import Image
import numpy as np
import tarfile
import pyprojroot
import requests
import xml.etree.ElementTree as ET
Download data with Python
Set up required libraries
Set up work and output directories
We use library pyprojroot
to define relative paths from our repository root folder:
= pyprojroot.find_root(pyprojroot.has_dir(".git")) repodir
Profile information from OSF
Workbook with profile content for Ecosystem Functional Groups of the IUCN Global Ecosystem Typology (Level 3 units) available at https://osf.io/4dcea
We can read this workbook directly from the url using pandas
:
='https://osf.io/download/4dcea/'
profiles_url= pd.read_excel(profiles_url, sheet_name = 'Short description') EFG_profiles
Here is a peak of the downloaded file:
'name','short description', 'url']].head() EFG_profiles[[
name | short description | url | |
---|---|---|---|
0 | F1.5 Seasonal lowland rivers | These medium to large rivers in tropical, subt... | https://global-ecosystems.org/explore/groups/F1.5 |
1 | F1.4 Seasonal upland streams | Seasonal rainfall patterns in large parts of t... | https://global-ecosystems.org/explore/groups/F1.4 |
2 | F1.3 Freeze-thaw rivers and streams | In cold climates at high latitudes or altitude... | https://global-ecosystems.org/explore/groups/F1.3 |
3 | F1.2 Permanent lowland rivers | Lowland rivers with slow continuous flows up t... | https://global-ecosystems.org/explore/groups/F1.2 |
4 | F1.1 Permanent upland streams | These small rivers or streams in mountainous o... | https://global-ecosystems.org/explore/groups/F1.1 |
Indicative map data download from zenodo
Indicative maps are available from different Zenodo repositories.
Bundle of indicative maps
One repository holds the bundle of maps in compressed tar archives. This record id automatically resolves to the latest version:
= 'https://zenodo.org/api/records/3546513'
record_url = requests.get(record_url) r
Check the difference between the id
and conceptid
, and the two DOI (digital object identifier):
= r.json()
response print((response['id'],
'conceptrecid'],
response['doi'],
response['conceptdoi'])) response[
(10081251, '3546513', '10.5281/zenodo.10081251', '10.5281/zenodo.3546513')
We will create a folder for this direct download from zenodo for the latest version of the bundle:
= repodir / "gisdata" / "indicative-maps-bundle" / "latest"
datadir =True, exist_ok=True) datadir.mkdir(parents
The zenodo record has four different files:
'files'] response[
[{'id': 'fe0bbd7a-bc60-415f-a8f6-6848acd8c220',
'key': 'README.md',
'size': 19383,
'checksum': 'md5:e6cdcb4b46b261fecd5fce5c31fb5805',
'links': {'self': 'https://zenodo.org/api/records/10081251/files/README.md/content'}},
{'id': '437d31b3-6502-454a-9511-da1432b22d1f',
'key': 'all-maps-vector-geojson.tar.bz2',
'size': 853479699,
'checksum': 'md5:8490adfd971bc3a5b02cc72832e2df3e',
'links': {'self': 'https://zenodo.org/api/records/10081251/files/all-maps-vector-geojson.tar.bz2/content'}},
{'id': 'd2cb3226-e68c-452d-a6f8-a65d0686380b',
'key': 'all-maps-raster-geotiff.tar.bz2',
'size': 157378816,
'checksum': 'md5:245d09d535609d1c010eff52e9c8bfaa',
'links': {'self': 'https://zenodo.org/api/records/10081251/files/all-maps-raster-geotiff.tar.bz2/content'}},
{'id': 'a42361fc-e4bc-48d3-8c8a-0deafa722f80',
'key': 'map-details.xml',
'size': 114612,
'checksum': 'md5:5177a7a6fc5aa4981088093a68b2964e',
'links': {'self': 'https://zenodo.org/api/records/10081251/files/map-details.xml/content'}}]
We want to download three of them:
for i in range(len(response['files'])):
= response['files'][i]['links']['self']
zenodo_url = datadir / response['files'][i]['key']
outputfile if not outputfile.exists():
= requests.get(zenodo_url)
zdownload if zdownload.ok:
with open(outputfile, mode="wb") as file:
file.write(zdownload.content)
We extract maps from the tar archives in a sandbox folder:
= repodir / "sandbox" / "latest"
sandbox =True, exist_ok=True) sandbox.mkdir(parents
= datadir / 'all-maps-raster-geotiff.tar.bz2'
maps_tar with tarfile.open(maps_tar) as my_tarfile:
if hasattr(tarfile, 'data_filter'):
=sandbox,filter='data')
my_tarfile.extractall(pathelse:
# remove this when no longer needed, see https://docs.python.org/3/library/tarfile.html#tarfile-extraction-filter
print('Extracting may be unsafe; consider updating Python')
=sandbox) my_tarfile.extractall(path
Extracting may be unsafe; consider updating Python
Repositories for single EFG maps
Map details are stored in a xml file that is part of the map bundle zenodo download.
Check the file was downloaded:
= datadir / "map-details.xml"
map_details_file map_details_file.exists()
True
We’ll use the xml2
library to read the xml file
= ET.parse(map_details_file)
tree #maplist_root = tree.getroot()
We can query map details for an specific map:
=tree.find(".//Map[@efg_code='T1.1']")
nodefor child in node.iter():
print(ET.tostring(child))
b'<Map efg_code="T1.1" map_code="T1.1.web.mix" map_version="v2.0" update="2020-11-08">\n <Functional_group>T1.1 Tropical/Subtropical lowland rainforests</Functional_group>\n <Description>Major and minor occurrences were initially identified using consensus land-cover maps (Tuanmu et al. 2014) and then cropped to selected terrestrial ecoregions (Dinerstein et al. 2017; DAWE 2012) at 30 arc seconds spatial resolution. Ecoregions were selected if: i) their descriptions mentioned features consistent with those identified in the profile of the Ecosystem Functional Group; and ii) if their location was consistent with the ecological drivers described in the profile.</Description>\n <Contributors>\n <map-contributor>JR Ferrer-Paris</map-contributor>\n <map-contributor>DA Keith</map-contributor>\n </Contributors>\n <Dataset-doi>10.5281/zenodo.5090450</Dataset-doi>\n </Map>\n '
b'<Functional_group>T1.1 Tropical/Subtropical lowland rainforests</Functional_group>\n '
b'<Description>Major and minor occurrences were initially identified using consensus land-cover maps (Tuanmu et al. 2014) and then cropped to selected terrestrial ecoregions (Dinerstein et al. 2017; DAWE 2012) at 30 arc seconds spatial resolution. Ecoregions were selected if: i) their descriptions mentioned features consistent with those identified in the profile of the Ecosystem Functional Group; and ii) if their location was consistent with the ecological drivers described in the profile.</Description>\n '
b'<Contributors>\n <map-contributor>JR Ferrer-Paris</map-contributor>\n <map-contributor>DA Keith</map-contributor>\n </Contributors>\n '
b'<map-contributor>JR Ferrer-Paris</map-contributor>\n '
b'<map-contributor>DA Keith</map-contributor>\n '
b'<Dataset-doi>10.5281/zenodo.5090450</Dataset-doi>\n '
The field with doi for the individual map are stored in the Dataset-doi
tag. We can run a query for all elements containing this tag and summarise this using list comprehension:
= [doi.text for doi in tree.findall( ".//Dataset-doi")] all_dois
Now we use this list of DOIs to download a copy of each of the repositories containing files for each ecosystem functional group:
for doi in all_dois:
= repodir / "gisdata" / "indicative-maps" / doi
out_folder =True, exist_ok=True)
out_folder.mkdir(parentsif len(os.listdir(out_folder)) ==0:
# query doi to get the corresponding zenodo API url
= requests.get('https://doi.org/'+doi, allow_redirects=True)
r = urlparse(r.url)
parsed_uri = '{uri.scheme}://{uri.netloc}/api{uri.path}'.format(uri=parsed_uri)
zenodo_url
r.close()# now query the API
= requests.get(zenodo_url)
r = r.json()
response for i in range(len(response['files'])):
= response['files'][i]['links']['self']
zenodo_url = out_folder / response['files'][i]['key']
outputfile if not outputfile.exists():
= requests.get(zenodo_url)
zdownload if zdownload.ok:
with open(outputfile, mode="wb") as file:
file.write(zdownload.content)
Compare downloaded files
Now we have two copies of each map file, one in the sandbox folder (extracted from the map bundle), and one downloaded directly from the corresponding record.
Map code and version are described in the map attributes in the xml file:
=tree.find(".//Map[@efg_code='T1.3']")
node node.attrib
{'efg_code': 'T1.3',
'map_code': 'T1.3.WM.nwx',
'map_version': 'v1.0',
'update': '2021-11-26'}
The file extracted from the tar archive uses the same code and version combination for the file name:
for k in sandbox.glob("%s*.tif" % node.attrib['efg_code']):
print(k.name)
T1.3.WM.nwx_v1.0.tif
As well as the file downloaded from the specific repository:
= repodir / "gisdata" / "indicative-maps" / node.find('Dataset-doi').text
out_folder for k in out_folder.glob("%s*.tif" % node.attrib['efg_code']):
print(k.name)
T1.3.WM.nwx_v1.0.tif
Raster maps
We’ll use the rasterio
to read and plot the raster file
= rasterio.open(k)
src 1), cmap='pink')
plt.imshow(src.read( plt.show()
Compare this with the thumbnail downloaded from Zenodo:
= [x for x in out_folder.glob("*.png")]
pngfile = np.asarray(Image.open(pngfile[0]))
img plt.imshow(img)