How easy is it to find Insitu data on the Internet? This presentation uses a Arctic Observations Value Tree Analysis as an example of what can be found and how. Conclusions will highlight what should be improved for data dissemination systems to improve the situation. The presentation was held at the GEO Data Technology Workshop in April 2019.
2. Content
• In-situ observation are very diverse
• Crowdsourcing data is free text and photos
• Searching for Arctic in-situ data
• Value tree analysis of Arctic Observations
• How to develop search and dissemination?
• GEOSS actions to help use insitu better?
3. Sodankylä/Pallas supersite
observing long term
• First thermo-/barometer based records in 1856
• Weather station during the 1st IPY 1882/83
• Homogenized synoptic weather records 1908
• Upper air soundings 1949
• Solar radiation observations (1st IGY) 1957/58
Radioactivity monitoring 1963
• Air quality observations 1970s
• Ozone and UV-observations 1988
• Stratospheric Aerosol/Humitidy mid 1990s
• GAW station on Sammaltunturi 1994
• Micrometeorological tower 1999
• Weather radar at Luosto 2000
• Satellite data reception 2003
• FTIR station for TCCON 2009
• Sodankylä GAW station 2012
• Several ICOS stations 2016
4. Sensor data structure
• Devices have proprietary software and formats
• Signals to data processes range from very simple to
extremely complex
• Tabular data row/column naming is arbitrary
• Semantic standards are sector specific
• Human as sensor another extra complexity
• Crowdsourcing without much training is diverse
• Fake obs as pranks need to be identified
• Quality by having many comparable observations
5. FMI Weather app
(Android/iOS)
5
• Released 2012, new UI in 2016
• Global, 10 day forecast, with a hourly
forecast for the two first days
• Radar images (only in Finland)
• Warnings for 5 days (only for Finland)
• Over 500k downloads in Google Play
• Over 200k downloads in App Store
• Around 100k-200k daily users
=> User base has potential from the start,
no need to market downloading
Supports 3 languages:
Finnish, Swedish and English
More information about the app:
https://en.ilmatieteenlaitos.fi/smartphones
7. Obs list of phenomena
7
Always something to
observe and report to us
- Select phenomenon
- Answer specific question
- With preset options
- Free textbox for more
- Add photo(s)
- Pick location from map
- or leave geo-located
- Insert date and time accuracy
- within last 10min, 1h…
- Check summary and send
First observation generates
Observer tag
8. 8
0
200
400
600
800
1000
1200
1400
1600
1800
Daily observations between 10th July 2017 and 31st October 2018
Public launch
10th July 2017
Blizzard
12th December 2017 Blizzard
2nd April 2018
Midsummer (party)
rains June 2018
Thunderstorms
between July and
September 2018
Severe
thunderstorm
(Kiira)
12th August 2017
TOP 3 days:
1592 (11th July 2017)
1199 (12th August 2017)
1095 (12th July 2017)
BOTTOM 3 days:
13 (16th August 2018)
13 (2nd October 2018)
16 (14th October 2018)
Average: ~ 100
observations /
day
Standard set of crowdsourcing observations types globally?
10. Search for Arctic in-situ
12 153 records –
> analyzed 120
first records:
Many more
metadata records
than data itself.
Irrelevant:
Fire brigade,
public transit, TV
transmitter etc
“stations”, many
records not
working (GBIF)
BIG issue: bounding box global – data area tiny!
11. https://oscar.wmo.int/
• Observing Systems Capability Analysis and Review Tool
- Both satellite missions and
surface networks information
- Weather, climate and marine
good
- Hydrology aspired
- Some gaps in networks, but
generally global coverage
- Only Metadata – actual obs
not linked to!
12. OSCAR network info
Station classes
for met-ocean
stations
Useful for
grouping in-situ
inputs to value
tree
Estimating costs
for stations can
be extrapolated
to networks
13. Observations and models
as value for SBAs
• Observation platform and modeling costs per year were estimated by
FMI network experts or from EU Copernicus program tenders and
multiplied by network information from WMO OSCAR or other sources
• The value in m€ /year is carried along production chains to service level
• weather, marine, environmental, climate, research and Arctic
Council working groups
• EO inputs cost 178 m€/year north of 60°N
• 810m€/year between 30°N and 60°N
• http://arctic-obs.fmi.fi/ to analyze value tree
• report at http://hdl.handle.net/10138/300768
14. Connections to SBAs
- Based on International assessment
framework for arctic obs by SAON/IDA STPI
- 170 key objectives from the Societal Benefit
Areas connected to Service
- Helps to understand value flow, complex
interrelation of different components
- Estimate should be developed into a live
system based on the real data flows
-> internet discovery needs to be improved
15. Non-WMO in-situ sources
• Arctic Observing Viewer http://arcticobserving.utep.edu/aov_viewer/
• US centric, metadata only
• Search listing not working – no export
• Arctic SDI https://geoportal.arctic-sdi.org/
• Limited to maps – stations difficult to discern
• Spatineo harvesting (for Natural Resources Canada)
• https://arctic-sdi-catalogue.spatineo-devops.com/ Needs login/password
• 6 398 services have 333 258 datasets
• Harvested with ML to a Geonetworks catalogue in early 2018, 2019 update
was already at 350 000+ data sets, but was discontinued by NRC
• AGAIN: Bounding Box too often set to full globe even for very local data sets
• Latitudes not between -90…90, probably different coordinate system than WGS84
• OGC services are a pain to harvest for search engines!
• Global Spatineo directory lists 105 924 Spatial Web services with
2 313 526 data sets
• Services with data fully within 60°N to 90°N was 11 003
16. Most data is not behind
OGC services
• Demanding standards -> poor implementation
• Catalogue stats: Datasets in 100 of millions – services ~100
000, but services do not have 100 000 sets on average
• Many Services have poorly defined bounding box, time and
other parameters
• -> Search engines are not crawling catalogue nor services
• Wide adoption is a key to success
• Services need to be simpler to implement
• Especially small in-situ data sets are missing on the
geospatial web – hurdle too high for single site
• Crowdsourcing data set usage resembles more big
data text and image analysis than traditional EO
analysis
17. Search engine, catalogue,
data access
• Federating searches is aspired by polar data communities (ADC
and SCADM)
• Schema.org vocabulary for common semantics
• Data needs to be presented as HTML including microdata (RDFa or JSON-
LD) – implementation to be spread
• SpatioTemporal Asset Catalogue (STAC 0.6)
• New simpler catalogue as JSON with variable, spatial and time definition,
extendable – in development!
• Born mainly from EO data, developing 5D model descriptions
• Catalogue easily turned into HTML -> can be catched by search engines
crawling
• OGC Web Feature Service 3.0 proposal moving to simpler and
extendable as well – pairs with STAC
• ULTIMATE GOAL: Data to be found and accessed in small chunks!
• Subset spatial and temporal extents
18. Conclusions
• From complex to simple
• In-situ data cataloguing and dissemination
need services to expose data for more use
• Trained machine learning for ingesting tabular data
to add semantics and description?
• Spatial and temporal info captured uniformly
• Curating to add quality seal
• Dissemination as OGC service is challenging
• Simpler ways in development
• Find and access multidimensional data chunks
• How far can a GEOSS hub advance this?