Attribution 4.0 (CC BY 4.0)https://creativecommons.org/licenses/by/4.0/
License information was derived automatically
FILS, Douglas, Ocean Leadership, 1201 New York Ave, NW, 4th Floor, Washington, DC 20005, SHEPHERD, Adam, Woods Hole Oceangraphic Inst, 266 Woods Hole Road, Woods Hole, MA 02543-1050 and LINGERFELT, Eric, Earth Science Support Office, Boulder, CO 80304
The growth in the amount of geoscience data on the internet is paralleled by the need to address issues of data citation, access and reuse. Additionally, new research tools are driving a demand for machine accessible data as part of researcher workflows. In the commercial sector, elements of this have been addressed by the use of the Schema.org vocabulary encoded via JSON-LD and coupled with web publishing patterns. Adaptable publishing approaches are already in use by many data facilities as they work to address publishing and FAIR patterns. While these often lack the structured data elements these workflows could be leveraged to additionally implement schema.org style publishing patterns.
This presentation will report on work that grew out of the EarthCube Council of Data Facilities known as, Project 418. Project 418 was a proof of concept funded by the EarthCube Science Support Office for exploring the approach of publishing JSON-LD with schema.org and extensions by a set of NSF data facilities. The goal was focused on using this approach to describe data set resources and evaluate the use of this structured metadata to address discovery. Additionally, we will discuss growing interest by Google and others in leveraging this approach to data set discovery.
The work scoped 47,650 datasets from 10 NSF-funded data facilities. Across these datasets, the harvester found 54,665 data download URLs, and approximately 560K dataset variables and 35k unique identifiers (DOIs, IGSNs or ORCIDs).
The various publishing workflows used by the involved data facilities will be presented along with the harvesting and interface developments. Details on how resources were indexed into text, spatial and graph systems and used for search interfaces will be presented along with future directions underway building on this foundation.