Enabling Technologies and Standards

Last updated on 2023-11-16 | Edit this page

Quote by Tim Berners-Lee saying 'The original idea of the web was that it should be a collaborative space where you can communicate through sharing information.'

Overview

Questions

  • What are the benefits of creating or using standards and metadata schemas?
  • How do you find a suitable metadata standard or terminology for your research field online?

Objectives

  • Understand, read and write XML / JSON Schema.
  • Find metadata standards and terminologies relevant to your scientific domain.

A Brief History of the World Wide Web


In 1989 researchers Tim Berners-Lee and Robert Cailliau started their HyperText project called the WWW (World-Wide Web, short Web) at the CERN research center in Geneva, Switzerland. The Web was developed to “meet the demand for automated information-sharing between scientists in universities and institutes around the world”.1

The main building blocks of the World Wide Web are:

  • HTML (HyperText Markup Language) with “hyperlinks”
  • HTTP (HyperText Transfer Protocol)
  • URI (Uniform Resource Identifier)

HTML is the standard markup language to create Web pages. It describes the Web page’s structure and tells the browser how to display the content.2


“a combination of natural language text with the computer’s capacity for interactive branching, or dynamic display …”
- Ted Nelson


HTTP is a simple protocol for communication between devices that store and provide resources (“server”) and devices that want to access and update them (“clients”). It is still the main protocol used on the World Wide Web.

For URI see chapter (Web) Location & Identifiers.

In 1992 Deutsches Elektronen-Synchrotron DESY in Hamburg connected a web server to the WWW. One of the first adopters worldwide was the arXiv preprint repository. They switched from email to HTTP for manuscript dissemination in 1991.3

So-called web repositories store and publish (scholarly) digital objects – like paper publications and research data – and their metadata records. This way, they aim to improve the persistent findability and accessibility of research output. Repositories in turn are indexed for findability in registry services like re3data and OpenDOAR.

Metadata Schemas


Callout

A metadata schema is a template which precisely spicifies the metadata elements expected and how they should be structured.

XML Schemas (.xsd) are written in XML and used to specify & syntactically validate the structure of XML documents or (meta)data records.4 You might encounter XML Schemas while looking for certain standards relevant to your field of research. However, xsd is less frequently used for modern standards.

The JSON Schema Vocabulary is used to specify & syntactically validate the structure of JSON (meta)data records. We will focus on JSON Schema in our next hands-on task. Each JSON schema is a JSON object literal by itself.5

A simple JSON schema could look like the one below. It declares:

  • JSON Schema version with $schema
  • a list (an array) of required (i. e. mandatory) properties with one required property (i.e. "superhero")
  • one optional property (i.e. "power")
  • data type constraints for record values (e.g. "type": "integer")

There are also some descriptions added for the human reader.

JSON

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "In real life you would add a meaningful description here.",
  "type": "object",
  "required": [
    "superhero"
  ],
  "properties": {
    "superhero": {
      "description": "A mandatory string property.",
      "type": "string"
    },
    "power": {
      "description": "An optional numeric property.",
      "type": "integer"
    }
  }
}

A JSON entity is syntactically valid and is called instance of a schema, if it conforms to the definition specified by the JSON schema. Note, that the JSON Schema required keyword holds a list of keys that must be present for a JSON object to be considered a valid instance of this schema.

JSON

{
  "superhero": "I am just a string"
}

Callout

The most challenging part of schema development can be to have everyone agree on the same expectations.

Challenge 4: JSON Schema

After a couple of researchers upload their JSON metadata records to the project repository, it becomes obvious that well-formed JSON metadata describing similar experiments can still be expressed in a myriad of ways.

Your collaboration decides to develop a metadata schema to standardize metadata records across the project. Consensus is encoded in a JSON Schema.

Now it is your task to help with the subschema for experimental conditions!

In the following code block you see valid JSON metadata that specifies experimental conditions as agreed on in the project.

JSON

{
  "experimentalConditions": {
    "ride": {
      "rideType": "roller coaster",
      "rideName": "Flight of the Bat",
      "location": "Gotham City, New Jersey"
    },
    "testPerson": {
      "sex": "male",
      "height": 180
    },
    "recording": {
      "testDevice": "iPhone X",
      "testDeviceFixture": "left upper arm",
      "testApp": "Physics Toolbox Suite by Vieyra Software"
    }
  }
}

In the following code block you see the JSON schema draft for the experimental conditions. Your collaborators already modelled constraints and valid values for ride and testPerson.

Discuss and add constraints to the recording property.

  • testDevice, testDeviceFixture and testApp are mandatory properties for the recording object
  • testDevice value must be one of:
    • iPhone X
    • iPhone 6
    • iPhone 6s
    • other
  • testApp value must be one of:
    • Physics Toolbox Suite by Vieyra Software
    • Bunny Rollercoaster Physics App
  • testDeviceFixture value must be one of:
    • left upper arm
    • right upper arm
    • mouth fixture device
    • other

JSON

{
  "experimentalConditions": {
    "description": "A summary of the experimental conditions. Include sufficient detail to facilitate search and discovery.",
    "type": "object",
    "required": [
      "recording",
      "ride",
      "testPerson"
      ],
    "properties": {
    
      "recording": {
      /* Insert your schema here and delete this comment */
        },
      
      "ride": {
        "description": "Properties of the ride.",
        "type": "object",
        "required": [
          "rideType",
          "rideName"
        ],
        "properties": {
          "rideType": {
            "description": "Ride type.",
            "type": "string",
            "enum": [
              "roller coaster",
              "water slide",
              "bob sled"
            ]
          },
          "rideName": {
            "description": "Official name of the ride.",
            "type": "string",
            "minLength": 3
          },
          "location": {
            "description": "City and State in which ride is located.",
            "type": "string",
            "minLength": 10
          }
        }
      }
    },
    "testPerson": {
      "description": "Properties of person carrying the test device.",
      "type": "object",
      "required": [
        "height",
        "sex"
      ],
      "properties": {
        "height": {
          "description": "Height of test person in cm (SI unit of length).",
          "type": "number",
          "minimum": 120,
          "exclusiveMaximum": 220
        },
        "sex": {
          "description": "Sex of test person.",
          "type": "string",
          "enum": [
            "female",
            "male",
            "non-binary",
            "not disclosed"
          ]
        }
      }
    }
  }
}

JSON

{
    "experimentalConditions": {
        "description": "A summary of the resource. Include sufficient detail to facilitate search and discovery.",
        "type": "object",
        "required": [
            "recording",
            "testObject"
            "testPerson"
            ],
        "properties": {
            
            */ add your schema here /*
            "recording": {
                "description": "",
                "type": "object",
                "required":[
                    "testApp",
                    "testDevice",
                    "testDeviceFixture"
                    ],
                "properties": {
                    "testApp": {
                        "description": "Test app used.",
                        "type": "string",
                        "enum": [
                            "Physics Toolbox Suite by Vieyra Software",
                            "Bunny Rollercoaster Physics App"
                            ]
                        },
                    "testAppVersion": {
                        "description": "Version of test app (free text input). Full semantic versioning input preferred: Major.minor.bugfix",
                        "type": "string",
                        "minLength": 1
                        },
                    "testDevice": {
                        "description": "Test device used.",
                        "type": "string",
                        "enum": [
                            "iPhone X",
                            "iPhone 6",
                            "iPhone 6s",
                            "other"
                            ]
                        },
                    "testDeviceFixture": {
                        "description": "Test device fixture.",
                        "type": "string",
                        "enum": [
                            "left upper arm",
                            "right upper arm",
                            "mouth fixture device",
                            "other"
                            ]
                        }
                    }   
                },
                
            */ this part was prepared by your collaborators /*
            "testObject": {
                "description": "A free text abstract of the experimental setup.",
                "type": "object",
                "required": [
                    "rideType",
                    "rideName"
                    ],
                "properties": {
                
                    "rideType": {
                        "description": "Specification of ride type of the tested object",
                        "type":"string",
                        "enum": [
                            "roller coaster",
                            "water slide",
                            "bob sled"
                            ]
                        },
                        
                    "rideName": {
                        "description": "Official name of the ride.",
                        "type": "string",
                        "minLength": 1
                        },
                        
                    "location": {
                        "description": "City and State in which the ride is located",
                        "type": "string"
                    }
                },
                
            "testPerson": {
                "description": "Information about the subject carrying the test device.",
                "type": "object",
                "required": [
                    "height",
                    "sex"
                    ],
                "properties": {
                
                    "height": {
                        "description": "The height of the test person in cm (SI unit of length).",
                        "type": "number",
                        "minimum": 120,
                        "exclusiveMaximum": 220
                        },
                        
                    "sex": {
                        "description": "The sex of the test person.",
                        "type": "string",
                        "enum": [
                            "female",
                            "male",
                            "not disclosed"
                            ]
                        }
                    }
                }
        }

TASK 5: Form input and validation with JSON schema

Congratulations, you finished your metadata schema! Now, collecting interoperable metadata will be a lot easier in your collaboration.

We must admit: writing a valid JSON metadata record for each and every experiment that you perform is tedious and time consuming. But now that you have a JSON Schema at hand, things will get a lot easier! The project sets up a user-friendly HTML form interface for the input of JSON metadata.

Let’s try this:

  • Download the full JSON schema here: exampleDataObject_Schema.json
  • Inspect the JSON schema briefly.
  • In your browser, go to the react-jsonschema-form playground.
  • Delete the sample content in JSONschema and formData
  • Copy and paste the full schema into the JSONschema box
  • Check again if Chuck Norris properties reappeared in formData results; he can be tough 😄
  • Inspect the form interface thoroughly.
  • Optional: Copy the final JSON object literal in formData in a separate text document and save the file as exampleDataObject.json

Note that the JSON Schema used for this demo lacks the recommended $schema keyword: this is because the playground will unfortunately reject the keyword. You should always follow the best practices when writing a schema, but sometimes some adaptations are needed to make them work in different situations.

Plenary result discussion

  • How does the browser display lists of pre-defined values (specified as enum in the schema)?
  • How are arrays and objects interpreted in the form interface?
  • What happens if you enter an invalid value (e.g. try to enter a string for the test persons height)
  • What happens if you enter a nonsense value (e.g. try to enter a nonsense string for rideName)
  • How does the web service respond if you click on submit without filling all the “required” fields?

Metadata Standards


Callout

A metadata schema can become a standard by governance authority or common adoption.

Researchers, librarians and web technologists drafted the Dublin Core – a set of 15 library-card-catalog-like metadata elements for the web – in 1995 at a meeting in Dublin, Ohio (USA).6

Dublin Core and its extensions are widely used and referenced today. The Dublin Core Metadata Initiative (DCMI) states to work openly, with a paid-membership model.

The 15 generic Dublin Core metadata elements have been formally standardized for cross-domain resource description in e.g. ISO 15836-1:20177

Depiction of the 15 Dublin Core Elements: Creator, Contributor, Publisher, Title, Date, Language, Format, Subject, Description, Identifier, Relation, Source, Type, Coverage, Rights

Many scholarly repositories expose a standardized application programming interface (API) for the harvesting of Dublin Core metadata as specified in the OAI 2.0 specification

Challenge 6: Domain specific metadata standards

  1. Open one of these metadata standard registries in your preferred browser:
  1. Search for a metadata schema, standard or vocabulary relevant to your research domain.
  2. Inspect the information provided.

Key Points

  • The WWW was developed in from and for the scientific community to connect researchers worldwide and enable sharing information
  • Metadata schemas serve as template and validation matrix for metadata records
  • JSON Schemas are special JSON object literals describing how other JSON must look like
  • Well-established metadata schemas have the potential to become a (community) standard

  1. The birth of the Web | CERN. (2023, August 11). https://home.cern/science/computing/birth-web↩︎

  2. XML Schema Tutorial. (C) 1999-2022. Refsnes Data, W3Schools. https: //www.w3schools.com/xml/schema_intro.asp↩︎

  3. The arXiv of the future will not look like the arXiv. (n.d.). Ar5iv. https://ar5iv.labs.arxiv.org/html/1709.07020↩︎

  4. XML Schema Tutorial. (C) 1999-2022. Refsnes Data, W3Schools. https: //www.w3schools.com/xml/schema_intro.asp↩︎

  5. Understanding JSON Schema. The basics. © Copyright 2013-2016 Michael Droettboom, Space Telescope Science Institute; Last updated on Feb 07, 2022. https://json-schema.org/understanding-json-schema/basics.html↩︎

  6. Metadata Basics. (2018, December 15). https://www.dublincore.org/resources/metadata-basics/↩︎

  7. ISO 15836-1:2017. (n.d.). ISO. https://www.iso.org/standard/71339.html↩︎