Enabling Technologies and Standards

Last updated on 2023-11-16 | Edit this page

Estimated time: 120 minutes

Quote by Tim Berners-Lee saying 'The original idea of the web was that it should be a collaborative space where you can communicate through sharing information.'

Overview

Questions

  • What are the benefits of creating or using standards and metadata schemas?
  • How do you find a suitable metadata standard or terminology for your research field online?

Objectives

  • Understand, read and write XML / JSON Schema.
  • Find metadata standards and terminologies relevant to your scientific domain.

A Brief History of the World Wide Web


Slide set: BriefHistoryOfTheWeb.pdf

Time estimate: 10 min

  1. Before we dive deeper into the topic of metadata and how we can make it more accessible, we would like to travel a few decades back and have a quick glimpse at the technical advancements, that paved the way for us, to share data across the globe instantaneously: the internet and the world wide web.
  2. Let’s travel back to the 1960s, when computers still looked like this (point to bottom right corner ). Computers became increasingly common in work environments, especially universities, research institutes and governmental agencies.
  3. With this advancements, the need to share data between computers arose and local network technologies were developed. Within the institutions, the computers were linked with cables…
  4. and communication protocols were developed, that allowed the transfer of data between computers in the local area network and controlled the means of data distribution.
  5. This was a development that naturally occurred on many sites simultaneously and soon the scientific drive to collaborate was aiming to use this new technology to share data quickly across inter-institutional boundaries…
  6. However, there was a major issue. The individual networks were using different communication protocols, that were largely incompatible with each other.
  7. To solve this problem, two researchers at the US Defense Advanced Research Projects Agency, Vint Cerf & Bob Kahn, worked hard on a solution. In 1974 they introduced the Transmission Control Protocol / Internet Protocol – TCP / IP, a term you have certainly heard at some point.
  8. This Internet Protocol Suite finally enabled communication between networks. It was soon declared as the standard for all military computer networking and adopted by research organizations as well as large communication companies. The internet, the network of networks, was born and Vint Cerf and Bob Kahn will become known as the fathers of the internet. That the TCP / IP protocol developed to a global standard for communication between networks, can be attributed…
  9. to another decision based in academia. In 1989, the University of California, Berkley, decided to share their TCP/IP code with the world and put it under public domain; free for anyone to use.
  10. And today, computer networks all over the world are connected with each other over uncountable miles of cables…
  11. Crossing the oceans.
  12. Now, that communication between networks was secured, the way was free for applications running on top of the Internet Protocol Suite. And scientists love to communicate, right? So again, to researchers at the University of California – Jon Postel and Suzanne Sluizer – developed the Mail Tranfer Protocol. No more letters, no more making appointments for phone calls, but sharing information instantaneously by email.
  13. Meanwhile in Europe, two computer scientists shared the vision to implement an application for the internet, that “serves as a collaborative space where you can communicate through sharing information”.
  14. These two were Tim Berners-Lee and Robert Caillau…
  15. at the CERN research center in Geneva Switzerland. They wrote a joint proposal for the “World Wide Web”, which aimed to “meet the demand for automated information-sharing between scientists in universities and institutes around the world”.So the World Wide Web was, from its beginning, meant to facilitate scientific data exchange.
  16. So to distinguish the World Wide Web from the internet: The internet is the global network connecting local networks with each other and allowing them to communicate. The World Wide Web is a service, that runs on top of the internet which allows to search, get and share data.
  17. And this was all established in a very familiar, academic environment. The name of this data sharing application, was discussed at lunch in the cafeteria.
  18. Based on the early protocols, the building blocks of today’s World Wide Web were developed - HTML, which is still the standard markup language for documents designed to be interpreted by and displayed in web browsers. The HyperText Transfer Protocol, which is an application layer protocol to ensure client-server communication on the Web. And URIs, Uniform Resource Identifiers, which uniquely identify resources on the web and ensure their accessibility and retrievability. We will come back to these technologies…
  19. tomorrow, when we discuss how data is transferred on the World Wide Web and the importance of web identifiers.
  20. Coming back to this 1989 picture, we haven’t talked about this young woman sitting next to Tim Berners-Lee.
  21. Her name is Nicola Pellow and back in 1990 she was still studying Mathematics and Information Science at Leicester Polytechnic. She went to Switzerland for an internship in the research group of Tim Berners-Lee and Robert Caillau and as a student intern…
  22. she developed the first operation system independent web browser, the Line Mode Browser, making the world wide web accessible on many computers.
  23. Within the scientific community, the World Wide Web was an immediate success. Already in 1991, the arXiv preprint repository switched from email dissemination of manuscripts to HTTP. The picture on the right gives an insight how scientific article were distributed before the World Wide Web. You couldn’t simply click on any article that you found on Google Scholar and download it - provided it isn’t hidden behind a paywall - and decide later, whether it was helpful or not. You needed to exactly know which article you want to study, contact the archiving repository, wait for an actual human being to retrieve the article, scan it and send it back to you. (To reactivate the learners, you can ask them at this point, who has a folder on their drive for “unread articles” or “articles to read”.). The first German institute to connect a web server to the Web was the “Deutsches Elektronen-Synchrotron DESY. (Well, the lesson was created within the Helmholtz Association in Germany, which makes this a nice Fun Fact for the original audience. Feel free to substitute with some other relevant institution relevant to your learners).
  24. Today the web hosts so many repositories for research articles and data - global, local, very general or highly domain-specific - that it becomes hard to find the right repository for the resource you want to retrieve or upload. Needless to say that you can find several repositories for repositories online. To sum it up, the World Wide Web was originally created to share data between scientists and scientific institutions, and it succeeded so well, that today it is barely imaginable to conduct research without the Web.
  25. The take away-message of this brief history lesson is that the World Wide Web was created by scientist for scientist to promote data sharing and collaboration in a scholarly environment across physical borders. And the decision of CERN to put all the components of Web software in the public Domain paved the way for the web as we know it today.

In 1989 researchers Tim Berners-Lee and Robert Cailliau started their HyperText project called the WWW (World-Wide Web, short Web) at the CERN research center in Geneva, Switzerland. The Web was developed to “meet the demand for automated information-sharing between scientists in universities and institutes around the world”.1

The main building blocks of the World Wide Web are:

  • HTML (HyperText Markup Language) with “hyperlinks”
  • HTTP (HyperText Transfer Protocol)
  • URI (Uniform Resource Identifier)

HTML is the standard markup language to create Web pages. It describes the Web page’s structure and tells the browser how to display the content.2


“a combination of natural language text with the computer’s capacity for interactive branching, or dynamic display …”
- Ted Nelson


HTTP is a simple protocol for communication between devices that store and provide resources (“server”) and devices that want to access and update them (“clients”). It is still the main protocol used on the World Wide Web.

For URI see chapter (Web) Location & Identifiers.

In 1992 Deutsches Elektronen-Synchrotron DESY in Hamburg connected a web server to the WWW. One of the first adopters worldwide was the arXiv preprint repository. They switched from email to HTTP for manuscript dissemination in 1991.3

So-called web repositories store and publish (scholarly) digital objects – like paper publications and research data – and their metadata records. This way, they aim to improve the persistent findability and accessibility of research output. Repositories in turn are indexed for findability in registry services like re3data and OpenDOAR.

Metadata Schemas


Callout

A metadata schema is a template which precisely spicifies the metadata elements expected and how they should be structured.

Slide set: MetadataSchemas.pdf

Time estimate: 10 min

  1. Even though the World Wide Web has provided the means of sharing information with individuals and broad communities easily, we have experienced yesterday and in our daily life as researchers, that simply sharing data does not guarantee the reusability of this information. And we have all experienced the yearning for guidelines on how this information should be provided. So we will now introduce you to metadata schemas. (You can also show a slide with the JSON object literal results from day 1 at this point or learner quotes from the result discussion of Challenge 3.)
  2. With a metadata schema, it is possible to express requirements on how a metadata record should be structured and even enforce this structure.
  3. An example of validated and enforced data submission we are probably all familiar with is the classic customer information form that you need to fill out when you are placing an order in an online shop.
  4. We know these asterisks tell us that these data entries are required. Some entries only allow string values, like the name, or specific formats, like e-mail addresses and will throw an error as soon as you diverge from the expected data type or format. Or you have a fixed subset of values, which you can pick from a drop-down list.
  5. If you press submit, the data in this form will be validated and, if it passes this validation, stored or used in other applications. This means that it will be stored and transmitted in some suitable common data format such as JSON.
  6. The names of the fields specify the keys - or properties of the JSON object…
  7. and below you can find a description of the data values that are supposed to be entered.
  8. For metadata records in general, these conventions or constraints can be set by metadata schemas. These schemas are defined in the same data format as the expected metadata record, such as XML or JSON. Again, this allows for parsing and automated validation.
  9. This means, XML schemas are written in XML and JSON schema is written in JSON. So far, we have written JSON object literals and we will be further focusing on JSON…
  10. so we will take a deeper look at JSON Schema.
  11. On the right, you see a simple JSON Schema, which is a JSON object that is following the JSON Schema standard. This is indicated by the “$schema”-key, which is used as a version identifier and points to the location of the schema specification. The value of this keyword must be an URI. We will tackle the topic of referencing in metadata records later in this lesson.
  12. The schema specifies keys - or properties - that require to enter a data value. In this case…
  13. The superhero property is required and expects an entry of data type: string.
  14. A second property is defined: power. However, power is not listed in the required properties, so it is optional.
  15. Coming back to the data types: This schema specifies a JSON object in which the superhero-value should be a string and the power-value needs to be of data type integer. And for the human reader, some meaningful descriptions are added to describe the individual properties.
  16. Based on this example schema, the object on top would be valid. A string value is assigned to the required property. The bottom object, however, does not conform with the schema and would throw an error, as the value data type is not a string. It is important to know, that schema validation only checks for syntactical validity. In this case, it only checks, whether the object contains the required property, the corresponding value and whether the value conforms with the expected data type. To prevent nonsensical values, a meaningful property description can be of great help.”
  17. (Allow some time for questions.)
  18. Before we write a JSON schema for our roller coaster experiment data, let’s get back to the customer information form and design a JSON schema together that enforces the data entries for this form.

The concept of a schema can be overwhelming for learners with little to no prior knowledge. To prepare the learners for Challenge 4, we recommend to include an interactive live coding session. The following instructions and suggestions are based on the narrative we follow in our course setup.

Time: 8 min

Instructor material:

Creator’s recommendation:

  • Introduce a customer information form as an example for data validation / enforcement in your lecture.
  • Open the image of the customer information form and an empty JSON file in split-screen view.

Screenshot of the recommended live coding session.

  • For live coding, we recommend using an IDE with JSON syntax highlighting (e.g. VS Code)

Narrative / Teaching script:

  • start off with 2 indentations
  • demonstrate specifying the first property in the form "Full Name" by entering a meaningful "description" and "type": "string"
  • encourage the learners to shout out the values (and keys), that specify the subsequent form properties.
  • highlight the following aspects:
    • "Country/Region": to restrict a value to a fixed set of values, the keyword "enum" is used. Fixed values are specified in an array of unique elements.
    • "Number of super powers": introduce "type": "number"
    • "E-mail": introduce the "format"-keyword. "format": "email" validates against the correct formatting of an e-mail address (someString - @-sign - domain name). However, it does not check whether the e-mail address exists.
    • "Date of birth": specify date-format.
  • The specified keywords represent the "properties" of the customerInformation-object
  • the "properties"-key of a JSON schema object takes a value of data type object -> enclose the specified field-objects in curly brackets
  • collaboratively define the JSON schema keys "title", "description", and "type"
  • highlight, that a JSON schema is a JSON object literal -> enclose the schema-object in curly brackets
  • Finally, draw the learners’ attention to the mandatory fields in the customer information form and introduce the "required"-keyword

XML Schemas (.xsd) are written in XML and used to specify & syntactically validate the structure of XML documents or (meta)data records.4 You might encounter XML Schemas while looking for certain standards relevant to your field of research. However, xsd is less frequently used for modern standards.

The JSON Schema Vocabulary is used to specify & syntactically validate the structure of JSON (meta)data records. We will focus on JSON Schema in our next hands-on task. Each JSON schema is a JSON object literal by itself.5

A simple JSON schema could look like the one below. It declares:

  • JSON Schema version with $schema
  • a list (an array) of required (i. e. mandatory) properties with one required property (i.e. "superhero")
  • one optional property (i.e. "power")
  • data type constraints for record values (e.g. "type": "integer")

There are also some descriptions added for the human reader.

JSON

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "description": "In real life you would add a meaningful description here.",
  "type": "object",
  "required": [
    "superhero"
  ],
  "properties": {
    "superhero": {
      "description": "A mandatory string property.",
      "type": "string"
    },
    "power": {
      "description": "An optional numeric property.",
      "type": "integer"
    }
  }
}

A JSON entity is syntactically valid and is called instance of a schema, if it conforms to the definition specified by the JSON schema. Note, that the JSON Schema required keyword holds a list of keys that must be present for a JSON object to be considered a valid instance of this schema.

JSON

{
  "superhero": "I am just a string"
}

Callout

The most challenging part of schema development can be to have everyone agree on the same expectations.

Slide set: Challenge45Introduction.pdf

Time estimate: 5 min

  1. Let’s briefly recap on the most important characteristics of a JSON schema.
  • each of the properties in the schema is described in a separate object.
  • the data type of a property is specified by the key type.
  • if you want to further specify the format of a data entry, use the key format.
  • mandatory data entries are given by the key required in an array, and not inside the properties section, but parallel to it
  • for controlled lists, you specify the given values in an array and the key enum.
  1. The complete specifications for JSON schema can be found on json-schema.org.
  2. If you develop a schema that is supposed to be used by a group of people, the most challenging part can be to have everyone agree on the same expectations.
  3. (Allow some time for questions.)
  4. We will now head back to our roller coaster data. In your collaboration you spend some time in discussing your expectations on the metadata, that is necessary to record with the data. But finally you have created an example JSON object, basically the ideal metadata record for the experiment within your collaboration. To enforce this metadata structure, you want to write a JSON schema. As this can get very lengthy…
  5. you agree to distribute the schema definition among the collaborators.
  6. We will now head back to the breakout rooms in groups and work on challenges 4 and 5. In challenge 4, you will discuss and develop an excerpt of the roller coaster JSON schema.
  7. After that, in Challenge 5, you will be introduced to an online tool that shows how you can benefit from the work you invested into writing the schema.
  8. (Specify the time, when the learners are expected to be back in the lecture setting.)

The following challenges 4 & 5 will be processed consecutively in groups of 4 - 6 learners. In our experience that changing the group composition for these tasks benefits the overall collaborative atmosphere.

Total time: 30 min

Implementation:
As in challenges 2 & 3, we recommend using a pre-structured shared notes document with the groups.

Shared notes:
You can find an example Markdown file for the group handouts HERE. This document is optimized for use in a Hedgedoc document.

Challenge 4: JSON Schema

Time: 20 min

Challenge type: group activity, production

Objective:
By writing a short excerpt of a JSON Schema, the learner gets familiar with the schema syntax, gains the ability to read and understand a schema, and gets to know some important JSON Schema keywords. By writing the schema in a group setup, the learners experience the discussion process as a crucial part of schema development.

Challenge 5: Form Input and Validation with JSON Schema

Time: 10 min

Challenge type: group activity OR individual exploration

Objective:
By downloading and inspecting the final JSON Schema, the learners comprehend the complexity a metadata schema can acquire. With the implementation of the schema in the UI of the react-jsonschema-form playground, the learners bring the developed schema into use, get to know a software tool they can benefit from, and experience relief from the frustration after Challenge 3.

Challenge 4: JSON Schema

After a couple of researchers upload their JSON metadata records to the project repository, it becomes obvious that well-formed JSON metadata describing similar experiments can still be expressed in a myriad of ways.

Your collaboration decides to develop a metadata schema to standardize metadata records across the project. Consensus is encoded in a JSON Schema.

Now it is your task to help with the subschema for experimental conditions!

In the following code block you see valid JSON metadata that specifies experimental conditions as agreed on in the project.

JSON

{
  "experimentalConditions": {
    "ride": {
      "rideType": "roller coaster",
      "rideName": "Flight of the Bat",
      "location": "Gotham City, New Jersey"
    },
    "testPerson": {
      "sex": "male",
      "height": 180
    },
    "recording": {
      "testDevice": "iPhone X",
      "testDeviceFixture": "left upper arm",
      "testApp": "Physics Toolbox Suite by Vieyra Software"
    }
  }
}

In the following code block you see the JSON schema draft for the experimental conditions. Your collaborators already modelled constraints and valid values for ride and testPerson.

Discuss and add constraints to the recording property.

  • testDevice, testDeviceFixture and testApp are mandatory properties for the recording object
  • testDevice value must be one of:
    • iPhone X
    • iPhone 6
    • iPhone 6s
    • other
  • testApp value must be one of:
    • Physics Toolbox Suite by Vieyra Software
    • Bunny Rollercoaster Physics App
  • testDeviceFixture value must be one of:
    • left upper arm
    • right upper arm
    • mouth fixture device
    • other

JSON

{
  "experimentalConditions": {
    "description": "A summary of the experimental conditions. Include sufficient detail to facilitate search and discovery.",
    "type": "object",
    "required": [
      "recording",
      "ride",
      "testPerson"
      ],
    "properties": {
    
      "recording": {
      /* Insert your schema here and delete this comment */
        },
      
      "ride": {
        "description": "Properties of the ride.",
        "type": "object",
        "required": [
          "rideType",
          "rideName"
        ],
        "properties": {
          "rideType": {
            "description": "Ride type.",
            "type": "string",
            "enum": [
              "roller coaster",
              "water slide",
              "bob sled"
            ]
          },
          "rideName": {
            "description": "Official name of the ride.",
            "type": "string",
            "minLength": 3
          },
          "location": {
            "description": "City and State in which ride is located.",
            "type": "string",
            "minLength": 10
          }
        }
      }
    },
    "testPerson": {
      "description": "Properties of person carrying the test device.",
      "type": "object",
      "required": [
        "height",
        "sex"
      ],
      "properties": {
        "height": {
          "description": "Height of test person in cm (SI unit of length).",
          "type": "number",
          "minimum": 120,
          "exclusiveMaximum": 220
        },
        "sex": {
          "description": "Sex of test person.",
          "type": "string",
          "enum": [
            "female",
            "male",
            "non-binary",
            "not disclosed"
          ]
        }
      }
    }
  }
}

JSON

{
    "experimentalConditions": {
        "description": "A summary of the resource. Include sufficient detail to facilitate search and discovery.",
        "type": "object",
        "required": [
            "recording",
            "testObject"
            "testPerson"
            ],
        "properties": {
            
            */ add your schema here /*
            "recording": {
                "description": "",
                "type": "object",
                "required":[
                    "testApp",
                    "testDevice",
                    "testDeviceFixture"
                    ],
                "properties": {
                    "testApp": {
                        "description": "Test app used.",
                        "type": "string",
                        "enum": [
                            "Physics Toolbox Suite by Vieyra Software",
                            "Bunny Rollercoaster Physics App"
                            ]
                        },
                    "testAppVersion": {
                        "description": "Version of test app (free text input). Full semantic versioning input preferred: Major.minor.bugfix",
                        "type": "string",
                        "minLength": 1
                        },
                    "testDevice": {
                        "description": "Test device used.",
                        "type": "string",
                        "enum": [
                            "iPhone X",
                            "iPhone 6",
                            "iPhone 6s",
                            "other"
                            ]
                        },
                    "testDeviceFixture": {
                        "description": "Test device fixture.",
                        "type": "string",
                        "enum": [
                            "left upper arm",
                            "right upper arm",
                            "mouth fixture device",
                            "other"
                            ]
                        }
                    }   
                },
                
            */ this part was prepared by your collaborators /*
            "testObject": {
                "description": "A free text abstract of the experimental setup.",
                "type": "object",
                "required": [
                    "rideType",
                    "rideName"
                    ],
                "properties": {
                
                    "rideType": {
                        "description": "Specification of ride type of the tested object",
                        "type":"string",
                        "enum": [
                            "roller coaster",
                            "water slide",
                            "bob sled"
                            ]
                        },
                        
                    "rideName": {
                        "description": "Official name of the ride.",
                        "type": "string",
                        "minLength": 1
                        },
                        
                    "location": {
                        "description": "City and State in which the ride is located",
                        "type": "string"
                    }
                },
                
            "testPerson": {
                "description": "Information about the subject carrying the test device.",
                "type": "object",
                "required": [
                    "height",
                    "sex"
                    ],
                "properties": {
                
                    "height": {
                        "description": "The height of the test person in cm (SI unit of length).",
                        "type": "number",
                        "minimum": 120,
                        "exclusiveMaximum": 220
                        },
                        
                    "sex": {
                        "description": "The sex of the test person.",
                        "type": "string",
                        "enum": [
                            "female",
                            "male",
                            "not disclosed"
                            ]
                        }
                    }
                }
        }

TASK 5: Form input and validation with JSON schema

Congratulations, you finished your metadata schema! Now, collecting interoperable metadata will be a lot easier in your collaboration.

We must admit: writing a valid JSON metadata record for each and every experiment that you perform is tedious and time consuming. But now that you have a JSON Schema at hand, things will get a lot easier! The project sets up a user-friendly HTML form interface for the input of JSON metadata.

Let’s try this:

  • Download the full JSON schema here: exampleDataObject_Schema.json
  • Inspect the JSON schema briefly.
  • In your browser, go to the react-jsonschema-form playground.
  • Delete the sample content in JSONschema and formData
  • Copy and paste the full schema into the JSONschema box
  • Check again if Chuck Norris properties reappeared in formData results; he can be tough 😄
  • Inspect the form interface thoroughly.
  • Optional: Copy the final JSON object literal in formData in a separate text document and save the file as exampleDataObject.json

Note that the JSON Schema used for this demo lacks the recommended $schema keyword: this is because the playground will unfortunately reject the keyword. You should always follow the best practices when writing a schema, but sometimes some adaptations are needed to make them work in different situations.

Time: 10 min

Implementation:

  • share the react-jsonschema-form playground pre-filled with the final schema and metadata values in your browser OR
  • share the final JSON schema on your screen OR
  • display the final JSON metadata records (exampleDataObject.json) of each group next to each other

Objective:
The learners are aware of the advantages of a metadata schema and metadata record validation, pay attention to interface interpretations in the proposed tool and feel relieve of the frustration experienced after challenge 3 based on the metadata record harmonization.

Leading questions:

  • How does the browser display lists of pre-defined values (specified as enum in the schema)?
  • How are arrays and objects interpreted in the form interface?
  • What happens if you enter an invalid value (e.g. try to enter a string for the test person’s height)
  • What happens if you enter a nonsense value (e.g. try to enter a nonsense string for rideName)
  • How does the web service respond if you click on submit without filling out all the “required” fields?

Plenary result discussion

  • How does the browser display lists of pre-defined values (specified as enum in the schema)?
  • How are arrays and objects interpreted in the form interface?
  • What happens if you enter an invalid value (e.g. try to enter a string for the test persons height)
  • What happens if you enter a nonsense value (e.g. try to enter a nonsense string for rideName)
  • How does the web service respond if you click on submit without filling all the “required” fields?

Slide set: AnnotatingTheExampleData.pdf

Time estimate: 5 min

(Challenge 5 concludes the work with the example dataset. At this point we like to give a brief summary about the metadata annotation process)

  1. Let’s have a last look at our roller coaster data.
  2. Yesterday morning, we started with a cryptic dataset, that was simply not understandable without further information. We were even guessing the meaning of the chosen variable names…
  3. after putting the dataset in context, we did not only know, how to read the data…
  4. but also who recorded the data, the test object and when it was recorded…
  5. and which conditions applied to this particular experiment.
  6. By structuring our metadata records as JSON objects…
  7. we made the information not only findable but also accessible to machines. As we put some effort into developing a schema for the metadata records, we can make sure, that experiments in the same study and collaboration, are recorded and annotated in the same way, which increases its reusability…
  8. and ensures, that metadata records can be validated.
  9. (If you like, recommend some tools that support and facilitate metadata annotation and validation. For more information on the examples in the slides, visit the DirSchema and Metador Github repositories.)
  10. With the metadata schema you have developed, you made sure, that every researcher in the collaboration annotates their data in the same meaningful way. Collaborating in this closed circle was made a lot easier.
  11. But now imagine that you want to analyse published data by some other scientist in your research field. Someone, who is not part of your collaboration…
  12. Wouldn’t it be nice, if this data would be annotated in the same way as yours?! We will dive deeper into community-wide metadata schemas and standards after the break.

We recommend a 15 min break at this point.

Metadata Standards


Callout

A metadata schema can become a standard by governance authority or common adoption.

Slide set: MetadataStandards.pdf

Time estimate: 4 min

  1. We have been discussing metadata schemas to enforce, harmonize and validate metadata records. But you might have heard the term metadata standards. So where is the difference?
  2. Basically, a metadata standard is a metadata schema. A schema can become a standard when it is well-established, endorsed, and widely accepted by its user community.
  3. One of the best-known, generic, and widely used metadata standard for online resources, is the Dublin Core. The Dublin Core was developed by a consortium of researchers, librarians, and web technologists in 1995 during a meeting in Dublin, Ohio, and was born by the need for a unified description of resources on the web. The design of the Dublin Core was inspired by the library cards as they are still used to catalog book resources in physical libraries.
  4. When we are talking about online resources, we talk about any information entity that can be retrieved from the web, such as websites, metadata and data files, images, videos, and so on. In the scientific context, “resource” can also refer to experimental data, protocols, or software code. Have you published anything under your name online? Guess what: in terms of the world wide web, you are a resource, too!
  5. To enhance the identification and findability of these resources, the Dublin Core Initiative has specified a set of 15 metadata elements to describe any type of resource on the Web. These core elements hold information on the creators, format and type, and detailed descriptions of the resource.
  6. Remember the first challenge we approached yesterday? When looking up metadata elements in the <head>-elements of websites…
  7. some properties were given as dc:, such as - in this example - dc:creator, directly stating, that this entry conforms with the creator element of the Dublin Core metadata standard.
  8. The Dublin Core is just one of many implemented and endorsed metadata standards. On the general online resource description, Facebook’s Open Graph and schema.org became industry metadata standards. But, you guessed it, there are multiple community specific and scientific metadata standards, that describe scientific resources in more detail.

Researchers, librarians and web technologists drafted the Dublin Core – a set of 15 library-card-catalog-like metadata elements for the web – in 1995 at a meeting in Dublin, Ohio (USA).6

Dublin Core and its extensions are widely used and referenced today. The Dublin Core Metadata Initiative (DCMI) states to work openly, with a paid-membership model.

The 15 generic Dublin Core metadata elements have been formally standardized for cross-domain resource description in e.g. ISO 15836-1:20177

Depiction of the 15 Dublin Core Elements: Creator, Contributor, Publisher, Title, Date, Language, Format, Subject, Description, Identifier, Relation, Source, Type, Coverage, Rights

Many scholarly repositories expose a standardized application programming interface (API) for the harvesting of Dublin Core metadata as specified in the OAI 2.0 specification

Slide set: Challenge6Introduction.pdf

Time estimate: 1 min

  • In the next challenge, we will introduce you to some online metadata standard registries: FAIRsharing.org, the RDA Metadata Directory and the RDA Metadata Standards Catalog.
  • Head over to one or more of these registries and make yourself familiar with their navigation and content.
  • Look specifically for metadata standards that are associated with your field of research and inspect the information that is provided on the standards.

Time: 25 min

Challenge type: individual exploration, guided confrontation

Objective:
The learners are aware of metadata standard registries and know how to navigate them.

Creator’s recommendation:
After giving the challenge instructions, allow for some time for the learners to explore the registries (ca. 15 min). Start a screen share of one on the metadata registries and navigate to a metadata standard of your liking. Talking points could be: - different representations of metadata standards (JSON / XML schema, RDF, SKOS, OWL) - granularity of representation (e.g. high-level standard vs. application-specific standard) - terms, properties, specifications - accompanying journal publications

Transition to “(Web) Locations and Identifiers”: During the discussion, ask the group of learners, whether someone encountered a 404 Error while exploring a metadata standard. If you are working with a heterogeneous group regarding their scientific background, chances are high, that at least one of the learners had this experience (-> it is not unusual that the development of a metadata standard in a specific research field was concluded and the maintenance of the web representation was terminated at the end of the funding period.)

Challenge 6: Domain specific metadata standards

  1. Open one of these metadata standard registries in your preferred browser:
  1. Search for a metadata schema, standard or vocabulary relevant to your research domain.
  2. Inspect the information provided.

Key Points

  • The WWW was developed in from and for the scientific community to connect researchers worldwide and enable sharing information
  • Metadata schemas serve as template and validation matrix for metadata records
  • JSON Schemas are special JSON object literals describing how other JSON must look like
  • Well-established metadata schemas have the potential to become a (community) standard

  1. The birth of the Web | CERN. (2023, August 11). https://home.cern/science/computing/birth-web↩︎

  2. XML Schema Tutorial. (C) 1999-2022. Refsnes Data, W3Schools. https: //www.w3schools.com/xml/schema_intro.asp↩︎

  3. The arXiv of the future will not look like the arXiv. (n.d.). Ar5iv. https://ar5iv.labs.arxiv.org/html/1709.07020↩︎

  4. XML Schema Tutorial. (C) 1999-2022. Refsnes Data, W3Schools. https: //www.w3schools.com/xml/schema_intro.asp↩︎

  5. Understanding JSON Schema. The basics. © Copyright 2013-2016 Michael Droettboom, Space Telescope Science Institute; Last updated on Feb 07, 2022. https://json-schema.org/understanding-json-schema/basics.html↩︎

  6. Metadata Basics. (2018, December 15). https://www.dublincore.org/resources/metadata-basics/↩︎

  7. ISO 15836-1:2017. (n.d.). ISO. https://www.iso.org/standard/71339.html↩︎