Frictionless Tables - Primary and Foreign Keys

Overview

Teaching: 0 min
Exercises: 0 min

Questions

How can I create or show relationships between tables in my dataset?

Objectives

Identify primary key columns.

Add a foreign key to create a relationship with another table.

So far we have used the Frictionless Data Table Schema to add metadata to the fields in our dataset table. Using the table schema we can also define primary and foreign key relationships between the tables in our dataset, similar to an SQL database.

The primary key is a field which uniquely identifies every record in a table. The foreign key is a field in one table that refers to a primary key in another table.

The following diagram shows how our three tables are related to each other.

Figure Description

Adding Primary Keys

Using the Frictionless Python module we can add a primary key to a table schema. For example in the yields table, plot_no is the unique identifier for each record so we can make this the primary key.

yields_schema.schema.primary_key = "plot_no"
pp.pprint(yields_schema)

We now have plot_no identified as the primary key in the JSON schema for the yields table.

{'encoding': 'utf-8',
 'format': 'csv',
 'hashing': 'md5',
 'name': 'yields',
 'path': 'data/yields.csv',
 'profile': 'tabular-data-resource',
 'schema': {'fields': [{'description': 'A unique identifer for the plot',
                        'name': 'plot_no',
                        'title': 'Plot Number',
                        'type': 'integer'},
                       {'description': 'Institute standard code for a field '
                                       'experiment',
                        'name': 'expt_id',
                        'title': 'Experiment Code',
                        'type': 'string'},
                       {'description': 'Date on which the plot was harvested',
                        'name': 'h_date',
                        'title': 'Harvest Date',
                        'type': 'string'},
                       {'name': 'col_y', 'type': 'integer'},
                       {'name': 'col_x', 'type': 'integer'},
                       {'name': 'variety', 'type': 'string'},
                       {'name': 'grain_weight', 'type': 'number'}],
            'primaryKey': 'plot_no'},
 'scheme': 'file'}

Exercise

Challenge: Add primary keys to the experiments and varieties table schemas.

Using the code for adding a primary key to the yields table as an example add primary keys to the experiments and varieties table schemas. The primary key for the varieties table is variety and for the experiments table is expt_code
Solution
varieties_schema.schema.primary_key = "variety"
experiments_schema.schema.primary_key = "expt_code"

Adding Foreign Keys

With the primary keys defined we can now add foreign keys to the yields table. To add a foreign key we need to pass a JSON string which defines the table and field being referenced.

The JSON specifies the foreign key field for the table and the referenced table and its primary key field using the following syntax:

{
    "fields": "FOREIGN-KEY-FIELD-NAME",
    "reference": {
        "resource": "REFERENCED-TABLE-NAME",
        "fields": "REFERENCED-TABLE-PRIMARY-KEY-NAME" 
    }
} 

Exercise

Challenge: Add foreign keys to the yields table schema.

Complete the following code to make variety and expt_code foreign keys in the yields table schema. Remember, variety reference the varieties table schema and expt_code references the experiments table schema. Note we add the keys to the schema as an array.

f_keys = []
f_keys.append({
  "fields": "variety",
  "reference": {
      "resource": "varieties",        
      "fields": "variety"
  }            
})
f_keys.append({
  "fields": "______",
  "reference": {
      "resource": "______",        
      "fields": "______"
  }            
})
yields_schema.schema.foreign_keys = f_keys
pp.pprint(yields_schema)

Solution

f_keys = []
f_keys.append({
  "fields": "variety",
  "reference": {
      "resource": "varieties",        
      "fields": "variety"
  }            
})
f_keys.append({
  "fields": "expt_code",
  "reference": {
      "resource": "experiments",        
      "fields": "expt_code"
  }            
})
yields_schema.schema.foreign_keys = f_keys
pp.pprint(yields_schema)

{'encoding': 'utf-8',
'format': 'csv',
'hashing': 'md5',
'name': 'yields',
'path': 'data/yields.csv',
'profile': 'tabular-data-resource',
'schema': {'fields': [{'description': 'A unique identifer for the plot',
                       'name': 'plot_no',
                       'title': 'Plot Number',
                       'type': 'integer'},
                      {'description': 'Institute standard code for a field '
                                      'experiment',
                       'name': 'expt_id',
                       'title': 'Experiment Code',
                       'type': 'string'},
                      {'description': 'Date on which the plot was harvested',
                       'name': 'h_date',
                       'title': 'Harvest Date',
                       'type': 'string'},
                      {'name': 'col_y', 'type': 'integer'},
                      {'name': 'col_x', 'type': 'integer'},
                      {'name': 'variety', 'type': 'string'},
                      {'name': 'grain_weight', 'type': 'number'}],
           'foreignKeys': [{'fields': 'variety',
                            'reference': {'fields': 'variety',
                                          'resource': 'varieties'}},
                           {'fields': 'expt_code',
                            'reference': {'fields': 'expt_code',
                                          'resource': 'experiments'}}],
            'primaryKey': 'plot_no'},
'scheme': 'file'}

Key Points

Frictionless allows you to define table fields as primary keys and foreign keys and create relationships between them

previous episode

Frictionless Data for Agricultural Research

next episode

Frictionless Tables - Primary and Foreign Keys

Overview

Adding Primary Keys

Exercise

Solution

Adding Foreign Keys

Exercise

Solution

Key Points

previous episode

next episode