databases-for-lte-data
Last updated on 2024-11-19 | Edit this page
Overview
Questions
- How are databases different from spreadsheets?
- Why are databases useful for managing LTE data
Objectives
- Understand the differences between databases and spreadsheets.
- Explain why databases may be a more useful tool for long-term LTE data management.
Excel is not a database
We commonly use Excel to manage data, but it is important to remember Excel is not a database management system. There are several key differences between how Excel, as a spreadsheet program, manages data compared to a database. How does data storage differ between spreadsheets and databases?
Unit of data
In a spreadsheet the unit of data is a cell, in a database it is a table record. If we create a table in Excel, the value in one column is independent from the value in the neighbouring cells. As a human we can see there is a relationship between cells in a row, but Excel does not share our concept of a table and will allow you to sort one column independently of other columns. Excel does not have a formal table structure, you can draw a table layout, but you can move the table around a worksheet, and place other information around it. In a database a table exists as a formal data structure and its unit of data is a row. A table can contain only data in the columns define for it and nothing else and columns cannot be sorted independently of each other.
Column data types
Database table columns are typed, in excel they are not. This means
in a database table you must define the data type for each column, for
example text
, integer
, numeric
,
date
, or datetime
(the exact datatypes will
vary between different database systems). Therefore a colmnn defined as
integer can only store integer values, it cannot store decimals, text or
dates. In Excel you can add any type of data into a cell. Databases
therefore provide very robust rules for quality controlling your data
compared to Excel.
Databases define formal relationships between tables
A database can define formal relationships between tables using key columns. This is a very powerful feature of relational databases which allow for complex data structures to be defined. An important use of relationships is to reduce data duplication and increase the integrity of the data. For example, the Chitedze dataset has eight different treatments. These treatments can be defined in one small table, and the unique treatment number copied into the yield data table to create a relationship. None of the information about treatments needs to be duplicated in the yield data table, and a rule known as referential integrity means only ID values from the treatments table can be used in the yield data table.
Databases live on database servers
Relational databases are usually hosted on a database server (SQLite and MS Access are two exceptions being file based programs). This brings several security features not available in Excel, including: - multi-user access - backup and recovery - permissioned access - multi-user access - audit logs
Databases are transactional
Every update to a database happens inside a transaction which has a start and an end. This means data is only saved into the database once the transaction successful completes, if something goes wrong, or you forget to finish, the transaction will rollback to the databases original state. This is a critical feature which helps to maintain the health of your data.
Databases are not easy to copy
Compared to Excel, it is very difficult to make multiple copies of a database. With Excel it is very easy to make multiple copies of a file which, unless a robust naming and versioning system is in place, can lead to confusion.
Databases are harder to use
While databases clearly have many advantages over Excel for managing data, this comes at a cost. To use a database effectively you need a good understanding of database design principles and SQL, the language used to interact with a database.
Key Points
- Use
.md
files for episodes when you want static content - Use
.Rmd
files for episodes when you need to generate output - Run
sandpaper::check_lesson()
to identify any issues with your lesson - Run
sandpaper::build_lesson()
to preview your lesson locally