Analyzing Patient Data
Last updated on 2026-01-27 | Edit this page
Overview
Questions
- How can I process tabular data files in Julia?
Objectives
- Explain what a package is and what libraries are used for.
- Import a package and use the functions it contains.
- Read tabular data from a file into a program.
- Select individual values and subsections from data.
- Perform operations on arrays of data.
Loading data into Julia
To begin processing the clinical trial inflammation data, we need to
load it into Julia. Depending on the file format we have to use
different packages. Some examples are XLSX.jl or JSON3.jl. In this
example we work with a CSV File. That means we use the
package CSV.jl
Before we can use a package in Julia, we need to install it. This can
be done either by entering the package mode in the Julia REPL or by
using Pkg.add("PackageName"), for example inside a
script.
To enter the package manager mode, press ] in the Julia
REPL:
Then you can add a package
Alternatively, to add a package inside a script use
After installing the package, you still need to load it before using its functionality:
Besides CSV.jl, we also need DataFrames.jl.
You can install it the same way — give it a try!
After installing both packages, we can read the data file like this:
OUTPUT
59×40 DataFrame
Row │ 0 0_1 1 3 1_1 2 4 7 8 3_1 3 ⋯
│ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 I ⋯
─────┼──────────────────────────────────────────────────────────────────────────
1 │ 0 1 2 1 2 1 3 2 2 6 ⋯
2 │ 0 1 1 3 3 2 6 2 5 9
3 │ 0 0 2 0 4 2 2 1 6 7
4 │ 0 1 1 3 3 1 3 5 2 4
5 │ 0 0 1 2 2 4 2 1 6 4 ⋯
6 │ 0 0 2 2 4 2 2 5 5 8
7 │ 0 0 1 2 3 1 2 3 5 3
8 │ 0 0 0 3 1 5 6 5 5 8
⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱
53 │ 0 0 2 1 1 4 4 7 2 9 ⋯
54 │ 0 1 2 1 1 4 5 4 4 5
55 │ 0 0 1 3 2 3 6 4 5 7
56 │ 0 1 1 2 2 5 1 7 4 2
57 │ 0 1 1 1 4 1 6 4 6 3 ⋯
58 │ 0 0 0 1 4 5 6 3 8 7
59 │ 0 0 1 0 3 2 5 4 8 2
30 columns and 44 rows omitted
If we want to check that the data loaded correctly, we can just print it:
Or view the first few rows using:
We can check the type of object we’ve created:
OUTPUT
DataFrame
To see how many rows and columns the data contains, we can use:
OUTPUT
(59,40)
We can also get just the number of rows or columns:
OUTPUT
59
40
Accessing Elements
In Julia, you can access data in a DataFrame by column,
by name, or by specifying row and column indices.
Accessing a Single Column
You can access a single column by its position (column number) or its name:
The ! means you’re accessing the actual data — a
view, not a copy.
Important: df[!, 1] gives you a view
into the DataFrame. If you modify this vector, it will also change the
original DataFrame. Use df[:, 1] instead if you want a
copy of the data.
Slicing data
An index like [30, 20] selects a single element of an array, but we can select whole sections as well. For example, we can select the first ten columns of values for the first four patients (rows) like this:
OUTPUT
4×10 DataFrame
Row │ 0 0_1 1 3 1_1 2 4 7 8 3_1
│ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────
1 │ 0 1 2 1 2 1 3 2 2 6
2 │ 0 1 1 3 3 2 6 2 5 9
3 │ 0 0 2 0 4 2 2 1 6 7
4 │ 0 1 1 3 3 1 3 5 2 4
The slice 1:4 means, “Start at index 1 and go up to and
including index 4”. Julia uses 1-based indexing, so
indices start at 1.
We don’t have to start slices at 1:
OUTPUT
5×10 DataFrame
Row │ 0 0_1 1 3 1_1 2 4 7 8 3_1
│ Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────
1 │ 0 0 2 2 4 2 2 5 5 8
2 │ 0 0 1 2 3 1 2 3 5 3
3 │ 0 0 0 3 1 5 6 5 5 8
4 │ 0 1 1 2 1 3 5 3 5 8
5 │ 0 1 0 0 4 3 3 5 5 4
We can also use :end to select everything from a certain
position up to the last element. If we use : on its own, it
includes everything:
This selects rows 1 through 3 and columns 37 through to the end of the array.
OUTPUT
3×4 DataFrame
Row │ 2_1 3_5 0_2 0_3
│ Int64 Int64 Int64 Int64
─────┼────────────────────────────
1 │ 1 1 0 1
2 │ 2 2 1 1
3 │ 2 3 2 1
Analyzing Data
Julia provides powerful tools for analyzing data stored in a
DataFrame. To calculate the average
inflammation for all patients on all
days, we can use the mean function from the
Statistics standard library.
Tip: Make sure to install the required packages first
Then load the packages:
OUTPUT
6.160593220338983
Here’s what’s happening:
-
Matrix(df)converts the DataFrame to a regular array of numbers. -
mean(...)calculates the average of all the values.
Descriptive Statistics in Julia
Let’s use three Julia functions to get some basic descriptive statistics from our dataset: maximum, minimum, and standard deviation.
We can use multiple assignment to store all the results in one line.
JULIA
maxval, minval, stdval = maximum(Matrix(df)), minimum(Matrix(df)), std(Matrix(df))
println("maximum inflammation: ", maxval)
println("minimum inflammation: ", minval)
println("standard deviation: ", stdval)
maximum inflammation: 20
minimum inflammation: 0
standard deviation: 4.625075651890539
Exploring Functions in Julia
How can you find out what functions are available in a Julia module and how to use them?
Julia provides several ways to explore functions and get help:
-
To list functions and names in a module, use the
names()function. For example: -
To get detailed documentation on a function, use the help mode by typing a question mark
?before the function name in the REPL or Jupyter notebook:This will show you the official help text for
mean. -
To see all methods and signatures of a function, use:
When analyzing data, we often want to find values like the maximum inflammation per patient or the average inflammation per day.
One way is to first select the data for a single patient (row), then apply a function to that data:
JULIA
# Select data for patient 1
patient_1 = df[1, :]
println("maximum inflammation for patient 1: ", maximum(patient_1))
OUTPUT
maximum inflammation for patient 1: 18
We don’t need to store the row separately — we can combine selecting the data and applying the function in one step:
OUTPUT
maximum inflammation for patient 3: 17
It is much easier to work with an array than with a DataFrame for many numerical operations. To convert a DataFrame to an array, use:
What if we want the maximum inflammation for each patient across all days (i.e., row-wise maximum), or the average inflammation for each day across all patients (i.e., column-wise average)?
In Julia, many functions accept a dims keyword argument
that specifies the dimension (axis) to operate on:
-
dims=1means operate across rows, producing one result per column. -
dims=2means operate across columns, producing one result per row.
For example, to find the average inflammation per day (i.e., average across all patients for each day — column-wise):
OUTPUT
[0.0 0.4576271186440678 1.11864406779661 1.728813559322034 ...]
To check the shape:
To get the average inflammation per patient:
OUTPUT
[5.45; 5.425; 6.1; 5.9; 5.55; ...]
Slicing Strings
A section of an array is called a slice. We can take slices of character strings as well:
JULIA
element = "oxygen"
println("first three characters: ", element[1:3])
println("last three characters: ", element[4:6])
OUTPUT
first three characters: oxy
last three characters: gen
What is the value of element[1:4]? What about
element[5:end]? Or element[:]? What is
element[end]? What is element[end-1]?
OUTPUT
oxyg
en
oxygen
n
e
Thin Slices
The expression element[4:3] (a range where the start is
greater than the end) produces an empty string in Julia, a string
that contains no characters.
If data is an array what does
data[4:3, 5:4] produce? What about
data[4:4, :]?
Stacking Arrays
Arrays can be concatenated and stacked on top of one another in Julia using square bracket syntax.
OUTPUT
3×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
3×6 Matrix{Int64}:
1 2 3 1 2 3
4 5 6 4 5 6
7 8 9 7 8 9
6×3 Matrix{Int64}:
1 2 3
4 5 6
7 8 9
1 2 3
4 5 6
7 8 9
Write additional code that slices the first and last
columns of A, and stacks them side by side into a
3×2 array, using only square bracket
syntax.
Change in Inflammation
The patient data is longitudinal — each row represents a series of measurements for one patient over time. That means calculating the change in inflammation over consecutive days is meaningful.
Here’s how you can explore those changes:
OUTPUT
7-element Vector{Int64}:
0
1
1
3
3
1
3
To compute changes day by day, we use diff:
OUTPUT
[1, 0, 2, 0, -2, 2]
- Set
dims=2indiff(data; dims=2)to compute differences along each row (across days for each patient). - If your data is shaped
(60, 40), the result ofdiffwill be(60, 39)— one fewer column, because differences are between pairs of adjacent days. - To get the largest magnitude change for each
patient, combine
diff,abs., andmaximum, again usingdims=2:
This returns a 60×1 array, where each entry is the maximum absolute change in inflammation for that patient.
- Use
Pkg.add("PackageName")to install andusing PackageNameto load packages in Julia. - Load CSV data into a DataFrame with
CSV.read("file.csv", DataFrame). - Use
df[row, column]to access specific values; usedf[!, column]to access entire columns. - Use
size(df),nrow(df), andncol(df)to inspect DataFrame dimensions. - Convert a DataFrame to a matrix using
Matrix(df)for numerical operations. - Use
mean,maximum,minimum, andstdto compute statistics on data arrays. - Use
mean(data, dims=1)for column-wise anddims=2for row-wise operations. - Use
diff(data; dims=2)to calculate daily changes per patient.