Stan extensions
Last updated on 2024-11-19 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- Which packages take advantage of Stan and how to use them?
Objectives
- Learn to use Stan with additional R packages
In this chapter, we will introduce packages that take advantage of
Stan. The covered packages are loo
, which enables
approximate Bayesian cross-validation, bayesplot
, which
contains plotting tools, and brms
, which allows calling
Stan using common R syntax, without having to write the Stan code.
loo
The loo
package allows computing approximate
leave-one-out cross-validation (loo-cv) for models fitted with Stan. The
approximation is based on something called Pareto smoothed importance
sampling (PSIS) [1]. The package can also be used for computing WAIC and
model weights for average predictive distributions.
Example 1
We will demonstrate loo
package usage on the model
comparison example studied in Episode 5. We will fit the normal and
Cauchy models on the same synthetic data, then use the tools provided in
loo
to compute and compare the approximate loo-cv scores
for these two models.
To be able to utilize the package functions, we need to add a
log-likelihood computation in the Stan code, in the generated quantities
block. The object containing the log-likelihood needs to be named
log_lik
so the ‘loo’ functions can find it. Below, we
demonstrate this with the two models we are comparing.
STAN
// Normal model
data {
int<lower=0> N;
vector[N] X;
}
parameters {
real<lower=0> sigma;
real mu;
}
model {
X ~ normal(mu, sigma);
mu ~ normal(0, 1);
sigma ~ gamma(2, 1);
}
generated quantities {
vector[N] X_rep;
for(i in 1:N) {
X_rep[i] = normal_rng(mu, sigma);
}
// Calculating log-likelihood for loo
vector[N] log_lik;
for (i in 1:N) {
log_lik[i] = normal_lpdf(X[i] | mu, sigma);
}
}
STAN
// Cauchy model
data {
int<lower=0> N;
vector[N] X;
}
parameters {
// Scale
real<lower=0> sigma;
// location
real mu;
}
model {
// location = mu and scale = sigma
X ~ cauchy(mu, sigma);
mu ~ normal(0, 1);
sigma ~ gamma(2, 1);
}
generated quantities {
vector[N] X_rep;
for(i in 1:N) {
X_rep[i] = cauchy_rng(mu, sigma);
}
// Calculating log-likelihood for loo
vector[N] log_lik;
for (i in 1:N) {
log_lik[i] = cauchy_lpdf(X[i] | mu, sigma);
}
}
Now we can fit the models in the usual way.
R
# Fit normal model
normal_fit <- rstan::sampling(normal_model_loo,
list(N = N, X = df5$X),
refresh = 0, seed = 2024)
# Fit cauchy model
cauchy_fit <- rstan::sampling(cauchy_model_loo,
list(N = N, X = df5$X),
refresh = 0, seed = 2024)
We can now compute PSIS-LOO for both of the models with
loo::loo
function. After the calling the function,
information about the fit can be viewed by printing the loo
objects.
R
# PSIS-LOO computation for normal model
normal_loo <- loo::loo(normal_fit)
WARNING
Warning: Some Pareto k diagnostic values are too high. See help('pareto-k-diagnostic') for details.
R
print(normal_loo)
OUTPUT
Computed from 4000 by 88 log-likelihood matrix.
Estimate SE
elpd_loo -288.8 41.6
p_loo 18.5 17.6
looic 577.7 83.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.8, 0.9]).
Pareto k diagnostic values:
Count Pct. Min. ESS
(-Inf, 0.7] (good) 87 98.9% 2167
(0.7, 1] (bad) 0 0.0% <NA>
(1, Inf) (very bad) 1 1.1% <NA>
See help('pareto-k-diagnostic') for details.
R
# PSIS-LOO computation for cauchy model
cauchy_loo <- loo::loo(cauchy_fit)
print(cauchy_loo)
OUTPUT
Computed from 4000 by 88 log-likelihood matrix.
Estimate SE
elpd_loo -206.9 14.7
p_loo 2.0 0.0
looic 413.8 29.3
------
MCSE of elpd_loo is 0.0.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.7, 0.8]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Running print
returns \(\widehat{\text{elpd}}_{\text{loo}}\)
(expected log pointwise predictive density), \(\hat{p}_{loo}\) (estimated number of
parameters) and \(\text{looic}\) (LOO
information criterion) values and their standard errors. It also returns
a table with the Pareto \(k\)
diagnostic values, which are used to asses the reliability of the
estimates. Values below 1 are required for reliable PSIS estimates.
Model comparison can be done by using the
loo::loo_compare
function on the loo
objects.
The comparison is based on the models’ elpd values.
R
# Comparing models based on loo
loo::loo_compare(normal_loo, cauchy_loo)
OUTPUT
elpd_diff se_diff
model2 0.0 0.0
model1 -81.9 36.2
The comparison shows that the elpd difference is larger than the standard error, indicating that the cauchy model is expected to have better predictive performance than the normal model. This is in line with what we saw in chapter 5: the Cauchy distribution is a superior model for the data.
Challenge
loo
can also be used to compute WAIC for Bayesian
models. Calculate WAIC for the two models and then compare them.
First we need to extract the log-likelihood values from the fitted model object.
R
# Extracting loglik
normal_loglik <- loo::extract_log_lik(normal_fit)
cauchy_loglik <- loo::extract_log_lik(cauchy_fit)
# Computing WAIC for the models
normal_waic <- loo::waic(normal_loglik)
WARNING
Warning:
1 (1.1%) p_waic estimates greater than 0.4. We recommend trying loo instead.
R
print(normal_waic)
OUTPUT
Computed from 4000 by 88 log-likelihood matrix.
Estimate SE
elpd_waic -290.0 42.8
p_waic 19.7 18.8
waic 580.1 85.6
1 (1.1%) p_waic estimates greater than 0.4. We recommend trying loo instead.
R
cauchy_waic <- loo::waic(cauchy_loglik)
print(cauchy_waic)
OUTPUT
Computed from 4000 by 88 log-likelihood matrix.
Estimate SE
elpd_waic -206.9 14.7
p_waic 2.0 0.0
waic 413.8 29.3
Computing WAIC for the model return values for \(\widehat{\text{eldp}}_{\text{WAIC}}\), \(\hat{p}_{\text{WAIC}}\) and \(\widehat{\text{WAIC}}\). Models can be compared based on WAIC using the same function as with PSIS-LOO.
R
# Comparing models based on WAIC
loo::loo_compare(normal_waic, cauchy_waic)
OUTPUT
elpd_diff se_diff
model2 0.0 0.0
model1 -83.1 37.4
bayesplot
Next, we will look at the the bayesplot
R package. The
package provides a library of plotting functions for fitted Stan models.
The created plots are ggplot
objects, meaning that the
plots can be customized with the functions from ggplot2
package. The package enables plotting posterior draws, visual MCMC
diagnostics and graphical posterior and prior predictive checking. The
functions of the package also work with model fit with the popular
packages brms
and rstanarm
.
Example 1 continued
We will demonstrate using bayesplot
with the Cauchy
model used in the first example. First, we need to extract the posterior
draws. Then, we will plot uncertainty intervals for \(\mu\) and \(\sigma\).
R
# Extracting draws
cauchy_draws <- as.array(cauchy_fit)
# Plotting uncertainty intervals
bayesplot::mcmc_intervals(cauchy_draws, pars = c("mu", "sigma"))
Alternatively, we can plot the (marginal) posterior sample histograms or densities with credible intervals as shaded areas as follows:
R
# Plotting estimated density curves
bayesplot::mcmc_areas(cauchy_draws, pars = c("mu", "sigma"),
prob = 0.95,
point_est = "mean")
R
# Plotting histogram
bayesplot::mcmc_hist(cauchy_draws, pars = c("mu", "sigma"))
bayesplot
also provides functions for assessing MCMC
convergence and visualizing fit diagnostics. For example, we can
generate trace plots for the chains:
R
# Plotting trace plot
bayesplot::mcmc_trace(cauchy_draws, pars = c("mu", "sigma"),
facet_args = list(ncol = 1))
Challenge
Perform a graphical posterior predictive checks with
bayesplot
. Using the Cauchy model fit generated above, plot
the density of \(X_{rep}\) samples
overlaid with the density of \(X\).
Alternatively, you can plot the corresponding histograms.
R
# Extracting replicates and getting a subset
set.seed(2024)
X_rep <- rstan::extract(cauchy_fit, "X_rep")[[1]] %>%
data.frame() %>%
mutate(sample = 1:nrow(.))
N_rep <- 9
X_rep_sub <- X_rep %>% filter(sample %in%
sample(X_rep$sample,
N_rep,
replace = FALSE))
X_rep_sub <- X_rep_sub[, -89] %>%
as.matrix()
R
# Plot density
# Limit x range for better illustration
bayesplot::ppc_dens_overlay(y = df5$X, yrep = X_rep_sub) + xlim(-25, 50)
R
# Plot histograms
bayesplot::ppc_hist(y = df5$X, yrep = X_rep_sub) + xlim(-25,50)
brms
R package
We will now introduce the brms
R package. The package
allows fitting probabilistic generalized (non-)linear models with Stan.
A large range of distributions and link functions are supported, in
addition to multilevel structure. Moreover, several built-in functions
are available for priors.
Models are specified using familiar R formula syntax, input into an R function which compiles and calls the Stan model in the backend.
The package also provides tools for evaluating the fit and MCMC
convergence. These tools, in turn, use functions from the
loo
and bayesplot
packages, that is, many of
the same tools we covered earlier in this Episode.
Next, we will demonstrate usage of the package with two different examples.
Example 2: Survival modeling
In this example, we will demonstrate fitting a Cox proportional
hazard model with brms
. However, first, we will briefly
describe and model and idea in survival modeling.
The Cox model is a standard approach used in survival modeling, in which the outcome of interest is the time to some event. A common application is medical studies where patients are followed in time until an event (e.g. death) or until censoring. A subject is censored if the event doesn’t occur during the follow-up.
An important ingredient in survival modeling is the hazard function, representing the instantaneous risk for an event at time \(t\), defined as \(\lambda(t)=\text{lim}_{h \to 0+} \frac{1}{h}P(t \le T<t+h|T\ge t)\). In the Cox model, the hazard function is of the form \(\lambda(t_i,Z_i,\theta)=\lambda_0(t_i)\text{exp}(\beta^\prime Z_i)\).
The baseline hazard function \(\lambda_0(t_i)\) represents the hazard when the covariates are set to their baselines, and is the same for each subject \(i\). Commonly, the functional form the baseline hazard is not specified. The second part of the hazard function contains subject-specific covariates, \(\text{exp}(\beta^\prime Z_i)\).
The exponentials of the effects \(\beta\) are called hazard ratios, which measure the hazard in one group against the hazard in another group.
When fitting the Cox model, brms
uses M-splines for the
baseline hazard. Splines are functions built from piecewise-defined
polynomials. In other words, the baseline hazard is a combination of
several different polynomial functions. M-splines are non-negative
spline functions, which is important for reasons we omit. However,
hopefully, the reader can appreciate the simplicity of the upcoming
brms
function call.
Before fitting the model, we will take a look at the
lung
dataset from the survival
R package,
which we will be analyzing below. The dataset consists of survival times
of patients with advanced lung cancer including some clinical
covariates.
R
# Get data
lung <- survival::lung
# Take a peek
head(lung)
OUTPUT
inst time status age sex ph.ecog ph.karno pat.karno meal.cal wt.loss
1 3 306 2 74 1 1 90 100 1175 NA
2 3 455 2 68 1 0 90 90 1225 15
3 3 1010 1 56 1 0 90 90 NA 15
4 5 210 2 57 1 1 90 60 1150 11
5 1 883 2 60 1 0 100 90 NA 0
6 12 1022 1 74 1 1 50 80 513 0
The variable status
denotes if an event (death) was
observed or if the subject was censored. We will use three covariates:
age
, sex
and ph.karno
. The
variable ph.karno
describes how well a patient can perform
daily activities rated by a physician. We will split the variable into
two categories “high” and “low.”
Cox model can be fit with brms::brm()
function by
specifying family = brmsfamily("cox")
. Censored data points
are indicated with the cens(1 - status)
argument. We will
use a standard \(\text{Normal}(0, 10)\)
prior for the population-level effects, with the argument
prior(normal(0,10), class = b)
. The option
class = b
sets the prior for all population-level
effects.
R
# Let's change status coding from 2/1 to 1/0
lung$status <- lung$status - 1
# Remove observations with NA ph.karno
lung <- lung[!is.na(lung$ph.karno),]
# Creating new variable for ph.karno status
lung$ph.karno_status <- cut(lung$ph.karno,
breaks = c(0, 70, 100),
labels = c("low", "high"))
# Fitting the model
fit_cox <- brms::brm(time | cens(1 - status) ~ sex + age + ph.karno_status,
data = lung, family = brmsfamily("cox"), seed = 2024,
silent = 2, refresh = 0, cores = 4,
prior = prior(normal(0,10), class = b))
# Summary of the fit
summary(fit_cox)
OUTPUT
Family: cox
Links: mu = log
Formula: time | cens(1 - status) ~ sex + age + ph.karno_status
Data: lung (Number of observations: 227)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 1.27 0.69 -0.15 2.58 1.00 5012 2490
sex -0.51 0.17 -0.83 -0.19 1.00 4789 2922
age 0.01 0.01 -0.01 0.03 1.00 5207 2779
ph.karno_statushigh -0.36 0.18 -0.72 -0.01 1.00 4534 2669
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The summary output of the brms
fit prints coefficient
estimates, and also returns Rhat, Bulk_ESS and Tail_ESS values, which
can be used to assess the convergence of the model.
It is important to notice that the coefficients are the log hazard
ratios, which means we still need to exponentiate them. The
bayesplot::mcmc_intervals()
function allows transforming
the parameters before plotting with transform = "exp"
argument.
R
# Get hazard values
sum_cox <- summary(fit_cox)
exp(sum_cox$fixed[,1:4])
OUTPUT
Estimate Est.Error l-95% CI u-95% CI
Intercept 3.5486294 1.985898 0.8641319 13.1790893
sex 0.6031900 1.179877 0.4380762 0.8288425
age 1.0131344 1.009490 0.9948428 1.0323407
ph.karno_statushigh 0.6964744 1.194624 0.4884050 0.9913700
R
# Credible intervals
bayesplot::mcmc_intervals(fit_cox,
pars = c("b_sex", "b_age", "b_ph.karno_statushigh"),
transform = "exp")
Based on the estimates, it seems that age has only a minor effect on
the hazard. Female sex and being “high” in ph.karno
imply
smaller hazards, meaning that these factors are protective.
After fitting the model, we can print information about the priors
used with the function brms::get_prior
.
R
# Get priors for the cox model
brms::get_prior(fit_cox)
OUTPUT
prior class coef group resp dpar nlpar lb
normal(0, 10) b
normal(0, 10) b age
normal(0, 10) b ph.karno_statushigh
normal(0, 10) b sex
student_t(3, 5.6, 2.5) Intercept
dirichlet(1) sbhaz
ub source
user
(vectorized)
(vectorized)
(vectorized)
default
default
The population-level effects have the normal prior we specified. In
brms
, the default prior for the intercept is Student’s
t-distribution with three degrees of freedom. The Stan program
brms
ran under the hood can be printed with the
brms::stancode
function.
R
# Print the Stan code
brms::stancode(fit_cox)
OUTPUT
// generated with brms 2.21.0
functions {
/* distribution functions of the Cox proportional hazards model
* parameterize hazard(t) = baseline(t) * mu
* so that higher values of 'mu' imply lower survival times
* Args:
* y: the response value; currently ignored as the relevant
* information is passed via 'bhaz' and 'cbhaz'
* mu: positive location parameter
* bhaz: baseline hazard
* cbhaz: cumulative baseline hazard
*/
real cox_lhaz(real y, real mu, real bhaz, real cbhaz) {
return log(bhaz) + log(mu);
}
real cox_lccdf(real y, real mu, real bhaz, real cbhaz) {
// equivalent to the log survival function
return - cbhaz * mu;
}
real cox_lcdf(real y, real mu, real bhaz, real cbhaz) {
return log1m_exp(cox_lccdf(y | mu, bhaz, cbhaz));
}
real cox_lpdf(real y, real mu, real bhaz, real cbhaz) {
return cox_lhaz(y, mu, bhaz, cbhaz) + cox_lccdf(y | mu, bhaz, cbhaz);
}
// Distribution functions of the Cox model in log parameterization
real cox_log_lhaz(real y, real log_mu, real bhaz, real cbhaz) {
return log(bhaz) + log_mu;
}
real cox_log_lccdf(real y, real log_mu, real bhaz, real cbhaz) {
return - cbhaz * exp(log_mu);
}
real cox_log_lcdf(real y, real log_mu, real bhaz, real cbhaz) {
return log1m_exp(cox_log_lccdf(y | log_mu, bhaz, cbhaz));
}
real cox_log_lpdf(real y, real log_mu, real bhaz, real cbhaz) {
return cox_log_lhaz(y, log_mu, bhaz, cbhaz) +
cox_log_lccdf(y | log_mu, bhaz, cbhaz);
}
}
data {
int<lower=1> N; // total number of observations
vector[N] Y; // response variable
array[N] int<lower=-1,upper=2> cens; // indicates censoring
int<lower=1> K; // number of population-level effects
matrix[N, K] X; // population-level design matrix
int<lower=1> Kc; // number of population-level effects after centering
// data for flexible baseline functions
int Kbhaz; // number of basis functions
// design matrix of the baseline function
matrix[N, Kbhaz] Zbhaz;
// design matrix of the cumulative baseline function
matrix[N, Kbhaz] Zcbhaz;
// a-priori concentration vector of baseline coefficients
vector<lower=0>[Kbhaz] con_sbhaz;
int prior_only; // should the likelihood be ignored?
}
transformed data {
matrix[N, Kc] Xc; // centered version of X without an intercept
vector[Kc] means_X; // column means of X before centering
for (i in 2:K) {
means_X[i - 1] = mean(X[, i]);
Xc[, i - 1] = X[, i] - means_X[i - 1];
}
}
parameters {
vector[Kc] b; // regression coefficients
real Intercept; // temporary intercept for centered predictors
simplex[Kbhaz] sbhaz; // baseline coefficients
}
transformed parameters {
real lprior = 0; // prior contributions to the log posterior
lprior += normal_lpdf(b | 0, 10);
lprior += student_t_lpdf(Intercept | 3, 5.6, 2.5);
lprior += dirichlet_lpdf(sbhaz | con_sbhaz);
}
model {
// likelihood including constants
if (!prior_only) {
// compute values of baseline function
vector[N] bhaz = Zbhaz * sbhaz;
// compute values of cumulative baseline function
vector[N] cbhaz = Zcbhaz * sbhaz;
// initialize linear predictor term
vector[N] mu = rep_vector(0.0, N);
mu += Intercept + Xc * b;
for (n in 1:N) {
// special treatment of censored data
if (cens[n] == 0) {
target += cox_log_lpdf(Y[n] | mu[n], bhaz[n], cbhaz[n]);
} else if (cens[n] == 1) {
target += cox_log_lccdf(Y[n] | mu[n], bhaz[n], cbhaz[n]);
} else if (cens[n] == -1) {
target += cox_log_lcdf(Y[n] | mu[n], bhaz[n], cbhaz[n]);
}
}
}
// priors including constants
target += lprior;
}
generated quantities {
// actual population-level intercept
real b_Intercept = Intercept - dot_product(means_X, b);
}
Example 3: Hierarchical binomial model
We will now demonstrate one of the key focuses of brms
,
fitting hierarchical models. The syntax for specifying hierarchical
models is similar as in the lme4
package, which is used to
fit frequentist multilevel models in R.
For this example, we will be using is the VerbAgg
data
from lme4
package. The data consist of item responses to a
questionnaire on verbal aggression.
R
# Get data
VerbAgg <- lme4::VerbAgg
head(VerbAgg)
OUTPUT
Anger Gender item resp id btype situ mode r2
1 20 M S1WantCurse no 1 curse other want N
2 11 M S1WantCurse no 2 curse other want N
3 17 F S1WantCurse perhaps 3 curse other want Y
4 21 F S1WantCurse perhaps 4 curse other want Y
5 17 F S1WantCurse perhaps 5 curse other want Y
6 21 F S1WantCurse yes 6 curse other want Y
We will estimate population-level effects for Anger, Gender, btype
and situ, and includea group-level intercept for id. The variable of
interest is the binary r2, which contains the response to an question in
the questionnaire. We will use \(\text{Normal}(0, 10)\) as the prior for all
the population-level effects. For the standard deviation of group-level
effect we will set a (half-)\(\text{Cauchy}(0,
5)\) prior. By default, brms
uses half-Student’s
t-distribution with three degrees of freedom for standard deviation
parameters. The group-level intercept for variable id is specified with
the argument (1|id)
. Let’s now fit the model.
R
# Change coding for r2
VerbAgg <- VerbAgg %>%
mutate(r2 = ifelse(r2 == "N", 0, 1))
# Fit model
fit_hier <- brms::brm(r2 ~ Anger + Gender + btype + situ + (1|id),
family = bernoulli,
data = VerbAgg,
seed = 2024, cores = 4, silent = 2, refresh = 0,
prior = prior(normal(0, 10), class = b) +
prior(cauchy(0,5), class = sd))
# Summary
summary(fit_hier)
OUTPUT
Family: bernoulli
Links: mu = logit
Formula: r2 ~ Anger + Gender + btype + situ + (1 | id)
Data: VerbAgg (Number of observations: 7584)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~id (Number of levels: 316)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.30 0.07 1.17 1.43 1.00 1086 1687
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.21 0.34 -0.49 0.87 1.01 637 1035
Anger 0.05 0.02 0.02 0.09 1.01 664 1026
GenderM 0.31 0.19 -0.05 0.68 1.00 580 1257
btypescold -1.03 0.07 -1.17 -0.90 1.00 4963 3254
btypeshout -2.00 0.07 -2.14 -1.85 1.00 4867 3286
situself -1.01 0.06 -1.12 -0.89 1.00 5877 3022
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
The conditional effects of the predictors can easily be plotted with
the function brms::conditional_effects
.
R
# Conditional effects
plots <- plot(conditional_effects(fit_hier), plot = FALSE)
cowplot::plot_grid(plots[[1]], plots[[2]], plots[[3]], plots[[4]])
The function can also plot variable interactions. Let’s plot the conditional effect for interaction between Anger and btype.
R
# Plot conditional effect for interaction of Anger and btype
plot(conditional_effects(fit_hier, effects = "Anger:btype"))
Let us now do a slight alteration in model and add another
group-level intercept for the item variable. The priors are same as in
the first model. The update
function can be used to modify
the formula without writing it anew in its entirety.
R
# Update model
fit_hier2 <- update(fit_hier, formula. = ~ . + (1|item), newdata = VerbAgg, seed = 2024,
cores = 4, silent = 2, refresh = 0)
# Summary
summary(fit_hier2)
OUTPUT
Family: bernoulli
Links: mu = logit
Formula: r2 ~ Anger + Gender + btype + situ + (1 | id) + (1 | item)
Data: VerbAgg (Number of observations: 7584)
Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
total post-warmup draws = 4000
Multilevel Hyperparameters:
~id (Number of levels: 316)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 1.36 0.07 1.23 1.51 1.00 1347 1922
~item (Number of levels: 24)
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept) 0.59 0.11 0.42 0.84 1.00 1404 2252
Regression Coefficients:
Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept 0.18 0.44 -0.65 1.05 1.01 831 1522
Anger 0.06 0.02 0.02 0.09 1.00 801 1568
GenderM 0.32 0.20 -0.07 0.72 1.00 847 1333
btypescold -1.06 0.30 -1.63 -0.43 1.00 1191 1561
btypeshout -2.11 0.30 -2.71 -1.52 1.00 1284 1889
situself -1.05 0.25 -1.54 -0.55 1.00 1156 1828
Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).
Another useful aspect of update
is that it allows
resampling from the model without having to recompile the model, for
example, using different number of iterations. However, changes to the
model itself require recompilation.
To end this section, let’s compare the two models by using
brms::loo()
. This works in the same way as the
loo::loo_compare
.
R
# Compare models
brms::loo(fit_hier, fit_hier2)
OUTPUT
Output of model 'fit_hier':
Computed from 4000 by 7584 log-likelihood matrix.
Estimate SE
elpd_loo -4004.9 42.9
p_loo 268.0 3.7
looic 8009.7 85.8
------
MCSE of elpd_loo is 0.2.
MCSE and ESS estimates assume MCMC draws (r_eff in [1.1, 2.2]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Output of model 'fit_hier2':
Computed from 4000 by 7584 log-likelihood matrix.
Estimate SE
elpd_loo -3866.9 43.9
p_loo 287.0 4.1
looic 7733.8 87.8
------
MCSE of elpd_loo is 0.2.
MCSE and ESS estimates assume MCMC draws (r_eff in [1.1, 2.6]).
All Pareto k estimates are good (k < 0.7).
See help('pareto-k-diagnostic') for details.
Model comparisons:
elpd_diff se_diff
fit_hier2 0.0 0.0
fit_hier -138.0 16.2
Based on the output, the second model provides a superior fit compared to the first model.
Challenge
Experiment with different priors for the model. How much does the chosen prior affect the results? Is there a big difference between a flat and the weakly informative prior used above?
Other packages built on Stan
In addition to the ones covered here, there are several other
packages that take advantage of Stan. Here we will briefly introduce
some of them. CmdStanR is a
lightweight command-line-based interface for Stan and provides and
alternative for rstan. rstanarm emulates the model
fitting R functions using Stan. The package can do lot of the same
things as brms
, but they do have differences, for example
rstanarm
models come pre-compiled while brms
compiles the models when called.
shinystan uses Shiny and
provides user with interactive, customizable visual and numerical
summaries of model parameters and convergence diagnostics. projpred performs projection
predictive variable selection for various models. The package works with
models from brms
and rstanarm
. posterior provides tools for
manipulating posterior draws, and contains methods for common
operations, such as, subsetting and binding, and producing posterior
summaries, and diagnostics.
Key Points
- There are several R packages that provide more user-friendly ways of using Stan.
-
brms
package can be used to fit a vast array of different Bayesian models. -
bayesplot
package is a library for various plotting tools. - Approximate leave-one-out cross-validation can be performed with the
loo
package.
Reading
References
- [1] A. Vehtari et al., Pareto Smoothed Importance Sampling, Journal of Machine Learning Research 25 (2024) 1-58.