Transforming raw data into ICER plots using the easyicer package

library(easyicer)
library(here)

Introduction

This vignette is a tutorial on how to seamlessly create incremental cost-effectiveness ratio (ICER) analyses, including plots, by making use of all the functions in this package. This package was designed in a pipeline structure so that the end-user could cohesively deploy the package functions in a step-wise manner to arrive at a final cost-effectiveness analysis (CEA) with ICER plots. For instructive purposes, this vignette will be organized into 3 use-cases that can be integrated as a single workflow or deployed standalone. The target domain for this package is health technology assessment and economics, where CEAs are a common tool for evaluating the risk-benefit of new health interventions for implementation.

Data

The data used in this tutorial is a modified dataset obtained from the authors’ research project. The full dataset is not publicly available. The modified dataset used herein can be located in the GitHub repository for this package. For the purpose of this tutorial, we will assign the dataset to an object.

# Load the "here" package to access the .csv file in the data folder if you are in the source package repository.
easyicer_data <- read.csv(here::here("data/easyicer_data.csv"))

dim(easyicer_data)
#> [1] 828   8

head(easyicer_data, n = 10)
#>    cohort scr_age ppt_rate scr_mod year Cost_of_screen_x_reen_positives
#> 1    1973      50     0.18     FIT 2015                        27458883
#> 2    1973      50     0.18     FIT 2016                        27805623
#> 3    1973      50     0.18     FIT 2017                        27650384
#> 4    1973      50     0.18     FIT 2018                        16568149
#> 5    1973      50     0.18     FIT 2019                        16598985
#> 6    1973      50     0.18     FIT 2020                        17469883
#> 7    1973      50     0.18     FIT 2021                        17340106
#> 8    1973      50     0.18     FIT 2022                        17281809
#> 9    1973      50     0.18     FIT 2023                        22268950
#> 10   1973      50     0.18     FIT 2024                        29240148
#>    Cost_of_clinic_x_ow_up_protocol Health_adjusted_person_years
#> 1                          1660290                      2132296
#> 2                          3498682                      2141253
#> 3                          3764434                      2149907
#> 4                          6989066                      2151707
#> 5                          7421521                      2149004
#> 6                          7601313                      2147614
#> 7                          8097227                      2145495
#> 8                          8479750                      2142464
#> 9                          7999188                      2138152
#> 10                         8926254                      2132390

The dataset contains 8 variables with 828 rows of observations. This data was obtained from a microsimulation model projecting lifetime health and economic outcomes of average-risk colorectal cancer screening simulations in pre-defined Canadian birth cohorts. There are 4 predictor variables, or characteristics (cohort, scr_age, ppt_rate, scr_mod, year), and 3 outcome variables (including both cost outcomes and a health outcome). See below for full variable definitions.

Variable Dictionary

cohort: A simulated birth cohort in the Canadian population with modeled adenoma risk relative to the age group. Example: 1973 represents a cohort of individuals born between 1973 and 1977.
scr_age: The modeled age of screening initiation for each scenario. Example: 50 means that simulated individuals began screening at age 50.
ppt_rate: The modeled participation rate for colorectal cancer screening in the screening scenario. Example: 0.18 = 18% of the cohort underwent screening.
scr_mod: The modality of the screening test. Either a fecal immunochemical test (FIT) or a colonoscopy (f10q).
year: The corresponding year that outcomes were simulated for.
Cost_of_screen_x_reen_positives: The total cost of screening, including follow-up screening of positive cases.
Cost_of_clinic_x_ow_up_protocol: The cumulative cost incurred by year from clinical diagnoses and follow-up procedures.
Health_adjusted_person_years: The estimated cumulative life years of all individuals based on individually ascribed health utility scores. Herein defined as quality-adjusted life years (QALY).

Use-Case 1: Calculating cumulative outputs for cost and QALY outcomes with `summed_output()`

Often times with health economic data, we might get a large volume of raw and unprocessed outputs. This is particularly the case for simulation models that generate and display highly granular information. Often times, the individual outputs are meaningless unless they are transformed in some way (e.g., calculating a cumulative value) This is where it is helpful to deploy the summed_output() function, especially when this process is iterative. The summed_output() function calculates the sum of a numeric column in the dataset to arrive at a cumulative value. With this dataset, this is valuable because we need to calculate the sum of each output variable (costs and QALY) since they were originally calculated per year. This function also allows the end-user to group by and filter specific characteristics. For example, in our data, we have multiple birth cohorts, screening ages and screening modalities that we want to group by before calculating cumulative values. Lastly, the function can handle any NA values that might otherwise disrupt the function calculation.


# Set a filter parameter to call in filter_vars
ppt18 <- easyicer_data$ppt_rate == 0.18

# Calculate the cumulative values for the cost outcome
cumulative_cost <- summed_output(
  data = easyicer_data,
  group_vars = c("scr_mod", "scr_age"),
  sum_var = "Cost_of_screen_x_reen_positives",
  filter_vars = ppt18
)

# Calculate the cumulative values for the health outcome
cumulative_qaly <- summed_output(
  data = easyicer_data,
  group_vars = c("scr_mod", "scr_age"),
  sum_var = "Health_adjusted_person_years",
  filter_vars = ppt18
)

print(cumulative_cost)
#> # A tibble: 4 × 3
#> # Groups:   scr_mod [2]
#>   scr_mod scr_age  cumulative
#>   <chr>     <int>       <dbl>
#> 1 FIT          45 2694728140.
#> 2 FIT          50 2517849263.
#> 3 f10q         45 3420224126.
#> 4 f10q         50 3312806274.
print(cumulative_qaly)
#> # A tibble: 4 × 3
#> # Groups:   scr_mod [2]
#>   scr_mod scr_age cumulative
#>   <chr>     <int>      <dbl>
#> 1 FIT          45 205541825.
#> 2 FIT          50 205524017.
#> 3 f10q         45 205610208.
#> 4 f10q         50 205601137.

Now we have arrived at our necessary outputs to calculate ICERs. Did you notice that this function is a general sum function with the ability to sum by groups or filtered datasets? This function is not strictly intended for calculating ICERs and is versatile with other purposes, but it makes iterative sum calculations very convenient - especially when calculating ICERs. Note that we did not specifically call na.rm = TRUE, this is because it is set to TRUE as default so if it is not called in the function it will automatically remove NA values.

Use-Case 2: Calculating ICERs from cumulative cost and QALY inputs with `icercalc()`

Before we can calculate ICERs, we need to transform the data structure so that the values are recognized in the correct order by the icercalc() function. This is because with ICERs, there are reference values that correspond with a reference or comparator scenario/intervention. In the context of our dataset, this refers to scr_age = 50 and scr_mod = FIT (FIT 50), which is the current colorectal cancer screening strategy in Canada. In order to properly calculate the ICERs, the function expects a matrix structure as illustrated:

        ref     new
 cost   n00     n01
 qaly   n10     n11

Once you have created a matrix, you might realize that the order of values in the matrix is not correct. Instead of manually re-coding the matrix, you can specify rev = as "neither", "rows", "columns" or "both" to reverse the order of the columns, rows, or columns and rows. By default rev = "neither" so you do not need to specify the rev argument if the matrix is correct. For instructive purposes, the rev argument will be showcased below.


# Create matrix of cost and QALY values for each scenario
icer_matrix_fit45 <- matrix(c(
  cumulative_cost$cumulative[2],
  cumulative_cost$cumulative[1],
  cumulative_qaly$cumulative[2],
  cumulative_qaly$cumulative[1]),
  nrow = 2,
  ncol = 2,
  byrow = TRUE)

# Calculate ICERs using matrix 
icer_fit45 <- icercalc(data = icer_matrix_fit45)
#> [1] "Modified matrix:"
#>            [,1]       [,2]
#> [1,] 2517849263 2694728140
#> [2,]  205524017  205541825
#> 
#> Incremental Cost:
#>  176878878 
#> Incremental QALY:
#>  17807.78 
#> Incremental cost-effectiveness ratio (ICER):
#>  9932.676

# Re-iteration for COL 45 scenario
icer_matrix_col45 <- matrix(c(
  cumulative_cost$cumulative[2],
  cumulative_cost$cumulative[3],
  cumulative_qaly$cumulative[2],
  cumulative_qaly$cumulative[3]),
  nrow = 2,
  ncol = 2,
  byrow = TRUE)

icer_col45 <- icercalc(data = icer_matrix_col45)
#> [1] "Modified matrix:"
#>            [,1]       [,2]
#> [1,] 2517849263 3420224126
#> [2,]  205524017  205610208
#> 
#> Incremental Cost:
#>  902374863 
#> Incremental QALY:
#>  86190.82 
#> Incremental cost-effectiveness ratio (ICER):
#>  10469.5

# # Re-iteration for COL 50 scenario
icer_matrix_col50 <- matrix(c(
  cumulative_cost$cumulative[4], # The columns of the matrix are purposefully in inverse order to demonstrate rev argument.
  cumulative_cost$cumulative[2],
  cumulative_qaly$cumulative[4],
  cumulative_qaly$cumulative[2]),
  nrow = 2,
  ncol = 2,
  byrow = TRUE)

icer_col50 <- icercalc(data = icer_matrix_col50, rev = "columns")
#> [1] "Modified matrix:"
#>            [,1]       [,2]
#> [1,] 2517849263 3312806274
#> [2,]  205524017  205601137
#> 
#> Incremental Cost:
#>  794957011 
#> Incremental QALY:
#>  77119.96 
#> Incremental cost-effectiveness ratio (ICER):
#>  10308.06

The output of icercalc() displays the final matrix used in the ICER calculation, the incremental cost output (difference in cost between the new and reference scenarios), the incremental QALY (difference in QALY between new and reference scenarios) and the ICER. The component cost and QALY outcomes are displayed because if the end user works with the entire easyicer pipeline, both values will be required for the icerplot() function.

Use-Case 3: Creating ICER plots from calculated ICERs with `icerplot()`

The last stage of a basic CEA is to plot the ICERs along a coordinate system. This is helpful for visually comparing how cost-effective multiple interventions are relative to the reference intervention. The icerplot() function creates a scatterplot based on a dataframe with at least two numeric columns (representing cost and QALY outcomes). An additional and optional names argument is implemented which creates legend labels if the ICER values have corresponding names.

# Create a dataframe (if needed) with cost and QALY derivatives of ICER values.
icer_coordinates <- data.frame(
intervention = c("FIT 45", "COL 45", "COL 50"),
QALY = c(17807.78, 86190.82, 77119.96),
Cost = c(176878878, 902374863, 794957011))

icerplot(data = icer_coordinates, x = "QALY", y = "Cost", names = "intervention")

Notice that you do not need the ICER outputs from icercalc() to use this function. In fact, the ICER values calculated with icercalc() will not work since both the component costs and QALY outcomes are required in the dataframe, which is why icercalc() provides both incremental costs and QALYs as well as the ICER. This highlights that these functions can be used in an integrated manner or independently based on the end user’s needs and data structures.

Introduction

Data

Variable Dictionary

Use-Case 1: Calculating cumulative outputs for cost and QALY outcomes with summed_output()

Use-Case 2: Calculating ICERs from cumulative cost and QALY inputs with icercalc()

Use-Case 3: Creating ICER plots from calculated ICERs with icerplot()

Use-Case 1: Calculating cumulative outputs for cost and QALY outcomes with `summed_output()`

Use-Case 2: Calculating ICERs from cumulative cost and QALY inputs with `icercalc()`

Use-Case 3: Creating ICER plots from calculated ICERs with `icerplot()`