Introduction
This document is intended to get you started with using the amp.dm package. This package is developed to ease the process of creating NONMEM datasets, but can in principle be used for any other dataset within the field of pharmacometrics.
Constructing an analysis data set is highly data driven and the strategy depends in great extend on the design of a study. However, certain steps are necessary in almost all cases. This package contains functions to help with these steps. An important part when coding in the pharmaceutical industry, is logging and documenting. The amp.dm package include various functions to help in this process.
Documentation and logging
An important part of pharmacometric analyses is the documentation and logging of the various steps that have been performed. This is important when communicating between data management and modelers, as well for submission purposes. Primary information regarding the meaning of variables, units of measurements or (de)coding of categories is key in understanding the data. Furthermore, information regarding records that have been dropped/added is essential. Other information like statistics or system information provide a complete overview of the data management process.
In the base of this, is construction of data sets using rmarkdown. This workflow enable to easily add comments regarding the data management process. Also providing various types of tables, with important information is easily done here. On top of this, amp.dm has various functions that log information or present it within a rmarkdown document.
functions logging results
The package has a few functions that log results, which can be used to add in the documentation at a later stage. These functions are mainly wrappers around existing functions but have additional options for logging. See below for the 3 main functions that are available
library(dplyr)
library(amp.dm)
xmpl <- system.file("example/NM.theoph.V1.csv",package="amp.dm")
# The read data function can read most common formats, for less common formats
# a manual function can passed to enable documenting the process
dat <- read_data(xmpl, comment="Read example data")
ℹ Read in '/home/runner/work/_temp/Library/amp.dm/example/NM.theoph.V1.csv' which has 288 records and 19 variables
# We can filter data with logging
dat2 <- filterr(dat,STIME<2, comment = "remove time-points") %>%
select(ID,STIME) %>% mutate(FLAG=1)
ℹ Filter applied with 168 record(s) deleted
# We can also join with logging
dat3 <- left_joinr(dat2, dat, comment = "example join")
Joining with `by = join_by(ID, STIME)`
ℹ Output data contains 168 records
ℹ dat2 contained 120 records
ℹ dat contained 288 records
! Be aware for possible cartesian product
The functions above will provide some additional information in the console. On top of this, all relevant information is saved in the package environment and can be shown using the get_log function:
$filterr_nfo
datain coding datainrows dataoutrows rowsdropped comment
1 dat STIME < 2 288 120 168 remove time-points
$joinr_nfo
datainl datainr datainrowsl datainrowsr dataoutrowsl dataoutrows comment
1 dat2 dat 120 288 0 168 example join
$read_nfo
datain datainrows
1 /home/runner/work/_temp/Library/amp.dm/example/NM.theoph.V1.csv 288
dataincols comment
1 19 Read example data
Besides the functions above there are two other functions that can be used for logging and documentation: 1. The cmnt function can be used to provide a comment regarding a piece of code within a large code block. This can then be presented after a code chunk (using cmnt_print). This is mainly useful to list items that need special attention 2. The srce function can be used to identify where certain variables derive from. This information can be used later on in the documentation, which is particularly useful for registration purposes
cmnt("**Be aware** that *ID 1* is removed using `subset`")
dat4 <- subset(dat,ID!=1, select=-BMI)
srce(BMI,c(dat4.WEIGHT,dat4.HEIGHT),'d')
dat4$BMI <- dat4$WEIGHT/(dat4$HEIGHT)^2
# Note it is easier to directly use inline code, e.g.: `r cmnt_print()`
cat(cmnt_print())
Assumptions and special attention:
-
Be aware that ID 1 is removed using
subset
# This is also available in tabulation functions e.g. define_tbl
get_log()$srce_nfo
variable type source
1 BMI d dat4.WEIGHT, dat4.HEIGHT
Handling of attributes
Data attributes hold vital information regarding the meta data of a constructed data set. Mainly an explanation on variables, units and the way they were constructed are key. Additionally, mainly for NONMEM analysis, it is important to provide an explanation for categorical variables. NONMEM can only handle numeric values, these means that categorical data like gender and country should be re-coded as a numeric. The meaning of these categories are important to understand the content of the data.
Data attributes can be created in an excel file. In such a file all the variables of a data set are listed with the corresponding meta information. When a data set is constructed the meta data can be obtained (using the attr_xls function) and used in various ways which is explained further on. A template of such an excel file is available in the package (see system.file("example/Attr.Template.xlsx", package="amp.dm")).
The other functions available to work with attributes in the package are:
- The
attr_add function; this can be used to add attributes to a data set
- The
attr_extract function; this can be used to extract attributes from a data set.
- The
attr_factor function; this can be used to create factors for numerical/categorical variables within a data set.
Tabulation and checking
When a data set is constructed using the functions in the previous section, results can be tabulated using various functions. The define_tbl function can be used to present a table of the attributes of a data set. It typically presents the table directly usable for a ‘define.pdf’ file. Another important table for reviewing the data can be generated using the stats_df function. This function will show some simple statistics including ranges, missing data and number of categories of a data set. The counts_df can be used to show number of records or unique subjects, stratified over one or multiple variables. Finally information from the functions that log results (e.g. reading, filtering or joining data) can be tabulated using the log_df function. A more specific function to mention is the check function intended for NONMEM data implemented in check_nmdata. This function will check if a data set follows the minimum requirements to be used in a NONMEM model. You can also check for non essential requirements to could trigger for further investigations.
All of these function will created a LateX table using the general_tbl function. This ensures that results are presented nicely and uniform when placed in a rmarkdown or quarto chunk (using the “asis” option), e.g.
\begin{longtable}{l}
\caption{General table} \\
\toprule result \\
\midrule\endhead this is a test \\
\hline
\end{longtable}
Analysis functions
There are a multiple functions implemented in the package that are quite specific for NONMEM analysis. This mainly include the following:
-
time_calc; Create time variables for usage in NONMEM analyses
-
expand_addl_ii; Expand rows in case NONMEM ADDL and II variables are present
-
fill_dates; Fills down dates within a data frame that include a start and end date. Although not strictly for NONMEM dataset, it is often used there to fill out dose records
-
impute_dose; Imputes dose records using ADDL and II by looking forward and backwards
-
create_addl and expand_addl_ii; Work with dose levels to reduce the size of the data by creating ADDL and II records, or expand dose records by looking at ADDL/II data
There are other functions that are not directly restricted for NONMEM usage but are often used to create common variables. For example, the egfr function to calculate the estimated glomerular filtration rate using different formulas or weight_height to calculate various metrics like BMI, LBM and FFM.
Conclusion
Although there are other functions available in the package. This vignette should provide a solid starting point to be able to use the package. Additionally, the example study vignette will provide a practical example on how functions can be used and how the final documentation of such a dataset will look like.