Tests of the General Linear Hypothesis on a Model with a Single Continuous Outcome and Categorical Variable with Three Levels
When parameters of a general linear model are estimated, analysts often report the main effects. However, sometimes the hypothesis of interest is a linear combination of the main effects that is not displayed by default in a standard regression table. Testing many hypotheses from the same linear model is especially relevant when an analyst fits a model using categorical variables where many post-hoc hypotheses are of interest. This presentation will show how to code a categorical variable for use in the general linear model and use tests of the so-called general linear hypothesis to test any number of hypotheses about the effects.
It is not an overstatement to say that the default output for a Principal Components Analysis (PCA) biplot using the biplot() function in R is gross. This presentation will show you how to use the output from the prcomp() function with the ggplot2 library to create a more aesthetically pleasing and informative PCA biplot.
Using Principal Components Analysis (PCA) to Analyze Latino Stress by Agricultural Season and Occupation
Principal Components Analysis (PCA) is a commonly used unsupervised machine learning technique. In this presentation, I describe the PCA method with a general description and geometric interpretation using simulated two and three-dimensional data. The description of the PCA methodology is followed by an application of PCA to analyze Latino stress by agricultural season and occupation in a majority-minority agricultural area of eastern Washington State.
Inference and Estimation of the Treatment Effect in a Two-Arm Parallel Randomized Controlled Trial that is Marginalized Over Time
In a 2-arm parallel randomized controlled trial (RCT) an outcome of interest is measured at least twice, once before the treatment is administered and once after. However, to measure the stability of an effect over time, additional time points can be added after the first follow-up. In this presentation, I consider the case where an outcome of interest is measured at a baseline, a second follow-up, and a third follow-up. In this design, it might be of interest to know the treatment effect at the third follow-up that is unconditional on the treatment effect at the second follow-up. We derive such an effect and its standard error and apply the theory to a simulated outcome that is correlated over time.
Many investigators, project managers, and data managers have turned to REDCap to manage their data. Eventually, it falls to the statistician to take the REDCap data and load it into their statistical analysis program of choice. In this presentation, I show how to use the CSV and R script file downloaded from REDCap to create a clean R data set.
Average Marginal Effects in a 2-Arm Parallel Randomized Controlled Trial with Heterogeneity of Effects by Strata
The utility of the method of Average Marginal Effects in many contexts of statistical modeling makes the lack of accessible resources in the literature surrounding them a tragedy for both statisticians and those who consume statistics. In this presentation, I attempt to solve this problem by deriving the estimate of the AME, and its standard error in the context of a common experimental design; namely, the 2-arm parallel randomized controlled trial (RCT) with heterogeneity of effects by site. We follow each section with straight forward programming techniques to apply this method to real data.
The foreign R package is useful for exporting data sets to all kinds of formats including files for the proprietary SPSS program from IBM. However, the default method for writing to SPSS doesn’t allow for variable labels. Instead, it defaults to labeling all of the variables by their name from the column headings. In this presentation, we will use the Hmisc R package for its variable labeling functionality and write a modification to the original SPSS export function from the foreign R package.
The general linear model (GLM), more commonly called linear regression, is the most common statistical modeling tool a statistician or data scientist will use. As such, it is crucial to know how to present results from a GLM in a way that is understandable to your audience.
An essential part of any data analysis project is to understand the data at hand.
Creating an analytic data set is very important when doing data analysis and will be used to reproduce the results.