The foreign R package is useful for exporting data sets to all kinds of formats including files for the proprietary SPSS program from IBM. However, the default method for writing to SPSS doesn’t allow for variable labels. Instead, it defaults to labeling all of the variables by their name from the column headings. In this presentation, we will use the Hmisc R package for its variable labeling functionality and write a modification to the original SPSS export function from the foreign R package. If you want to test this code but don’t have SPSS, there is an open-source version available called PSPP.
This presentation uses the R programming language and assumes the end user is taking advantage of RStudio IDE to compile their R markdown files into HTML (R Core Team 2019; RStudio Team 2016). All of the files needed to reproduce these results can be downloaded from the Git repository git clone https://git.waderstats.com/export_r_to_spss_with_variable_labels/
.
The libraries knitr, bookdown, and kableExtra generate the HTML output (Xie 2019, 2018; Zhu 2019). We use the Hmisc library to set variable labels and the foreign library to export the data to SPSS (Harrell Jr, Charles Dupont, and others. 2019; R Core Team 2018).
package_loader <- function(x, ...) {
if (x %in% rownames(installed.packages()) == FALSE) install.packages(x)
library(x, ...)
}
packages <- c("knitr", "bookdown", "kableExtra", "Hmisc", "foreign")
invisible(sapply(X = packages, FUN = package_loader, character.only = TRUE))
There are two essential parts to setting up the data. First, make sure each variable is coded to reflect its class. Second, we want to add labels to each variable in the data set using the label function of the Hmisc library.
set.seed(123)
data(mpg)
mpg <- data.frame(mpg)
colnames(mpg)[which(colnames(mpg) == "manufacturer")] <- "manu"
mpg$manu <- factor(mpg$manu)
mpg$model <- factor(mpg$model)
mpg$displ <- as.numeric(mpg$displ)
mpg$year <- factor(mpg$year, levels = c("1999", "2008"), ordered = TRUE)
mpg$dp <- as.Date(NA, origin = "1970-01-01")
mpg$dp[which(mpg$year == "1999")] <- sample(seq(as.Date('1999-01-01', format = "%Y-%m-%d", origin = "1970-01-01"), as.Date('1999-12-25', format = "%Y-%m-%d", origin = "1970-01-01"), by="day"), dim(mpg)[1]/2)
mpg$dp[which(mpg$year == "2008")] <- sample(seq(as.Date('2008-01-01', format = "%Y-%m-%d", origin = "1970-01-01"), as.Date('2008-12-25', format = "%Y-%m-%d", origin = "1970-01-01"), by="day"), dim(mpg)[1]/2)
mpg$cyl <- factor(mpg$cyl, levels = c(4, 5, 6, 8), ordered = TRUE)
mpg$trans <- factor(mpg$trans)
mpg$drv <- factor(mpg$drv, levels = c("f", "r", "4"), labels = c("front-wheel drive", "rear wheel drive", "4wd"))
mpg$fl <- factor(mpg$fl)
mpg$class <- factor(mpg$class)
mpg$rn <- rnorm(dim(mpg)[1], mean = 10, sd = 5)
mpg$rn[sample(1:length(mpg$rn), size = 50)] <- NA
mpg$party <- factor(sample(c("republican", "democrat", "independent", NA), dim(mpg)[1], replace = TRUE), levels = c("republican", "democrat", "independent"))
mpg$comments <- sample(c("I like this car!", "Meh.", "This is the worst car ever!", "Does it come in green?", "want cheese flavoured cars.", "Does it also fly?", "Blah, Blah, Blah, Blah, Blah, Blah, Blah, Blah", NA), dim(mpg)[1], replace = TRUE)
label(mpg$manu) <- "manufacturer"
label(mpg$model) <- "model name"
label(mpg$displ) <- "engine displacement, in litres"
label(mpg$year) <- "year of manufacture"
label(mpg$dp) <- "date of purchase"
label(mpg$cyl) <- "number of cylinders"
label(mpg$trans) <- "type of transmission"
label(mpg$drv) <- "drive type"
label(mpg$cty) <- "city miles per gallon"
label(mpg$hwy) <- "highway miles per gallon"
label(mpg$fl) <- "fuel type"
label(mpg$class) <- "type of car"
label(mpg$rn) <- "some random numbers that are generated from a normal distrubtion with mean = 10 and sd = 5"
label(mpg$party) <- "some random political parties"
label(mpg$comments) <- "some random comments"
kable(head(mpg), caption = "Header of <b>mpg</b>.", booktabs = TRUE, escape = FALSE) %>% kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"))
manu | model | displ | year | cyl | trans | drv | cty | hwy | fl | class | dp | rn | party | comments |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | front-wheel drive | 18 | 29 | p | compact | 1999-06-28 | 18.98570 | democrat | I like this car! |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | front-wheel drive | 21 | 29 | p | compact | 1999-01-14 | NA | NA | I like this car! |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | front-wheel drive | 20 | 31 | p | compact | 2008-02-08 | 19.50450 | independent | Does it also fly? |
audi | a4 | 2.0 | 2008 | 4 | auto(av) | front-wheel drive | 21 | 30 | p | compact | 2008-07-14 | NA | independent | I like this car! |
audi | a4 | 2.8 | 1999 | 6 | auto(l5) | front-wheel drive | 16 | 26 | p | compact | 1999-07-14 | 13.68097 | democrat | Meh. |
audi | a4 | 2.8 | 1999 | 6 | manual(m5) | front-wheel drive | 18 | 26 | p | compact | 1999-11-02 | 16.82888 | NA | Meh. |
The original unexported function can be viewed with the command foreign:::writeForeignSPSS. We are going to create a new function called writeForeignSPSS2. Below is the code with comments where changes are made.
# The function below isn't exported from the foreign package, so we recreate it here. Another option is the use the ::: operator but the documention for R discourages this use case. We will take their word for it :).
adQuote <- function(x) paste("\"", x, "\"", sep = "")
# The original function doesn't deal with dates. To address this we need to check if the variable is of class date to later store is a character string.
is.date <- function(x) inherits(x, 'Date')
# The original function uses the strwrap function that can sometime cause a split string to begin with a '*'. This function prevents this problem.
spssSafeSplit <- function(x) {
spssSplit <- strwrap(paste(x, collapse = " "), width = 70)
spssInvalid <- which(sapply(X = spssSplit, FUN = substring, 1, 1) == "*")
if (length(spssInvalid) == 0) return(spssSplit)
for (i in 1:length(spssInvalid)) {
spssSplit[spssInvalid[i]] <- gsub("* ", "", spssSplit[spssInvalid[i]], fixed = TRUE)
spssSplit[spssInvalid[i] - 1] <- paste(spssSplit[spssInvalid[i] - 1], "*")
}
return(spssSplit)
}
writeForeignSPSS2 <- function (df, datafile, codefile, varnames = NULL, maxchars = 32L) {
dfn <- lapply(df, function(x) if (is.factor(x)) as.numeric(x) else x)
# write.table(dfn, file = datafile, row.names = FALSE, col.names = FALSE, sep = ",", quote = FALSE, na = "", eol = ",\n")
write.table(dfn, file = datafile, row.names = FALSE, col.names = FALSE, sep = "\t", quote = FALSE, na = ".", eol = "\n") # The original code creates a comma-delimited file. In many data sets, there are comment columns that include commas which creates unexpected results. To avoid that I change it over to create tab-delimited data.
# varlabels <- names(df)
varlabels <- label(df) # We don't want the names of df but the labels.
if (is.null(varnames)) {
varnames <- abbreviate(names(df), maxchars)
if (any(sapply(varnames, nchar) > maxchars)) stop("I cannot abbreviate the variable names to 'maxchars' or fewer chars")
# if (any(varnames != varlabels)) warning("some variable names were abbreviated")
if (any(varnames != names(varlabels))) warning("some variable names were abbreviated") # The original code has an error check for the max length of variable names. We can still have the error check, but we need to use the names of varlables instead.
}
varnames <- gsub("[^[:alnum:]_\\$@#]", "\\.", varnames)
dl.varnames <- varnames
chv <- sapply(df, is.character)
fav <- sapply(df, is.factor)
dav <- sapply(df, is.date) # This is added in to deal with date class columns.
if (any(chv)) {
lengths <- sapply(df[chv], function(v) max(c(nchar(v), 8), na.rm = TRUE))
lengths <- paste0("(A", lengths, ")")
dl.varnames[chv] <- paste(dl.varnames[chv], lengths)
}
# Store any date class columns as a character string in SPSS. It is the same as above, but coerces the date to a character vector.
if (any(dav)) {
lengths <- sapply(df[dav], function(v) max(c(nchar(as.character(v)), 8), na.rm = TRUE))
lengths <- paste0("(A", lengths, ")")
dl.varnames[dav] <- paste(dl.varnames[dav], lengths)
}
if (any(fav)) {
dl.varnames[fav] <- paste(dl.varnames[fav], "(F8.0)")
}
# if (any(chv) || any(fav)) {
# star <- ifelse(c(FALSE, diff(chv | fav) == 1)[chv | fav], " *", " ")
# dl.varnames[chv | fav] <- paste(star, dl.varnames[chv | fav])
# }
# In the code above, the star detection isn't setup for dates. We can add this simple enough, as shown below.
if (any(chv) || dav || any(fav)) {
star <- ifelse(c(FALSE, diff(chv | dav | fav) == 1)[chv | dav | fav], " *", " ")
dl.varnames[chv | dav | fav] <- paste(star, dl.varnames[chv | dav | fav])
}
cat("SET DECIMAL=DOT.\n\n", file = codefile)
# cat("DATA LIST FILE=", adQuote(datafile), " free (\",\")\n", file = codefile, append = TRUE)
cat("DATA LIST FILE=", adQuote(datafile), " free (TAB)\n", file = codefile, append = TRUE) # Since we changed to a tab-delmited file, we also need to change the import format to TAB.
cat("ENCODING=\"Locale\"\n", file = codefile, append = TRUE)
# cat("/", paste(strwrap(paste(dl.varnames, collapse = " "), width = 70), "\n"), " .\n\n", file = codefile, append = TRUE)
cat("/", paste(spssSafeSplit(paste(dl.varnames, collapse = " ")), "\n"), " .\n\n", file = codefile, append = TRUE) # The original code forces a string wrap. If a new line ends up begining with a '*" weird things happen so we use the spssSafeSplit function to prevent this.
cat("VARIABLE LABELS\n", file = codefile, append = TRUE)
cat(paste(varnames, adQuote(varlabels), "\n"), ".\n", file = codefile, append = TRUE)
if (any(fav)) {
cat("\nVALUE LABELS\n", file = codefile, append = TRUE)
for (v in which(fav)) {
cat("/\n", file = codefile, append = TRUE)
cat(varnames[v], " \n", file = codefile, append = TRUE, sep = "")
levs <- levels(df[[v]])
cat(paste(seq_along(levs), adQuote(levs), "\n", sep = " "), file = codefile, append = TRUE)
}
cat(".\n", file = codefile, append = TRUE)
}
ord <- sapply(df, is.ordered)
if (any(ord)) cat("VARIABLE LEVEL", paste(strwrap(paste(varnames[ord], collapse = ", "), width = 70), "\n"), "(ordinal).\n", file = codefile, append = TRUE)
num <- sapply(df, is.numeric)
if (any(num)) cat("VARIABLE LEVEL", paste(strwrap(paste(varnames[num], collapse = ", "), width = 70), "\n"), "(scale).\n", file = codefile, append = TRUE)
cat("\nEXECUTE.\n", file = codefile, append = TRUE)
}
write.foreign(mpg, datafile = "export_r_to_spss_with_variable_labels.txt", codefile = "export_r_to_spss_with_variable_labels.sps", package = "SPSS2", maxchars = 64L)
If you are using a console you can create you .sav file with the following commands, otherwise if you are using the SPSS GUI, click away!
cat("\nSAVE OUTFILE = \"export_r_to_spss_with_variable_labels.sav\".", file = "export_r_to_spss_with_variable_labels.sps", append = TRUE)
system("pspp export_r_to_spss_with_variable_labels.sps")
Finally, we can view the results to make sure everything worked as expected.
get file='export_r_to_spss_with_variable_labels.sav'.
display dictionary.
list /cases=from 1 to 5.
EXECUTE.
cat(system("pspp export_r_to_spss_with_variable_labels_results.sps", intern = TRUE), sep = "\n")
## Variables
## +--------+--------+-------------+-----------+-----+-----+---------+------+
## | | | |Measurement| | | | Print|
## |Name |Position| Label | Level | Role|Width|Alignment|Format|
## +--------+--------+-------------+-----------+-----+-----+---------+------+
## |manu | 1|manufacturer |Scale |Input| 8|Right |F8.0 |
## |model | 2|model name |Scale |Input| 8|Right |F8.0 |
## |displ | 3|engine |Scale |Input| 8|Right |F8.2 |
## | | |displacement,| | | | | |
## | | |in litres | | | | | |
## |year | 4|year of |Ordinal |Input| 8|Right |F8.0 |
## | | |manufacture | | | | | |
## |cyl | 5|number of |Ordinal |Input| 8|Right |F8.0 |
## | | |cylinders | | | | | |
## |trans | 6|type of |Scale |Input| 8|Right |F8.0 |
## | | |transmission | | | | | |
## |drv | 7|drive type |Scale |Input| 8|Right |F8.0 |
## |cty | 8|city miles |Scale |Input| 8|Right |F8.2 |
## | | |per gallon | | | | | |
## |hwy | 9|highway miles|Scale |Input| 8|Right |F8.2 |
## | | |per gallon | | | | | |
## |fl | 10|fuel type |Scale |Input| 8|Right |F8.0 |
## |class | 11|type of car |Scale |Input| 8|Right |F8.0 |
## |dp | 12|date of |Nominal |Input| 10|Left |A10 |
## | | |purchase | | | | | |
## |rn | 13|some random |Scale |Input| 8|Right |F8.2 |
## | | |numbers that | | | | | |
## | | |are generated| | | | | |
## | | |from a normal| | | | | |
## | | |distrubtion | | | | | |
## | | |with mean = | | | | | |
## | | |10 and sd = 5| | | | | |
## |party | 14|some random |Scale |Input| 8|Right |F8.0 |
## | | |political | | | | | |
## | | |parties | | | | | |
## |comments| 15|some random |Nominal |Input| 32|Left |A46 |
## | | |comments | | | | | |
## +--------+--------+-------------+-----------+-----+-----+---------+------+
##
## +--------+------+-------+
## | | Write|Missing|
## |Name |Format| Values|
## +--------+------+-------+
## |manu |F8.0 | |
## |model |F8.0 | |
## |displ |F8.2 | |
## | | | |
## | | | |
## |year |F8.0 | |
## | | | |
## |cyl |F8.0 | |
## | | | |
## |trans |F8.0 | |
## | | | |
## |drv |F8.0 | |
## |cty |F8.2 | |
## | | | |
## |hwy |F8.2 | |
## | | | |
## |fl |F8.0 | |
## |class |F8.0 | |
## |dp |A10 | |
## | | | |
## |rn |F8.2 | |
## | | | |
## | | | |
## | | | |
## | | | |
## | | | |
## | | | |
## |party |F8.0 | |
## | | | |
## | | | |
## |comments|A46 | |
## | | | |
## +--------+------+-------+
##
## Value Labels
## +--------------------------------+----------------------+
## |Variable Value | Label |
## +--------------------------------+----------------------+
## |manufacturer 1 |audi |
## | 2 |chevrolet |
## | 3 |dodge |
## | 4 |ford |
## | 5 |honda |
## | 6 |hyundai |
## | 7 |jeep |
## | 8 |land rover |
## | 9 |lincoln |
## | 10|mercury |
## | 11|nissan |
## | 12|pontiac |
## | 13|subaru |
## | 14|toyota |
## | 15|volkswagen |
## |model name 1 |4runner 4wd |
## | 2 |a4 |
## | 3 |a4 quattro |
## | 4 |a6 quattro |
## | 5 |altima |
## | 6 |c1500 suburban 2wd |
## | 7 |camry |
## | 8 |camry solara |
## | 9 |caravan 2wd |
## | 10|civic |
## | 11|corolla |
## | 12|corvette |
## | 13|dakota pickup 4wd |
## | 14|durango 4wd |
## | 15|expedition 2wd |
## | 16|explorer 4wd |
## | 17|f150 pickup 4wd |
## | 18|forester awd |
## | 19|grand cherokee 4wd |
## | 20|grand prix |
## | 21|gti |
## | 22|impreza awd |
## | 23|jetta |
## | 24|k1500 tahoe 4wd |
## | 25|land cruiser wagon 4wd|
## | 26|malibu |
## | 27|maxima |
## | 28|mountaineer 4wd |
## | 29|mustang |
## | 30|navigator 2wd |
## | 31|new beetle |
## | 32|passat |
## | 33|pathfinder 4wd |
## | 34|ram 1500 pickup 4wd |
## | 35|range rover |
## | 36|sonata |
## | 37|tiburon |
## | 38|toyota tacoma 4wd |
## |year of manufacture 1 |1999 |
## | 2 |2008 |
## |number of cylinders 1 |4 |
## | 2 |5 |
## | 3 |6 |
## | 4 |8 |
## |type of transmission 1 |auto(av) |
## | 2 |auto(l3) |
## | 3 |auto(l4) |
## | 4 |auto(l5) |
## | 5 |auto(l6) |
## | 6 |auto(s4) |
## | 7 |auto(s5) |
## | 8 |auto(s6) |
## | 9 |manual(m5) |
## | 10|manual(m6) |
## |drive type 1 |front-wheel drive |
## | 2 |rear wheel drive |
## | 3 |4wd |
## |fuel type 1 |c |
## | 2 |d |
## | 3 |e |
## | 4 |p |
## | 5 |r |
## |type of car 1 |2seater |
## | 2 |compact |
## | 3 |midsize |
## | 4 |minivan |
## | 5 |pickup |
## | 6 |subcompact |
## | 7 |suv |
## |some random political parties 1 |republican |
## | 2 |democrat |
## | 3 |independent |
## +--------------------------------+----------------------+
##
## Data List
## +----+-----+-----+----+---+-----+---+-----+-----+--+-----+----------+-----+
## |manu|model|displ|year|cyl|trans|drv| cty | hwy |fl|class| dp | rn |
## +----+-----+-----+----+---+-----+---+-----+-----+--+-----+----------+-----+
## | 1| 2| 1.80| 1| 1| 4| 1|18.00|29.00| 4| 2|1999-06-28|18.99|
## | | | | | | | | | | | | | |
## | | | | | | | | | | | | | |
## | 1| 2| 1.80| 1| 1| 9| 1|21.00|29.00| 4| 2|1999-01-14| .|
## | | | | | | | | | | | | | |
## | | | | | | | | | | | | | |
## | 1| 2| 2.00| 2| 1| 10| 1|20.00|31.00| 4| 2|2008-02-08|19.50|
## | | | | | | | | | | | | | |
## | | | | | | | | | | | | | |
## | 1| 2| 2.00| 2| 1| 1| 1|21.00|30.00| 4| 2|2008-07-14| .|
## | | | | | | | | | | | | | |
## | | | | | | | | | | | | | |
## | 1| 2| 2.80| 1| 3| 4| 1|16.00|26.00| 4| 2|1999-07-14|13.68|
## +----+-----+-----+----+---+-----+---+-----+-----+--+-----+----------+-----+
##
## +-----+--------+
## |party|comments|
## +-----+--------+
## | 2|I like |
## | |this |
## | |car! |
## | .|I like |
## | |this |
## | |car! |
## | 3|Does it |
## | |also |
## | |fly? |
## | 3|I like |
## | |this |
## | |car! |
## | 2|Meh. |
## +-----+--------+
sessionInfo()
## R version 4.0.3 (2020-10-10)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices datasets utils methods base
##
## other attached packages:
## [1] foreign_0.8-79 Hmisc_4.4-1 ggplot2_3.3.2 Formula_1.2-4
## [5] survival_3.2-7 lattice_0.20-41 kableExtra_1.3.1 bookdown_0.21
## [9] knitr_1.30
##
## loaded via a namespace (and not attached):
## [1] xfun_0.19 splines_4.0.3 colorspace_2.0-0
## [4] vctrs_0.3.5 htmltools_0.5.0 viridisLite_0.3.0
## [7] yaml_2.2.1 base64enc_0.1-3 rlang_0.4.8
## [10] pillar_1.4.6 glue_1.4.2 withr_2.3.0
## [13] RColorBrewer_1.1-2 jpeg_0.1-8.1 lifecycle_0.2.0
## [16] stringr_1.4.0 munsell_0.5.0 gtable_0.3.0
## [19] rvest_0.3.6 htmlwidgets_1.5.2 evaluate_0.14
## [22] latticeExtra_0.6-29 htmlTable_2.1.0 backports_1.2.0
## [25] checkmate_2.0.0 renv_0.12.2 scales_1.1.1
## [28] webshot_0.5.2 gridExtra_2.3 png_0.1-7
## [31] digest_0.6.27 stringi_1.5.3 grid_4.0.3
## [34] tools_4.0.3 magrittr_2.0.1 tibble_3.0.4
## [37] cluster_2.1.0 crayon_1.3.4 pkgconfig_2.0.3
## [40] ellipsis_0.3.1 Matrix_1.2-18 data.table_1.13.2
## [43] xml2_1.3.2 rmarkdown_2.5 httr_1.4.2
## [46] rstudioapi_0.13 R6_2.5.0 rpart_4.1-15
## [49] nnet_7.3-14 compiler_4.0.3
Harrell Jr, Frank E, with contributions from Charles Dupont, and many others. 2019. Hmisc: Harrell Miscellaneous. https://CRAN.R-project.org/package=Hmisc.
R Core Team. 2018. Foreign: Read Data Stored by ’Minitab’, ’S’, ’Sas’, ’Spss’, ’Stata’, ’Systat’, ’Weka’, ’dBase’, ... https://CRAN.R-project.org/package=foreign.
———. 2019. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
RStudio Team. 2016. RStudio: Integrated Development Environment for R. Boston, MA: RStudio, Inc. http://www.rstudio.com/.
Xie, Yihui. 2018. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.
———. 2019. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://yihui.name/knitr/.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.