library(xtable)
library(DT)
library(dplyr)
There are many functions or packages which summarizes variables in a data frame. However, there seem not to offer so much flexibility on how much one can manipute or export the output, for instance, including outputs in .tex files.
In this post, we implement a simple function which entirely depends on base R to generate a more flexible variable(s) summary object.
summarizeDf <- function(df, output = c("simple", "tex"), digits = 1){
if (!missing(output) & sum(!output %in% c("simple", "tex")) > 0){
stop("output can only be 'simple' or 'tex'")
}
vars <- colnames(df)
df_summary <- data.frame(Variable = rep(NA, length(vars))
, Type = rep(NA, length(vars))
, Summary = rep(NA, length(vars))
)
for (i in 1:length(vars)){
vals <- df[, vars[[i]]]
if (class(vals) == "numeric" | class(vals) == "integer"){
df_summary[["Type"]][[i]] <- "numeric"
df_summary[["Variable"]][[i]] <- vars[[i]]
df_summary[["Summary"]][[i]] <- paste0("["
, round(min(vals), digits), ", "
, round(max(vals), digits), "]; "
, round(mean(vals), digits), " ("
, round(sd(vals), digits), ")"
)
} else{
df_summary[["Type"]][[i]] <- "categorical"
df_summary[["Variable"]][[i]] <- vars[[i]]
perc <- sort(round(prop.table(table(vals))*100, digits)
, decreasing = TRUE
)
if (missing(output) | sum(output %in% "simple") > 0){
perc <- paste0(names(perc), " (", perc, "%)")
df_summary[["Summary"]][[i]] <- paste0(perc
, collapse = ";\n "
)
} else{
perc <- paste0(names(perc), " (", perc, "\\%)")
df_summary[["Summary"]][[i]] <- paste0(perc
, collapse = "; \\\\ & & "
)
}
}
}
return(df_summary)
}
summarizeDf
summariz(s)es dataframe. Computes ([min, max]; mean (sd)) for numerical or integer variables and frequency distribution (percent) for categorical variables.
Inputs:
df
- Input dataframeoutput
- Specifies the output structure.output = "simple"
returns R-output-like output.output = "tex"
returns xtable ready format.digits
- Number of digits to return.
Details:
For categorical variables with several categories,
output = "tex"
is preferrable. Addsanitize.text.function = function(x){x}
to xtable print function for .tex.
Value:
It returns an object of class
data.frame
. Computes ([min, max]; mean (sd)) for numerical or integer variables and frequency distribution (percent) for categorical variable
To demonstrate this, we use social media survey data described in this post.
smedia_df <- read.csv("../datasets/multi_response.csv")
datatable(smedia_df, rownames = FALSE)
Output normal R-like summary. See value above.
smedia_df1 <- select(smedia_df, -c("doi"))
smedia_summary <-(smedia_df1
%>% summarizeDf(.)
)
datatable(smedia_summary, rownames = FALSE)
Generate simple latex-like table.
smedia_summary <-(smedia_df1
%>% summarizeDf(., output = "simple")
%>% xtable(., caption = "Simple data summary")
)
summary.tex <- print(smedia_summary, sanitize.text.function = function(x){x}
, type = "html"
, scalebox = 0.5
, include.rownames = FALSE
, caption.placement = "top"
)
Variable | Type | Summary |
---|---|---|
gender | categorical | Female (63.2%); Male (36.8%) |
age | numeric | [19, 28]; 22.1 (2) |
smedia_used | categorical | Facebook, Whatsapp (26.3%); Twitter, Facebook, Instagram, Whatsapp (18.4%); Twitter, Facebook, Whatsapp (18.4%); Facebook, Instagram, Whatsapp (10.5%); Facebook (5.3%); Twitter, Facebook, Instagram, Pinterest, Whatsapp (5.3%); Twitter, Instagram, Whatsapp (5.3%); Twitter, Facebook, Instagram, Pinterest, Whatsapp, Viber (2.6%); Twitter, Facebook, Pinterest, Whatsapp (2.6%); Twitter, Whatsapp (2.6%); Whatsapp (2.6%) |
freq_usage | categorical | Never (34.2%); Multiple times a day (26.3%); A few times a week (18.4%); At least once a day (10.5%); A few times a month (5.3%); Rarely (5.3%) |
freq_post | categorical | Never (36.8%); Rarely (23.7%); A few times a week (15.8%); A few times a month (10.5%); At least once a day (7.9%); Multiple times a day (5.3%) |
You can also generate .tex file to include in .tex files by changing type = "latex"
and also adding file = "filename.tex"
. See print.xtable
.
You can download the function from here or markdown file from here.