如何使R中的摘要的某些变量保持静态 [英] How to keep some variables static for the summary in R

查看：61 发布时间：2020/10/17 2:48:50 r dataframe dplyr

本文介绍了如何使R中的摘要的某些变量保持静态的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在R中使用以下数据框：

I am using the following dataframe in R:

df<-

structure(list(uid = c("K-1", "K-1", 
"K-2", "K-3", "K-4", "K-5", 
"K-6", "K-7", "K-8", "K-9", 
"K-10", "K-11", "K-12", "K-13", 
"K-14"), Date = c("2020-03-16 12:11:33", "2020-03-16 12:11:33", 
"2020-03-16 06:13:55", "2020-03-16 10:03:43", "2020-03-16 12:37:09", 
"2020-03-16 06:40:24", "2020-03-16 09:46:45", "2020-03-16 12:07:44", 
"2020-03-16 14:09:51", "2020-03-16 09:19:23", "2020-03-16 09:07:37", 
"2020-03-16 11:48:34", "2020-03-16 06:23:24", "2020-03-16 04:39:03", 
"2020-03-16 04:59:13"), batch_no = c(7, 7, 8, 9, 9, 8, 
7, 6, 7, 9, 8, 8, 7, 7, 7), marking = c("S1", "S1", "S2", 
"SE_hold1", "SD_hold1", "SD_hold2", "S3", "S3", "", "SA_hold3", "S1", "S1", "S2", 
"S3", "S3"), seq = c("FRD", 
"FHL", NA, NA, NA, NA, NA, NA, "ABC", NA, NA, NA, NA, "DEF", NA)), .Names = c("uid", 
"Date", "batch_no", "marking",
"seq"), row.names = c(NA, 15L), class = "data.frame")

然后使用

# Function to summarise each of the vectors required: summariser => function
summariser <- function(vec) {
  within(unique(data.frame(
    vec = vec,
    counter = as.numeric(ifelse(is.na(vec), sum(is.na(vec)),
                     ave(vec, vec, FUN = length))), stringsAsFactors = FALSE
  )),
  {
    perc = paste0(round(counter / sum(counter) * 100, 2), "%")
  })
}
# Vectors to summarise: vecs_to_summarise => character vector
vecs_to_summarise <- c("seq", "marking", "batch_no")

# Create an empty list in order to allocate some memory: df_list => list
df_list <- vector("list", length(vecs_to_summarise))

# Apply the summariser function to each of the vectors required: df_list => list of dfs
df_list <- lapply(df[,vecs_to_summarise], summariser)

# Rename the vectors of each data.frame in the list: df_list => list of dfs: 
df_list <- lapply(seq_along(df_list), function(i) {
  names(df_list[[i]]) <- gsub("_vec", "",
                              paste(names(df_list[i]), names(df_list[[i]]), sep = "_"))
  return(df_list[[i]])
})


# Determine the number of rows of the maximum data.frame: numeric scalar 
max_df_length <- max(sapply(df_list, nrow))

# Extend each data.frame to be the same length (pad with NAs if necessary): df_list => list
df_list <- lapply(seq_along(df_list), function(i){
  y <- data.frame(df_list[[i]][rep(seq_len(nrow(df_list[[1]])), each = 1),])
  y[1:(nrow(y)),] <- NA
  y <- y[1:(max_df_length - nrow(df_list[[i]])),]
  if(length(y) > 0){
    x <- data.frame(rbind(df_list[[i]], y)[1:max_df_length,])
  }else{
      x <- data.frame(df_list[[i]][1:max_df_length,])
      }
  return(x)
  }
)

# Bind the data.frames in the list into a single df: analysed_df => data.frame
analysed_df <- do.call("cbind", df_list)

问题陈述：

我正在使用 sys.date（）创建一个数据框。现在可能在特定日期某些变量或所有变量都不可用于 batch_no ，标记或 seq 。

I am creating a dataframe by using sys.date(). Now it is possible that for a particular date some or all variables are not available for either batch_no, marking or seq.

问题是，我想为列 batch_no ，制作和 seq 静态在 analysed_df 无论该变量中的某些变量或全部变量在特定日期是否可用。


The question is, I want to keep some variables for column batch_no, making and seq static in the analysed_df irrespective of if some of or all those variables are available in the dataframe for that particular date.
如果在特定日期没有这些变量，则计数和百分比对于该特定变量，分别为0和0.00％。
If those variables are not availabe for the particular date then the count and percentage would be 0 and 0.00% respectively for that particular variable.
输出：
seq      count  percentage   Marking     count     Percentage     batch_no   count    Percentage
FRD      1      12.50%       S1          2         25.00%         6          1        12.50%
FHL      1      12.50%       S2          1         12.50%         7          2        25.00%
ABC      2      25.00%       S3          1         12.50%         8          2        25.00%
DEF      1      12.50%       Hold        2         25.00%         9          1        12.50%
XYZ      1      12.50%       NA          1         12.50%         NA         1        12.50%
ZZZ      1      12.50%       (Blank)     1         12.50%         (Blank)    1        12.50%
FRD      1      12.50%         -         -           -             -         -           -
NA       1      12.50%         -         -           -             -         -           -
(Blank)  0      0.00%          -         -           -             -         -           -
Total    8      112.50%        -         8         100.00%         -         8         100.00%

 
 
推荐答案
 Base R解决方案：
Base R Solution: 
# Vector containing the all unique elements of the uid vector: 
# unique_ids => character vector:
reporting_vars <- c("seq", "marking", "batch_no")

# Empty list to store all unique reported vector's values: report_struc_list => list
report_struc_list <- vector("list", length(reporting_vars))

# Populate the list: report_struc_list => list
report_struc_list <- lapply(df[, reporting_vars], function(x){sort(unique(x))})

# Simplify to a data.frame: report_struc => data.frame
report_struc <- cbind(
  data.frame(lapply(report_list, function(x) {
    length(x) <- max(lengths(report_list))
    return(x)
  })),
  counter = 0,
  perc = 0
)

# Order the reporting data.frame: report_struc_ordered => data.frame
report_struc_ordered <- report_struc[, c("seq", "marking", "batch_no", 
                                         "counter", "perc")]

# Function to generate reports, input data.frame: analysed_df => data.frame
report_func <- function(df){
  # Function to count elements and calculate perc of total: 
  # analyse_func => function
  analyse_func <- function(df, vec){
    vec_summary <- data.frame(lapply(within(
    merge(rbind(setNames(
      aggregate(
        rep(1, nrow(df))~vec,
        df,
        FUN = sum,
        na.action = na.pass
      ),
      c(gsub(".*\\$", "", deparse(
        substitute(vec)
      )), "counter")
    ), c(NA, sum(
      is.na(df[, gsub(".*\\$", "", deparse(substitute(vec)))])
    ))),
    report_struc_ordered[!(report_struc_ordered[, gsub(".*\\$", "", deparse(substitute(vec)))]
                           %in% vec),
                         c(grep(
                           gsub(".*\\$", "", deparse(substitute(vec))),
                           names(report_struc_ordered),
                           value = TRUE
                         ),
                         "counter", "perc")],
    all = TRUE),
    {
      perc = paste0(round(counter / sum(counter) * 100, 2), "%")

    }
  ), as.character), stringsAsFactors = FALSE)

  # Append a total to the bottom of the data.frame: vec_summary => data.frame
  vec_summary <- setNames(rbind(vec_summary,
                       c("TOTAL",
                         as.character(sum(
                           as.numeric(vec_summary$counter)
                         )),
                         as.character(paste0(
                           sum(as.double(gsub(
                             "\\%", "",
                             vec_summary$perc
                           ))),
                           "%"
                         )))), c(gsub(".*\\$", "", deparse(substitute(vec))),
                                 paste(gsub(".*\\$", "", 
                                             deparse(substitute(vec))), 
                                        names(vec_summary)[2:ncol(vec_summary)],
                                       sep = "_")))
  }

  # Apply the function to each of the vectors required: vec_summ_list => list
  vec_summ_list <- list(
    seq_df = analyse_func(df, df$seq),
    marking_df = analyse_func(df, df$marking),
    batch_no_df = analyse_func(df, df$batch_no)
  )

  # Store a scalar containing the row count of the data.frame
  # with the most rows in the vec_summ_list: max_df_length => numeric vector
  max_df_length <- max(sapply(vec_summ_list, nrow))

  # Extend each data.frame to be the same length 
  # (pad with NAs if necessary): vec_summ_list => list
  vec_summ_list <- setNames(lapply(seq_along(vec_summ_list), function(i){
    # Replicate the amount rows required to be padded: y => data.frame
    y <- data.frame(vec_summ_list[[i]][rep(seq_len(max_df_length - nrow(vec_summ_list[[i]])), 
                                           each = 1),])
    # Nullify the rows: y => data.frame
    y[1:(nrow(y)),] <- "-"

    # If necessary bind the replicated rows to the underlying data.frame:
    # x => data.frame
    if(length(y) > 0){
      x <- data.frame(rbind(vec_summ_list[[i]], y)[1:max_df_length,])
    }else{
      x <- data.frame(df_list[[i]][1:max_df_length,])
    }
    # Move the total row to the bottom of the data.frame: x => data.frame
    x[nrow(x),] <- x[which(grepl("TOTAL", x[,1])),]

    # Nullify the total row thats not the last row: x => data.frame
    suppressWarnings(if(length(which(grepl("TOTAL", x[,1]) < nrow(x))) > 0){
     tmp <- x[which(grepl("TOTAL", x[,1])),]
     x[which(grepl("TOTAL", x[,1])),] <- as.character("-")
     x[nrow(x),] <- tmp 
    }else{
      x
    })

    # Define the return object: 
    return(x)

  }
  ), names(vec_summ_list))

  # Flatten the list into a data.frame: analysed_df => data.frame
  analysed_df <- Reduce(cbind, vec_summ_list)

}

# Store an empty list to contain each unique date: df_list => list
df_list <- vector("list", length(unique(df$Date)))

# Store an empty list to hold the daily reports: report_list => list
report_list <- df_list

# Split the data.frame into many data.frames by date: df_list => list
df_list <- split(df, df$Date)

# Store the base report as a list element for each date: report_list => list
report_list <- lapply(df_list, function(x) report_func(x))

数据：
df <-
  structure(
    list(
      uid = c("K-1", "K-1", "K-2", "K-3", "K-4", "K-5",
              "K-6", "K-7", "K-8"),
      Date = structure(
        c(
          1584321093,
          1584321093,
          1584321093,
          1584321093,
          1584321093,
          1584321093,
          1584321093,
          1584321093,
          1584321093
        ),
        class = c("POSIXct", "POSIXt"),
        tzone = ""
      ),
      batch_no = c(7L,
                   7L, 8L, 9L, 8L, NA, 7L, NA, 6L),
      marking = c("S1", "S1", "SE_hold1",
                  "SD_hold2", "S1", NA, NA, "S2", "S3"),
      seq = c("FRD", "FHL",
              "ABC", "DEF", "XYZ", "ABC", "ZZZ", NA, "FRD")
    ),
    row.names = c(NA,-9L),
    class = "data.frame"
  )


                        这篇关于如何使R中的摘要的某些变量保持静态的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何使R中的摘要的某些变量保持静态 [英] How to keep some variables static for the summary in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使R中的摘要的某些变量保持静态 [英] How to keep some variables static for the summary in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭