将普通的R数据帧转换为粗糙的格式(la ftable) [英] Convert normal R data frame into ragged format (a la ftable)

查看:51
本文介绍了将普通的R数据帧转换为粗糙的格式(la ftable)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

R 中,函数 ftable()默认情况下会创建一个具有所谓的衣衫r外观的表:

In R the function ftable() creates by default a table with a so called ragged appearance:

data(UCBAdmissions)
ftable(UCBAdmissions)

...

                Dept   A   B   C   D   E   F
Admit    Gender                             
Admitted Male        512 353 120 138  53  22
         Female       89  17 202 131  94  24
Rejected Male        313 207 205 279 138 351
         Female       19   8 391 244 299 317

行和列参差不齐",因为标签仅在它们更改时才显示,并且明显的约定是从上至下读取行,从左至右读取列.( https://cran.r-project.org/doc/manuals/r-devel/R-data.html#Flat-contingency-tables )

Rows and columns are "ragged" in the sense that labels are only displayed when they change, with the obvious convention that rows are read from top to bottom and columns are read from left to right. (https://cran.r-project.org/doc/manuals/r-devel/R-data.html#Flat-contingency-tables)

问题:

如何为普通的 data.frame 对象获得相同的参差不齐"外观?

How can I get same "ragged" appearance for a normal data.frame object?

可复制的示例:

before= data.frame(C1= c(rep("A", 5), rep("L", 2)),
                   C2= c("B", rep("E", 3), rep("K", 2), "L"),
                   C3= c("C", "F", rep("H", 5)),
                   C4= c("D", "G", "I", rep("J", 4)), 
                   stringsAsFactors = FALSE)

before

...

  C1 C2 C3 C4
1  A  B  C  D
2  A  E  F  G
3  A  E  H  I
4  A  E  H  J
5  A  K  H  J
6  L  K  H  J
7  L  L  H  J

函数的外观如何,它将对象 before 转换为类 data.frame 的新对象 after ,该对象被打印到带有 print(after)的控制台,如下所示...

How does a function look like which converts the object before to a new object after of class data.frame, which is printed to console with print(after) as follows...

  C1 C2 C3 C4
1  A  B  C  D
2     E  F  G
3        H  I
4           J
5     K  H  J
6  L  K  H  J
7     L  H  J

如有必要,可以接受这种演示格式丢失的遗留数据.

If necessary, it is acceptable that the left out data are lost for this presentation format.

推荐答案

也许不是最优雅的解决方案(a.很多 for 循环,b.将任何类型的列强制转换为字符,c.没有输入断言,例如d.slow等),但是遵循函数 rag_blank 似乎基本上可以按照示例中的要求进行操作:

Maybe not the most elegant solution (a. lots of for loops, b. coercing any type of column to character, c. no input assertions, d. slow, etc.), but following function rag_blank seems to basically work as requested on the example:

## Task

before= data.frame(C1= c(rep("A", 5), rep("L", 2)),
                   C2= c("B", rep("E", 3), rep("K", 2), "L"),
                   C3= c("C", "F", rep("H", 5)),
                   C4= c("D", "G", "I", rep("J", 4)), 
                   stringsAsFactors = FALSE)

before


## Solution

library(dplyr)

rag_blank= function(x, cols= seq_along(x), blank= ":"){

  # Copy input
  res= x

  # 1st step: blank trailing cells
  for(df_col in cols){
    res[, df_col]= as.character(unlist(res[, df_col]))
    x[, df_col]= as.character(unlist(x[, df_col]))
    re= rle(unlist(res[, df_col]))
    re_df= data.frame(value= re$values,
                      length= re$lengths,
                      stringsAsFactors = F) %>%
      mutate(idx_start= cumsum(length) - length + 2,
             idx_end= idx_start + length -2)
    for(re_row in 1:nrow(re_df)){
      if(re_df$idx_start[re_row]<= re_df$idx_end[re_row]){
        res[(re_df$idx_start[re_row]:re_df$idx_end[re_row]), df_col]= blank
      }
    }
  }

  # 2nd step: restore value if blank, resp. changed from 1st step but left cell it is not blank
  for(df_col in cols[-1]){
    changed_before= res[, df_col]!= x[, df_col]
    left_not_changed= res[, df_col-1]== x[, df_col-1]
    to_change= changed_before & left_not_changed
    res[to_change, df_col]= x[to_change, df_col]
  }

  res
}

rag_blank(before)

...

  C1 C2 C3 C4
1  A  B  C  D
2  :  E  F  G
3  :  :  H  I
4  :  :  :  J
5  :  K  H  J
6  L  K  H  J
7  :  L  H  J

在某些情况下,不适合使用空格,这可能会有所帮助:

In some cases applying blanks is not appropriate, then this may be helpful:

rag_index= function(x){
  rag_blank(x) != x
}

rag_index(before)

...

        C1    C2    C3    C4
[1,] FALSE FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE FALSE
[3,]  TRUE  TRUE FALSE FALSE
[4,]  TRUE  TRUE  TRUE FALSE
[5,]  TRUE FALSE FALSE FALSE
[6,] FALSE FALSE FALSE FALSE
[7,]  TRUE FALSE FALSE FALSE

更忙的示例:

data("diamonds", package = "ggplot2")
print(rag_blank(x= head(diamonds, 30)), n= 100)

...

Source: local data frame [30 x 10]

   carat       cut color clarity depth table price     x     y     z
   <chr>     <chr> <chr>   <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1   0.23     Ideal     E     SI2  61.5    55   326  3.95  3.98  2.43
2   0.21   Premium     E     SI1  59.8    61   326  3.89  3.84  2.31
3   0.23      Good     E     VS1  56.9    65   327  4.05  4.07  2.31
4   0.29   Premium     I     VS2  62.4    58   334   4.2  4.23  2.63
5   0.31      Good     J     SI2  63.3    58   335  4.34  4.35  2.75
6   0.24 Very Good     J    VVS2  62.8    57   336  3.94  3.96  2.48
7      :         :     I    VVS1  62.3    57   336  3.95  3.98  2.47
8   0.26 Very Good     H     SI1  61.9    55   337  4.07  4.11  2.53
9   0.22      Fair     E     VS2  65.1    61   337  3.87  3.78  2.49
10  0.23 Very Good     H     VS1  59.4    61   338     4  4.05  2.39
11   0.3      Good     J     SI1    64    55   339  4.25  4.28  2.73
12  0.23     Ideal     J     VS1  62.8    56   340  3.93   3.9  2.46
13  0.22   Premium     F     SI1  60.4    61   342  3.88  3.84  2.33
14  0.31     Ideal     J     SI2  62.2    54   344  4.35  4.37  2.71
15   0.2   Premium     E     SI2  60.2    62   345  3.79  3.75  2.27
16  0.32   Premium     E      I1  60.9    58   345  4.38  4.42  2.68
17   0.3     Ideal     I     SI2    62    54   348  4.31  4.34  2.68
18     :      Good     J     SI1  63.4    54   351  4.23  4.29   2.7
19     :         :     :       :  63.8    56   351  4.23  4.26  2.71
20     : Very Good     J     SI1  62.7    59   351  4.21  4.27  2.66
21     :      Good     I     SI2  63.3    56   351  4.26   4.3  2.71
22  0.23 Very Good     E     VS2  63.8    55   352  3.85  3.92  2.48
23     :         :     H     VS1    61    57   353  3.94  3.96  2.41
24  0.31 Very Good     J     SI1  59.4    62   353  4.39  4.43  2.62
25     :         :     :       :  58.1    62   353  4.44  4.47  2.59
26  0.23 Very Good     G    VVS2  60.4    58   354  3.97  4.01  2.41
27  0.24   Premium     I     VS1  62.5    57   355  3.97  3.94  2.47
28   0.3 Very Good     J     VS2  62.2    57   357  4.28   4.3  2.67
29  0.23 Very Good     D     VS2  60.5    61   357  3.96  3.97   2.4
30     :         :     F     VS1  60.9    57   357  3.96  3.99  2.42

如果有更好的解决方案,请感谢您的反馈.

In case there are more elegant solutions, appreciate your feedback.

这篇关于将普通的R数据帧转换为粗糙的格式(la ftable)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆