如何在提迪尔中正确使用扩散功能 [英] How to use the spread function properly in tidyr

查看:84
本文介绍了如何在提迪尔中正确使用扩散功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何更改下表:

Type    Name    Answer     n
TypeA   Apple   Yes        5
TypeA   Apple   No        10
TypeA   Apple   DK         8
TypeA   Apple   NA        20
TypeA   Orange  Yes        6
TypeA   Orange  No        11
TypeA   Orange  DK         8
TypeA   Orange  NA        23

更改为:

Type    Name    Yes   No   DK   NA  
TypeA   Apple   5     10   8    20
TypeA   Orange  6     11   8    23

我使用以下代码获取了第一张桌子.

I used the following codes to get the first table.

df_1 <- 
  df %>% 
  group_by(Type, Name, Answer) %>% 
  tally()  

然后我尝试使用传播命令进入第二张表,但是出现以下错误消息:

Then I tried to use the spread command to get to the 2nd table, but I got the following error message:

错误:所有列都必须命名"

"Error: All columns must be named"

df_2 <- spread(df_1, Answer)

推荐答案

在跟随ayk的评论之后,我提供了一个示例.在我看来,当您有一个data_frame的列包含具有NA值的因子或字符类的列时,除非删除它们或对数据进行重新分类,否则无法进行扩展.这特定于data_frame(请注意名称中带有下划线的dplyr类),因为在data.frame中具有NA值的情况下,此方法在我的示例中有效.例如,上面示例的稍微修改后的版本:

Following on the comment from ayk, I'm providing an example. It looks to me like when you have a data_frame with a column of either a factor or character class that has values of NA, this cannot be spread without either removing them or re-classifying the data. This is specific to a data_frame (note the dplyr class with the underscore in the name), as this works in my example when you have values of NA in a data.frame. For example, a slightly modified version of the example above:

这是数据框

library(dplyr)
library(tidyr)
df_1 <- data_frame(Type = c("TypeA", "TypeA", "TypeB", "TypeB"),
                   Answer = c("Yes", "No", NA, "No"),
                   n = 1:4)
df_1

哪个给出的data_frame看起来像这样

Which gives a data_frame that looks like this

Source: local data frame [4 x 3]

   Type Answer     n
  (chr)  (chr) (int)
1 TypeA    Yes     1
2 TypeA     No     2
3 TypeB     NA     3
4 TypeB     No     4

然后,当我们尝试对其进行整理时,会收到一条错误消息:

Then, when we try to tidy it, we get an error message:

df_1 %>% spread(key=Answer, value=n)
Error: All columns must be named

但是,如果我们删除不适用项,则它会起作用:

But if we remove the NA's then it 'works':

df_1 %>%
    filter(!is.na(Answer)) %>%
    spread(key=Answer, value=n)
Source: local data frame [2 x 3]

   Type    No   Yes
  (chr) (int) (int)
1 TypeA     2     1
2 TypeB     4    NA

但是,删除NA可能不会给您想要的结果:即,您可能希望将它们包含在整理的表中.您可以直接修改数据以将NA更改为更具描述性的值.另外,您可以将数据更改为data.frame,然后将其散布得很好:

However, removing the NAs may not give you the desired result: i.e. you might want those to be included in your tidied table. You could modify the data directly to change the NAs to a more descriptive value. Alternatively, you could change your data to a data.frame and then it spreads just fine:

as.data.frame(df_1) %>% spread(key=Answer, value=n)
   Type No Yes NA
1 TypeA  2   1 NA
2 TypeB  4  NA  3

这篇关于如何在提迪尔中正确使用扩散功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆