R dplyr,使用mutate与na.omit导致错误不兼容的大小(%d) [英] R dplyr, using mutate with na.omit causes error incompatible size (%d)

查看:123
本文介绍了R dplyr,使用mutate与na.omit导致错误不兼容的大小(%d)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在进行数据清理。我在Dplyr中使用mutate很多,因为它逐步生成新列,我可以很容易地看到它如何进行。



这里有两个示例,我有这个错误

 错误:不兼容大小(%d),期望%d(组大小)或1 






示例1:从邮政编码获取城镇名称。数据就像这样:

  Zip 
1 02345
2 02201

我注意到数据中有NA,它不起作用。



没有NA它可以工作:

 库(dplyr)
库(zipcode)
数据(zipcode)

test = data.frame(Zip = c('02345','02201'),stringsAsFactors = FALSE)

test%>%
rowwise()%>%
mutate(Town1 = zipcode [zipcode $ zip == na.omit(Zip),'city'])

导致

 源:本地数据框[2 x 2] 
组:< by row>

Zip Town1
1 02345 Manom​​et
2 02201波士顿

使用NA它不起作用:

 库(dplyr)
库(zipcode)
data(zipcode)

test = data.frame(Zip = c('02345','02201',NA),stringsAsFactors = FALSE)

test%> %
rowwise()%>%
mutate(Town1 = zipcode [zipcode $ zip == na.omit(Zip),'city'])

导致

 错误:不兼容大小(% d),期望%d(组大小)或1 






例2。我想在以下数据中删除城市列中的冗余状态名称。

 城镇状态
1 BOSTON MA MA
2 NORTH AMAMS MA
3 CHICAGO IL IL

是我如何做:
(1)将城镇中的字符串分割成单词,例如
(2)看看这些单词是否匹配该行的状态
(3)删除匹配的单词

  library(dplyr)
test = data.frame(Town = c('BOSTON MA','NORTH AMAMS','CHICAGO IL'),State = c('MA','MA','IL'),stringsAsFactors = FALSE)

test%>%
mutate(Town.word = strsplit(Town,split = '))%>%
rowwise()%>%#rowwise确保每个计算只考虑修正行
mutate(is.state = match(State,Town.word))%>%
mutate(Town1 = Town.word [-is.state])

  Town State Town.word is.state Town1 
1 BOSTON MA MA< chr [2]> 2 BOSTON
2 NORTH AMAMS MA< chr [2]> NA NA
3芝加哥IL IL< chr [2]> 2 CHICAGO

含义:例如,第1行显示is.state == 2,表示第2个字镇是州名。摆脱这项工作后,Town1是正确的城镇名称。



现在我想修复第2行中的NA,但是添加na.omit会导致错误: p>

  test%>%
mutate(Town.word = strsplit(Town,split =''))%> %
rowwise()%>%#rowwise确保每个计算只考虑一下行
mutate(is.state = match(State,Town.word))%>%
mutate Town1 = Town.word [-na.omit(is.state)])

结果: / p>

 错误:不兼容大小(%d),期望%d(组大小)或1 





$ b

$ b

  test%>%
mutate(Town.word = strsplit(Town,split =''))%>%
rowwise() %>%#rowwise确保每个计算只考虑修正行
mutate(is.state = match(State,Town.word))%>%
mutate(length(is.state))% >%
mutate(class(na.omit(is.s tate)))

结果:

  Town State Town.word is.state length(is.state)class(na.omit(is.state))
1 BOSTON MA MA< chr [2]> ; 2 1整数
2 NORTH AMAMS MA< chr [2]> NA 1整数
3 CHICAGO IL IL

所以这是%d的长度== 1。有人可能错了吗?谢谢

解决方案

你可以只是 sub

  test%>%
rowwise()%>%
mutate(Town = sub(sprintf ,] *%s $','',Town))
##来源:本地数据框[3 x 2]
##组:< by row>
##
##镇州
## 1 BOSTON MA
## 2 NORTH AMAMS MA
## 3 CHICAGO IL

(这样也可以在城镇之后捕获逗号,如果发生这种情况)。



注意:如果您使用 ungroup()这里使用 rowwise_df (原样),它将擦除 tbl_df 类,并输出一个直接的data.frame,这对你的数据是好的,但如果你不小心,并且正在查看大量的数据,将会屏蔽你的屏幕因为我做了无数次)。 (Github参考#936 #553 。)


I'm doing data cleaning. I use mutate in Dplyr a lot since it generates new columns step by step and I can easily see how it goes.

Here are two examples where I have this error

Error: incompatible size (%d), expecting %d (the group size) or 1


Example 1: Get town name from zipcode. Data is simply like this:

    Zip
1 02345
2 02201

And I notice when the data has NA in it, it doesn't work.

Without NA it works:

library(dplyr)
library(zipcode)
data(zipcode)

test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE)

test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )

resulting in

Source: local data frame [2 x 2]
Groups: <by row>

    Zip   Town1
1 02345 Manomet
2 02201  Boston

With NA it doesn't work:

library(dplyr)
library(zipcode)
data(zipcode)

test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE)

test %>%
  rowwise() %>%
  mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )

resulting in

Error: incompatible size (%d), expecting %d (the group size) or 1


Example2. I wanna get rid of the redundant state name that occurs in the Town column in the following data.

         Town State
1   BOSTON MA    MA
2 NORTH AMAMS    MA
3  CHICAGO IL    IL

This is how I do it: (1) split the string in Town into words, e.g. 'BOSTON' and 'MA' for row 1. (2) see if any of these words match the State of that line (3) delete the matched words

library(dplyr)
test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE)

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-is.state])

This results in:

         Town State Town.word is.state   Town1
1   BOSTON MA    MA  <chr[2]>        2  BOSTON
2 NORTH AMAMS    MA  <chr[2]>       NA      NA
3  CHICAGO IL    IL  <chr[2]>        2 CHICAGO

Meaning: E.g., row 1 shows is.state==2, meaning the 2nd word in Town is the state name. After getting rid of that work, Town1 is the correct town name.

Now I wanna fix the NA in row 2, but add na.omit would cause error:

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(Town1 = Town.word[-na.omit(is.state)]) 

results in:

Error: incompatible size (%d), expecting %d (the group size) or 1


I checked the data type and size:

test %>%
  mutate(Town.word = strsplit(Town, split=' ')) %>%
  rowwise() %>% # rowwise ensures every calculation only consider currect row
  mutate(is.state = match(State,Town.word ) ) %>%
  mutate(length(is.state) ) %>%       
  mutate(class(na.omit(is.state)))

results in:

         Town State Town.word is.state length(is.state) class(na.omit(is.state))
1   BOSTON MA    MA  <chr[2]>        2                1                  integer
2 NORTH AMAMS    MA  <chr[2]>       NA                1                  integer
3  CHICAGO IL    IL  <chr[2]>        2                1                  integer

So it is %d of length==1. Can somebody where's wrong? Thanks

解决方案

Can you just sub it out?

test %>%
    rowwise() %>%
    mutate(Town=sub(sprintf('[, ]*%s$', State), '', Town))
## Source: local data frame [3 x 2]
## Groups: <by row>
##
##          Town State
## 1      BOSTON    MA
## 2 NORTH AMAMS    MA
## 3     CHICAGO    IL

(This way also catches commas after the town, if that happens.)

NB: if you use ungroup() here with a rowwise_df (as this is), it will wipe the tbl_df class as well and output a straight data.frame, which is fine for your data but will clobber your screen if you aren't careful and are looking at large amounts of data (as I've done countless times). (Github references #936 and #553.)

这篇关于R dplyr,使用mutate与na.omit导致错误不兼容的大小(%d)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆