R dplyr,使用mutate与na.omit导致错误不兼容的大小(%d) [英] R dplyr, using mutate with na.omit causes error incompatible size (%d)
问题描述
这里有两个示例,我有这个错误
错误:不兼容大小(%d),期望%d(组大小)或1
示例1:从邮政编码获取城镇名称。数据就像这样:
Zip
1 02345
2 02201
我注意到数据中有NA,它不起作用。
没有NA它可以工作:
库(dplyr)
库(zipcode)
数据(zipcode)
test = data.frame(Zip = c('02345','02201'),stringsAsFactors = FALSE)
test%>%
rowwise()%>%
mutate(Town1 = zipcode [zipcode $ zip == na.omit(Zip),'city'])
导致
源:本地数据框[2 x 2]
组:< by row>
Zip Town1
1 02345 Manomet
2 02201波士顿
使用NA它不起作用:
库(dplyr)
库(zipcode)
data(zipcode)
test = data.frame(Zip = c('02345','02201',NA),stringsAsFactors = FALSE)
test%> %
rowwise()%>%
mutate(Town1 = zipcode [zipcode $ zip == na.omit(Zip),'city'])
导致
错误:不兼容大小(% d),期望%d(组大小)或1
例2。我想在以下数据中删除城市列中的冗余状态名称。
城镇状态
1 BOSTON MA MA
2 NORTH AMAMS MA
3 CHICAGO IL IL
是我如何做:
(1)将城镇中的字符串分割成单词,例如
(2)看看这些单词是否匹配该行的状态
(3)删除匹配的单词
library(dplyr)
test = data.frame(Town = c('BOSTON MA','NORTH AMAMS','CHICAGO IL'),State = c('MA','MA','IL'),stringsAsFactors = FALSE)
test%>%
mutate(Town.word = strsplit(Town,split = '))%>%
rowwise()%>%#rowwise确保每个计算只考虑修正行
mutate(is.state = match(State,Town.word))%>%
mutate(Town1 = Town.word [-is.state])
:
Town State Town.word is.state Town1
1 BOSTON MA MA< chr [2]> 2 BOSTON
2 NORTH AMAMS MA< chr [2]> NA NA
3芝加哥IL IL< chr [2]> 2 CHICAGO
含义:例如,第1行显示is.state == 2,表示第2个字镇是州名。摆脱这项工作后,Town1是正确的城镇名称。
现在我想修复第2行中的NA,但是添加na.omit会导致错误: p>
test%>%
mutate(Town.word = strsplit(Town,split =''))%> %
rowwise()%>%#rowwise确保每个计算只考虑一下行
mutate(is.state = match(State,Town.word))%>%
mutate Town1 = Town.word [-na.omit(is.state)])
结果: / p>
错误:不兼容大小(%d),期望%d(组大小)或1
$ c我检查了数据类型和大小:
$ b $ b
test%>%
mutate(Town.word = strsplit(Town,split =''))%>%
rowwise() %>%#rowwise确保每个计算只考虑修正行
mutate(is.state = match(State,Town.word))%>%
mutate(length(is.state))% >%
mutate(class(na.omit(is.s tate)))
结果:
Town State Town.word is.state length(is.state)class(na.omit(is.state))
1 BOSTON MA MA< chr [2]> ; 2 1整数
2 NORTH AMAMS MA< chr [2]> NA 1整数
3 CHICAGO IL IL
所以这是%d的长度== 1。有人可能错了吗?谢谢
解决方案你可以只是 sub
test%>%
rowwise()%>%
mutate(Town = sub(sprintf ,] *%s $','',Town))
##来源:本地数据框[3 x 2]
##组:< by row>
##
##镇州
## 1 BOSTON MA
## 2 NORTH AMAMS MA
## 3 CHICAGO IL
(这样也可以在城镇之后捕获逗号,如果发生这种情况)。
注意:如果您使用 ungroup()
这里使用 rowwise_df
(原样),它将擦除 tbl_df
类,并输出一个直接的data.frame,这对你的数据是好的,但如果你不小心,并且正在查看大量的数据,将会屏蔽你的屏幕因为我做了无数次)。 (Github参考#936 和#553 。)
I'm doing data cleaning. I use mutate in Dplyr a lot since it generates new columns step by step and I can easily see how it goes.
Here are two examples where I have this error
Error: incompatible size (%d), expecting %d (the group size) or 1
Example 1: Get town name from zipcode. Data is simply like this:
Zip
1 02345
2 02201
And I notice when the data has NA in it, it doesn't work.
Without NA it works:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201'),stringsAsFactors=FALSE)
test %>%
rowwise() %>%
mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
resulting in
Source: local data frame [2 x 2]
Groups: <by row>
Zip Town1
1 02345 Manomet
2 02201 Boston
With NA it doesn't work:
library(dplyr)
library(zipcode)
data(zipcode)
test = data.frame(Zip=c('02345','02201',NA),stringsAsFactors=FALSE)
test %>%
rowwise() %>%
mutate( Town1 = zipcode[zipcode$zip==na.omit(Zip),'city'] )
resulting in
Error: incompatible size (%d), expecting %d (the group size) or 1
Example2. I wanna get rid of the redundant state name that occurs in the Town column in the following data.
Town State
1 BOSTON MA MA
2 NORTH AMAMS MA
3 CHICAGO IL IL
This is how I do it:
(1) split the string in Town into words, e.g. 'BOSTON' and 'MA' for row 1.
(2) see if any of these words match the State of that line
(3) delete the matched words
library(dplyr)
test = data.frame(Town=c('BOSTON MA','NORTH AMAMS','CHICAGO IL'), State=c('MA','MA','IL'), stringsAsFactors=FALSE)
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(Town1 = Town.word[-is.state])
This results in:
Town State Town.word is.state Town1
1 BOSTON MA MA <chr[2]> 2 BOSTON
2 NORTH AMAMS MA <chr[2]> NA NA
3 CHICAGO IL IL <chr[2]> 2 CHICAGO
Meaning: E.g., row 1 shows is.state==2, meaning the 2nd word in Town is the state name. After getting rid of that work, Town1 is the correct town name.
Now I wanna fix the NA in row 2, but add na.omit would cause error:
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(Town1 = Town.word[-na.omit(is.state)])
results in:
Error: incompatible size (%d), expecting %d (the group size) or 1
I checked the data type and size:
test %>%
mutate(Town.word = strsplit(Town, split=' ')) %>%
rowwise() %>% # rowwise ensures every calculation only consider currect row
mutate(is.state = match(State,Town.word ) ) %>%
mutate(length(is.state) ) %>%
mutate(class(na.omit(is.state)))
results in:
Town State Town.word is.state length(is.state) class(na.omit(is.state))
1 BOSTON MA MA <chr[2]> 2 1 integer
2 NORTH AMAMS MA <chr[2]> NA 1 integer
3 CHICAGO IL IL <chr[2]> 2 1 integer
So it is %d of length==1. Can somebody where's wrong? Thanks
解决方案 Can you just sub
it out?
test %>%
rowwise() %>%
mutate(Town=sub(sprintf('[, ]*%s$', State), '', Town))
## Source: local data frame [3 x 2]
## Groups: <by row>
##
## Town State
## 1 BOSTON MA
## 2 NORTH AMAMS MA
## 3 CHICAGO IL
(This way also catches commas after the town, if that happens.)
NB: if you use ungroup()
here with a rowwise_df
(as this is), it will wipe the tbl_df
class as well and output a straight data.frame, which is fine for your data but will clobber your screen if you aren't careful and are looking at large amounts of data (as I've done countless times). (Github references #936 and #553.)
这篇关于R dplyr,使用mutate与na.omit导致错误不兼容的大小(%d)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!