使用子集()时确定哪个列名导致“选择未定义的列"错误 [英] Determine which column name is causing 'undefined columns selected' error when using subset()
问题描述
我正在尝试从一个非常大的数据帧中提取一个大数据帧的子集,使用
I'm trying to subset a large data frame from a very large data frame, using
data.new <- subset(data, select = vector)
其中 vector 是一个字符串,其中包含我试图隔离的列名.当我这样做时,我得到
where vector is a character string containing the column names I'm trying to isolate. When I do this I get
Error in `[.data.frame`(x, r, vars, drop = drop) :
undefined columns selected
有没有办法确定向量中的哪个特定列名是未定义的?通过反复试验,我将范围缩小到大约 400 个,但这仍然无济于事.
Is there a way to identify which specific column name in the vector is undefined? Through trial and error I've narrowed it down to about 400, but that still doesn't help.
推荐答案
找到您的向量中不是 %in%
数据的 names()
的元素框架.
Find the elements of your vector that are not %in%
the names()
of your data frame.
工作示例:
dd <- data.frame(a=1,b=2)
subset(dd,select=c("a"))
## a
## 1 1
现在尝试一些不起作用的东西:
Now try something that doesn't work:
v <- c("a","d")
subset(dd,select=v)
## Error in `[.data.frame`(x, r, vars, drop = drop) :
## undefined columns selected
v[!v %in% names(dd)]
## [1] "d"
或
setdiff(v,names(dd))
## [1] "d"
?match
中示例代码的最后几行显示了类似的情况.
The last few lines of the example code in ?match
show a similar case.
这篇关于使用子集()时确定哪个列名导致“选择未定义的列"错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!