在数据框的列上运行 factor() 时出错 [英] error in running factor() on a column of a data frame
问题描述
我有一个包含多列的数据框.我想在其中一列上运行 factor() 函数,比如名称 my_col.最初我是这样做的
I have a dataframe which has several columns. I want to run the factor() function on one of the columns, say name my_col. Initially I did it this way
df[,"my_col"]<-factor((df[,"my_col"]))
出现以下错误
错误:对于sort.list",x"必须是原子的清单?
Error: 'x' must be atomic for 'sort.list' Have you called 'sort' on a list?
关于参考 类似问题,所以我的问题解决了.
On referring to a similar question on SO my problem was solved.
现在,如果我尝试以下代码而不是第一种方法,它可以完美运行而不会出现任何错误
Now if instead of the first method I try the following code, it works perfectly without giving any error
df$"my_col"<-factor(df$"my_col")
这是为什么?通过 df$vec_name 和 df[,vec_name] 访问列有区别吗?
Why's that? Is there a difference between accessing a column via df$vec_name and df[,vec_name]?
更新:
str(df)
Classes 'tbl_df', 'tbl' and 'data.frame': 160 obs. of 8 variables:
$ area : int 1 1 1 1 1 1 1 1 1 1 ...
$ temp : int 1 1 1 1 1 1 1 1 1 1 ...
$ size : int 1 1 1 1 1 1 1 1 1 1 ...
$ storage : int 1 1 1 1 1 2 2 2 2 2 ...
$ my_col : int 1 2 3 4 5 1 2 3 4 5 ...
$ texture : num 2.9 2.3 2.5 2.1 1.9 1.8 2.6 3 2.2 2 ...
$ flavor : num 3.2 2.5 2.8 2.9 2.8 3 3.1 3 3.2 2.8 ...
$ moistness: num 3 2.6 2.8 2.4 2.2 1.7 2.4 2.9 2.5 1.9 ...
推荐答案
你的数据是一个 tbl_df
.我没有你的数据,但我们可以看一个使用 mtcars
的例子.
Your data is a tbl_df
. I don't have your data, but we can look at an example using mtcars
.
library(dplyr)
tbl_df(mtcars)[, "mpg"]
# Source: local data frame [32 x 1]
#
# mpg
# (dbl)
# 1 21.0
# 2 21.0
# 3 22.8
# 4 21.4
# 5 18.7
# 6 18.1
# 7 14.3
# 8 24.4
# 9 22.8
# 10 19.2
# .. ...
它仍然是一个数据框,而在基数 R 中它会被丢弃到一个原子向量中.dplyr::`[.tbl_df`
不会删除单个列,就像在基础 R 的 [.data.frame
中所做的那样.这就是我们不能运行的原因factor()
就可以了.
It's still a data frame, whereas in base R it would have been dropped to an atomic vector. dplyr:::`[.tbl_df`
does not drop single columns, as is done in [.data.frame
from base R. This is why we can't run factor()
on it.
factor(tbl_df(mtcars)[, "mpg"])
# Error in sort.list(y) : 'x' must be atomic for 'sort.list'
# Have you called 'sort' on a list?
所以你需要使用 [[
,就像在 df[["my_col"]]
中一样,或者只使用 $
.
So you'll need to use [[
, as in df[["my_col"]]
, or just use $
.
df[["my_col"]] <- factor(df[["my_col"]])
注意:当您使用 $
运算符时,您可以不用在列名周围加上引号.
Note: When you use the $
operator you can do it without the quotes around the column name.
df$my_col <- factor(df$my_col)
这篇关于在数据框的列上运行 factor() 时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!