转换 data.table 中的 *some* 列类 [英] Convert *some* column classes in data.table
问题描述
我想将 data.table cols 的一个子集转换为一个新类.这里有一个流行的问题(Convert column classes in data.table)但是answer 创建一个新对象,而不是对起始对象进行操作.
I want to convert a subset of data.table cols to a new class. There's a popular question here (Convert column classes in data.table) but the answer creates a new object, rather than operating on the starter object.
举个例子:
dat <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
cols <- c('ID', 'Quarter')
如何最好地将 cols
列转换为(例如)一个因素?在普通的 data.frame 中,您可以这样做:
How best to convert to just the cols
columns to (e.g.) a factor? In a normal data.frame you could do this:
dat[, cols] <- lapply(dat[, cols], factor)
但这不适用于 data.table,这也不是
but that doesn't work for a data.table, and neither does this
dat[, .SD := lapply(.SD, factor), .SDcols = cols]
来自 Matt Dowle(2013 年 12 月)的链接问题中的评论建议以下内容,效果很好,但似乎不太优雅.
A comment in the linked question from Matt Dowle (from Dec 2013) suggests the following, which works fine, but seems a bit less elegant.
for (j in cols) set(dat, j = j, value = factor(dat[[j]]))
目前是否有更好的 data.table 答案(即更短的 + 不会生成计数器变量),还是应该只使用上面的 + rm(j)
?
Is there currently a better data.table answer (i.e. shorter + doesn't generate a counter variable), or should I just use the above + rm(j)
?
推荐答案
除了使用 Matt Dowle 建议的选项外,另一种更改列类的方法如下:
Besides using the option as suggested by Matt Dowle, another way of changing the column classes is as follows:
dat[, (cols) := lapply(.SD, factor), .SDcols = cols]
通过使用 :=
运算符,您可以通过引用更新数据表.检查这是否有效:
By using the :=
operator you update the datatable by reference. A check whether this worked:
> sapply(dat,class)
ID Quarter value
"factor" "factor" "numeric"
根据@MattDowle 在评论中的建议,您还可以使用 for(...) set(...)
的组合,如下所示:
As suggeted by @MattDowle in the comments, you can also use a combination of for(...) set(...)
as follows:
for (col in cols) set(dat, j = col, value = factor(dat[[col]]))
这将给出相同的结果.第三种选择是:
which will give the same result. A third alternative is:
for (col in cols) dat[, (col) := factor(dat[[col]])]
在较小的数据集上,for(...) set(...)
选项比 lapply
选项快大约三倍(但这并不t 真的很重要,因为它是一个小数据集).在较大的数据集(例如 200 万行)上,这些方法中的每一种都需要大约相同的时间.为了在更大的数据集上进行测试,我使用了:
On a smaller datasets, the for(...) set(...)
option is about three times faster than the lapply
option (but that doesn't really matter, because it is a small dataset). On larger datasets (e.g. 2 million rows), each of these approaches takes about the same amount of time. For testing on a larger dataset, I used:
dat <- data.table(ID=c(rep("A", 1e6), rep("B",1e6)),
Quarter=c(1:1e6, 1:1e6),
value=rnorm(10))
<小时>
有时,您必须做一些不同的事情(例如,当数值存储为一个因子时).然后你必须使用这样的东西:
Sometimes, you will have to do it a bit differently (for example when numeric values are stored as a factor). Then you have to use something like this:
dat[, (cols) := lapply(.SD, function(x) as.integer(as.character(x))), .SDcols = cols]
<小时><小时>
警告: 下面的解释是不是 data.table
的做事方式.数据表不会通过引用进行更新,因为已制作副本并将其存储在内存中(如@Frank 所指出的那样),这会增加内存使用量.它更多的是为了解释 with = FALSE
的工作原理.
WARNING: The following explanation is not the data.table
-way of doing things. The datatable is not updated by reference because a copy is made and stored in memory (as pointed out by @Frank), which increases memory usage. It is more an addition in order to explain the working of with = FALSE
.
当您想像使用数据框一样更改列类时,您必须添加 with = FALSE
,如下所示:
When you want to change the column classes the same way as you would do with a dataframe, you have to add with = FALSE
as follows:
dat[, cols] <- lapply(dat[, cols, with = FALSE], factor)
检查这是否有效:
> sapply(dat,class)
ID Quarter value
"factor" "factor" "numeric"
如果不添加 with = FALSE
,datatable 会将 dat[, cols]
评估为向量.检查 dat[, cols]
和 dat[, cols, with = FALSE]
之间的输出差异:
If you don't add with = FALSE
, datatable will evaluate dat[, cols]
as a vector. Check the difference in output between dat[, cols]
and dat[, cols, with = FALSE]
:
> dat[, cols]
[1] "ID" "Quarter"
> dat[, cols, with = FALSE]
ID Quarter
1: A 1
2: A 2
3: A 3
4: A 4
5: A 5
6: B 1
7: B 2
8: B 3
9: B 4
10: B 5
这篇关于转换 data.table 中的 *some* 列类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!