在data.table中转换一些列类 [英] Convert some* column classes in data.table*

查看：175 发布时间：2017/3/12 10:20:33 r data.table

本文介绍了在data.table中转换*一些*列类的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想将data.table cols的子集转换为新类。这里有一个常见问题（在data.table中转换列类），但是answer

I want to convert a subset of data.table cols to a new class. There's a popular question here (Convert column classes in data.table) but the answer creates a new object, rather than operating on the starter object.

以这个例子：

dat <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
cols <- c('ID', 'Quarter')

如何最好转换为 cols 列到（例如）一个因素？在正常的data.frame中，你可以这样做：

How best to convert to just the cols columns to (e.g.) a factor? In a normal data.frame you could do this:

dat[, cols] <- lapply(dat[, cols], factor)

但不适用于data.table， p>

but that doesn't work for a data.table, and neither does this

dat[, .SD := lapply(.SD, factor), .SDcols = cols]

Matt Dowle的链接问题（来自2013年12月）中的评论表明以下内容可行，但似乎不那么优雅。

A comment in the linked question from Matt Dowle (from Dec 2013) suggests the following, which works fine, but seems a bit less elegant.

for (j in cols) set(dat, j = j, value = factor(dat[[j]]))

目前有一个更好的data.table答案（即更短+不生成计数器变量），或者应该使用上面的< c> rm（j）？

Is there currently a better data.table answer (i.e. shorter + doesn't generate a counter variable), or should I just use the above + rm(j)?

推荐答案

p>除了使用Matt Dowle建议的选项之外，改变列类的另一种方法如下：

Besides using the option as suggested by Matt Dowle, another way of changing the column classes is as follows:

dat[, (cols) := lapply(.SD, factor), .SDcols=cols]

：= 运算符可以通过引用更新datatable。检查这是否有效：

By using the := operator you update the datatable by reference. A check whether this worked:

> sapply(dat,class)
       ID   Quarter     value 
 "factor"  "factor" "numeric"

由@MattDowle在注释中也可以使用 for（...）set（...）的组合，如下所示： / p>

As suggeted by @MattDowle in the comments, you can also use a combination of for(...) set(...) as follows:

for (col in cols) set(dat, j = col, value = factor(dat[[col]]))

这将给出相同的结果。第三个选择是：

which will give the same result. A third alternative is:

for (col in cols) dat[, (col) := factor(dat[[col]])]

在较小的数据集上， set（...）选项大约比 lapply 选项快三倍（但这并不重要，数据集）。在较大的数据集（例如200万行）上，这些方法中的每一个花费大约相同的时间量。为了测试一个更大的数据集，我使用了：

On a smaller datasets, the for(...) set(...) option is about three times faster than the lapply option (but that doesn't really matter, because it is a small dataset). On larger datasets (e.g. 2 million rows), each of these approaches takes about the same amount of time. For testing on a larger dataset, I used:

dat <- data.table(ID=c(rep("A", 1e6), rep("B",1e6)),
                  Quarter=c(1:1e6, 1:1e6),
                  value=rnorm(10))

有时候，你需要做一些不同的改变作为因子存储）。然后你必须使用这样的东西：

Sometimes, you will have to do it a bit differently (for example when numeric values are stored as a factor). Then you have to use something like this:

dat[, (cols) := lapply(.SD, function(x) as.integer(as.character(x))), .SDcols=cols]

警告： 以下说明 data.table - 做事情。数据表不会通过引用更新，因为复制是在内存中进行并存储的（如@Frank指出），这增加了内存使用。这是更多的，以解释 with = FALSE 的工作。

WARNING: The following explanation is not the data.table-way of doing things. The datatable is not updated by reference because a copy is made and stored in memory (as pointed out by @Frank), which increases memory usage. It is more an addition in order to explain the working of with=FALSE.

想要像使用数据框一样改变列类，你必须用= FALSE 添加，如下所示：
When you want to change the column classes the same way as you would do with a dataframe, you have to add with = FALSE as follows: dat[, cols] <- lapply(dat[, cols, with = FALSE], factor) 检查这是否工作： > sapply(dat,class) ID Quarter value "factor" "factor" "numeric"
如果不用= FALSE 添加，datatable将会评估 dat [，cols] 矢量。检查 dat [，cols] 和 dat [，cols，with = FALSE] 之间的输出差异：
If you don't add with = FALSE, datatable will evaluate dat[, cols] as a vector. Check the difference in output between dat[, cols] and dat[, cols, with=FALSE]: > dat[, cols] [1] "ID" "Quarter" > dat[, cols, with=FALSE] ID Quarter 1: A 1 2: A 2 3: A 3 4: A 4 5: A 5 6: B 1 7: B 2 8: B 3 9: B 4 10: B 5 这篇关于在data.table中转换*一些*列类的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在data.table中转换一些列类 [英] Convert some* column classes in data.table*

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在data.table中转换*一些*列类 [英] Convert *some* column classes in data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

在data.table中转换一些列类 [英] Convert some* column classes in data.table*

登录关闭