使用data.table set()将所有列从整数转换为数值 [英] Use data.table set() to convert all columns from integer to numeric
问题描述
我正在使用具有1900列和大约280,000行的data.table.
I am working with a data.table that has 1900 columns and roughly 280,000 rows.
当前,数据完全是整数",但我希望它们显式地数字",以便稍后将其传递给bigcor()函数.显然,bigcor()只能处理数字",而不能处理整数".
Currently, the data is entirely "integer", but I want them to explicitly "numeric" so I can pass it to a bigcor() function later. Apparently, bigcor() can only handle "numeric" and not "integer".
我尝试过:
full.bind <- full.bind[,sapply(full.bind, as.numeric), with=FALSE]
不幸的是,我得到了错误:
Unfortunately, I get the error:
Error in `[.data.table`(full.bind, , sapply(full.bind, as.numeric), with = FALSE) :
j out of bounds
因此,我尝试使用data.table set()函数,但出现错误:
So, I tried using the data.table set() function, but I get the error:
Error in set(full.bind, value = as.numeric(full.bind)) :
(list) object cannot be coerced to type 'double'
我创建了一个简单的可复制示例.请记住,实际的列不是"a","b"或"c";它们是非常复杂的列名,因此不可能单独引用列.
I have created a simple reproducible example. Keep in mind, the actual columns are NOT "a", "b", or "c"; they are extremely complicated column names so referencing column individually is not a possibility.
dt <- data.table(a=1:10, b=1:10, c=1:10)
所以,我的最后一个问题是:
So, my final questions are:
1)为什么我的套用技术不起作用?(什么是"j越界"错误?)2)为什么set()技术没有?(为什么不能将data.table强制转换为数字?)3)bigcor()函数是否需要数字对象,还是有其他问题?
1) Why does my sapply technique not work? (what is the "j out of bounds" error?) 2) Why does the set() technique not? (why can't the data.table be coerced to numeric?) 3) Does the bigcor() function require a numeric object, or is there another problem?
推荐答案
使用 .SD
并通过引用进行分配:
Use .SD
and assignment by reference:
library(data.table)
dt <- data.table(a=1:10, b=1:10, c=1:10)
sapply(dt, class)
# a b c
#"integer" "integer" "integer"
dt[, names(dt) := lapply(.SD, as.numeric)]
sapply(dt, class)
# a b c
#"numeric" "numeric" "numeric"
set
在这里仅适用于一列(请注意文档,它没有说 j
是可选的),因为必须生成每个替换列.如果要使用它们,则需要遍历各列(例如,使用 for
循环).可能更可取,因为它需要较少的内存(额外的内存需求对应于一列,而整个data.table则需要额外的内存).
set
only works for one column here (note the documentation, which doesn't say that j
is optional), because each replacement column has to be generated. You would need to loop over the columns (e.g., using a for
loop) if you want to use it. It might be preferable because it needs less memory (additional memory need corresponds to one column whereas additional memory for the whole data.table is needed with the first approach).
for (k in seq_along(dt)) set(dt, j = k, value = as.character(dt[[k]]))
sapply(dt, class)
# a b c
#"character" "character" "character"
但是, bigcor
(来自包传播)需要矩阵作为输入,而 data.table
不是矩阵.因此,您的问题不是列类型,而是需要使用 as.matrix(dt)
.
However, bigcor
(from package propagate) requires a matrix as input and a data.table
isn't a matrix. So, your problem is not the column type, but you need to use as.matrix(dt)
.
这篇关于使用data.table set()将所有列从整数转换为数值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!