R:data.table:在多列上搜索并设置数据类型 [英] R: data.table : searching on multiple columns AND setting data type
问题描述
Q1:
我可以在数据表中的两个不同的列上进行搜索。我有一个200万行的数据,我想要选择在两列之一进行搜索。一个有名字,其他有整数。
Is it possible for me to search on two different columns in a data table. I have a 2 million odd row data and I want to have the option to search on either of the two columns. One has names and other has integers.
示例:
x <- data.table(foo=letters,bar=1:length(letters))
x
want to do
x['c'] : searching on foo column
as well as
x[2] : searching on bar column
Q2:
是否可以更改数据表中的默认数据类型。我正在读取一个字符和整数列的矩阵,但是所有内容都作为字符读取。
Q2: Is it possible to change the default data types in a data table. I am reading in a matrix with both character and integer columns however everything is being read in as a character.
谢谢!
-Abhi
Thanks! -Abhi
推荐答案
首先回答你的Q2,一个 data.table
是一个 data.frame
,这两个内部都是列表
。因此, data.table
(或 data.frame
)的每一列都可以是不同的类。但是您不能使用矩阵
。您可以使用:=
更改课程(通过引用 - 不需要进行不必要的副本),例如bar:
To answer your Q2 first, a data.table
is a data.frame
, both of which are internally a list
. Each column of the data.table
(or data.frame
) can therefore be of a different class. But you can't do that with a matrix
. You can use :=
to change the class (by reference - no unnecessary copy being made), for example, of "bar" here:
x[, bar := as.integer(as.character(bar))]
对于Q1,如果要使用 data.table
的快速子集(使用二进制搜索)功能,使用函数 setkey
,将其设置为设置键。
For Q1, if you want to use fast subset (using binary search) feature of data.table
, then you've to set key, using the function setkey
.
setkey(x, foo)
允许您单独在x上快速子集,如: x ['a']
(或 x [J('a')]
)。类似地,在bar上设置一个键可以让您在该列上快速子集。
allows you to fast-subset on 'x' alone like: x['a']
(or x[J('a')]
). Similarly setting a key on 'bar' allows you to fast-subset on that column.
如果您将foo和bar两者设置为键,则可以提供两者的值:
If you set the key on both 'foo' and 'bar' then you can provide values for both like so:
setkey(x) # or alternatively setkey(x, foo, bar)
x[J('c', 3)]
然而,这将会将x == 'c'和 y == 3.目前,我不认为有一种方法可以直接使用快速子集来执行 |
。在这种情况下,您必须采用矢量扫描方式。
However, this'll subset those where x == 'c' and y == 3. Currently, I don't think there is a way to do a |
operation with fast-subset directly. You'll have to resort to a vector-scan approach in that case.
希望这是您的问题。不确定。
Hope this is what your question was about. Not sure.
这篇关于R:data.table:在多列上搜索并设置数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!