R:data.table:在多列上搜索并设置数据类型 [英] R: data.table : searching on multiple columns AND setting data type

查看:175
本文介绍了R:data.table:在多列上搜索并设置数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Q1:

我可以在数据表中的两个不同的列上进行搜索。我有一个200万行的数据,我想要选择在两列之一进行搜索。一个有名字,其他有整数。

Is it possible for me to search on two different columns in a data table. I have a 2 million odd row data and I want to have the option to search on either of the two columns. One has names and other has integers.

示例:

x <- data.table(foo=letters,bar=1:length(letters))
x

want to do
x['c'] : searching on foo column
as well as 
x[2]   : searching on bar column

Q2:
是否可以更改数据表中的默认数据类型。我正在读取一个字符和整数列的矩阵,但是所有内容都作为字符读取。

Q2: Is it possible to change the default data types in a data table. I am reading in a matrix with both character and integer columns however everything is being read in as a character.

谢谢!
-Abhi

Thanks! -Abhi

推荐答案

首先回答你的Q2,一个 data.table 是一个 data.frame ,这两个内部都是列表。因此, data.table (或 data.frame )的每一列都可以是不同的类。但是您不能使用矩阵。您可以使用:= 更改课程(通过引用 - 不需要进行不必要的副本),例如bar:

To answer your Q2 first, a data.table is a data.frame, both of which are internally a list. Each column of the data.table (or data.frame) can therefore be of a different class. But you can't do that with a matrix. You can use := to change the class (by reference - no unnecessary copy being made), for example, of "bar" here:

x[, bar := as.integer(as.character(bar))]

对于Q1,如果要使用 data.table 的快速子集(使用二进制搜索)功能,使用函数 setkey ,将其设置为设置键

For Q1, if you want to use fast subset (using binary search) feature of data.table, then you've to set key, using the function setkey.

setkey(x, foo)

允许您单独在x上快速子集,如: x ['a'] (或 x [J('a')] )。类似地,在bar上设置一个键可以让您在该列上快速子集。

allows you to fast-subset on 'x' alone like: x['a'] (or x[J('a')]). Similarly setting a key on 'bar' allows you to fast-subset on that column.

如果您将foo和bar两者设置为键,则可以提供两者的值:

If you set the key on both 'foo' and 'bar' then you can provide values for both like so:

setkey(x) # or alternatively setkey(x, foo, bar)
x[J('c', 3)]

然而,这将会将x == 'c' y == 3.目前,我不认为有一种方法可以直接使用快速子集来执行 | 。在这种情况下,您必须采用矢量扫描方式。

However, this'll subset those where x == 'c' and y == 3. Currently, I don't think there is a way to do a | operation with fast-subset directly. You'll have to resort to a vector-scan approach in that case.

希望这是您的问题。不确定。

Hope this is what your question was about. Not sure.

这篇关于R:data.table:在多列上搜索并设置数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆