R中的data.matrix的数据表的字符串 [英] Rownames for data.table in R for model.matrix

查看:795
本文介绍了R中的data.matrix的数据表的字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 data.table DT ,我想运行 model.matrix 就可以了。每一行都有一个字符串ID,它存储在 DT ID 列中。当我在 DT 上运行 model.matrix 时,我的公式不包括 ID 列。问题是, model.matrix 丢弃一些行,因为NAs。如果我在 ID 列中设置 DT 的名称,则在调用 model.matrix ,那么最终的模型矩阵有rownames,我都设置了。否则,我不知道我最后是什么行。我用 rownames(DT)= DT $ ID 设置rownames。但是,当我尝试向 DT 添加新列时,我收到了对

的投诉


无效的.internal.selfref被检测到...在早些时候,这个
data.table已经被R复制。


所以我想知道


  1. 是否有更好的方法为数据设置rownames .table

  2. 是否有更好的方法来解决这个问题。




首先,它是的一个特性, data.table ,他们没有 rownames ,而是有是更强大的。请参阅这个伟大的小插曲



但是,这不是世界的尽头。 model.matrix 当您传递 data.table


$ b时返回合理的rownames $ b

例如

  A<  -  data.table(ID = 1:5,x = c ,1:4),y = c(4:2,NA,3))

mm < - model.matrix(〜x + y,A)

rownames(mm)

## [1]235

因此,第2,3和5行是包含在model.matrix中的那些。



现在,您可以将此序列作为列添加到 A

  A [,rowid:=你可以考虑把它作为字符(比如<$ c的粗体字) $ c> mm )),但它并不重要(因为你可以很容易地将 rownames(mm) 



对于 data.table 给出的警告,如果您阅读下一句



避免键< - ,名称< - 和attr < - >在R当前(奇怪)可以复制整个数据表。use set *语法,避免复制:setkey(),setnames()和setattr()


rownames 是属性 rownames < - (在内部在某处使用等价于 attr < -



`row.names< - 。data.frame`

  attr(x,row.names)<  -  value 

话虽如此, data.tables 没有rownames,所以没有点设置


I have a data.table DT and I want to run model.matrix on it. Each row has a string ID, which is stored in the ID column of DT. When I run model.matrix on DT, my formula excludes the ID column. The problem is, model.matrix drops some rows because of NAs. If I set the rownames of DT to the ID column, before calling model.matrix, then the final model matrix has rownames, and I'm all set. Otherwise, I can't figure out what rows I end up with. I'm setting the rownames with rownames(DT) = DT$ID. However, when I try to add a new column to DT, I get a complaint about

"Invalid .internal.selfref detected . . . At an earlier point, this data.table has been copied by R."

So I'm wondering

  1. Is there a better way to set rownames for a data.table
  2. Is there a better approach to solving this problem.

解决方案

There are a couple of issues here.

Firstly, it is a feature of a data.table that they do not have a rownames, instead they have keys which are far more powerful. See this great vignette.

But, it isn't the end of the world. model.matrix returns sensible rownames when you pass it a data.table

For example

A <- data.table(ID = 1:5, x = c(NA, 1:4), y = c(4:2,NA,3))

mm <- model.matrix( ~ x + y, A)

rownames(mm)

## [1] "2" "3" "5"

So rows 2,3 and 5 are those included in the model.matrix.

Now, you can add this sequence as a column to A. This will be useful if you then set the key to something else (thereby losing the original order)

A[, rowid := seq_len(nrow(A)]

You might consider making it character (like the rownames of mm)) but it won't really matter (as you can just as easily convert rownames(mm) to numeric when you need to reference.

As to the warning that data.table gives, if you read the next sentence

Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr()

rownames are an attribute rownames<- (internally at somepoint using the equivalent to attr<-) will (possibly copy) in the same way.

The line from `row.names<-.data.frame` is

attr(x, "row.names") <- value

That being said, data.tables don't have rownames, so there is no point setting them.

这篇关于R中的data.matrix的数据表的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆