R中的data.matrix的数据表的字符串 [英] Rownames for data.table in R for model.matrix
问题描述
我有一个 data.table
DT
,我想运行 model.matrix
就可以了。每一行都有一个字符串ID,它存储在 DT
的 ID
列中。当我在 DT
上运行 model.matrix
时,我的公式不包括 ID
列。问题是, model.matrix
丢弃一些行,因为NAs。如果我在 ID
列中设置 DT
的名称,则在调用 model.matrix
,那么最终的模型矩阵有rownames,我都设置了。否则,我不知道我最后是什么行。我用 rownames(DT)= DT $ ID
设置rownames。但是,当我尝试向 DT
添加新列时,我收到了对
无效的.internal.selfref被检测到...在早些时候,这个
data.table已经被R复制。
所以我想知道
- 是否有更好的方法为
数据设置rownames .table
- 是否有更好的方法来解决这个问题。
首先,它是的一个特性, data.table
,他们没有 rownames
,而是有键
是更强大的。请参阅这个伟大的小插曲。
但是,这不是世界的尽头。 model.matrix
当您传递 data.table
$ b时返回合理的rownames $ b
例如
A< - data.table(ID = 1:5,x = c ,1:4),y = c(4:2,NA,3))
mm < - model.matrix(〜x + y,A)
rownames(mm)
## [1]235
因此,第2,3和5行是包含在model.matrix中的那些。
现在,您可以将此序列作为列添加到 A
。
A [,rowid:=你可以考虑把它作为字符(比如<$ c的粗体字) $ c> mm
)),但它并不重要(因为你可以很容易地将rownames(mm)
对于
data.table
给出的警告,如果您阅读下一句
避免键< - ,名称< - 和attr < - >在R当前(奇怪)可以复制整个数据表。use set *语法,避免复制:setkey(),setnames()和setattr()
rownames
是属性rownames < -
(在内部在某处使用等价于attr < -
`row.names< - 。data.frame`
是attr(x,row.names)< - value
话虽如此,
data.tables
没有rownames,所以没有点设置I have a
data.table
DT
and I want to runmodel.matrix
on it. Each row has a string ID, which is stored in theID
column ofDT
. When I runmodel.matrix
onDT
, my formula excludes theID
column. The problem is,model.matrix
drops some rows because of NAs. If I set the rownames ofDT
to theID
column, before callingmodel.matrix
, then the final model matrix has rownames, and I'm all set. Otherwise, I can't figure out what rows I end up with. I'm setting the rownames withrownames(DT) = DT$ID
. However, when I try to add a new column toDT
, I get a complaint about"Invalid .internal.selfref detected . . . At an earlier point, this data.table has been copied by R."
So I'm wondering
- Is there a better way to set rownames for a
data.table
- Is there a better approach to solving this problem.
解决方案There are a couple of issues here.
Firstly, it is a feature of a
data.table
that they do not have arownames
, instead they havekey
s which are far more powerful. See this great vignette.But, it isn't the end of the world.
model.matrix
returns sensible rownames when you pass it adata.table
For example
A <- data.table(ID = 1:5, x = c(NA, 1:4), y = c(4:2,NA,3)) mm <- model.matrix( ~ x + y, A) rownames(mm) ## [1] "2" "3" "5"
So rows 2,3 and 5 are those included in the model.matrix.
Now, you can add this sequence as a column to
A
. This will be useful if you then set the key to something else (thereby losing the original order)A[, rowid := seq_len(nrow(A)]
You might consider making it character (like the rownames of
mm
)) but it won't really matter (as you can just as easily convertrownames(mm)
to numeric when you need to reference.As to the warning that
data.table
gives, if you read the next sentenceAvoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: setkey(), setnames() and setattr()
rownames
are an attributerownames<-
(internally at somepoint using the equivalent toattr<-
) will (possibly copy) in the same way.The line from
`row.names<-.data.frame`
isattr(x, "row.names") <- value
That being said,
data.tables
don't have rownames, so there is no point setting them.这篇关于R中的data.matrix的数据表的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!