在'quanteda'包中串联dfm矩阵 [英] Concatenate dfm matrices in 'quanteda' package

查看:118
本文介绍了在'quanteda'包中串联dfm矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在一种方法来同时连接包含不同列和行数的两个dfm矩阵?可以通过其他一些编码来完成,因此我对即席代码不感兴趣,但对通用的优雅解决方案(如果有)感兴趣.

Does there exist a method to concatenate two dfm matrices containing different numbers of columns and rows at the same time? It can be done with some additional coding, so I am not interested in an adhoc code but in the general and elegant solution if there exists any.

一个例子:

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)
rbind(dfm1, dfm2)

给出一个错误.

'tm'软件包可以直接连接其dfm矩阵;这对我来说太慢了.

The 'tm' package can concatenate its dfm matrices out of box; it is too slow for my purposes.

还记得'quanteda'中的'dfm'是S4类.

Also recall that 'dfm' from 'quanteda' is a S4 class.

推荐答案

如果使用的是最新版本,则应开箱即用"工作:

Should work "out of the box", if you are using the latest version:

packageVersion("quanteda")
## [1] ‘0.9.6.9’

dfm1 <- dfm(c(doc1 = "This is one sample text sample."), verbose = FALSE)
dfm2 <- dfm(c(doc2 = "Surprise! This is one sample text sample."), verbose = FALSE)

rbind(dfm1, dfm2)
## Document-feature matrix of: 2 documents, 6 features.
## 2 x 6 sparse Matrix of class "dfmSparse"
##      is one sample surprise text this
## doc1  1   1      2        0    1    1
## doc2  1   1      2        1    1    1

另请参阅?selectFeatures,其中features是dfm对象(帮助文件中有示例).

See also ?selectFeatures where features is a dfm object (there are examples in the help file).

已添加:

请注意,这将正确对齐公共功能集中的两个文本,这与矩阵的常规rbind方法(其列必须匹配)不同.出于相同的原因,对于具有不同术语的DocumentTermMatrix对象,rbind() tm 包中实际上不起作用:

Note that this will correctly align the two texts in a common feature set, unlike the normal rbind methods for matrices, whose columns must match. For the same reasons, rbind() does not actually work in the tm package for DocumentTermMatrix objects with different terms:

require(tm)
dtm1 <- DocumentTermMatrix(Corpus(VectorSource(c(doc1 = "This is one sample text sample."))))
dtm2 <- DocumentTermMatrix(Corpus(VectorSource(c(doc2 = "Surprise! This is one sample text sample."))))
rbind(dtm1, dtm2)
## Error in f(init, x[[i]]) : Numbers of columns of matrices must match.

这几乎可以理解,但似乎重复了重复的功能:

This almost gets it, but seems to duplicate the repeated feature:

as.matrix(rbind(c(dtm1, dtm2)))
##     Terms
## Docs one sample sample. text this surprise!
##    1   1      1       1    1    1         0
##    1   1      1       1    1    1         1

这篇关于在'quanteda'包中串联dfm矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆