tm 包本身是否提供了一种组合文档术语矩阵的内置方法? [英] Does tm package itself provide a built-in way to combine document-term matrices?

查看:25
本文介绍了tm 包本身是否提供了一种组合文档术语矩阵的内置方法?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tm 包本身是否提供了一种内置方式来组合文档项矩阵?

Does tm package itself provide a built-in way to combine document-term matrices?

我在同一个语料库上生成了 4 个文档术语矩阵,每个矩阵分别为 1、2、3、4 克.它们都非常大:200k*10k,因此将它们转换为数据帧然后cbinding 是不可能的.我知道我可以编写一个程序来记录每个矩阵中的非零元素并构建一个稀疏矩阵,但这很麻烦.tm 包提供此功能似乎很自然.所以如果是这样,我不想重建已经建立的东西.

I generated 4 document term matrices on the same corpus, each for 1,2,3,4 gram. They are all really big: 200k*10k so converting them into data frames and then cbinding them is out of question. I know I could write a program recording the non-zero elements in each of the matrices and build a sparse-matrix, but that is a lot of trouble. It just seems natural for tm package to provide this functionality. So if it does,I don't want to rebuild something that has already been built.

如果没有,有没有比编写程序来记录 dtms 的非零元素的索引然后制作稀疏矩阵更方便的方法来组合 dtms?

If it doesn't, is there any handier way to combine dtms than writing a program to record indices of non-zero elements of dtms and then making a sparse matrix?

推荐答案

您是否尝试过 tm_combine?您可以像这样通过通用函数 c 使用它:

Have you tried tm_combine? You can use it via the generic function c like so:

require(tm)
data("acq")
data("crude")
summary(c(acq, crude))
summary(c(acq[[30]], crude[[10]]))
c(TermDocumentMatrix(acq), TermDocumentMatrix(crude))

这篇关于tm 包本身是否提供了一种组合文档术语矩阵的内置方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆