合并词频数据列表 [英] Combining Lists of Word Frequency Data

查看：97 发布时间：2020/7/14 6:16:14 wolfram-mathematica word-frequency

本文介绍了合并词频数据列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这似乎应该是一个显而易见的问题，但是列表上的教程和文档将不可用.其中许多问题源于我的文本文件的大小(数百MB)，以及我试图将它们简化为系统可管理的内容的原因.结果，我正在按段进行工作，现在正在尝试合并结果.

This seems like it should be an obvious question, but the tutorials and documentation on lists are not forthcoming. Many of these issues stem from the sheer size of my text files (hundreds of MB) and my attempts to boil them down to something manageable by my system. As a result, I'm doing my work in segments and am now trying to combine the results.

我有多个单词频率列表(其中约40个).列表可以通过Import []获取，也可以作为Mathematica中生成的变量获取.每个列表如下所示，并且是使用Tally []和Sort []命令生成的:

I have multiple word frequency lists (~40 of them). The lists can either be taken through Import[ ] or as variables generated in Mathematica. Each list appears as the following and has been generated using the Tally[ ] and Sort[ ] commands:

{{"the"，42216}，{"of"，24903}，{"and"，18624}，{"n"，16850}，{"in"，
16164}，{"de"，14930}，{"a"，14660}，{"to"，14175}，{"la"，7347}， {"was"，6030}，{"l"，5981}，{"le"，5735}，<< 51293 >>，{屠场"， 1}，{"abattement"，1}，{"abattagen"，1}，{"abattage"，1}， {减轻"，1}，{放弃"，1}，{"abaiss"，1}，{"aback"，1}， {"aase"，1}，{"aaijaut"，1}，{"aaaah"，1}，{"aaa"，1}}

{{"the", 42216}, {"of", 24903}, {"and", 18624}, {"n", 16850}, {"in",
16164}, {"de", 14930}, {"a", 14660}, {"to", 14175}, {"la", 7347}, {"was", 6030}, {"l", 5981}, {"le", 5735}, <<51293>>, {"abattoir", 1}, {"abattement", 1}, {"abattagen", 1}, {"abattage", 1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}

这是第二个文件的示例:

Here is an example of the second file:

{{"the"，30419}，{"n"，20414}，{"de"，19956}，{"of"，16262}，{"and"，
14488}，{"to"，12726}，{"a"，12635}，{"in"，11141}，{"la"，10739}， {"et"，9016}，{"les"，8675}，{"le"，7748}，<< 101032 >>， {"abattement"，1}，{"abattagen"，1}，{"abattage"，1}，{"abated"， 1}，{"abandonn"，1}，{"abaiss"，1}，{"aback"，1}，{"aase"，1}， {"aaijaut"，1}，{"aaaah"，1}，{"aaa"，1}}

{{"the", 30419}, {"n", 20414}, {"de", 19956}, {"of", 16262}, {"and",
14488}, {"to", 12726}, {"a", 12635}, {"in", 11141}, {"la", 10739}, {"et", 9016}, {"les", 8675}, {"le", 7748}, <<101032>>, {"abattement", 1}, {"abattagen", 1}, {"abattage", 1}, {"abated", 1}, {"abandonn", 1}, {"abaiss", 1}, {"aback", 1}, {"aase", 1}, {"aaijaut", 1}, {"aaaah", 1}, {"aaa", 1}}

我想将它们组合起来，以便频率数据聚合:即，如果第二个文件中有30,419次"the"并且连接到第一个文件中，则它应返回存在72,635次(在我移动时依此类推)整个收藏集).

I want to combine them so that the frequency data aggregates: i.e. if the second file has 30,419 occurrences of 'the' and is joined to the first file, it should return that there are 72,635 occurrences (and so on as I move through the entire collection).

合并词频数据列表 [英] Combining Lists of Word Frequency Data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

合并词频数据列表 [英] Combining Lists of Word Frequency Data

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭