如何添加项目到collection.Counter？然后将它们排序为ASC？ [英] How can I add items to collection.Counter? and then sort them into ASC?

查看：232 发布时间：2016/12/15 22:32:42 python collections preprocessor normalization spam-prevention

本文介绍了如何添加项目到collection.Counter？然后将它们排序为ASC？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

目前我正尝试处理 lingspam数据集 ，通过计算600个文件（400个电子邮件和200个垃圾邮件）中的字词的出现。我已经使每个词通用 Porter Stemmer Aglorithm ，我也希望我的结果在每个文件标准化进一步处理。

$ b

8.3。 collections - 容器数据类型

如何计数co-ocurrences与collections.Counter（）在python？

袋子字模型

为了获得下面的输出，我需要能够添加可能不存在的项目
从./../lingspam_results/spmsgb164.txt.out [ 'money'，0，'univers'，0，'sales'，0）] 从./../lingspam_results/spmsgb166.txt.out打印 [（'money'，2， 0'，'sales'，0）] 从./../lingspam_results/spmsgb167.txt.out打印 [（'money'，0，'univers'，0，'sales '，1）]
然后我计划转换为 code>使用 numpy 。
0] [2,0,0] [0,0,0]
而不是..
从./../lingspam_results/spmsgb165.txt.out [ ] 从./../lingspam_results/spmsgb166.txt.out打印 [（'univers'，2）] 从./../lingspam_results/spmsgb167.txt.out打印 [（'sale'，1）]
如何标准化我的结果将模块插入升序订单（同时将项目添加到计数器结果中， c> search_list ）？我试过下面的一些东西，只是从每个文本文件读取，并创建一个基于 search_list 的列表。
import numpy as np，os from collections import Counter def parse_bag（directory，search_list）： words = [] for（dirpath，dirnames，filenames）in os.walk（directory）： for f in fileenames： path = directory +/+ f count_words（path，search_list） return; def count_words（filename，search_list）： textwords = open（filename，'r'）read（）。split（） filteredwords = if t in search_list] wordfreq = Counter（filteredwords）.most_common（5） printprinting from+ filename print wordfreq search_list = '，'univers'，'money'] parse_bag（./../ lingspam_results，search_list）
b $ b
感谢
解决方案
从你的问题，听起来你的要求是，所有文件的一致排序，计数。这应该为你：
def count_words（filename，search_list）： textwords = r'）。read（）。split（） filteredwords = [t for textwords if t in search_list] counter = Counter（filteredwords） for w in search_list：计数器[w] + = 0＃确保存在 wordfreq = sorted（counter.items（）） printprinting from+ filename print wordfreq search_list = ['sale'，'univers'，'money']
样品输出：
从./../lingspam_results/spmsgb164.txt.out [（'money'，0）（'money'，2），（'sale'，0），（'univers'，0）] 打印从./../lingspam_results/spmsgb166.txt.out [ 0），（'sale'，0）] 从./../lingspam_results/spmsgb167.txt.out打印 [（'money'，0）（'univers'，0）]
我不认为你想使用 most_common ，因为你明确不希望每个文件的内容影响排序或列表长度。
At the moment I'm trying to process lingspam dataset by counting the occurance of words in 600 files (400 emails and 200 spam emails). I've already made each word universal with the Porter Stemmer Aglorithm, I would also like for my result to be standardized across each file for further processing. But I'm unsure on how I can accomplish this.. Resources thus far 8.3. collections — Container datatypes How to count co-ocurrences with collections.Counter() in python? Bag of Words model In order to get the output below I need to be able to add items that may not exist inside the file, in ascending order. printing from ./../lingspam_results/spmsgb164.txt.out [('money', 0, 'univers', 0, 'sales', 0)] printing from ./../lingspam_results/spmsgb166.txt.out [('money', 2, 'univers', 0, 'sales', 0)] printing from ./../lingspam_results/spmsgb167.txt.out [('money', 0, 'univers', 0, 'sales', 1)] Which I then plan on converting into vectors using numpy. [0,0,0] [2,0,0] [0,0,0] instead of.. printing from ./../lingspam_results/spmsgb165.txt.out [] printing from ./../lingspam_results/spmsgb166.txt.out [('univers', 2)] printing from ./../lingspam_results/spmsgb167.txt.out [('sale', 1)] How can I standardize my results from the Counter module into Ascending Order (while also adding items to the Counter Result that may not exist from my search_list)? I've tried something already below that simply reads from each text file and creates a list based on the search_list. import numpy as np, os from collections import Counter def parse_bag(directory, search_list): words = [] for (dirpath, dirnames, filenames) in os.walk(directory): for f in filenames: path = directory + "/" + f count_words(path, search_list) return; def count_words(filename, search_list): textwords = open(filename, 'r').read().split() filteredwords = [t for t in textwords if t in search_list] wordfreq = Counter(filteredwords).most_common(5) print "printing from " + filename print wordfreq search_list = ['sale', 'univers', 'money'] parse_bag("./../lingspam_results", search_list) Thanks 解决方案 From your question, it sounds like your requirements are that you want the same words in a consistent ordering across all files, with counts. This should do it for you: def count_words(filename, search_list): textwords = open(filename, 'r').read().split() filteredwords = [t for t in textwords if t in search_list] counter = Counter(filteredwords) for w in search_list: counter[w] += 0 # ensure exists wordfreq = sorted(counter.items()) print "printing from " + filename print wordfreq search_list = ['sale', 'univers', 'money'] sample output: printing from ./../lingspam_results/spmsgb164.txt.out [('money', 0), ('sale', 0), ('univers', 0)] printing from ./../lingspam_results/spmsgb166.txt.out [('money', 2), ('sale', 0), ('univers', 0)] printing from ./../lingspam_results/spmsgb167.txt.out [('money', 0), ('sale', 1), ('univers', 0)] I don't think you want to use most_common at all since you specifically don't want the contents of each file to affect the ordering or list length. 这篇关于如何添加项目到collection.Counter？然后将它们排序为ASC？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何添加项目到collection.Counter？然后将它们排序为ASC？ [英] How can I add items to collection.Counter? and then sort them into ASC?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何添加项目到collection.Counter？然后将它们排序为ASC？ [英] How can I add items to collection.Counter? and then sort them into ASC?

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭