如何从集合 B 中删除集合 A 中单个集合项目的所有实例? [英] How can I remove all instances of a single set item in set A from set B?
问题描述
如下所示,当我打开 test.txt 并将单词放入一个集合时,返回该集合与 common_words 集合的差异.但是,它只删除了 common_words 集中单词的单个实例,而不是它们的所有出现.我怎样才能做到这一点?我想从 title_words 中删除 common_words 中项目的所有实例
As you can see below, when I open test.txt and put the words into a set, the difference of the set with the common_words set is returned. However, it is only removing a single instance of the words in the common_words set rather than all occurrences of them. How can I achieve this? I want to remove ALL instances of items in common_words from title_words
from string import punctuation
from operator import itemgetter
N = 10
words = {}
linestring = open('test.txt', 'r').read()
//set A, want to remove these from set B
common_words = set(("if", "but", "and", "the", "when", "use", "to", "for"))
title = linestring
//set B, want to remove ALL words in set A from this set and store in keywords
title_words = set(title.lower().split())
keywords = title_words.difference(common_words)
words_gen = (word.strip(punctuation).lower() for line in keywords
for word in line.split())
for word in words_gen:
words[word] = words.get(word, 0) + 1
top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]
for word, frequency in top_words:
print "%s: %d" % (word, frequency)
推荐答案
我最近写了一些代码,做了一些类似的事情,虽然风格和你的很不一样.也许它会帮助你.
I wrote some code recently that does something similar, although the style is very different from yours. Maybe it will help you out.
import string
import sys
def main():
# get some stop words
stopf = open('stop_words.txt', "r")
stopwords = {}
for s in stopf:
stopwords[string.strip(s)] = 1
file = open(sys.argv[1], "r")
filedata = file.read()
words=string.split(filedata)
histogram = {}
count = 0
for word in words:
word = string.strip(word, string.punctuation)
word = string.lower(word)
if word in stopwords:
continue
histogram[word] = histogram.get(word, 0) + 1
count = (count+1) % 1000
if count == 0:
print '*',
flist = []
for word, count in histogram.items():
flist.append([count, word])
flist.sort()
flist.reverse()
for pair in flist[0:100]:
print "%30s: %4d" % (pair[1], pair[0])
main()
这篇关于如何从集合 B 中删除集合 A 中单个集合项目的所有实例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!