如何从集合 B 中删除集合 A 中单个集合项目的所有实例? [英] How can I remove all instances of a single set item in set A from set B?

查看:54
本文介绍了如何从集合 B 中删除集合 A 中单个集合项目的所有实例?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如下所示,当我打开 test.txt 并将单词放入一个集合时,返回该集合与 common_words 集合的差异.但是,它只删除了 common_words 集中单词的单个实例,而不是它们的所有出现.我怎样才能做到这一点?我想从 title_words 中删除 common_words 中项目的所有实例

As you can see below, when I open test.txt and put the words into a set, the difference of the set with the common_words set is returned. However, it is only removing a single instance of the words in the common_words set rather than all occurrences of them. How can I achieve this? I want to remove ALL instances of items in common_words from title_words

from string import punctuation
from operator import itemgetter

N = 10
words = {}

linestring = open('test.txt', 'r').read()

//set A, want to remove these from set B
common_words = set(("if", "but", "and", "the", "when", "use", "to", "for"))

title = linestring

//set B, want to remove ALL words in set A from this set and store in keywords
title_words = set(title.lower().split())

keywords = title_words.difference(common_words)

words_gen = (word.strip(punctuation).lower() for line in keywords
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

推荐答案

我最近写了一些代码,做了一些类似的事情,虽然风格和你的很不一样.也许它会帮助你.

I wrote some code recently that does something similar, although the style is very different from yours. Maybe it will help you out.

import string
import sys

def main():
    # get some stop words
    stopf = open('stop_words.txt', "r")
    stopwords = {}
    for s in stopf:
        stopwords[string.strip(s)] = 1

    file = open(sys.argv[1], "r")
    filedata = file.read()
    words=string.split(filedata)
    histogram = {}
    count = 0
    for word in words:
        word = string.strip(word, string.punctuation)
        word = string.lower(word)
        if word in stopwords:
            continue
        histogram[word] = histogram.get(word, 0) + 1
        count = (count+1) % 1000
        if count == 0:
            print '*',
    flist = []
    for word, count in histogram.items():
        flist.append([count, word])
    flist.sort()
    flist.reverse()
    for pair in flist[0:100]:
        print "%30s: %4d" % (pair[1], pair[0])

main()

这篇关于如何从集合 B 中删除集合 A 中单个集合项目的所有实例?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆