如何从集合 B 中删除集合 A 中单个集合项目的所有实例? [英] How can I remove all instances of a single set item in set A from set B?

查看：54 发布时间：2021/7/23 19:19:46 python set

本文介绍了如何从集合 B 中删除集合 A 中单个集合项目的所有实例?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如下所示，当我打开 test.txt 并将单词放入一个集合时，返回该集合与 common_words 集合的差异.但是，它只删除了 common_words 集中单词的单个实例，而不是它们的所有出现.我怎样才能做到这一点?我想从 title_words 中删除 common_words 中项目的所有实例

As you can see below, when I open test.txt and put the words into a set, the difference of the set with the common_words set is returned. However, it is only removing a single instance of the words in the common_words set rather than all occurrences of them. How can I achieve this? I want to remove ALL instances of items in common_words from title_words

from string import punctuation
from operator import itemgetter

N = 10
words = {}

linestring = open('test.txt', 'r').read()

//set A, want to remove these from set B
common_words = set(("if", "but", "and", "the", "when", "use", "to", "for"))

title = linestring

//set B, want to remove ALL words in set A from this set and store in keywords
title_words = set(title.lower().split())

keywords = title_words.difference(common_words)

words_gen = (word.strip(punctuation).lower() for line in keywords
                                             for word in line.split())

for word in words_gen:
    words[word] = words.get(word, 0) + 1

top_words = sorted(words.iteritems(), key=itemgetter(1), reverse=True)[:N]

for word, frequency in top_words:
    print "%s: %d" % (word, frequency)

推荐答案

我最近写了一些代码，做了一些类似的事情，虽然风格和你的很不一样.也许它会帮助你.

I wrote some code recently that does something similar, although the style is very different from yours. Maybe it will help you out.

import string
import sys

def main():
    # get some stop words
    stopf = open('stop_words.txt', "r")
    stopwords = {}
    for s in stopf:
        stopwords[string.strip(s)] = 1

    file = open(sys.argv[1], "r")
    filedata = file.read()
    words=string.split(filedata)
    histogram = {}
    count = 0
    for word in words:
        word = string.strip(word, string.punctuation)
        word = string.lower(word)
        if word in stopwords:
            continue
        histogram[word] = histogram.get(word, 0) + 1
        count = (count+1) % 1000
        if count == 0:
            print '*',
    flist = []
    for word, count in histogram.items():
        flist.append([count, word])
    flist.sort()
    flist.reverse()
    for pair in flist[0:100]:
        print "%30s: %4d" % (pair[1], pair[0])

main()

这篇关于如何从集合 B 中删除集合 A 中单个集合项目的所有实例?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何从集合 B 中删除集合 A 中单个集合项目的所有实例? [英] How can I remove all instances of a single set item in set A from set B?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何从集合 B 中删除集合 A 中单个集合项目的所有实例? [英] How can I remove all instances of a single set item in set A from set B?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭