Python-将文本文件读入字典 [英] Python - reading text file into dictionary

查看:82
本文介绍了Python-将文本文件读入字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从文本文件中提取大量术语,并将其归为以下一组:动物,艺术,建筑物,车辆,人,人,食品,玻璃,瓶,标牌,口号,DJ,派对.我目前在tester2文件中有四个字:

I have a huge list of terms that I want to pull from a text file and get them grouped into one of the following groups: Animal, Art, Buildings, Vehicle, Person, People, Food, Glass, Bottle, Signage, Slogan, DJ, Party. I currently have four words in the tester2 file:

板比萨担心调音台

这是我的代码:

keyword_dictionary = {
    'Animal' : ['animal', 'dog', 'cat'],
    'Art' : ['art', 'sculpture', 'fearns'],
    'Buildings' : ['building', 'architecture', 'gothic', 'skyscraper'],
    'Vehicle' : ['car','formula','f-1','f1','f 1','f one','f-one','moped','mo ped','mo-ped','scooter'],
    'Person' : ['person','dress','shirt','woman','man','attractive','adult','smiling','sleeveless','halter','spectacles','button','bodycon'],
    'People' : ['people','women','men','attractive','adults','smiling','group','two','three','four','five','six','seven','eight','nine','ten','2','3','4','5','6','7','8','9','10'],
    'Food' : ['food','plate','chicken','steak','pizza','pasta','meal','asian','beef','cake','candy','food pyramid','spaghetti','curry','lamb','sushi','meatballs','biscuit','apples','meat','mushroom','jelly', 'sorbet','nacho','burrito','taco','cheese'],
    'Glass' : ['glass','drink','container','glasses','cup'],
    'Bottle' : ['bottle','drink'],
    'Signage' : ['sign','martini','ad','advert','card','bottles','logo','mat','chalkboard','blackboard'],
    'Slogan' : ['Luck is overrated'],
    'DJ' : ['dj','disc','jockey','mixer','instrument','turntable'],
    'Party' : ['party']
 }

y = 0
while (y < 1):
    try:
        def search(keywords, searchFor):
            for item in keywords:
                for terms in keywords[item]:
                    if searchFor in terms:
                        print item



        with open("C:/Users/USERNAME/Desktop/tester2.txt") as termsdesk:
                for line in termsdesk:
                    this = search (keyword_dictionary, line)
                    this2 = str(this)
                    #print this2
                    #print item
    except KeyError:
        break
    y = y+1

我的结果应该是这样的:

My results should look something like this:

Food
Food
Art
DJ

但是我得到了:

DJ

我想这是因为我的循环有问题.有人知道我需要更改吗?我尝试过移动"while(y< 1)",但无法获得所需的结果.

I imagine it's because there's something wrong with my loop. Does anyone know what I need to change? I've tried moving the "while (y<1)" around but I haven't been able to get the results I want.

推荐答案

从搜索词中删除前导/尾随空格.预期效果如下:

Remove leading / trailing whitespace from the search term. The following works as expected:

def search(keywords, searchFor):
    for key, words in keywords.iteritems():
        if searchFor in words:
           print key

with open("tester2.txt") as termsdesk:
    for line in termsdesk:
        this = search(keyword_dictionary, line.strip())
        this2 = str(this)



$ cat tester2.txt 
plate
pizza
fearns
mixer

$ python test4.py 
Food
Food
Art
DJ

此外,如果您希望相对于字典的大小,搜索词的数量大,则可以考虑提高性能:您可以从任何单词到其单词建立反向映射类别.例如转换:

Also, here is a performance improvement you could consider if you expect the number of search terms to be large relative to the size of the dictionary: you could build a reverse mapping from any word to its category. For example transform:

keyword_dict = {'DJ': ['mixer', 'speakers']}

进入

category_dict = {
 'mixer': 'DJ',
 'speakers':'DJ'
}

此反向映射可以在开始时构建一次,然后可用于每个查询,这样可以将您的搜索功能变成 category_dict [term] .这样,查找将更快,摊销O(1)的复杂性,并且更易于编写.

This reverse mapping could be built once at the start and then reused for every query, this way turning your search function into just category_dict[term]. This way the look-up will be faster, amortised O(1) complexity, and easier to write.

这篇关于Python-将文本文件读入字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆