如何根据名称将文本文件中的单词添加到字典中? [英] How to add words from a text file to a dictionary depending on the name?

查看:107
本文介绍了如何根据名称将文本文件中的单词添加到字典中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一个文本文件,其中包含罗密欧与朱丽叶戏剧中的第一幕剧本,我想计算某人说一个单词的次数.

So I have a text file which has the script of Act 1 from a Romeo and Juliet play and I want to count how many times someone says a word.

以下是文本: http://pastebin.com/X0gaxAPK

文本中有3个人发言:Gregory,Sampson和Abraham.

There are 3 people speaking in the text: Gregory, Sampson, and Abraham.

基本上,我想为三个发言人中的每一个制作3个不同的词典(如果这是最好的方法?).用人们分别说出的单词填充字典,然后计算他们在整个脚本中说出每个单词的次数.

Basically I want to make 3 different dictionaries (if that's the best way to do it?) for each of the three speakers. Populate the dictionaries with the words the people say respectively, and then count how many times they say each word in the entire script.

我将如何去做?我想我可以算出字数,但是对于如何区分谁说什么并将其放入每个人的3种不同的词典中,我有些困惑.

How would I go about doing this? I think I can figure out the word count but I am a bit confused on how to separate who says what and put it into 3 different dictionaries for each person.

我的输出应如下所示(这不正确,只是一个示例):

My output should look something like this (this is not correct but an example):

Gregory - 
25: the
15: a
5: from
3: while
1: hello
etc

数字是文件中所说单词的出现频率.

Where the number is the frequency of the word said in the file.

现在,我已编写了读取文本文件,去除标点符号以及将文本编译为列表的代码.我也不想使用任何外部模块,我想用老式的学习方法来做,谢谢.

Right now I have code written that reads the text file, strips the punctuation, and compiles the text into a list. I also don't want to use any outside modules, I'd like to do it the old fashioned way to learn, thanks.

您不必发布确切的代码,只需解释我需要做的事情,并希望我能弄清楚.我正在使用Python 3.

You don't have to post exact code, just explain what I need to do and hopefully I can figure it out. I'm using Python 3.

推荐答案

import collections
import string
c = collections.defaultdict(collections.Counter)
speaker = None

with open('/tmp/spam.txt') as f:
  for line in f:
    if not line.strip():
      # we're on an empty line, the last guy has finished blabbing
      speaker = None
      continue
    if line.count(' ') == 0 and line.strip().endswith(':'):
      # a new guy is talking now, you might want to refine this event
      speaker = line.strip()[:-1]
      continue
    c[speaker].update(x.strip(string.punctuation).lower() for x in line.split())

示例输出:

In [1]: run /tmp/spam.py

In [2]: c.keys()
Out[2]: [None, 'Abraham', 'Gregory', 'Sampson']

In [3]: c['Gregory'].most_common(10)
Out[3]: 
[('the', 7),
 ('thou', 6),
 ('to', 6),
 ('of', 4),
 ('and', 4),
 ('art', 3),
 ('is', 3),
 ('it', 3),
 ('no', 3),
 ('i', 3)]

这篇关于如何根据名称将文本文件中的单词添加到字典中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆