计算从文件中读取例外时一行中第一个单词出现的次数 [英] Counting the number of times the first word in a line appears when read from file with exceptions

查看:55
本文介绍了计算从文件中读取例外时一行中第一个单词出现的次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用具有以下内容的虚拟文件(streamt.txt):

andrew I hate mondays.
fred Python is cool.
fred Ko Ko Bop Ko Ko Bop Ko Ko Bop for ever
andrew @fred no it isn't, what do you think @john???
judy @fred enough with the k-pop
judy RT @fred Python is cool.
andrew RT @judy @fred enough with the k pop
george RT @fred Python is cool.
andrew DM @john Oops
john DM @andrew Who are you go away! Do you know him, @judy?

每行的第一个单词代表一个用户,其余的行则是一条消息,类似于twitter.我需要在他们发送的邮件数旁边打印一个 n 顶级(原始邮件)用户列表(由用户输入).

开头不包含任何带有"RT"的消息.在有联系的情况下,按字典顺序排列在对齐的列中.

就目前而言,我的代码仅在消息中找到最常用的单词,并且不排除RT和DM消息或占n:

file=open('streamt.txt')

counts=dict()
for line in file:
    words=line.split()
    for word in words:
    counts[word]=counts.get(word, 0)+1

lst=list()
for key,value in counts.items():
    new=(value, key)
    lst.append(new)

lst=sorted (lst, reverse=True)

for value, key in lst[:10]:
    print(value,key)

这是我的输出:

6 Ko
5 @fred
4 andrew
3 you
3 is
3 cool.
3 RT
3 Python
3 Bop
2 with

实际输出应为:

Enter n: 10
3 andrew
2 fred
1 john judy

关于我应该如何做的任何想法?

解决方案

计数如下:

#!/usr/bin/env python3.6
from collections import Counter, defaultdict
from pathlib import Path

def main():
    n = input('Enter n: ')
    try:
        n = int(n)
    except:
        print('Invalid input.')
        return
    ss = Path('streamt.txt').read_text().strip().split('\n')
    c = Counter([
        i.strip().split(' ', 1)[0] for i in ss
        if i.strip().split(' ', 2)[1] not in ('RT',)
    ])
    d = defaultdict(list)
    for k, v in c.most_common():
        d[v].append(k)
    print('\n'.join([f'{k} {" ".join(v)}' for k, v in list(d.items())[:n]]))

if __name__ == '__main__':
    main()

输出:

Enter n: 10
3 andrew
2 fred
1 judy john

Using a dummy file (streamt.txt) with the following contents:

andrew I hate mondays.
fred Python is cool.
fred Ko Ko Bop Ko Ko Bop Ko Ko Bop for ever
andrew @fred no it isn't, what do you think @john???
judy @fred enough with the k-pop
judy RT @fred Python is cool.
andrew RT @judy @fred enough with the k pop
george RT @fred Python is cool.
andrew DM @john Oops
john DM @andrew Who are you go away! Do you know him, @judy?

The first word of each line represents a user and the rest of the line is a message, similar to twitter. I need to print a list of the top n (entered by user) original posting users (most messages) next to the number of messages they sent.

This doesn't include any message with 'RT' at the start. Formatted in justified columns in lexicographic order in the case of ties.

As it stands, my code only finds the most used words in the messages and it doesn't exclude RT and DM messages or account for n:

file=open('streamt.txt')

counts=dict()
for line in file:
    words=line.split()
    for word in words:
    counts[word]=counts.get(word, 0)+1

lst=list()
for key,value in counts.items():
    new=(value, key)
    lst.append(new)

lst=sorted (lst, reverse=True)

for value, key in lst[:10]:
    print(value,key)

This is my output:

6 Ko
5 @fred
4 andrew
3 you
3 is
3 cool.
3 RT
3 Python
3 Bop
2 with

The actual output should be:

Enter n: 10
3 andrew
2 fred
1 john judy

Any ideas as to how I should do this?

解决方案

Count as follows:

#!/usr/bin/env python3.6
from collections import Counter, defaultdict
from pathlib import Path

def main():
    n = input('Enter n: ')
    try:
        n = int(n)
    except:
        print('Invalid input.')
        return
    ss = Path('streamt.txt').read_text().strip().split('\n')
    c = Counter([
        i.strip().split(' ', 1)[0] for i in ss
        if i.strip().split(' ', 2)[1] not in ('RT',)
    ])
    d = defaultdict(list)
    for k, v in c.most_common():
        d[v].append(k)
    print('\n'.join([f'{k} {" ".join(v)}' for k, v in list(d.items())[:n]]))

if __name__ == '__main__':
    main()

Output:

Enter n: 10
3 andrew
2 fred
1 judy john

这篇关于计算从文件中读取例外时一行中第一个单词出现的次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆