尝试在文本文件中输出x个最常用的单词 [英] Trying to output the x most common words in a text file

查看:73
本文介绍了尝试在文本文件中输出x个最常用的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序,该程序将在文本文件中读取并输出最常见单词(现在写入代码时为30)及其计数的列表.像这样:

I'm trying to write a program that will read in a text file and output a list of most common words (30 as the code is written now) along with their counts. so something like:

word1 count1
word2 count2
word3 count3
...   ...
...   ...
wordn countn

按count1> count2> count3> ...> countn的顺序.到目前为止,这是我所拥有的,但是我无法获得排序后的函数来执行所需的功能.我现在得到的错误是:

in order of count1 > count2 > count3 >... >countn. This is what I have so far but I cannot get the sorted function to perform what I want. The error I get now is:

TypeError: list indices must be integers, not tuple

我是python的新手.任何帮助,将不胜感激.谢谢.

I'm new to python. Any help would be appreciated. Thank you.

 def count_func(dictionary_list):
  return dictionary_list[1]

def print_top(filename):
  word_list = {}
  with open(filename, 'r') as input_file:

    count = 0

    #best
    for line in input_file:
      for word in line.split():
        word = word.lower()
        if word not in word_list:
          word_list[word] = 1
        else:
          word_list[word] += 1

#sorted_x = sorted(word_list.items(), key=operator.itemgetter(1))
#  items = sorted(word_count.items(), key=get_count, reverse=True)

  word_list = sorted(word_list.items(), key=lambda x: x[1])

  for word in word_list:
    if (count > 30):#19
      break
    print "%s: %s" % (word, word_list[word])
    count += 1


# This basic command line argument parsing code is provided and
# calls the print_words() and print_top() functions which you must define.
def main():
  if len(sys.argv) != 3:
    print 'usage: ./wordcount.py {--count | --topcount} file'
    sys.exit(1)

  option = sys.argv[1]
  filename = sys.argv[2]
  if option == '--count':
    print_words(filename)
  elif option == '--topcount':
    print_top(filename)
  else:
    print 'unknown option: ' + option
    sys.exit(1)

if __name__ == '__main__':
  main()

推荐答案

使用一些不请自来的建议:在一切都作为一个大代码块工作之前,不要做那么多函数.重构为之后的函数即可.您甚至不需要这么小的脚本的主要部分.

Some unsolicited advice: Don't make so many functions until everything is working as one big block of code. Refactor into functions after it works. You don't even need a main section for a script this small.

这篇关于尝试在文本文件中输出x个最常用的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆