Python_RAKE中的关键字提取 [英] Keyword Extraction in Python_RAKE

查看:261
本文介绍了Python_RAKE中的关键字提取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是新手用户,对以下简单的循环"问题感到困惑.我有一个本地目录,带有x个文件(约500个.txt文件).我想使用RAKE for Python从每个唯一文件中提取相应的关键字.我已经查看了RAKE的文档;但是,教程中建议的代码将获取单个文档的关键字.有人可以向我解释一下如何循环存储在本地目录中的X个文件.这是本教程中的代码,对于单个文档而言,它的表达确实不错.

I am a novice user and puzzled over the following otherwise simple "loop" problem. I have a local dir with x number of files (about 500 .txt files). I would like to extract the corresponding keywords from each unique file using RAKE for Python. I've reviewed the documentation for RAKE; however, the suggested code in the tutorial gets keywords for a single document. Can someone please explain to me how to loop over an X number of files stored in my local dir. Here's the code from the tutorial and it words really well for a single document.

$git clone https://github.com/zelandiya/RAKE-tutorial

import rake
import operator

rake_object = rake.Rake("SmartStoplist.txt", 5, 3, 4)

sample_file = open("data/docs/fao_test/w2167e.txt", 'r')
text = sample_file.read()
keywords = rake_object.run(text)
print "Keywords:", keywords

推荐答案

创建要处理的文件名列表:

Create a list of filenames you want to process:

filenames = [
    'data/docs/fao_test/w2167e.txt',
    'some/other/folder/filename.txt',
    etc...
]

如果不想对所有名称进行硬编码,则可以使用glob模块通过通配符收集文件名.

If you don't want to hardcode all the names, you can use the glob module to collect filenames by wildcards.

创建用于存储结果的字典:

Create a dictionary for storing the results:

results = {}

浏览每个文件名,读取内容并将Rake结果存储在以文件名为键的字典中:

Loop through each filename, reading the contents and storing the Rake results in the dictionary, keyed by filename:

for filename in filenames:
    with open(filename, 'r') as fp:
        results[filename] = rake_object.run(fp.read())

这篇关于Python_RAKE中的关键字提取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆