请帮助它更快 [英] help make it faster please
问题描述
我写了这个函数,它执行以下操作:
从文件读取行后。它分裂并通过哈希表找到单词出现
...原因这个很慢..可以有一个
帮我把它变得更快......
f = open(文件名)
lines = f .readlines()
def create_words(lines):
cnt = 0
spl_set =''[",;<> {} _&?!(): - [\。= + * \t\\\
\ r] +''
代表行内容:
words = content.split()
countDict = {}
wordlist = []
for w in words:
w = string.lower(w)
如果在spl_set中w [-1]:w = w [: - 1]
如果w!='''' ':
if countDict.has_key(w):
countDict [w] = countDict [w] +1
else:
countDict [w] = 1
wordlist = countDict.keys()
wordlist.sort()
cnt + = 1
if countDict!= {}:
word word word:print(word + '''+
str(countDict [word])+''\ n'')
I wrote this function which does the following:
after readling lines from file.It splits and finds the word occurences
through a hash table...for some reason this is quite slow..can some one
help me make it faster...
f = open(filename)
lines = f.readlines()
def create_words(lines):
cnt = 0
spl_set = ''[",;<>{}_&?!():-[\.=+*\t\n\r]+''
for content in lines:
words=content.split()
countDict={}
wordlist = []
for w in words:
w=string.lower(w)
if w[-1] in spl_set: w = w[:-1]
if w != '''':
if countDict.has_key(w):
countDict[w]=countDict[w]+1
else:
countDict[w]=1
wordlist = countDict.keys()
wordlist.sort()
cnt += 1
if countDict != {}:
for word in wordlist: print (word+'' ''+
str(countDict[word])+''\n'')
推荐答案
为什么要重新加载wordlist并在每个文字处理后对其进行排序?似乎
可以在for循环之后完成。
pk ** **** @ gmail.com 写道:
why reload wordlist and sort it after each word processing ? seems that
it can be done after the for loop.
pk******@gmail.com wrote:
我写了这个函数,它执行以下操作:
从文件读取行后。它分裂并找到单词occurences
通过哈希表...由于某种原因,这是非常慢..可以帮助我让它更快......
f = open(文件名)
lines = f .readlines()
def create_words(lines):
cnt = 0
spl_set =''[" ,;<> {} _&?!(): - [\\ \\。= + * \\\\] +''
对于行内容:
words = content.split()
countDict = {}
wordlist = []
for w in words:
w = string.lower(w)
如果在spl_set中w [-1]:w = w [: - 1]
如果w!='''':
如果countDict.has_key(w):
countDict [w] = countDict [w] +1
否则:
countDict [w] = 1
wordlist = countDict.keys()
wordlist.sort()
cnt + = 1
如果countDict!= {} :
wordlist中的单词:print(word +''''+
str(countDict [word])+''\ n'')
I wrote this function which does the following:
after readling lines from file.It splits and finds the word occurences
through a hash table...for some reason this is quite slow..can some one
help me make it faster...
f = open(filename)
lines = f.readlines()
def create_words(lines):
cnt = 0
spl_set = ''[",;<>{}_&?!():-[\.=+*\t\n\r]+''
for content in lines:
words=content.split()
countDict={}
wordlist = []
for w in words:
w=string.lower(w)
if w[-1] in spl_set: w = w[:-1]
if w != '''':
if countDict.has_key(w):
countDict[w]=countDict[w]+1
else:
countDict[w]=1
wordlist = countDict.keys()
wordlist.sort()
cnt += 1
if countDict != {}:
for word in wordlist: print (word+'' ''+
str(countDict[word])+''\n'')
实际上我为每个所谓的行创建了一个单独的单词列表。这一行
我的意思是将来会是一个段落...所以我将不得不重新创建每个循环的
单词表
Actually I create a seperate wordlist for each so called line.Here line
I mean would be a paragraph in future...so I will have to recreate the
wordlist for each loop
哦对不起缩进在这里搞砸了...
wordlist = countDict.keys()
wordlist.sort()
应该在单词循环之外....现在
def create_words( ():
cnt = 0
spl_set =''[",;<> {} _&?!(): - [\。= + * \\\\] +''
行内容
:
words = content.split()
countDict = {}
wordlist = []
for w in words:
w = string.lower(w)
if w [-1]在spl_set中:w = w [: - 1]
如果w!='''':
if countDict.has_key(w):
countDict [w] = countDict [w] +1
else:
countDict [w] = 1
wordlist = countDict.keys ()
wordlist.sort()
cnt + = 1
if countDict!= {}:
wordlist中的单词:print(word +''''+
str(countDict [word])+''\ n'')
ok现在这是我要问的正确问题......
Oh sorry indentation was messed here...the
wordlist = countDict.keys()
wordlist.sort()
should be outside the word loop.... now
def create_words(lines):
cnt = 0
spl_set = ''[",;<>{}_&?!():-[\.=+*\t\n\r]+''
for content in lines:
words=content.split()
countDict={}
wordlist = []
for w in words:
w=string.lower(w)
if w[-1] in spl_set: w = w[:-1]
if w != '''':
if countDict.has_key(w):
countDict[w]=countDict[w]+1
else:
countDict[w]=1
wordlist = countDict.keys()
wordlist.sort()
cnt += 1
if countDict != {}:
for word in wordlist: print (word+'' ''+
str(countDict[word])+''\n'')
ok now this is the correct question I am asking...
这篇关于请帮助它更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!