如何使用python内置映射和归约函数计算字符串中的字母频率 [英] How to compute letter frequency in a string using pythons built-in map and reduce functions
问题描述
我想使用pythons map计算字符串中字母的频率并减少内置函数.谁能提供一些有关我该怎么做的见解?
I would like to compute the frequency of letters in a string using pythons map and reduce built-in functions. Could anyone offer some insight into how I might do this?
到目前为止我所拥有的:
What I've got so far:
s = "the quick brown fox jumped over the lazy dog"
# Map function
m = lambda x: (x,1)
# Reduce
# Add the two frequencies if they are the same
# else.... Not sure how to put both back in the list
# in the case where they are not the same.
r = lambda x,y: (x[0], x[1] + y[1]) if x[0] == y[0] else ????
freq = reduce(r, map(m, s))
当所有字母都相同时,这非常有用.
This works great when all the letters are the same.
>>> s
'aaaaaaa'
>>> map(m, s)
[('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1)]
>>> reduce(r, map(m, s))
('a', 7)
当字母不同时,如何使它正常工作?
How do I get it to work nicely when there are different letters?
推荐答案
暂时忽略有关您的代码的问题,我会指出一种常用的(最快的)计数方法是Counter类.集合模块.这是在Python 2.7.3解释器中使用的示例:
Sidestepping for a moment the question about your code, I will point out that one of the usual (and fastest) ways to count things is with the Counter class from the collections module. Here is an example of its use, in the Python 2.7.3 interpreter:
>>> from collections import Counter
>>> lets=Counter('aaaaabadfasdfasdfafsdff')
>>> lets
Counter({'a': 9, 'f': 6, 'd': 4, 's': 3, 'b': 1})
>>> s = "the quick brown fox jumped over the lazy dog"
>>> Counter(s)
Counter({' ': 8, 'e': 4, 'o': 4, 'd': 2, 'h': 2, 'r': 2, 'u': 2, 't': 2, 'a': 1, 'c': 1, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'n': 1, 'q': 1, 'p': 1, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1})
要使用reduce,请定义辅助功能addto(oldtotal,newitem)
,该功能将newitem
加到oldtotal
并返回新的总计.总计的初始化程序是一个空字典{}
.这是一个解释的示例.请注意,get()的第二个参数是当键尚未在字典中时要使用的默认值.
To use reduce, define an auxiliary function addto(oldtotal,newitem)
that adds newitem
to oldtotal
and returns a new total. The initializer for the total is an empty dictionary, {}
. Here is an interpreted example. Note that the second parameter to get() is a default value to use when the key is not yet in the dictionary.
>>> def addto(d,x):
... d[x] = d.get(x,0) + 1
... return d
...
>>> reduce (addto, s, {})
{' ': 8, 'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2, 'g': 1, 'f': 1, 'i': 1, 'h': 2, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'o': 4, 'n': 1, 'q': 1, 'p': 1, 'r': 2, 'u': 2, 't': 2, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1}
下面显示的代码打印几种方法中每一种方法的1000次通过的执行时间.在带有两个不同字符串s
的旧AMD Athlon 5000+ Linux 3.2.0-32 Ubuntu 12系统上执行时,它会打印:
The code shown below prints the execution times for 1000 passes each of several methods. When executed on an old AMD Athlon 5000+ Linux 3.2.0-32 Ubuntu 12 system with two different strings s
it printed:
String length is 44 Pass count is 1000
horsch1 : 0.77517914772
horsch2 : 0.778718948364
jreduce : 0.0403778553009
jcounter: 0.0699260234833
String length is 4931 Pass count is 100
horsch1 : 8.25176692009
horsch2 : 8.14318394661
jreduce : 0.260674953461
jcounter: 0.282369852066
(reduce方法的运行速度比Counter方法的运行速度稍快.)
时序代码如下.它使用 timeit 模块.在此处的代码中,timeit.Timer
的第一个参数是要重复计时的代码,第二个参数是设置代码.
(The reduce method ran slightly faster than the Counter method.)
The timing code follows. It uses the timeit module. In the code as here, the first parameter to timeit.Timer
is code to be repeatedly timed, and the second parameter is setup code.
import timeit
from collections import Counter
passes = 1000
m1 = lambda x: [int(ord(x) == i) for i in xrange(65,91)]
def m2(x):
return [int(ord(x) == i) for i in xrange(65,91)]
def es1(s):
add = lambda x,y: [x[i]+y[i] for i in xrange(len(x))]
freq = reduce(add,map(m1, s.upper()))
return freq
def es2(s):
add = lambda x,y: [x[i]+y[i] for i in xrange(len(x))]
freq = reduce(add,map(m2, s.upper()))
return freq
def addto(d,x):
d[x] = d.get(x,0) + 1
return d
def jwc(s):
return Counter(s)
def jwr(s):
return reduce (addto, s, {})
s = "the quick brown fox jumped over the lazy dog"
print 'String length is',len(s), ' Pass count is',passes
print "horsch1 :",timeit.Timer('f(s)', 'from __main__ import s, m1, es1 as f').timeit(passes)
print "horsch2 :",timeit.Timer('f(s)', 'from __main__ import s, m2, es2 as f').timeit(passes)
print "jreduce :",timeit.Timer('f(s)', 'from __main__ import s, addto, jwr as f').timeit(passes)
print "jcounter:",timeit.Timer('f(s)', 'from __main__ import s, Counter,jwc as f').timeit(passes)
这篇关于如何使用python内置映射和归约函数计算字符串中的字母频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!