如何使用python内置映射和归约函数计算字符串中的字母频率 [英] How to compute letter frequency in a string using pythons built-in map and reduce functions

查看:114
本文介绍了如何使用python内置映射和归约函数计算字符串中的字母频率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用pythons map计算字符串中字母的频率并减少内置函数.谁能提供一些有关我该怎么做的见解?

I would like to compute the frequency of letters in a string using pythons map and reduce built-in functions. Could anyone offer some insight into how I might do this?

到目前为止我所拥有的:

What I've got so far:

s = "the quick brown fox jumped over the lazy dog"

# Map function
m = lambda x: (x,1)

# Reduce
# Add the two frequencies if they are the same
# else.... Not sure how to put both back in the list
# in the case where they are not the same.
r = lambda x,y: (x[0], x[1] + y[1]) if x[0] == y[0] else ????

freq = reduce(r, map(m, s))

当所有字母都相同时,这非常有用.

This works great when all the letters are the same.

>>> s
'aaaaaaa'
>>> map(m, s)
[('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1), ('a', 1)]
>>> reduce(r, map(m, s))
('a', 7)

当字母不同时,如何使它正常工作?

How do I get it to work nicely when there are different letters?

推荐答案

暂时忽略有关您的代码的问题,我会指出一种常用的(最快的)计数方法是Counter类.集合模块.这是在Python 2.7.3解释器中使用的示例:

Sidestepping for a moment the question about your code, I will point out that one of the usual (and fastest) ways to count things is with the Counter class from the collections module. Here is an example of its use, in the Python 2.7.3 interpreter:

>>> from collections import Counter
>>> lets=Counter('aaaaabadfasdfasdfafsdff')
>>> lets
Counter({'a': 9, 'f': 6, 'd': 4, 's': 3, 'b': 1})
>>> s = "the quick brown fox jumped over the lazy dog"
>>> Counter(s)
Counter({' ': 8, 'e': 4, 'o': 4, 'd': 2, 'h': 2, 'r': 2, 'u': 2, 't': 2, 'a': 1, 'c': 1, 'b': 1, 'g': 1, 'f': 1, 'i': 1, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'n': 1, 'q': 1, 'p': 1, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1})

要使用reduce,请定义辅助功能addto(oldtotal,newitem),该功能将newitem加到oldtotal并返回新的总计.总计的初始化程序是一个空字典{}.这是一个解释的示例.请注意,get()的第二个参数是当键尚未在字典中时要使用的默认值.

To use reduce, define an auxiliary function addto(oldtotal,newitem) that adds newitem to oldtotal and returns a new total. The initializer for the total is an empty dictionary, {}. Here is an interpreted example. Note that the second parameter to get() is a default value to use when the key is not yet in the dictionary.

 >>> def addto(d,x):
...     d[x] = d.get(x,0) + 1
...     return d
... 
>>> reduce (addto, s, {})
{' ': 8, 'a': 1, 'c': 1, 'b': 1, 'e': 4, 'd': 2, 'g': 1, 'f': 1, 'i': 1, 'h': 2, 'k': 1, 'j': 1, 'm': 1, 'l': 1, 'o': 4, 'n': 1, 'q': 1, 'p': 1, 'r': 2, 'u': 2, 't': 2, 'w': 1, 'v': 1, 'y': 1, 'x': 1, 'z': 1}

下面显示的代码打印几种方法中每一种方法的1000次通过的执行时间.在带有两个不同字符串s的旧AMD Athlon 5000+ Linux 3.2.0-32 Ubuntu 12系统上执行时,它会打印:

The code shown below prints the execution times for 1000 passes each of several methods. When executed on an old AMD Athlon 5000+ Linux 3.2.0-32 Ubuntu 12 system with two different strings s it printed:

String length is 44   Pass count is 1000
horsch1 : 0.77517914772
horsch2 : 0.778718948364
jreduce : 0.0403778553009
jcounter: 0.0699260234833
String length is 4931   Pass count is 100
horsch1 : 8.25176692009
horsch2 : 8.14318394661
jreduce : 0.260674953461
jcounter: 0.282369852066

(reduce方法的运行速度比Counter方法的运行速度稍快.) 时序代码如下.它使用 timeit 模块.在此处的代码中,timeit.Timer的第一个参数是要重复计时的代码,第二个参数是设置代码.

(The reduce method ran slightly faster than the Counter method.) The timing code follows. It uses the timeit module. In the code as here, the first parameter to timeit.Timer is code to be repeatedly timed, and the second parameter is setup code.

import timeit
from collections import Counter
passes = 1000

m1 = lambda x: [int(ord(x) == i) for i in xrange(65,91)]

def m2(x):
    return [int(ord(x) == i) for i in xrange(65,91)]

def es1(s):
    add = lambda x,y: [x[i]+y[i] for i in xrange(len(x))]
    freq = reduce(add,map(m1, s.upper()))
    return freq

def es2(s):
    add = lambda x,y: [x[i]+y[i] for i in xrange(len(x))]
    freq = reduce(add,map(m2, s.upper()))
    return freq

def addto(d,x):
    d[x] = d.get(x,0) + 1
    return d

def jwc(s):
    return Counter(s)

def jwr(s):
    return reduce (addto, s, {})

s = "the quick brown fox jumped over the lazy dog"
print 'String length is',len(s), '  Pass count is',passes
print "horsch1 :",timeit.Timer('f(s)', 'from __main__ import s, m1,     es1 as f').timeit(passes)
print "horsch2 :",timeit.Timer('f(s)', 'from __main__ import s, m2,     es2 as f').timeit(passes)
print "jreduce :",timeit.Timer('f(s)', 'from __main__ import s, addto,  jwr as f').timeit(passes)
print "jcounter:",timeit.Timer('f(s)', 'from __main__ import s, Counter,jwc as f').timeit(passes)

这篇关于如何使用python内置映射和归约函数计算字符串中的字母频率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆