计算列表通配符中的元素 [英] Counting elements in a list wildcard

查看:101
本文介绍了计算列表通配符中的元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有一个名单,请说出姓名。而且我想要算上所有人的名字,比如苏比,但我并不在乎他们的拼写方式(即,

Susy,Susi,Susie所有工作。)我该怎么做?在计数中设置一个常规的

表达式?我可以使用通配符变量吗?

这是非模糊方式的代码:

lstNames.count(" Susie")

有什么想法吗?这是你不值得期待的东西吗?

非常感谢你们所有新手。

Ed

解决方案

" Ryan Ginstrom" < RY *** @ gol.com>写道:

如果你想要允许的具体拼写,你可以创建一个列表,看看你的Suzy是否在那里:
< blockquote class =post_quotes>

possible_suzys = [''Susy'',''Susi'',''Susie'']
my_strings = [''Bob'',' 'sally'',''Susi'',''Dick'',''Jane'']
for mytrtrings中的行:...如果在possible_suzys中的行:print line
... <苏西




如果你想稍后做一些事情,而不是仅仅在扫描期间

在列表上,获取一个列表suzies可能会更有用:

possible_suzys = [''Susy'',''Susi'',''Susie'']
my_strings = [''Bob '',''Sally'',''Susi'',''Dick'',''Susy'',''Jane'']
found_suzys = [s for my in mytrtrings if s in in possible_suzys]
found_suzys



[''Susi'',''Susy'']


-

\UNIX安装的数量已增加到10个,其​​中包含更多|

` \'。 - Unix程序员手册,第2版,19-Jun-1972 |

_o__)|

Ben Finney


hawkesed写道:

如果我有一个列表,请说出名字。而且我想要算上所有名字的人,比如苏西,但我并不在乎他们拼写的方式(即,苏西,苏西,苏西都在工作。)我该怎么做这个?在计数内设置一个常规的
表达式?是否有可以使用的通配符变量?
以下是非模糊方式的代码:
lstNames.count(" Susie)
任何想法?这是你不值得期待做的事情吗?
谢谢你们所有新手。
Ed




你可能想要查看SoundEx和MetaPhone算法,它们提供了声音的近似值。一个单词基于拼写

(假设英语发音)。


显然soundex模块曾用于内置到Python但是
$ b在2.0中删除$ b。你可以在网上找到几个实现,例如:

例如:

http://orca.mojam.com/~skip/python/soundex.py
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213


MetaPhone通常被认为比SoundEx更好,因为听起来像
匹配,虽然它要复杂得多(IIRC,虽然自从我用任何

语言编写实现以来,已经很长时间了。
。一个Python MetaPhone实现(这个必须超过

?):

http://joelspeters.com/awesomecode/

另一种可能感兴趣的算法并非基于听起来像 ;但

改为计算从一个

字到另一个字所需的变换次数:Levenshtein距离。基于AC的实现

(使用Python界面)可用:

http://trific.ath.cx/resources/python/levenshtein/

无论你使用哪种算法,你都可以某种类型的b $ b类似的结束功能可以类似于本's
示例的方式应用(我刚刚嘲笑了以下内容 - 它不是实际的

会话):

import soundex
import metaphone
import levenshtein
my_strings = ['''Bob'',''Sally'',''Susi'',''Dick'',''Susy'',''Jane'']
found_suzys = [s for my in mytrtrings如果
soundsex.sounds_similar(s,''Susy'')] found_suzys = [s for my_strings中的s如果
metaphone.sounds_similar(s,''Susy'')] found_suzys = [s for s在my_strings中如果levenshtein.distance(s,
''Susy'')< 4] found_suzys



[''Susi'',''Susy''](无论如何都希望!)

HTH,


Dave。

-


Dave Hughes写道:< blockquote class =post_quotes>另一种可能感兴趣的算法并不是基于类似声音的算法。但是
会计算从一个单词到另一个单词所需的变换次数:Levenshtein距离。基于AC的实现
(使用Python界面)可用:




我不知道它使用什么算法,但difflib模块看起来很相似。 br />
我使用get_close_matches函数找到了好的结果

类似命名的mp3文件。


但是我不喜欢认为足够接近非常适合这种应用。

序列简短且不同。差异匹配需要更长时间

序列才有效。音素匹配似乎过于复杂,并且可能会像Tsu-zi那样抓住b $ b。我只会使用其他拼写列表,比如

Ben建议。


If I have a list, say of names. And I want to count all the people
named, say, Susie, but I don''t care exactly how they spell it (ie,
Susy, Susi, Susie all work.) how would I do this? Set up a regular
expression inside the count? Is there a wildcard variable I can use?
Here is the code for the non-fuzzy way:
lstNames.count("Susie")
Any ideas? Is this something you wouldn''t expect count to do?
Thanks y''all from a newbie.
Ed

解决方案

"Ryan Ginstrom" <ry***@gol.com> writes:

If there are specific spellings you want to allow, you could just
create a list of them and see if your Suzy is in there:

possible_suzys = [ ''Susy'', ''Susi'', ''Susie'' ]
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Jane'' ]
for line in my_strings: ... if line in possible_suzys: print line
...
Susi



If you wanted to do something later, rather than only during the scan
over the list, getting a list of suzies would probaby be more useful:

possible_suzys = [ ''Susy'', ''Susi'', ''Susie'' ]
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Susy'', ''Jane'' ]
found_suzys = [s for s in my_strings if s in possible_suzys]
found_suzys


[''Susi'', ''Susy'']

--
\ "The number of UNIX installations has grown to 10, with more |
`\ expected." -- Unix Programmer''s Manual, 2nd Ed., 12-Jun-1972 |
_o__) |
Ben Finney


hawkesed wrote:

If I have a list, say of names. And I want to count all the people
named, say, Susie, but I don''t care exactly how they spell it (ie,
Susy, Susi, Susie all work.) how would I do this? Set up a regular
expression inside the count? Is there a wildcard variable I can use?
Here is the code for the non-fuzzy way:
lstNames.count("Susie")
Any ideas? Is this something you wouldn''t expect count to do?
Thanks y''all from a newbie.
Ed



You might want to check out the SoundEx and MetaPhone algorithms which
provide approximations of the "sound" of a word based on spelling
(assuming English pronunciations).

Apparently a soundex module used to be built into Python but was
removed in 2.0. You can find several implementations on the ''net, for
example:

http://orca.mojam.com/~skip/python/soundex.py
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213

MetaPhone is generally considered better than SoundEx for "sounds-like"
matching, although it''s considerably more complex (IIRC, although it''s
been a long time since I wrote an implementation of either in any
language). A Python MetaPhone implementations (there must be more than
this one?):

http://joelspeters.com/awesomecode/

Another algorithm that might interest isn''t based on "sounds-like" but
instead computes the number of transforms necessary to get from one
word to another: the Levenshtein distance. A C based implementation
(with Python interface) is available:

http://trific.ath.cx/resources/python/levenshtein/

Whichever algorithm you go with, you''ll wind up with some sort of
"similar" function which could be applied in a similar manner to Ben''s
example (I''ve just mocked up the following -- it''s not an actual
session):

import soundex
import metaphone
import levenshtein
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Susy'', ''Jane'' ]
found_suzys = [s for s in my_strings if soundsex.sounds_similar(s, ''Susy'')] found_suzys = [s for s in my_strings if metaphone.sounds_similar(s, ''Susy'')] found_suzys = [s for s in my_strings if levenshtein.distance(s, ''Susy'') < 4] found_suzys


[''Susi'', ''Susy''] (one hopes anyway!)
HTH,

Dave.
--


Dave Hughes wrote:

Another algorithm that might interest isn''t based on "sounds-like" but
instead computes the number of transforms necessary to get from one
word to another: the Levenshtein distance. A C based implementation
(with Python interface) is available:



I don''t know what algorithm it uses, but the difflib module looks similar.
I''ve had good results using the get_close_matches function to locate
similarly-named mp3 files.

However I don''t think "close enough" is well suited for this application.
The sequences are short and non-distinct. Difference matching needs longer
sequences to be effective. Phoneme matching seems overly complex and might
grab things like Tsu-zi. I''d just use a list of alternate spellings like
Ben suggested.


这篇关于计算列表通配符中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆