计算列表通配符中的元素 [英] Counting elements in a list wildcard
问题描述
如果我有一个名单,请说出姓名。而且我想要算上所有人的名字,比如苏比,但我并不在乎他们的拼写方式(即,
Susy,Susi,Susie所有工作。)我该怎么做?在计数中设置一个常规的
表达式?我可以使用通配符变量吗?
这是非模糊方式的代码:
lstNames.count(" Susie")
有什么想法吗?这是你不值得期待的东西吗?
非常感谢你们所有新手。
Ed
" Ryan Ginstrom" < RY *** @ gol.com>写道:
如果你想要允许的具体拼写,你可以创建一个列表,看看你的Suzy是否在那里:
< blockquote class =post_quotes>possible_suzys = [''Susy'',''Susi'',''Susie'']
my_strings = [''Bob'',' 'sally'',''Susi'',''Dick'',''Jane'']
for mytrtrings中的行:...如果在possible_suzys中的行:print line
... <苏西
如果你想稍后做一些事情,而不是仅仅在扫描期间
在列表上,获取一个列表suzies可能会更有用:
possible_suzys = [''Susy'',''Susi'',''Susie'']
my_strings = [''Bob '',''Sally'',''Susi'',''Dick'',''Susy'',''Jane'']
found_suzys = [s for my in mytrtrings if s in in possible_suzys]
found_suzys
[''Susi'',''Susy'']
-
\UNIX安装的数量已增加到10个,其中包含更多|
` \'。 - Unix程序员手册,第2版,19-Jun-1972 |
_o__)|
Ben Finney
hawkesed写道:
如果我有一个列表,请说出名字。而且我想要算上所有名字的人,比如苏西,但我并不在乎他们拼写的方式(即,苏西,苏西,苏西都在工作。)我该怎么做这个?在计数内设置一个常规的
表达式?是否有可以使用的通配符变量?
以下是非模糊方式的代码:
lstNames.count(" Susie)
任何想法?这是你不值得期待做的事情吗?
谢谢你们所有新手。
Ed
你可能想要查看SoundEx和MetaPhone算法,它们提供了声音的近似值。一个单词基于拼写
(假设英语发音)。
显然soundex模块曾用于内置到Python但是
$ b在2.0中删除$ b。你可以在网上找到几个实现,例如:
例如:
http://orca.mojam.com/~skip/python/soundex.py
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
MetaPhone通常被认为比SoundEx更好,因为听起来像
匹配,虽然它要复杂得多(IIRC,虽然自从我用任何
语言编写实现以来,已经很长时间了。
。一个Python MetaPhone实现(这个必须超过
?):
http://joelspeters.com/awesomecode/
另一种可能感兴趣的算法并非基于听起来像 ;但
改为计算从一个
字到另一个字所需的变换次数:Levenshtein距离。基于AC的实现
(使用Python界面)可用:
http://trific.ath.cx/resources/python/levenshtein/
无论你使用哪种算法,你都可以某种类型的b $ b类似的结束功能可以类似于本's
示例的方式应用(我刚刚嘲笑了以下内容 - 它不是实际的
会话):
import soundex
import metaphone
import levenshtein
my_strings = ['''Bob'',''Sally'',''Susi'',''Dick'',''Susy'',''Jane'']
found_suzys = [s for my in mytrtrings如果
soundsex.sounds_similar(s,''Susy'')] found_suzys = [s for my_strings中的s如果
metaphone.sounds_similar(s,''Susy'')] found_suzys = [s for s在my_strings中如果levenshtein.distance(s,
''Susy'')< 4] found_suzys
[''Susi'',''Susy''](无论如何都希望!)
HTH,
Dave。
-
Dave Hughes写道:< blockquote class =post_quotes>另一种可能感兴趣的算法并不是基于类似声音的算法。但是
会计算从一个单词到另一个单词所需的变换次数:Levenshtein距离。基于AC的实现
(使用Python界面)可用:
我不知道它使用什么算法,但difflib模块看起来很相似。 br />
我使用get_close_matches函数找到了好的结果
类似命名的mp3文件。
但是我不喜欢认为足够接近非常适合这种应用。
序列简短且不同。差异匹配需要更长时间
序列才有效。音素匹配似乎过于复杂,并且可能会像Tsu-zi那样抓住b $ b。我只会使用其他拼写列表,比如
Ben建议。
If I have a list, say of names. And I want to count all the people
named, say, Susie, but I don''t care exactly how they spell it (ie,
Susy, Susi, Susie all work.) how would I do this? Set up a regular
expression inside the count? Is there a wildcard variable I can use?
Here is the code for the non-fuzzy way:
lstNames.count("Susie")
Any ideas? Is this something you wouldn''t expect count to do?
Thanks y''all from a newbie.
Ed
"Ryan Ginstrom" <ry***@gol.com> writes:
If there are specific spellings you want to allow, you could just
create a list of them and see if your Suzy is in there:possible_suzys = [ ''Susy'', ''Susi'', ''Susie'' ]
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Jane'' ]
for line in my_strings: ... if line in possible_suzys: print line
...
Susi
If you wanted to do something later, rather than only during the scan
over the list, getting a list of suzies would probaby be more useful:
possible_suzys = [ ''Susy'', ''Susi'', ''Susie'' ]
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Susy'', ''Jane'' ]
found_suzys = [s for s in my_strings if s in possible_suzys]
found_suzys
[''Susi'', ''Susy'']
--
\ "The number of UNIX installations has grown to 10, with more |
`\ expected." -- Unix Programmer''s Manual, 2nd Ed., 12-Jun-1972 |
_o__) |
Ben Finney
hawkesed wrote:
If I have a list, say of names. And I want to count all the people
named, say, Susie, but I don''t care exactly how they spell it (ie,
Susy, Susi, Susie all work.) how would I do this? Set up a regular
expression inside the count? Is there a wildcard variable I can use?
Here is the code for the non-fuzzy way:
lstNames.count("Susie")
Any ideas? Is this something you wouldn''t expect count to do?
Thanks y''all from a newbie.
Ed
You might want to check out the SoundEx and MetaPhone algorithms which
provide approximations of the "sound" of a word based on spelling
(assuming English pronunciations).
Apparently a soundex module used to be built into Python but was
removed in 2.0. You can find several implementations on the ''net, for
example:
http://orca.mojam.com/~skip/python/soundex.py
http://aspn.activestate.com/ASPN/Coo...n/Recipe/52213
MetaPhone is generally considered better than SoundEx for "sounds-like"
matching, although it''s considerably more complex (IIRC, although it''s
been a long time since I wrote an implementation of either in any
language). A Python MetaPhone implementations (there must be more than
this one?):
http://joelspeters.com/awesomecode/
Another algorithm that might interest isn''t based on "sounds-like" but
instead computes the number of transforms necessary to get from one
word to another: the Levenshtein distance. A C based implementation
(with Python interface) is available:
http://trific.ath.cx/resources/python/levenshtein/
Whichever algorithm you go with, you''ll wind up with some sort of
"similar" function which could be applied in a similar manner to Ben''s
example (I''ve just mocked up the following -- it''s not an actual
session):
import soundex
import metaphone
import levenshtein
my_strings = [''Bob'', ''Sally'', ''Susi'', ''Dick'', ''Susy'', ''Jane'' ]
found_suzys = [s for s in my_strings if soundsex.sounds_similar(s, ''Susy'')] found_suzys = [s for s in my_strings if metaphone.sounds_similar(s, ''Susy'')] found_suzys = [s for s in my_strings if levenshtein.distance(s, ''Susy'') < 4] found_suzys
[''Susi'', ''Susy''] (one hopes anyway!)
HTH,
Dave.
--
Dave Hughes wrote:Another algorithm that might interest isn''t based on "sounds-like" but
instead computes the number of transforms necessary to get from one
word to another: the Levenshtein distance. A C based implementation
(with Python interface) is available:
I don''t know what algorithm it uses, but the difflib module looks similar.
I''ve had good results using the get_close_matches function to locate
similarly-named mp3 files.
However I don''t think "close enough" is well suited for this application.
The sequences are short and non-distinct. Difference matching needs longer
sequences to be effective. Phoneme matching seems overly complex and might
grab things like Tsu-zi. I''d just use a list of alternate spellings like
Ben suggested.
这篇关于计算列表通配符中的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!