newb:comapring两个字符串 [英] newb: comapring two strings

查看:72
本文介绍了newb:comapring两个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




有没有一种聪明的方法可以看出两个相同长度的字符串是否因为

只有一个字符而变化,字符是什么在两个字符串中。


例如str1 = yaqtil str2 = yaqtel


他们在str1 [4]上有所区别,差异是(''我',''e'')


但是如果有str1 = yiqtol和str2 = yaqtel,我不感兴趣。


有人能建议一个简单的方法吗?

我的下一个问题是,我有一个300,000多个单词的列表,我想找到

每对这样的字符串。我想我会首先按照
字符串的长度进行排序,但是我如何迭代以下内容:


str1

str2

str3

str4

str5


所以我比较str1& str2,str1& str3,str 1& str4,str1& str5,

str2& str3,str3& str4,str3& str5,str4& str5。


提前致谢,

马修

解决方案

< BLOCKQUOTE>>是否有一种聪明的方法来查看两个相同长度的字符串是否因

只有一个字符而变化,以及两个字符串中的字符是什么。

例如str1 = yaqtil str2 = yaqtel

它们在str1 [4]上有所不同,差别是(''我',''e'')

但如果有的话str1 = yiqtol和str2 = yaqtel,我不感兴趣。

任何人都可以提出一个简单的方法吗?


使用levenshtein距离。
http://en.wikisource.org/wiki/Levenshtein_distance


我的下一个问题是,我有一个300,000多个单词的列表,我想找到
每一对这样的字符串。我想我会首先按照字符串的长度排序,但是如何重复以下内容:

str1
str2
str3
str4
str5

这样我比较str1& str2,str1& str3,str 1& str4,str1& str5,
str2& str3,str3& str4,str3& str5,str4& str5。




decorate-sort-undecorate是这个的目标


l =<字符串列表>


l = [(len(w),w)for w in l]

l.sort()

l = [w for _ ,w in l]

Diez


manstey写道:



是否有一种聪明的方法来查看相同长度的两个字符串是否只有一个字符变化,以及两个字符串中的字符是什么。

例如str1 = yaqtil str2 = yaqtel

它们在str1 [4]上有所不同,差别是(''我',''e'')




这样的事可能吗?

str1 =''yaqtil''
str2 = ''yaqtel''
set(enumerate(str1))^ set(enumerate(str2))
set([(4,''e''),(4,''i'') ])




-

- 贾斯汀


manstey写道:



是否有一种聪明的方法来查看相同长度的两个字符串是否因
而异字符,以及两个字符串中的字符。

例如str1 = yaqtil str2 = yaqtel

它们在str1 [4]上有所不同,差别是(''我',''e'')

但如果有的话str1 = yiqtol和str2 = yaqtel,我不感兴趣。

任何人都可以提出一个简单的方法吗?

我的下一个问题是,我有一个300,000的清单+单词,我想找到每对这样的字符串。我想我会首先按照字符串的长度排序,但是如何重复以下内容:

str1
str2
str3
str4
str5

这样我比较str1& str2,str1& str3,str 1& str4,str1& str5,
str2& str3,str3& str4,str3& str5,str4& str5。




如果你的字符串非常短,你可以像这样做,即使没有

先按长度排序:


def fuzzy_keys(s):

为范围内的pos(len(s)):

收益率s [0:pos] + chr( 0)+ s [pos + 1:]


def fuzzy_insert(d,s):
对于fuzzy_keys中的fuzzy_key,


如果d中有fuzzy_key:
strings = d [fuzzy_key]

如果type(字符串)是list:

strings + = s

else:

d [fuzzy_key] = [strings,s]

else:

d [fuzzy_key] = s


def gather_fuzzy_matches(d):

表示d.itervalues()中的字符串:
如果type(字符串)是
清单:

收益率字符串


acc = {}

fuzzy_insert(acc," yaqtel")

fuzzy_insert(acc," yaqtil")

fuzzy_insert(acc," oaqtil")

打印列表(gather_fuzzy_matches(acc))
< br $> b $ b打印


[[''yaqt il'',''oaqtil''],[''yaqtel'',''yaqtil'']]


Hi,

Is there a clever way to see if two strings of the same length vary by
only one character, and what the character is in both strings.

E.g. str1=yaqtil str2=yaqtel

they differ at str1[4] and the difference is (''i'',''e'')

But if there was str1=yiqtol and str2=yaqtel, I am not interested.

can anyone suggest a simple way to do this?

My next problem is, I have a list of 300,000+ words and I want to find
every pair of such strings. I thought I would first sort on length of
string, but how do I iterate through the following:

str1
str2
str3
str4
str5

so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,
str2 & str3, str3 & str4, str3 & str5, str4 & str5.

Thanks in advance,
Matthew

解决方案

> Is there a clever way to see if two strings of the same length vary by

only one character, and what the character is in both strings.

E.g. str1=yaqtil str2=yaqtel

they differ at str1[4] and the difference is (''i'',''e'')

But if there was str1=yiqtol and str2=yaqtel, I am not interested.

can anyone suggest a simple way to do this?
Use the levenshtein distance.
http://en.wikisource.org/wiki/Levenshtein_distance

My next problem is, I have a list of 300,000+ words and I want to find
every pair of such strings. I thought I would first sort on length of
string, but how do I iterate through the following:

str1
str2
str3
str4
str5

so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,
str2 & str3, str3 & str4, str3 & str5, str4 & str5.



decorate-sort-undecorate is the idion for this

l = <list of strings>

l = [(len(w), w) for w in l]
l.sort()
l = [w for _, w in l]
Diez


manstey wrote:

Hi,

Is there a clever way to see if two strings of the same length vary by
only one character, and what the character is in both strings.

E.g. str1=yaqtil str2=yaqtel

they differ at str1[4] and the difference is (''i'',''e'')



something like this maybe?

str1=''yaqtil''
str2=''yaqtel''
set(enumerate(str1)) ^ set(enumerate(str2)) set([(4, ''e''), (4, ''i'')])



--
- Justin


manstey wrote:

Hi,

Is there a clever way to see if two strings of the same length vary by
only one character, and what the character is in both strings.

E.g. str1=yaqtil str2=yaqtel

they differ at str1[4] and the difference is (''i'',''e'')

But if there was str1=yiqtol and str2=yaqtel, I am not interested.

can anyone suggest a simple way to do this?

My next problem is, I have a list of 300,000+ words and I want to find
every pair of such strings. I thought I would first sort on length of
string, but how do I iterate through the following:

str1
str2
str3
str4
str5

so that I compare str1 & str2, str1 & str3, str 1 & str4, str1 & str5,
str2 & str3, str3 & str4, str3 & str5, str4 & str5.



If your strings are pretty short you can do it like this even without
sorting by length first:

def fuzzy_keys(s):
for pos in range(len(s)):
yield s[0:pos]+chr(0)+s[pos+1:]

def fuzzy_insert(d, s):
for fuzzy_key in fuzzy_keys(s):
if fuzzy_key in d:
strings = d[fuzzy_key]
if type(strings) is list:
strings += s
else:
d[fuzzy_key] = [strings, s]
else:
d[fuzzy_key] = s

def gather_fuzzy_matches(d):
for strings in d.itervalues():
if type(strings) is list:
yield strings

acc = {}
fuzzy_insert(acc, "yaqtel")
fuzzy_insert(acc, "yaqtil")
fuzzy_insert(acc, "oaqtil")
print list(gather_fuzzy_matches(acc))

prints

[[''yaqtil'', ''oaqtil''], [''yaqtel'', ''yaqtil'']]


这篇关于newb:comapring两个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆