python 3,两个字符串之间的区别 [英] python 3, differences between two strings

查看:129
本文介绍了python 3,两个字符串之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在列表中记录两个字符串的差异位置(以删除它们)...最好记录每个部分的最高分隔点,因为这些区域将具有动态内容.

I'd like to record the location of differences from both strings in a list (to remove them) ... preferably recording the highest separation point for each section, as these areas will have dynamic content.

比较这些

总字符178.两个唯一的部分

total chars 178. Two unique sections

t1 = 'WhereTisthetotalnumberofght5y5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although'

总字符211.两个唯一的部分

total chars 211. Two unique sections

t2 = 'WhereTisthetotalnumberofdofodfgjnjndfgu><rgregw><sssssuguyguiygis>gggs<GS,Gs Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentrexxxxxxxsultsduetodifferinglevelsofapproximation,although'

我知道 difflib 可以做到这一点,但是输出不好.

I know difflib can do this but the output is bad.

我想存储(在列表中)char位置,最好是较大的分隔值.

I'd like to store (in a list) the char positions, perferably the larger seperation values.

模式位置

t1 = 'WhereTisthetotalnumberof  24  ght5y5wsjhhhhjhkmhm  43  Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofap  151  xxxxxxx  158  proximation,although'
t2 = 'WhereTisthetotalnumberof  24  dofodfgjnjndfgu><rgregw><sssssuguyguiygis>gggs<GS,Gs  76  Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentre  155  xxxxxxx  162  sultsduetodifferinglevelsofapproximation,although'

输出:

output list = [24, 76, 151, 162]

更新

@Olivier回复帖子

Response post @Olivier

所有Y的位置(用***分隔)

position of all Y's seperated by ***

t1
WhereTisthetotalnumberofght5***y***5wsjhhhhjhkmhm Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapxxxxxxxproximation,although

t2 WhereTisthetotalnumberofdofodfgjnjndfgu><rgregw><sssssugu***y***gui***y***gis>gggs<GS,Gs Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentrexxxxxxxsultsduetodifferinglevelsofapproximation,although

matcher.get_matching_blocks()之后的输出 和string = ''.join([t1[a:a+n] for a, _, n in blocks])

WhereTisthetotalnumberof***y*** Thethreemethodsthatreturntheratioofmatchingtototalcharacterscangivedifferentresultsduetodifferinglevelsofapproximation,although

推荐答案

使用difflib可能是最好的选择,因为您不太可能想出比其提供的算法更有效的解决方案.您要使用SequenceMatcher.get_matching_blocks.这是根据 doc 输出的结果

Using difflib is probably your best bet as you are unlikely to come up with a more efficient solution than the algorithms it provides. What you want is to use SequenceMatcher.get_matching_blocks. Here is what it will output according to the doc.

返回三元组的列表,描述匹配的子序列.每个三倍 格式为(i, j, n),表示a[i:i+n] == b[j:j+n].这 三元组在 i j 中单调增加.

Return list of triples describing matching subsequences. Each triple is of the form (i, j, n), and means that a[i:i+n] == b[j:j+n]. The triples are monotonically increasing in i and j.

这是一种可以用来重构从中删除增量的字符串的方法.

Here is a way you could use this to reconstruct a string from which you removed the delta.

from difflib import SequenceMatcher

x = "abc_def"
y = "abc--ef"

matcher = SequenceMatcher(None, x, y)
blocks = matcher.get_matching_blocks()

# blocks: [Match(a=0, b=0, size=4), Match(a=5, b=5, size=2), Match(a=7, b=7, size=0)]

string = ''.join([x[a:a+n] for a, _, n in blocks])

# string: "abcef"

编辑:还指出了在您有两个这样的字符串的情况下.

Edit: It was also pointed out that in a case where you had two strings like such.

t1 = 'WordWordaayaaWordWord'
t2 = 'WordWordbbbybWordWord'

然后上述代码将返回'WordWordyWordWord.这是因为get_matching_blocks将捕获预期块之间的两个字符串中都存在的'y'.解决此问题的一种方法是按长度过滤返回的块.

Then the above code would return 'WordWordyWordWord. This is because get_matching_blocks will catch that 'y' that is present in both strings between the expected blocks. A solution around this is to filter the returned blocks by length.

string = ''.join([x[a:a+n] for a, _, n in blocks if n > 1])

如果要对返回的块进行更复杂的分析,还可以执行以下操作.

If you want more complex analysis of the returned blocks you could also do the following.

def block_filter(substring):
    """Outputs True if the substring is to be merged, False otherwise"""
    ...


string = ''.join([x[a:a+n] for a, _, n in blocks if block_filter(x[a:a+n])])

这篇关于python 3,两个字符串之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆