在JavaScript中找出字串之间的差异 [英] Finding difference between strings in Javascript
问题描述
我想比较两个字符串(前后一个字符串),并准确检测它们之间的位置和变化。
I'd like to compare two strings (a before and after) and detect exactly where and what changed between them.
对于任何更改,我想知道:
For any change, I want to know:
- 更改的开始位置(包括0)。
- 更改的结束位置相对于上一个文本的更改(包括从0开始)
- 更改
假设字符串一次只能更改一个位置(例如,从不 B il l -> K il n )。
Assume that strings will change in only one place at a time (for example, never "Bill" -> "Kiln").
此外,我还需要起点和终点位置来反映变化的类型:
Additionally, I need the start and end positions to reflect the type of change:
- 如果删除,则开始位置和结束位置应分别为已删除文本的开始位置和结束位置
- 如果进行替换,则开始位置和结束位置应分别是已删除文本的开始和结束位置(更改将是添加文本)
- 如果插入时,开始位置和结束位置应相同;文本的入口点
- 如果没有更改,则将起始位置和结束位置保持为零,并进行空更改
- If deletion, the start and end position should be the start and end positions of the deleted text, respectively
- If replacement, the start and end position should be the start and end positions of the "deleted" text, respectively (the change will be the "added" text)
- If insertion, the start and end positions should be the same; the entry point of the text
- If no change, let start and end positions remain zero, with an empty change
例如:
"0123456789" -> "03456789"
Start: 1, End: 2, Change: "" (deletion)
"03456789" -> "0123456789"
Start: 1, End: 1, Change: "12" (insertion)
"Hello World!" -> "Hello Aliens!"
Start: 6, End: 10, Change: "Aliens" (replacement)
"Hi" -> "Hi"
Start: 0, End: 0, Change: "" (no change)
我能够在某种程度上检测到更改后的文本的位置,但是它不能在所有情况下都起作用,因为要准确地执行此操作,我需要知道进行了哪种更改。
I was able to somewhat detect the positions of the changed text, but it doesn't work in all cases because in order to do that accurately, I need to know what kind of change is made.
var OldText = "My edited string!";
var NewText = "My first string!";
var ChangeStart = 0;
var NewChangeEnd = 0;
var OldChangeEnd = 0;
console.log("Comparing start:");
for (var i = 0; i < NewText.length; i++) {
console.log(i + ": " + NewText[i] + " -> " + OldText[i]);
if (NewText[i] != OldText[i]) {
ChangeStart = i;
break;
}
}
console.log("Comparing end:");
// "Addition"?
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// "Deletion"?
} else if (NewText.length < OldText.length) {
for (var i = 1; i < OldText.length; i++) {
console.log(i + "(N: " + (NewText.length - i) + " O: " + (OldText.length - i) + ": " + NewText.substring(NewText.length - i, NewText.length - i + 1) + " -> " + OldText.substring(OldText.length - i, OldText.length - i + 1));
if (NewText.substring(NewText.length - i, NewText.length - i + 1) != OldText.substring(OldText.length - i, OldText.length - i + 1)) {
NewChangeEnd = NewText.length - i;
OldChangeEnd = OldText.length - i;
break;
}
}
// Same length...
} else {
// Do something
}
console.log("Change start: " + ChangeStart);
console.log("NChange end : " + NewChangeEnd);
console.log("OChange end : " + OldChangeEnd);
console.log("Change: " + OldText.substring(ChangeStart, OldChangeEnd + 1));
我如何知道是否进行了插入,删除或替换?
I've searched and came up with a few other similar questions, but they don't seem to help.
推荐答案
I已经检查了您的代码,并且您匹配字符串的逻辑对我来说很有意义。它将正确记录 ChangeStart
, NewChangeEnd
和 OldChangeEnd
以及算法一切顺利。您只想知道是否发生了插入,删除或替换。
I have gone through your code and your logic for matching string makes sense to me. It logs ChangeStart
, NewChangeEnd
and OldChangeEnd
correctly and the algorithm flows alright. You just want to know if an insertion, deletion or replacement took place. Here's how I would go about it.
首先,您需要确保在出现不匹配的第一点后,即 ChangeStart
然后从头开始遍历字符串时,索引不应越过 ChangeStart
。
First of all, you need to make sure that after you have got the first point of mis-match i.e. ChangeStart
when you then traverse the strings from the end, the index shouldn't cross ChangeStart
.
我给你举个例子。考虑以下字符串:
I'll give you an example. Consider the following strings:
var NewText = "Hello Worllolds!";
var OldText = "Hello Worlds!";
ChangeStart -> 10 //Makes sense
OldChangeEnd -> 8
NewChangeEnd -> 11
console.log("Change: " + NewText.substring(ChangeStart, NewChangeEnd + 1));
//Ouputs "lo"
这种情况下的问题是从后面,流程是这样的:
The problem in this case is when it starts matching from the back, the flow is something like this:
Comparing end:
1(N: 12 O: 12: ! -> !)
2(N: 11 O: 11: s -> s)
3(N: 10 O: 10: d -> d) -> You need to stop here!
//Although there is not a mismatch, but we have reached ChangeStart and
//we have already established that characters from 0 -> ChangeStart-1 match
//That is why it outputs "lo" instead of "lol"
假设,我刚才说的很有意义,您只需要像这样修改循环的:
Assuming, what I just said makes sense, you just need to modify your for
loops like so:
if (NewText.length > OldText.length) {
for (var i = 1; i < NewText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
此条件-> ( OldText.length-i)> = ChangeStart
处理我提到的异常,因此,如果达到此条件, for
循环会自动终止。但是,就像我提到的那样,在某些情况下,如我刚刚演示的那样,在遇到不匹配之前可能已经达到此条件。因此,您需要将 NewChangeEnd
和 OldChangeEnd
的值更新为小于匹配值的1 。如果不匹配,则可以适当地存储值。
This condition -> (OldText.length-i)>=ChangeStart
takes care of the anomaly that I mentioned and therefore the for
loop automatically terminates if this condition is reached. However, just as I mentioned there might be situations where this condition is reached before a mis-match is encountered like I just demonstrated. So you need to update values of NewChangeEnd
and OldChangeEnd
as 1 less than the matched value. In case of a mis-match, you store the values appropriately.
代替 else -if
,我们可以只需在我们知道 NewText.length>的情况下包装这两个条件即可。 OldText.length
绝对不是 true ,即它是替换或删除。再次 NewText.length>根据您的示例,OldText.length
也表示它可以是替换或插入。因此 else
可能类似于:
Instead of an else -if
we could just wrap those two conditions in a situation where we know NewText.length > OldText.length
is definitely not true i.e. it is either a replacement or a deletion. Again NewText.length > OldText.length
also means it could be a replacement or an insertion as per your examples, which makes sense. So the else
could be something like:
else {
for (var i = 1; i < OldText.length && ((OldText.length-i)>=ChangeStart); i++) {
...
NewChangeEnd = NewText.length - i -1;
OldChangeEnd = OldText.length - i -1;
if(//Mismatch condition reached){
//break..That code is fine.
}
}
如果您了解到目前为止的微小变化,请确定具体情况非常简单:
If you have understood the minor changes thus far, identifying the specific cases is really simple:
- 删除-条件->
ChangeStart> NewChangeEnd
。已从ChangeStart->中删除字符串。 OldChangeEnd
。
- Deletion - Condition ->
ChangeStart > NewChangeEnd
. Deleted string fromChangeStart -> OldChangeEnd
.
已删除的文本-> OldText.substring(ChangeStart,OldChangeEnd + 1);
- 插入-条件->
ChangeStart> OldChangeEnd
。在ChangeStart
处插入字符串。
- Insertion - Condition ->
ChangeStart > OldChangeEnd
. Inserted string atChangeStart
.
插入的文本-> NewText.substring(ChangeStart,NewChangeEnd +1);
- 替换-如果
NewText!= OldText
并且不满足上述两个条件,则是替换。
- Replacement - If
NewText != OldText
and the above two conditions are not met, then it is a replacement.
已替换的旧字符串中的文本-> OldText.substring(ChangeStart,OldChangeEnd + 1);
Text in old string that got replaced -> OldText.substring(ChangeStart, OldChangeEnd + 1);
替换文本-> NewText.substring(ChangeStart,NewChangeEnd + 1);
已替换 -> <$ c的 OldText
的开始和结束位置$ c> ChangeStart-> OldChangeEnd
Start and end positions in the OldText
that got replaced -> ChangeStart -> OldChangeEnd
我创建了 jsfiddle 结合了我在您的代码中提到的更改。您可能需要检查一下。希望它能使您朝正确的方向开始。
I have created a jsfiddle incorporating the changes that I have mentioned in your code. You might want to check it out. Hope it gets you started in the right direction.
这篇关于在JavaScript中找出字串之间的差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!