从两个字符串中抓取编辑 [英] Grabbing Edits from two strings
问题描述
我要深入了解我的问题,你可以跳到TL; DR如果你不想阅读所有这些
我在做什么
我需要存储一个文件(文本文档)可以由用户编辑。如果我有我的原始文件(可能很大)
和用户进行更改:
基本上,我有原始的字符串和用户编辑的字符串。我想找到差异,编辑。为了防止存储非常大的字符串的重复。我想存储原始和编辑。然后将编辑应用到原件。有点像重复数据删除。问题是,我不知道如何进行不同的编辑,我也需要能够将这些编辑应用到字符串。
尝试
因为文本可能会很大,所以我想知道在没有存储两个独立版本的情况下,将文本编辑存储到最有效的方式是什么。我的第一个猜测是:
var str ='Original String of text ...'。split(' ')|| [],
mod ='修改后的字符串...'。split('')|| [],我,编辑= [];
(i = 0; i< str.length; i + = 1){
edits.push(str [i] === mod [i]?undefined:mod [一世]);
}
console.log(edits); // [Modified,null,null,null](期望输出)
然后返回:
$ b $ pre $for(i = 0; i
str [我] =编辑[我] || STR [1];
}
str.join(''); //修改文本字符串
基本上,我试图将空格分割成数组。比较数组并存储差异。然后应用差异来生成修改版本
问题
空间的数量会发生变化,会出现问题:
str
:原始文本字符串...
mod
:原始文本字符串...
输出:
文本的原始字符串... text ...
我所需的输出:
文本的原始字符串...
即使我用
mod切换
str.length
$ b//获取编辑
var str ='原始文本字符串...'。split('')|| [],
mod ='修改后的字符串...'。split('')|| [],我,编辑= [];
(i = 0; i< mod.length; i + = 1){
edits.push(str [i] === mod [i]?undefined:mod [一世]);
}
//应用编辑
var final = [];
for(i = 0; i< edits.length; i + = 1){
final [i] = edits [i] || STR [1];
}
final = final.join('');
编辑
会是:[ModifiedString,of,text ...]
,结果使整个存储编辑的东西变得无用。甚至更糟糕的是,如果一个字被添加/删除。如果str
成为大量文本的原始字符串...
。输出仍然是一样的。
我可以看到他们在我的工作方式上有很多缺陷这个,但我想不出任何其他的方式。
$ b摘要:
document.getElementById('go')。onclick = function(){var str = document.getElementById('a')。value.split('') [],mod = document.getElementById('b')。value.split('')|| [],我,编辑= [];对于(i = 0; i
基本字符串:< input id =a>< br />修改字符串:< input id =b/>< br />< button id =go>第二种方法< / button>< button id =go2>第一种方法< / button>
$ bTL; DR:
您如何找到两个字符串之间的变化?
我正在处理大量的文本,每个文本都可能是一个
使用JavaScript只运行一个适当的差异可能会很慢,但这取决于性能要求和差异的质量,当然还有运行频率。兆字节百千字节。这是运行在浏览器上
一个非常有效的方法是在用户实际编辑文档时跟踪编辑,仅在完成时才存储这些更改。为此,您可以使用例如ACE编辑器或任何其他支持更改跟踪的编辑器。
$ b $ $ p $ {action:insertText,range:{ start:{row:0,column:0},
end:{row:0,column:1}}text:d}
您可以挂钩ACE编辑器的更改并收听更改事件:
var changeList = []; //更改列表
//编辑器在这里是ACE编辑器实例,例如
var editor = ace.edit(document.getElementById(editorDivId));
editor.setValue(original text contents);
editor.on(change,function(e){
// e.data有更改
var cmd = e.data;
var range = cmd.range ;
if(cmd.action ==insertText){
changeList.push([
1,
range.start.row,
range.start。列,
range.end.row,
range.end.column,
cmd.text
])
}
if(cmd.action = =removeText){
changeList.push([
2,
range.start.row,
range.start.column,
range.end.row ,
range.end.column,
cmd.text
])
}
if(cmd.action ==insertLines){
changeList .push([
3,
range.start.row,
range.start.column,
range.end.row,
range.end.column,
cm d.lines
))
}
if(cmd.action ==removeLines){
changeList.push([
4,
range .start.row,
range.start.column,
range.end.row,
range.end.column,
cmd.lines,
cmd.nl
])
}
});
要了解它如何工作,只需创建一些捕获更改的测试运行。基本上只有那些命令:
- insertText
- removeText
- insertLines
- removeLines
从文本中删除换行符可能有点儿棘手。
如果您有这个更改列表,您可以根据文本文件重播更改。您甚至可以将相似或重叠的更改合并为一个更改 - 例如,插入到后续字符中可以合并为单个更改。
当您测试这个,把字符串写回文本不是微不足道的,而是相当可行的,不应该超过100行左右的代码。
好的是,当你已经完成了,还有撤消和重做命令的方便,所以你可以重放整个编辑过程。
I'm going to go a bit in-depth with my problem, you can jump to the TL;DR if you don't want to read all of this
What I'm trying to do
I need to store a "file" (text document) which can be user-edited. If I have my original file (which could be huge)
Lorem ipsum dolor sit amet
and the user were to make a change:
Foo ipsum amet_ sit
Basically, I have the original string and the user-edited string. I want to find the differences, "edits". To prevent storing duplicates of very large strings. I want to store the original and the "edits". Then apply the edits to the original. Kind of like data de-duplication. The problem is that I have no idea how different edits can be and I also need to be able to apply those edits to the string.
Attempts
Because the text could be huge, I am wondering what would be the most "efficient" way to store edits to the text without storing two separate versions. My first guess was something along the lines of:
var str = 'Original String of text...'.split(' ') || [], mod = 'Modified String of text...'.split(' ') || [], i, edits = []; for (i = 0; i < str.length; i += 1) { edits.push(str[i]===mod[i] ? undefined : mod[i]); } console.log(edits); // ["Modified", null, null, null] (desired output)
then to revert back:
for (i = 0; i < str.length; i += 1) { str[i] = edits[i] || str[i]; } str.join(' '); // "Modified String of text..."
Basically, I'm trying to split the text by spaces into arrays. Compare the arrays and store the differences. Then apply the differences to generate the modified version
Problems
But if the amount of spaces were to change, problems would occur:
str
:Original String of text...
mod
:OriginalString of text...
Output:
OriginalString of text... text...
My desired output:
OriginalString of text...
Even if I were to switch
str.length
withmod.length
andedits.length
like:// Get edits var str = 'Original String of text...'.split(' ') || [], mod = 'Modified String of text...'.split(' ') || [], i, edits = []; for (i = 0; i < mod.length; i += 1) { edits.push(str[i]===mod[i] ? undefined : mod[i]); } // Apply edits var final = []; for (i = 0; i < edits.length; i += 1) { final[i] = edits[i] || str[i]; } final = final.join(' ');
edits
would be:["ModifiedString", "of", "text..."]
in result making the whole 'storing edits thing useless. And even worse if a word were to be added / removed. Ifstr
were to becomeOriginal String of lots of text...
. The output would still be the same.
I can see that they're many flaws in the way I'm doing this, but I can't think of any other way.
Snippet:
document.getElementById('go').onclick = function() { var str = document.getElementById('a').value.split(' ') || [], mod = document.getElementById('b').value.split(' ') || [], i, edits = []; for (i = 0; i < mod.length; i += 1) { edits.push(str[i] === mod[i] ? undefined : mod[i]); } // Apply edits var final = []; for (i = 0; i < edits.length; i += 1) { final[i] = edits[i] || str[i]; } final = final.join(' '); alert(final); }; document.getElementById('go2').onclick = function() { var str = document.getElementById('a').value.split(' ') || [], mod = document.getElementById('b').value.split(' ') || [], i, edits = []; for (i = 0; i < str.length; i += 1) { edits.push(str[i] === mod[i] ? undefined : mod[i]); } for (i = 0; i < str.length; i += 1) { str[i] = edits[i] || str[i]; } alert(str.join(' ')); // "Modified String of text..." };
Base String: <input id="a"> <br/>Modified String: <input id="b" /> <br/> <button id="go">Second method</button> <button id="go2">First Method</button>
TL;DR:
How would you find the changes between two strings?
I'm dealing with large pieces of text each could be about a
megabytehundred kilobytes. This is running on the browser解决方案Running a proper diff using just JavaScript can be potentially slow, but it depends on the performance requirements and the quality of the diff, and of course how often it must be run.
One quite efficient way would be to track the edits when the user is actually editing the document and only store those changes just after the moment they are done. For this you can use for example ACE editor, or any other editor that supports change tracking.
ACE is tracking the changes while the document is edited. The ACE editor tracks the commands in easily comprehensible format like:
{"action":"insertText","range":{"start":{"row":0,"column":0}, "end":{"row":0,"column":1}},"text":"d"}
You can hook to the changes of the ACE editor and listen to the change events:
var changeList = []; // list of changes // editor is here the ACE editor instance for example var editor = ace.edit(document.getElementById("editorDivId")); editor.setValue("original text contents"); editor.on("change", function(e) { // e.data has the change var cmd = e.data; var range = cmd.range; if(cmd.action=="insertText") { changeList.push([ 1, range.start.row, range.start.column, range.end.row, range.end.column, cmd.text ]) } if(cmd.action=="removeText") { changeList.push([ 2, range.start.row, range.start.column, range.end.row, range.end.column, cmd.text ]) } if(cmd.action=="insertLines") { changeList.push([ 3, range.start.row, range.start.column, range.end.row, range.end.column, cmd.lines ]) } if(cmd.action=="removeLines") { changeList.push([ 4, range.start.row, range.start.column, range.end.row, range.end.column, cmd.lines, cmd.nl ]) } });
To learn how it works just create some test runs which capture the changes. Basicly there are only those for commands:
- insertText
- removeText
- insertLines
- removeLines
Removing the newline from the text can be a bit tricky.
When you have this list of changes you are ready to replay the changes against the text file. You can even merge similar or overlapping changes into a single change - for example inserts to subsequent characters could be merged into a single change.
There will be some problems when you are testing this, composing the string back to text is not trivial but quite doable and should not be more than around 100 lines of code or so.
The nice thing is that when you are finished, you have also undo and redo commands easily available, so you can replay the whole editing process.
这篇关于从两个字符串中抓取编辑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文