从两个字符串中抓取编辑 [英] Grabbing Edits from two strings

查看:152
本文介绍了从两个字符串中抓取编辑的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要深入了解我的问题,你可以跳到TL; DR如果你不想阅读所有这些



我在做什么



我需要存储一个文件(文本文档)可以由用户编辑。如果我有我的原始文件(可能很大)


和用户进行更改:


基本上,我有原始的字符串和用户编辑的字符串。我想找到差异,编辑。为了防止存储非常大的字符串的重复。我想存储原始和编辑。然后将编辑应用到原件。有点像重复数据删除。问题是,我不知道如何进行不同的编辑,我也需要能够将这些编辑应用到字符串。



尝试



因为文本可能会很大,所以我想知道在没有存储两个独立版本的情况下,将文本编辑存储到最有效的方式是什么。我的第一个猜测是:

  var str ='Original String of text ...'。split(' ')|| [],
mod ='修改后的字符串...'。split('')|| [],我,编辑= [];

(i = 0; i< str.length; i + = 1){
edits.push(str [i] === mod [i]?undefined:mod [一世]);
}

console.log(edits); // [Modified,null,null,null](期望输出)

然后返回:
$ b $ pre $ for(i = 0; i str [我] =编辑[我] || STR [1];
}
str.join(''); //修改文本字符串




基本上,我试图将空格分割成数组。比较数组并存储差异。然后应用差异来生成修改版本



问题



空间的数量会发生变化,会出现问题:

str 原始文本字符串...
mod 原始文本字符串...



输出:文本的原始字符串... text ...

我所需的输出:文本的原始字符串...






即使我用 mod切换 str.length $ b

   //获取编辑
var str ='原始文本字符串...'。split('')|| [],
mod ='修改后的字符串...'。split('')|| [],我,编辑= [];

(i = 0; i< mod.length; i + = 1){
edits.push(str [i] === mod [i]?undefined:mod [一世]);
}

//应用编辑
var final = [];
for(i = 0; i< edits.length; i + = 1){
final [i] = edits [i] || STR [1];
}
final = final.join('');

编辑会是: [ModifiedString,of,text ...] ,结果使整个存储编辑的东西变得无用。甚至更糟糕的是,如果一个字被添加/删除。如果 str 成为大量文本的原始字符串... 。输出仍然是一样的。






我可以看到他们在我的工作方式上有很多缺陷这个,但我想不出任何其他的方式。
$ b

摘要:

  document.getElementById('go')。onclick = function(){var str = document.getElementById('a')。value.split('') [],mod = document.getElementById('b')。value.split('')|| [],我,编辑= [];对于(i = 0; i  

基本字符串:< input id =a>< br />修改字符串:< input id =b/>< br />< button id =go>第二种方法< / button>< button id =go2>第一种方法< / button>


$ b

TL; DR:



您如何找到两个字符串之间的变化?




我正在处理大量的文本,每个文本都可能是一个兆字节百千字节。这是运行在浏览器上

使用JavaScript只运行一个适当的差异可能会很慢,但这取决于性能要求和差异的质量,当然还有运行频率。

一个非常有效的方法是在用户实际编辑文档时跟踪编辑,仅在完成时才存储这些更改。为此,您可以使用例如ACE编辑器或任何其他支持更改跟踪的编辑器。

$ b $ $ p $ {action:insertText,range:{ start:{row:0,column:0},
end:{row:0,column:1}}text:d}

您可以挂钩ACE编辑器的更改并收听更改事件:

  var changeList = []; //更改列表
//编辑器在这里是ACE编辑器实例,例如
var editor = ace.edit(document.getElementById(editorDivId));
editor.setValue(original text contents);
editor.on(change,function(e){
// e.data有更改
var cmd = e.data;
var range = cmd.range ;
if(cmd.action ==insertText){
changeList.push([
1,
range.start.row,
range.start。列,
range.end.row,
range.end.column,
cmd.text
])
}
if(cmd.action = =removeText){
changeList.push([
2,
range.start.row,
range.start.column,
range.end.row ,
range.end.column,
cmd.text
])
}
if(cmd.action ==insertLines){
changeList .push([
3,
range.start.row,
range.start.column,
range.end.row,
range.end.column,
cm d.lines
))
}
if(cmd.action ==removeLines){
changeList.push([
4,
range .start.row,
range.start.column,
range.end.row,
range.end.column,
cmd.lines,
cmd.nl
])
}
});

要了解它如何工作,只需创建一些捕获更改的测试运行。基本上只有那些命令:


  1. insertText

  2. removeText

  3. insertLines
  4. removeLines

从文本中删除换行符可能有点儿棘手。



如果您有这个更改列表,您可以根据文本文件重播更改。您甚至可以将相似或重叠的更改合并为一个更改 - 例如,插入到后续字符中可以合并为单个更改。



当您测试这个,把字符串写回文本不是微不足道的,而是相当可行的,不应该超过100行左右的代码。



好的是,当你已经完成了,还有撤消重做命令的方便,所以你可以重放整个编辑过程。


I'm going to go a bit in-depth with my problem, you can jump to the TL;DR if you don't want to read all of this

What I'm trying to do

I need to store a "file" (text document) which can be user-edited. If I have my original file (which could be huge)

Lorem ipsum dolor sit amet

and the user were to make a change:

Foo ipsum amet_ sit

Basically, I have the original string and the user-edited string. I want to find the differences, "edits". To prevent storing duplicates of very large strings. I want to store the original and the "edits". Then apply the edits to the original. Kind of like data de-duplication. The problem is that I have no idea how different edits can be and I also need to be able to apply those edits to the string.

Attempts

Because the text could be huge, I am wondering what would be the most "efficient" way to store edits to the text without storing two separate versions. My first guess was something along the lines of:

var str = 'Original String of text...'.split(' ') || [],
    mod = 'Modified String of text...'.split(' ') || [], i, edits = [];

for (i = 0; i < str.length; i += 1) {
    edits.push(str[i]===mod[i] ? undefined : mod[i]);
}

console.log(edits); // ["Modified", null, null, null] (desired output)

then to revert back:

for (i = 0; i < str.length; i += 1) {
    str[i] = edits[i] || str[i];
}
str.join(' '); // "Modified String of text..."

Basically, I'm trying to split the text by spaces into arrays. Compare the arrays and store the differences. Then apply the differences to generate the modified version

Problems

But if the amount of spaces were to change, problems would occur:

str: Original String of text... mod: OriginalString of text...

Output: OriginalString of text... text...

My desired output: OriginalString of text...


Even if I were to switch str.length with mod.length and edits.length like:

// Get edits
var str = 'Original String of text...'.split(' ') || [],
    mod = 'Modified String of text...'.split(' ') || [], i, edits = [];

for (i = 0; i < mod.length; i += 1) {
    edits.push(str[i]===mod[i] ? undefined : mod[i]);
}

// Apply edits
var final = [];
for (i = 0; i < edits.length; i += 1) {
    final[i] = edits[i] || str[i];
}
final = final.join(' ');

edits would be: ["ModifiedString", "of", "text..."] in result making the whole 'storing edits thing useless. And even worse if a word were to be added / removed. If str were to become Original String of lots of text.... The output would still be the same.


I can see that they're many flaws in the way I'm doing this, but I can't think of any other way.

Snippet:

document.getElementById('go').onclick = function() {
  var str = document.getElementById('a').value.split(' ') || [],
    mod = document.getElementById('b').value.split(' ') || [],
    i, edits = [];

  for (i = 0; i < mod.length; i += 1) {
    edits.push(str[i] === mod[i] ? undefined : mod[i]);
  }

  // Apply edits
  var final = [];
  for (i = 0; i < edits.length; i += 1) {
    final[i] = edits[i] || str[i];
  }
  final = final.join(' ');
  alert(final);
};

document.getElementById('go2').onclick = function() {
  var str = document.getElementById('a').value.split(' ') || [],
    mod = document.getElementById('b').value.split(' ') || [],
    i, edits = [];

  for (i = 0; i < str.length; i += 1) {
    edits.push(str[i] === mod[i] ? undefined : mod[i]);
  }

  for (i = 0; i < str.length; i += 1) {
    str[i] = edits[i] || str[i];
  }
  alert(str.join(' ')); // "Modified String of text..."
};

Base String:
<input id="a">
<br/>Modified String:
<input id="b" />
<br/>
<button id="go">Second method</button>
<button id="go2">First Method</button>

TL;DR:

How would you find the changes between two strings?


I'm dealing with large pieces of text each could be about a megabyte hundred kilobytes. This is running on the browser

解决方案

Running a proper diff using just JavaScript can be potentially slow, but it depends on the performance requirements and the quality of the diff, and of course how often it must be run.

One quite efficient way would be to track the edits when the user is actually editing the document and only store those changes just after the moment they are done. For this you can use for example ACE editor, or any other editor that supports change tracking.

http://ace.c9.io/

ACE is tracking the changes while the document is edited. The ACE editor tracks the commands in easily comprehensible format like:

{"action":"insertText","range":{"start":{"row":0,"column":0},
    "end":{"row":0,"column":1}},"text":"d"}

You can hook to the changes of the ACE editor and listen to the change events:

var changeList = []; // list of changes
// editor is here the ACE editor instance for example
var editor = ace.edit(document.getElementById("editorDivId"));
editor.setValue("original text contents");
editor.on("change", function(e) {
    // e.data has the change
    var cmd = e.data;
    var range = cmd.range;
    if(cmd.action=="insertText") {
        changeList.push([
            1, 
            range.start.row,
            range.start.column,
            range.end.row,
            range.end.column,
            cmd.text
        ])
    }
    if(cmd.action=="removeText") {
        changeList.push([
                2, 
                range.start.row,
                range.start.column,
                range.end.row,
                range.end.column,
                cmd.text
            ])
    }
    if(cmd.action=="insertLines") {
        changeList.push([
                3, 
                range.start.row,
                range.start.column,
                range.end.row,
                range.end.column,
                cmd.lines
            ])
    }
    if(cmd.action=="removeLines") {
        changeList.push([
                4, 
                range.start.row,
                range.start.column,
                range.end.row,
                range.end.column,
                cmd.lines,
                cmd.nl
            ])
    }
});

To learn how it works just create some test runs which capture the changes. Basicly there are only those for commands:

  1. insertText
  2. removeText
  3. insertLines
  4. removeLines

Removing the newline from the text can be a bit tricky.

When you have this list of changes you are ready to replay the changes against the text file. You can even merge similar or overlapping changes into a single change - for example inserts to subsequent characters could be merged into a single change.

There will be some problems when you are testing this, composing the string back to text is not trivial but quite doable and should not be more than around 100 lines of code or so.

The nice thing is that when you are finished, you have also undo and redo commands easily available, so you can replay the whole editing process.

这篇关于从两个字符串中抓取编辑的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆