在浏览器插件中替换大量文本 [英] Replacing a lot of text in browser's addon

查看:266
本文介绍了在浏览器插件中替换大量文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图开发一个Firefox附加组件,将任何页面上的文本音译成特定的语言。其实它只是一个二维数组,我迭代和使用这个代码
$ b $ pre $函数escapeRegExp(str){
return str.replace(/([。* +?^ =!:$ {}()| \ [\] \ / \\\\))/ g,\\ $ 1);

$ b $ function replaceAll(find,replace){
return document.body.innerHTML.replace(new RegExp(escapeRegExp(find),'g'),replace);


函数convert2latin(){
for(var i = 0; i< Table.length; i ++){
document.body.innerHTML = replaceAll (表[i] [1],表[i] [0]);




$ b $ p $它可以工作,我可以忽略HTML标签它只能用英文,但问题是性能。当然这是非常非常贫穷的。因为我没有经验的JS,我试图谷歌,发现也许documentFragment可以帮助。

也许我应该使用另一种方法呢?

document.body.innerHTML )。您目前正在为每个替代做这个。这导致Firefox重新呈现您正在制作的每个替换的整个页面。您完成所有的替换后,您只需要分配给 document.body.innerHTML 一次。

下面应该提供一个第一遍,让它更快:

pre $函数escapeRegExp(str){
return str.replace( /([**+?^=!:${}()|\[\]\/\\\))/g,\\ $ 1);


函数convert2latin(){
newInnerHTML = document.body.innerHTML
for(let i = 0; i< Table.length; i ++){
newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table [i] [1]),'g'),Table [i] [0]);
}
document.body.innerHTML = newInnerHTML
}

你在评论中提到,没有真正需要使用RegExp进行匹配,所以以下情况会更快:

 函数convert2latin(){
newInnerHTML = document.body.innerHTML
for(let i = 0; i< Table.length; i ++){
newInnerHTML = newInnerHTML.replace(Table [i ] [1],表[i] [0]);
}
document.body.innerHTML = newInnerHTML
}

如果您确实需要使用RegExp进行匹配,并且要多次执行这些精确的替换,则最好在首次使用之前创建所有RegExp(例如,当 Table < (code> Table [i] [2] )。

然而,分配给 document.body.innerHTML 是一个不好的方法:

由于8472上面提到的,替换 document.body.innerHTML 的整个内容是完成这个任务的一个非常重要的方式,它有一些显着的缺点,包括可能会破坏其他JavaScript的功能页面和潜在的安全问题。更好的解决办法是只更改 textContent

这样做的一个方法是使用 TreeWalker 。这样做的代码可能是这样的:

$ p $ lt; code> function convert2latin(text){
for(let i = 0; i< Table.length; i ++){
text = text.replace(Table [i] [1],Table [i] [0]);

return text

$ b $ //创建TreeWalker
let treeWalker = document.createTreeWalker(document.body,NodeFilter.SHOW_TEXT,{
acceptNode:function(node){
if(node.textContent.length === 0
|| node.parentNode.nodeName ==='SCRIPT'
|| node。 parentNode.nodeName ==='STYLE'
){
//不要包含0长度,< script>或< style>文本节点
return NodeFilter.FILTER_SKIP;
} // else
return NodeFilter.FILTER_ACCEPT;
}
},false);
//在修改DOM之前制作一个节点列表。一旦DOM被修改,TreeWalker
//就可能变成无效(即在第一次修改之后停止)。这样做不需要
//在这种情况下,但是在需要的时候是一个好习惯。
let nodeList = [];
while(treeWalker.nextNode()){
nodeList.push(treeWalker.currentNode);

//遍历所有文本节点,改变文本节点的textContent
nodeList.forEach(function(el){

el.textContent = convert2latin (el.textContent));
});


I'm trying to develop a Firefox add-on that transliterates the text on any page into specific language. Actually it's just a set of 2D arrays which I iterate and use this code

function escapeRegExp(str) {
    return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}

function replaceAll(find, replace) {
    return document.body.innerHTML.replace(new RegExp(escapeRegExp(find), 'g'), replace);
}

function convert2latin() {
    for (var i = 0; i < Table.length; i++) {
        document.body.innerHTML = replaceAll(Table[i][1], Table[i][0]);
    }
}

It works, and I can ignore HTML tags, as it can be in english only, but the problem is performance. Of course it's very very poor. As I have no experience in JS, I tried to google and found that maybe documentFragment can help.
Maybe I should use another approach at all?

解决方案

Based on your comments, you appear to have already been told that the most expensive thing is the DOM rebuild that happens when you completely replace the entire contents of the page (i.e. when you assign to document.body.innerHTML). You are currently doing that for each substitution. This results in Firefox re-rendering the entire page for each substitution you are making. You only need assign to document.body.innerHTML once, after you have made all of the substitutions.

The following should provide a first pass at making it faster:

function escapeRegExp(str) {
    return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}

function convert2latin() {
    newInnerHTML = document.body.innerHTML
    for (let i = 0; i < Table.length; i++) {
        newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table[i][1]), 'g'), Table[i][0]);
    }
    document.body.innerHTML = newInnerHTML
}

You mention in comments that there is no real need to use a RegExp for the match, so the following would be even faster:

function convert2latin() {
    newInnerHTML = document.body.innerHTML
    for (let i = 0; i < Table.length; i++) {
        newInnerHTML = newInnerHTML.replace(Table[i][1], Table[i][0]);
    }
    document.body.innerHTML = newInnerHTML
}

If you really need to use a RegExp for the match, and you are going to perform these exact substitutions multiple times, you are better off creating all of the RegExp prior to the first use (e.g. when Table is created/changed) and storing them (e.g. in Table[i][2]).

However, assigning to document.body.innerHTML is a bad way to do this:

As the8472 mentioned, replacing the entire content of document.body.innerHTML is a very heavy handed way to perform this task, which has some significant disadvantages including probably breaking the functionality of other JavaScript in the page and potential security issues. A better solution would be to change only the textContent of the text nodes.

One method of doing this is to use a TreeWalker. The code to do so, could be something like:

function convert2latin(text) {
    for (let i = 0; i < Table.length; i++) {
        text = text.replace(Table[i][1], Table[i][0]);
    }
    return text
}

//Create the TreeWalker
let treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,{
    acceptNode: function(node) { 
        if(node.textContent.length === 0
            || node.parentNode.nodeName === 'SCRIPT' 
            || node.parentNode.nodeName === 'STYLE'
        ) {
            //Don't include 0 length, <script>, or <style> text nodes.
            return NodeFilter.FILTER_SKIP;
        } //else
        return NodeFilter.FILTER_ACCEPT;
    }
}, false );
//Make a list of nodes prior to modifying the DOM. Once the DOM is modified the TreeWalker
//  can become invalid (i.e. stop after the first modification). Doing so is not needed
//  in this case, but is a good habit for when it is needed.
let nodeList=[];
while(treeWalker.nextNode()) {
    nodeList.push(treeWalker.currentNode);
}
//Iterate over all text nodes, changing the textContent of the text nodes 
nodeList.forEach(function(el){

    el.textContent = convert2latin(el.textContent));
});

这篇关于在浏览器插件中替换大量文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆