在浏览器插件中替换大量文本 [英] Replacing a lot of text in browser's addon
问题描述
$ b $ pre $函数escapeRegExp(str){
return str.replace(/([。* +?^ =!:$ {}()| \ [\] \ / \\\\))/ g,\\ $ 1);
$ b $ function replaceAll(find,replace){
return document.body.innerHTML.replace(new RegExp(escapeRegExp(find),'g'),replace);
函数convert2latin(){
for(var i = 0; i< Table.length; i ++){
document.body.innerHTML = replaceAll (表[i] [1],表[i] [0]);
$ b $ p $它可以工作,我可以忽略HTML标签它只能用英文,但问题是性能。当然这是非常非常贫穷的。因为我没有经验的JS,我试图谷歌,发现也许documentFragment可以帮助。
也许我应该使用另一种方法呢?
document.body.innerHTML
一次。 下面应该提供一个第一遍,让它更快:
pre $函数escapeRegExp(str){
return str.replace( /([**+?^=!:${}()|\[\]\/\\\))/g,\\ $ 1);
函数convert2latin(){
newInnerHTML = document.body.innerHTML
for(let i = 0; i< Table.length; i ++){
newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table [i] [1]),'g'),Table [i] [0]);
}
document.body.innerHTML = newInnerHTML
}
你在评论中提到,没有真正需要使用RegExp进行匹配,所以以下情况会更快:
函数convert2latin(){
newInnerHTML = document.body.innerHTML
for(let i = 0; i< Table.length; i ++){
newInnerHTML = newInnerHTML.replace(Table [i ] [1],表[i] [0]);
}
document.body.innerHTML = newInnerHTML
}
如果您确实需要使用RegExp进行匹配,并且要多次执行这些精确的替换,则最好在首次使用之前创建所有RegExp(例如,当 Table < (code> Table [i] [2]
)。
然而,分配给 document.body.innerHTML
是一个不好的方法:
由于8472上面提到的,替换 document.body.innerHTML
的整个内容是完成这个任务的一个非常重要的方式,它有一些显着的缺点,包括可能会破坏其他JavaScript的功能页面和潜在的安全问题。更好的解决办法是只更改 textContent
。
这样做的一个方法是使用 TreeWalker 。这样做的代码可能是这样的:
$ p $ lt; code> function convert2latin(text){
for(let i = 0; i< Table.length; i ++){
text = text.replace(Table [i] [1],Table [i] [0]);
return text
$ b $ //创建TreeWalker
let treeWalker = document.createTreeWalker(document.body,NodeFilter.SHOW_TEXT,{
acceptNode:function(node){
if(node.textContent.length === 0
|| node.parentNode.nodeName ==='SCRIPT'
|| node。 parentNode.nodeName ==='STYLE'
){
//不要包含0长度,< script>或< style>文本节点
return NodeFilter.FILTER_SKIP;
} // else
return NodeFilter.FILTER_ACCEPT;
}
},false);
//在修改DOM之前制作一个节点列表。一旦DOM被修改,TreeWalker
//就可能变成无效(即在第一次修改之后停止)。这样做不需要
//在这种情况下,但是在需要的时候是一个好习惯。
let nodeList = [];
while(treeWalker.nextNode()){
nodeList.push(treeWalker.currentNode);
//遍历所有文本节点,改变文本节点的textContent
nodeList.forEach(function(el){
el.textContent = convert2latin (el.textContent));
});
I'm trying to develop a Firefox add-on that transliterates the text on any page into specific language. Actually it's just a set of 2D arrays which I iterate and use this code
function escapeRegExp(str) {
return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}
function replaceAll(find, replace) {
return document.body.innerHTML.replace(new RegExp(escapeRegExp(find), 'g'), replace);
}
function convert2latin() {
for (var i = 0; i < Table.length; i++) {
document.body.innerHTML = replaceAll(Table[i][1], Table[i][0]);
}
}
It works, and I can ignore HTML tags, as it can be in english only, but the problem is performance. Of course it's very very poor. As I have no experience in JS, I tried to google and found that maybe documentFragment can help.
Maybe I should use another approach at all?
Based on your comments, you appear to have already been told that the most expensive thing is the DOM rebuild that happens when you completely replace the entire contents of the page (i.e. when you assign to document.body.innerHTML
). You are currently doing that for each substitution. This results in Firefox re-rendering the entire page for each substitution you are making. You only need assign to document.body.innerHTML
once, after you have made all of the substitutions.
The following should provide a first pass at making it faster:
function escapeRegExp(str) {
return str.replace(/([.*+?^=!:${}()|\[\]\/\\])/g, "\\$1");
}
function convert2latin() {
newInnerHTML = document.body.innerHTML
for (let i = 0; i < Table.length; i++) {
newInnerHTML = newInnerHTML.replace(new RegExp(escapeRegExp(Table[i][1]), 'g'), Table[i][0]);
}
document.body.innerHTML = newInnerHTML
}
You mention in comments that there is no real need to use a RegExp for the match, so the following would be even faster:
function convert2latin() {
newInnerHTML = document.body.innerHTML
for (let i = 0; i < Table.length; i++) {
newInnerHTML = newInnerHTML.replace(Table[i][1], Table[i][0]);
}
document.body.innerHTML = newInnerHTML
}
If you really need to use a RegExp for the match, and you are going to perform these exact substitutions multiple times, you are better off creating all of the RegExp prior to the first use (e.g. when Table
is created/changed) and storing them (e.g. in Table[i][2]
).
However, assigning to document.body.innerHTML
is a bad way to do this:
As the8472 mentioned, replacing the entire content of document.body.innerHTML
is a very heavy handed way to perform this task, which has some significant disadvantages including probably breaking the functionality of other JavaScript in the page and potential security issues. A better solution would be to change only the textContent
of the text nodes.
One method of doing this is to use a TreeWalker. The code to do so, could be something like:
function convert2latin(text) {
for (let i = 0; i < Table.length; i++) {
text = text.replace(Table[i][1], Table[i][0]);
}
return text
}
//Create the TreeWalker
let treeWalker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT,{
acceptNode: function(node) {
if(node.textContent.length === 0
|| node.parentNode.nodeName === 'SCRIPT'
|| node.parentNode.nodeName === 'STYLE'
) {
//Don't include 0 length, <script>, or <style> text nodes.
return NodeFilter.FILTER_SKIP;
} //else
return NodeFilter.FILTER_ACCEPT;
}
}, false );
//Make a list of nodes prior to modifying the DOM. Once the DOM is modified the TreeWalker
// can become invalid (i.e. stop after the first modification). Doing so is not needed
// in this case, but is a good habit for when it is needed.
let nodeList=[];
while(treeWalker.nextNode()) {
nodeList.push(treeWalker.currentNode);
}
//Iterate over all text nodes, changing the textContent of the text nodes
nodeList.forEach(function(el){
el.textContent = convert2latin(el.textContent));
});
这篇关于在浏览器插件中替换大量文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!