最好的算法以突出在一个HTML文件中给定单词的列表 [英] Best algorithm to highlight a list of given words in an HTML file

查看:139
本文介绍了最好的算法以突出在一个HTML文件中给定单词的列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些HTML文件,在这我管不着。因此,我无法改变自己的结构或标记。

I have some HTML files, upon which I have no control. Thus I can't change their structure or markup.

有关每个这些HTML文件,单词的列表将基于另一种算法找到。这些话应在HTML文本高亮显示。例如,如果HTML标记是:

For each of these HTML files, a list of words would be found based on another algorithm. These words should be highlighted in the text of HTML. For example if the HTML markup is:

<p>
Monkeys are going to die soon, if we don't stop killing them. 
So, we have to try hard to persuade hunters not to hunt monkeys. 
Monkeys are very intelligent, and they should survive. 
In fact, they deserve to survive.
</p>

和的词列表是:

are, we, monkey

的结果应该是这样的:

the result should be something like:

<p>
    <span class='highlight'>Monkeys</span> 
    <span class='highlight'>are</span> 
going to die soon, if 
    <span class='highlight'>we</span> 
don't stop killing them. 
So, 
    <span class='highlight'>we</span> 
have to try hard to persuade hunters 
not to hunt 
    <span class='highlight'>monkeys</span>
. They 
    <span class='highlight'>are</span> 
very intelligent, and they should survive. 
In fact, they deserve to survive.
</p>

高亮显示的算法应该:

The highlighting algorithm should:

  1. 在不区分大小写
  2. 可以使用JavaScript编写的(这种情况发生的内部浏览器)(jQuery是欢迎)
  3. 进行快速(适用于某一本书的文字,几乎800页)
  4. 在不显示浏览器的著名的停止脚本对话框
  5. 适用于肮脏的HTML文件(如支撑无效的HTML标记,比方说未闭合的

    元素)(其中一些文件是微软Word的HTML出口,我觉得你有什么我的意思是肮脏的! )

  6. 应该preserve原始的HTML标记
  7. (无标记的缺失,没有标记的变化,除了包装意字的元素在里面,没有嵌套的变化。HTML之前应该和编辑,除了突出的话后看起来是一样的)
  1. be case-insensitive
  2. be written in JavaScript (this happens inside browser) (jQuery is welcomed)
  3. be fast (be applicable for the text of a given book with almost 800 pages)
  4. not showing browser's famous "stop script" dialog
  5. be applicable for dirty HTML files (like supporting invalid HTML markup, say for example unclosed

    elements) (some of these files are HTML export of MS Word, and I think you got what I mean by dirty!!!)

  6. should preserve original HTML markup (no markup deletion, no markup change except wrapping intended words inside an element, no nesting change. HTML should look the same before and after edit except highlighted words)

我所做至今:

  1. 我得到在JavaScript中的单词列表在一个数组像 [是,我们,猴子]
  2. 我尽量选择在浏览器文本节点(它现在有故障)
  3. 在每个文本节点我环路,并为每个文本节点,我遍历列表中的每个单词,并设法找到它,把它包一个元素
  4. 里面
  1. I get the list of words in JavaScript in an array like ["are", "we", "monkey"]
  2. I try to select text nodes in the browser (which is faulty now)
  3. I loop over each text node, and for each text node, I loop over each word in the list and try to find it and wrap it inside an element

请注意,您可以在线观看这里(用户名:demo@phis.ir,传:演示)。此外当前的脚本可以在页面的源端可见。

Please note that you can watch it online here (username: demo@phis.ir, pass: demo). Also current script could be seen at the end of the page's source.

推荐答案

连接这些你的话与 | 成一个字符串,然后跨preT的字符串作为正则表达式,然后替换事件再度发生的充分匹配的高亮标记包围。

Concatenate your words with | into a string, and then interpret the string as a regex, and then substitute occurences with the full match surrounded by the highlight tags.

这篇关于最好的算法以突出在一个HTML文件中给定单词的列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆