如何在忽略html标签的html文档中突出显示文本查询的搜索结果? [英] How to highlight the search-result of a text-query within an html document ignoring the html tags?
问题描述
我有一个包含html内容的字符串.像这样
I have a string which has html content in it. Something like this
const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."
我使用 dangerouslySetInnerHTML
将其呈现在react组件中.该文本非常长,并且其中包含不同类型的HTML标记.
I render this inside a react component using dangerouslySetInnerHTML
. This text is really long and has different types of HTML tags in it.
我想搜索一个单词,并在用户键入时在该文档中突出显示该单词.该功能类似于浏览器的查找(cmd + f)功能.键入时,文本应突出显示.
I want to search for a word and highlight it in that document as the user is typing. The functionality is similar to the browser's find (cmd + f) feature. As you type the text should get highlighted.
这就是我要寻找的:
user types `an`
const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."
result: "My name is Al<mark>an</mark> and I <span><mark>an</mark></span> <div class="someClass">artist</div>."
我尝试使用此库 https://github.com/bvaughn/react-highlight-,但是问题是它也会突出显示标签内的文本并弄乱了内容.
I tried using this library https://github.com/bvaughn/react-highlight-words but the issue is it highlights the text inside the tags too and messes up the content.
result: "My name is Al<mark>an</mark> and I <sp<mark>an</mark>><mark>an</mark></span> <div class="someClass">artist</div>."
然后,尽管我将使用自己的正则表达式并提出此正则表达式:
Then I though I'll use my own regex and came up with this regex:
const regex = new RegExp(((`${searchedText}`)(?![^<>]*>)))
但是react(eslint)在?
处抛出此错误:
but react(eslint) throws this error at ?
:
This experimental syntax requires enabling the parser plugin: 'partial Application'
这是我的代码:
get highlightedText() {
if (searchText === '') return self.renderedText;
const regex = new RegExp((`${searchText}`)((?![^<>]*>)));
const parts = self.renderedText.split(regex);
return parts
.map(part => (regex.test(part) ? `<mark>${part}</mark>` : part))
.join('');
},
我不确定自己在做什么错.当我使用regextester.com测试正则表达式时,正则表达式工作得非常好
I am not sure what I am doing wrong. The regex works perfectly fine as I tested the regex using regextester.com
感谢您的帮助.谢谢!
推荐答案
基于正则表达式的一种在字符串模板级别处理html标记的方法仅适用于严格有效且未嵌套的标记,例如由OP给出.
An approach, based on regular expressions, that manipulates html markup at string-template level, does only work for strictly valid and unnested markup, like the example that was given by the OP.
const text = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'
这种方法不适用于任何嵌套的html标记,如下所示...
Such an approach will not work for any nested html markup like the following one ...
const text = 'My name is Alan and I\'m <span><em>an</em></span> <div><em>artist</em></div>.'
至于OP提供的用例,为了不意外地处理任何html标记,正则表达式需要匹配并存储开头和结尾标记以及随附的文本内容.因此,需要使用捕获组.
As for the OP's provided use case, in order to not accidentally manipulate any html markup, a regex needs to match and memorize opening and closing tags as well as the enclosed text content. Thus one needs to work with Capturing Groups.
现提供使用命名组的示例正则表达式...
An example-regex that uses Named Groups is hereby provided ...
const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'
const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
[...test.matchAll(regXSimpleMarkup)].forEach((match, idx) =>
console.log(`match ${ idx } :: groups : `, match.groups)
);
console.log([...test.matchAll(regXSimpleMarkup)]);
.as-console-wrapper { min-height: 100%!important; top: 0; }
..,但是从上述运行代码的结果可以看出,一个不匹配/捕获html标记之前或之后的所有其他文本内容.因此,应该利用捕获正则表达式和 split
...
.., but as one can see from the result of the above running code, one does not match/capture all the other text content before or after an html tag. Thus one should take advantage of the combination of a capturing regex and split
...
const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'
// const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);
console.log(test.split(regXSimpleMarkup));
.as-console-wrapper { min-height: 100%!important; top: 0; }
如上所述,对于OP给出的示例,结果是标记片段的干净分隔列表.现在可以按以下方式逐步处理此列表:仅对每个检测到的文本内容都应用搜索和替换机制(搜索子字符串并创建突出显示标记),而在每个迭代步骤中,也将以编程方式构建新的html标记字符串.
As it is proved above, for the OP's given example the result is a cleanly separated list of markup fragments. This list now could be stepwise processed in a way that only for each detected text content a search and replace mechanism (search for substring and create highlighting markup) gets applied, whilst with each iteration step the new html markup string gets build programmatically as well.
// How to escape regular expression special characters using javascript?
//
// [https://stackoverflow.com/questions/3115150/how-to-escape-regular-expression-special-characters-using-javascript/9310752#9310752]
//
function escapeRegExpSearchString(text) {
return text.replace(/[-[\]{}()*+?.,\\^$|#\\s]/g, '\\$&');
}
function createTextSearchMarkup(fragment, search, isCaseSensitive) {
const flags = `g${ !!isCaseSensitive ? '' : 'i' }`;
search = escapeRegExpSearchString(search);
search = RegExp(`(${ search })`, flags);
return fragment.replace(search, '<mark>$1</mark>');
}
function concatTextSearchMarkup(collector, fragment) {
const regXTag = (/^<[^>]+>$/);
if (!regXTag.test(fragment)) {
fragment = createTextSearchMarkup(
fragment,
collector.search,
collector.isCaseSensitive
);
}
collector.markup = [collector.markup, fragment].join(''); // concat.
return collector;
}
function getHighlightTextSearchMarkup(markup, search, isCaseSensitive) {
//const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);
return markup.split(regXSimpleMarkup).reduce(
concatTextSearchMarkup, {
isCaseSensitive,
search,
markup: ''
}
).markup;
}
const markup = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'
console.log('original markup => ', markup);
console.log(
'case insensitive search for "an" => ',
getHighlightTextSearchMarkup(markup, 'an')
);
console.log(
'case insensitive search for "i" => ',
getHighlightTextSearchMarkup(markup, 'i')
);
console.log(
'case sensitive search for "i" => ',
getHighlightTextSearchMarkup(markup, 'i', true)
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
注意
对于html模板字符串中的任何嵌套标记,都需要一种利用浏览器本机html解析/渲染(例如通过)的方法.一个HTML(片段)节点,该节点永远都不是浏览器DOM的一部分.
For any nested markup within html template strings one needs an approach that takes advantage of a browsers native html parsing/rendering via e.g. an HTML (fragment) node that at no time is part of the browser DOM.
这篇关于如何在忽略html标签的html文档中突出显示文本查询的搜索结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!