如何在忽略html标签的html文档中突出显示文本查询的搜索结果? [英] How to highlight the search-result of a text-query within an html document ignoring the html tags?

查看:105
本文介绍了如何在忽略html标签的html文档中突出显示文本查询的搜索结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含html内容的字符串.像这样

I have a string which has html content in it. Something like this

  const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."

我使用 dangerouslySetInnerHTML 将其呈现在react组件中.该文本非常长,并且其中包含不同类型的HTML标记.

I render this inside a react component using dangerouslySetInnerHTML. This text is really long and has different types of HTML tags in it.

我想搜索一个单词,并在用户键入时在该文档中突出显示该单词.该功能类似于浏览器的查找(cmd + f)功能.键入时,文本应突出显示.

I want to search for a word and highlight it in that document as the user is typing. The functionality is similar to the browser's find (cmd + f) feature. As you type the text should get highlighted.

这就是我要寻找的:

 user types `an`
 const text = "My name is Alan and I <span>an</span> <div class="someClass">artist</div>."
result: "My name is Al<mark>an</mark> and I <span><mark>an</mark></span> <div class="someClass">artist</div>."

我尝试使用此库 https://github.com/bvaughn/react-highlight-,但是问题是它也会突出显示标签内的文本并弄乱了内容.

I tried using this library https://github.com/bvaughn/react-highlight-words but the issue is it highlights the text inside the tags too and messes up the content.

result: "My name is Al<mark>an</mark> and I <sp<mark>an</mark>><mark>an</mark></span> <div class="someClass">artist</div>."

然后,尽管我将使用自己的正则表达式并提出此正则表达式:

Then I though I'll use my own regex and came up with this regex:

const regex = new RegExp(((`${searchedText}`)(?![^<>]*>)))

但是react(eslint)在?处抛出此错误:

but react(eslint) throws this error at ?:

This experimental syntax requires enabling the parser plugin: 'partial Application'

这是我的代码:

get highlightedText() {
      if (searchText === '') return self.renderedText;
      const regex = new RegExp((`${searchText}`)((?![^<>]*>)));
      const parts = self.renderedText.split(regex);
      return parts
         .map(part => (regex.test(part) ? `<mark>${part}</mark>` : part))
         .join('');
    },

我不确定自己在做什么错.当我使用regextester.com测试正则表达式时,正则表达式工作得非常好

I am not sure what I am doing wrong. The regex works perfectly fine as I tested the regex using regextester.com

感谢您的帮助.谢谢!

推荐答案

基于正则表达式的一种在字符串模板级别处理html标记的方法仅适用于严格有效且未嵌套的标记,例如由OP给出.

An approach, based on regular expressions, that manipulates html markup at string-template level, does only work for strictly valid and unnested markup, like the example that was given by the OP.

const text = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

这种方法不适用于任何嵌套的html标记,如下所示...

Such an approach will not work for any nested html markup like the following one ...

const text = 'My name is Alan and I\'m <span><em>an</em></span> <div><em>artist</em></div>.'

至于OP提供的用例,为了不意外地处理任何html标记,正则表达式需要匹配并存储开头和结尾标记以及随附的文本内容.因此,需要使用捕获组.

As for the OP's provided use case, in order to not accidentally manipulate any html markup, a regex needs to match and memorize opening and closing tags as well as the enclosed text content. Thus one needs to work with Capturing Groups.

现提供使用命名组的示例正则表达式...

An example-regex that uses Named Groups is hereby provided ...

const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);

[...test.matchAll(regXSimpleMarkup)].forEach((match, idx) =>
  console.log(`match ${ idx } :: groups : `, match.groups)
);

console.log([...test.matchAll(regXSimpleMarkup)]);

.as-console-wrapper { min-height: 100%!important; top: 0; }

..,但是从上述运行代码的结果可以看出,一个不匹配/捕获html标记之前或之后的所有其他文本内容.因此,应该利用捕获正则表达式和 split ...

.., but as one can see from the result of the above running code, one does not match/capture all the other text content before or after an html tag. Thus one should take advantage of the combination of a capturing regex and split ...

const test = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

// const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);

console.log(test.split(regXSimpleMarkup));

.as-console-wrapper { min-height: 100%!important; top: 0; }

如上所述,对于OP给出的示例,结果是标记片段的干净分隔列表.现在可以按以下方式逐步处理此列表:仅对每个检测到的文本内容都应用搜索和替换机制(搜索子字符串并创建突出显示标记),而在每个迭代步骤中,也将以编程方式构建新的html标记字符串.

As it is proved above, for the OP's given example the result is a cleanly separated list of markup fragments. This list now could be stepwise processed in a way that only for each detected text content a search and replace mechanism (search for substring and create highlighting markup) gets applied, whilst with each iteration step the new html markup string gets build programmatically as well.

//  How to escape regular expression special characters using javascript?
//
//  [https://stackoverflow.com/questions/3115150/how-to-escape-regular-expression-special-characters-using-javascript/9310752#9310752]
//
function escapeRegExpSearchString(text) {
  return text.replace(/[-[\]{}()*+?.,\\^$|#\\s]/g, '\\$&');
}


function createTextSearchMarkup(fragment, search, isCaseSensitive) {
  const flags = `g${ !!isCaseSensitive ? '' : 'i' }`;

  search = escapeRegExpSearchString(search);
  search = RegExp(`(${ search })`, flags);

  return fragment.replace(search, '<mark>$1</mark>');
}

function concatTextSearchMarkup(collector, fragment) {
  const regXTag = (/^<[^>]+>$/);

  if (!regXTag.test(fragment)) {

    fragment = createTextSearchMarkup(
      fragment,
      collector.search,
      collector.isCaseSensitive
    );
  }
  collector.markup = [collector.markup, fragment].join(''); // concat.

  return collector;
}

function getHighlightTextSearchMarkup(markup, search, isCaseSensitive) {
//const regXSimpleMarkup = (/(?<tagStart><[^>]+>)(?<text>[^<]+)(?<tagEnd><\/[^>]+>)/g);
  const regXSimpleMarkup = (/(<[^>]+>)([^<]+)(<\/[^>]+>)/g);

  return markup.split(regXSimpleMarkup).reduce(
    concatTextSearchMarkup, {
      isCaseSensitive,
      search,
      markup: ''
    }
  ).markup;
}


const markup = 'My name is Alan and I\'m <span>an</span> <div class="someClass">artist</div>.'

console.log('original markup => ', markup);

console.log(
  'case insensitive search for "an" => ',
  getHighlightTextSearchMarkup(markup, 'an')
);
console.log(
  'case insensitive search for "i" => ',
  getHighlightTextSearchMarkup(markup, 'i')
);
console.log(
  'case sensitive search for "i" => ',
  getHighlightTextSearchMarkup(markup, 'i', true)
);

.as-console-wrapper { min-height: 100%!important; top: 0; }

注意

对于html模板字符串中的任何嵌套标记,都需要一种利用浏览器本机html解析/渲染(例如通过)的方法.一个HTML(片段)节点,该节点永远都不是浏览器DOM的一部分.

For any nested markup within html template strings one needs an approach that takes advantage of a browsers native html parsing/rendering via e.g. an HTML (fragment) node that at no time is part of the browser DOM.

这篇关于如何在忽略html标签的html文档中突出显示文本查询的搜索结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆