使用正则表达式删除html实体并提取文本内容 [英] Remove html entities and extract text content using regex

查看：37 发布时间：2021/7/6 19:40:55 regex

本文介绍了使用正则表达式删除html实体并提取文本内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个仅包含 HTML 实体的文本，例如 < 和   我需要将其全部删除并仅获取文本内容:

I have a text containing just HTML entities such as < and   I need to remove this all and get just the text content:

&nbspHello there&lt;testdata&gt;

所以，我需要从这个部分获得 Hello there 和 testdata.有没有办法使用负前瞻来做到这一点?

So, I need to get Hello there and testdata from this section. Is there any way of using negative lookahead to do this?

我尝试了以下方法:/((?!&.+;).)+/ig 但这似乎效果不佳.那么，我如何才能从那里提取所需的文本?

I tried the following: /((?!&.+;).)+/ig but this doesnt seem to work very well. So, how can I just extract the required text from there?

推荐答案

这里有 2 个建议:

1) 使用 /(&.+;)/ig 匹配所有实体.然后，使用您使用的任何编程语言，将这些匹配项替换为空字符串.例如，在 php 中使用 preg_replace;在 C# 中使用 Regex.Replace.请参阅此 SO 以了解适用于更多情况的类似解决方案:How to remove html special chars?

1) Match all the entities using /(&.+;)/ig. Then, using whatever programming language you are using, replace those matches with an empty string. For example, in php use preg_replace; in C# use Regex.Replace. See this SO for a similar solution that accounts for more cases: How to remove html special chars?

2) 如果您真的想使用纯文本部分来执行此操作，您可以尝试这样的操作: /(?:^|;)([^&;]+)(?:&|$)/ig.它实际尝试做的事情是将 ; 和 & 之间的部分与没有实体的开始和结束的特殊情况相匹配.这可能不是要走的路，您可能会遇到不同的情况.

2) If you really want to do this using the plaintext portions, you could try something like this: /(?:^|;)([^&;]+)(?:&|$)/ig. What its actually trying to do it match the pieces between; and & with special cases for start and end without entities. This is probably not the way to go, you're likely to run into different cases this breaks.

这篇关于使用正则表达式删除html实体并提取文本内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用正则表达式删除html实体并提取文本内容 [英] Remove html entities and extract text content using regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用正则表达式删除html实体并提取文本内容 [英] Remove html entities and extract text content using regex

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭