用正则表达式选择HTML文本元素？ [英] select HTML text element with regex?

查看：132 发布时间：2019/6/7 19:58:10 javascript jquery regex html-parsing text-extraction

本文介绍了用正则表达式选择HTML文本元素？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想在HTML文档中查找& copy; ，并基本上获得版权所属的实体。

I want to look for © in an HTML document, and basically get the entity the copyright is attributed to.

版权线显示了几种不同的方式：

The copyright line shows up a couple of different ways:

<p class="bg-copy">&copy; 2011  The New York Times Company</p>

或

<a href="http://www.nytimes.com/ref/membercenter/help/copyright.html">
&copy; 2011</a> 
<a href="http://www.nytco.com/">The New York Times Company</a>

或

<br>Published since 1996<br>Copyright &copy; CounterPunch<br>
All rights reserved.<br>

我想忽略日期和干预标签，只是得到纽约时报公司或反击。

I want to ignore the dates and intervening tags and just get "The New York Times Company" or "Counterpunch".

我在使用JavaScript或JQuery的正则表达方式上找不到多少，但我觉得它可能会导致严重的问题。如果有更好的方法，请告诉我。

I haven't been able to find much on using regex with JavaScript or JQuery, though I get the impression that it can lead to major headaches. If there is a better approach to this, let me know.

推荐答案

对于强大的解决方案，您可能需要组合使用DOM导航和一些启发式。您的示例可以使用正则表达式解决，但可能有更多场景...

For a robust solution, you will probably need a combination of DOM navigation and some heuristics. Your examples are solvable with regex, but there are so many more scenarios possible...

&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)

适用于您的三个样本。但仅适用于他们和类似案例。

works for your three samples. But ONLY for them and similar cases.

请参阅 rubular

说明：

&copy; // copyright symbol
[\s\d]* // followed by spaces or digits 
(?:</.+?>[^>]*>)? // maybe followed by a closing tag and another opening one
([^<]*) // than match anything up to the next tag

请参阅此关于如何在javascript中使用javascript的答案。基本上你可以使用匹配（/ regex /）函数：

See this answer on how to use in javascript with jquery. Basically you can use the match(/regex/) function:

var result = string.match(/&copy;[\s\d]*(?:<\/.+?>[^>]*>)?([^<]*)/)

这篇关于用正则表达式选择HTML文本元素？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

用正则表达式选择HTML文本元素？ [英] select HTML text element with regex?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

用正则表达式选择HTML文本元素？ [英] select HTML text element with regex?

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭