调整正则表达式以忽略链接HTML标记中的其他内容 [英] Adjust regex to ignore anything else inside link HTML tags

查看：106 发布时间：2018/6/25 18:26:30 javascript html regex

本文介绍了调整正则表达式以忽略链接HTML标记中的其他内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我有这个正则表达式：

 < a（？：。*）href =（。*） （？：*）>（*）< \ / A>

到目前为止，我已经能够使它匹配具有额外属性的HTML链接标签。像类和目标等等，这是可行的。

我现在想要做的是调整它，使其匹配并忽略链接本身内的任何其他标记如果有的话），因为我只想要链接的文本和地址。我不确定如何做到这一点。 总是使用DOM分析而不是正则表达式

这已经被多次提出。并且基于对越来越复杂的正则表达式形成的评论，仅检查DOM将更容易。以下面的例子为例：

function fragmentFromString（strHTML）{return document。创建范围（）。createContextualFragment（strHTML）;}让html =`< a data-popup-text =带我去< a href ='http：//www.google.com'>搜索引擎< A>中href =testing.htmldata-id =1data-popup-text =带我去< a href ='http：//www.google.com'>搜索引擎< / a> >这个< / a>; let fragment = fragmentFromString（html）; let aTags = Array.from（fragment.querySelectorAll（'a' ））; aTags = aTags.map（a => {return {href：a.href，text：a.textContent}}）; console.log（aTags）;

上面的代码将把一串HTML转化为一个片段中的实际DOM。您仍然需要在某处添加片段，但重点是，您现在可以查询一个标签。上面的代码为您提供了一个包含每个 a 标签，它们的href值和innerText的数据的对象数组，减去所有的html。

原始答案。不要使用它，它会作为上下文来解决真正的问题：

我稍微改变了一点以使用非贪婪格式（。 *？）。它也将避免提前结束，因为@Gaby aka G. Petrioli指出了属性中的HTML结束。
< ？* HREF = （。*？）（？：[^ ] *）;（。*）+>< \ / A>
查看JS小提琴

So I have this regex:
<a(?:.*)href="(.*)"(?:.*)>(.*)<\/a>
So far I have been able to get it to match HTML link tags that have extra attributes in them. Like classes and targets and so on, which works.

What I now want to do, is to adjust it so it matches and ignores any other tags inside the link itself (if there is any), as I only want the text of the link along with the address. I am unsure about the best way to do this.
解决方案
Always Use DOM Parsing instead of regex

This has been suggested a multitude of times. And based on the comments to the increasingly complicated regex forming, it would be easier to examine just DOM. Take the following for example:

function fragmentFromString(strHTML) { return document.createRange().createContextualFragment(strHTML); } let html = `<a data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>" href="testing.html" data-id="1" data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>">Testing This</a>`; let fragment = fragmentFromString(html); let aTags = Array.from(fragment.querySelectorAll('a')); aTags = aTags.map(a => { return { href: a.href, text: a.textContent } }); console.log(aTags);

The above will turn a string of HTML into actual DOM inside of a fragment. You still still need to append that fragment somewhere, but the point is, that you can now query the a tags. The above code gives you an array of objects that contain the data for each a tag, their href value, and the innerText, minus all the html.

Original answer. Don't use it, it stays to serve as context to the real problem:

I changed this a little to use a non-greedy format (.*?). It will also avoid early ending because of ending html in an attribute as pointed out by @Gaby aka G. Petrioli.
<.*?href="(.*?)"(?:[^"]*")+>(.*)<\/a>
Check out the JS fiddle

这篇关于调整正则表达式以忽略链接HTML标记中的其他内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

调整正则表达式以忽略链接HTML标记中的其他内容 [英] Adjust regex to ignore anything else inside link HTML tags

问题描述

Always Use DOM Parsing instead of regex

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

调整正则表达式以忽略链接HTML标记中的其他内容 [英] Adjust regex to ignore anything else inside link HTML tags

问题描述

Always Use DOM Parsing instead of regex

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭