调整正则表达式以忽略链接HTML标记中的其他内容 [英] Adjust regex to ignore anything else inside link HTML tags

查看:106
本文介绍了调整正则表达式以忽略链接HTML标记中的其他内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有这个正则表达式:

 < a(?:。*)href =(。*) (?:*)>(*)< \ / A> 

到目前为止,我已经能够使它匹配具有额外属性的HTML链接标签。像类和目标等等,这是可行的。



我现在想要做的是调整它,使其匹配并忽略链接本身内的任何其他标记如果有的话),因为我只想要链接的文本和地址。我不确定如何做到这一点。 总是使用DOM分析而不是正则表达式

这已经被多次提出。并且基于对越来越复杂的正则表达式形成的评论,仅检查DOM将更容易。以下面的例子为例:



function fragmentFromString(strHTML){return document。创建范围()。createContextualFragment(strHTML);}让html =`< a data-popup-text =带我去< a href ='http://www.google.com'>搜索引擎< A>中href =testing.htmldata-id =1data-popup-text =带我去< a href ='http://www.google.com'>搜索引擎< / a> >< p>< p>< p>< p>< p>< p>< / span>这个< / span>< / p>< / a>; let fragment = fragmentFromString(html); let aTags = Array.from(fragment.querySelectorAll('a' )); aTags = aTags.map(a => {return {href:a.href,text:a.textContent}}); console.log(aTags);

上面的代码将把一串HTML转化为一个片段中的实际DOM。您仍然需要在某处添加片段,但重点是,您现在可以查询一个标签。上面的代码为您提供了一个包含每个 a 标签,它们的href值和innerText的数据的对象数组,减去所有的html。






原始答案。不要使用它,它会作为上下文来解决真正的问题:

我稍微改变了一点以使用非贪婪格式(。 *?)。它也将避免提前结束,因为@Gaby aka G. Petrioli指出了属性中的HTML结束。

 < ?* HREF = (。*?)(?:[^ ] *);(。*)+>< \ / A> 

查看JS小提琴


So I have this regex:

<a(?:.*)href="(.*)"(?:.*)>(.*)<\/a>

So far I have been able to get it to match HTML link tags that have extra attributes in them. Like classes and targets and so on, which works.

What I now want to do, is to adjust it so it matches and ignores any other tags inside the link itself (if there is any), as I only want the text of the link along with the address. I am unsure about the best way to do this.

解决方案

Always Use DOM Parsing instead of regex

This has been suggested a multitude of times. And based on the comments to the increasingly complicated regex forming, it would be easier to examine just DOM. Take the following for example:

function fragmentFromString(strHTML) {
  return document.createRange().createContextualFragment(strHTML);
}

let html = `<a data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>" href="testing.html" data-id="1" data-popup-text="take me to <a href='http://www.google.com'>a search engine</a>"><p>Testing <span>This</span></p></a>`;
let fragment = fragmentFromString(html);
let aTags = Array.from(fragment.querySelectorAll('a'));

aTags = aTags.map(a => {
  return {
    href: a.href,
    text: a.textContent
  }
});

console.log(aTags);

The above will turn a string of HTML into actual DOM inside of a fragment. You still still need to append that fragment somewhere, but the point is, that you can now query the a tags. The above code gives you an array of objects that contain the data for each a tag, their href value, and the innerText, minus all the html.


Original answer. Don't use it, it stays to serve as context to the real problem:

I changed this a little to use a non-greedy format (.*?). It will also avoid early ending because of ending html in an attribute as pointed out by @Gaby aka G. Petrioli.

<.*?href="(.*?)"(?:[^"]*")+>(.*)<\/a>

Check out the JS fiddle

这篇关于调整正则表达式以忽略链接HTML标记中的其他内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆