获取“标题”使用Regex从html链接获得属性 [英] Get "Title" attribute from html link using Regex

查看：136 发布时间：2018/6/26 20:42:01 c# .net html regex

本文介绍了获取“标题”使用Regex从html链接获得属性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下正则表达式来匹配由我们的自定义cms生成的页面上的所有链接标记。 $ < a \ S +（（:( ?: \w + \s * = \s *）（?: \w + |？ [^] *| '[^'] *'））？* \s * href\s * = \s *？（？< URL> \w + | [^] *| '[^'] *'）（:( ?: \s + \w + \\ \\ * *（*：\ s *）（?: \ w + |[^] *|'[^'] *'））*？）>。+？< / a>

我们使用c＃循环遍历所有匹配项并在每个链接（用于跟踪软件）之前添加一个onclick事件渲染页面内容
我需要解析链接并为onclick函数添加一个参数，即链接名称。

我打算修改正则表达式以获取以下子组：

链接的title属性

如果链接包含图片标签获得
图片的替代文本

链接文本 b然后我可以检查每个小组与aqquir的匹配情况e相关的链接名称。

如何修改上述正则表达式来完成此操作，或者我可以使用c＃代码实现相同的想法？
解决方案
正则表达式在解析HTML时存在根本性问题（请参阅您能否提供一些为什么很难用正则表达式分析XML和HTML？）为什么）。你需要的是一个HTML解析器。有关使用各种解析器的示例，请参阅您能否提供一个使用您最喜爱的解析器解析HTML的示例？。

特别是您可能对 HTMLAgilityPack答案。

I have the following Regex to match all link tags on a page generated from our custom cms
<a\s+((?:(?:\w+\s*=\s*)(?:\w+|"[^"]*"|'[^']*'))*?\s*href\s*=\s*(?<url>\w+|"[^"]*"|'[^']*')(?:(?:\s+\w+\s*=\s*)(?:\w+|"[^"]*"|'[^']*'))*?)>.+?</a>
We are using c# to loop through all matches of this and add an onclick event to each link (for tracking software) before rendering the page content. I need to parse the link and add a parameter to the onclick function which is the "link name".

I was going to modify the regex to get the following subgroups

The title attribute of the link

If the link contains an image tag get the alt text of the image

The text of the link

I can then check the match of each subgroup to aqquire the relevant name of the link.

How would I modify the above regex to do this or could I achieve the same think using c# code?
解决方案
Regexes are fundamentally bad at parsing HTML (see Can you provide some examples of why it is hard to parse XML and HTML with a regex? for why). What you need is an HTML parser. See Can you provide an example of parsing HTML with your favorite parser? for examples using a variety of parsers.

In particular you may be interested in the HTMLAgilityPack answer.

这篇关于获取“标题”使用Regex从html链接获得属性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取“标题”使用Regex从html链接获得属性 [英] Get "Title" attribute from html link using Regex

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

获取“标题”使用Regex从html链接获得属性 [英] Get &quot;Title&quot; attribute from html link using Regex

问题描述

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

获取“标题”使用Regex从html链接获得属性 [英] Get "Title" attribute from html link using Regex

登录关闭