正则表达式c#从< a>中提取网址标签 [英] regex c# extracting url from <a> tag

查看:59
本文介绍了正则表达式c#从< a>中提取网址标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

但是,我尝试从标记中提取URL,而不是获取 https://website.com/- id1 ,我正在获取标签链接文本.这是我的代码:

I am trying to extract URL from an tag, however, instead of getting https://website.com/-id1, I am getting tag link text. Here is my code:

string text="<a style=\"font - weight: bold; \" href=\"https://website.com/-id1\">MyLink</a>";

 string parsed = Regex.Replace(text, " <[^>] + href =\"([^\"]+)\"[^>]*>", "$1 " );

    parsed = Regex.Replace(parsed, "<[^>]+>", "");

    Console.WriteLine(parsed);

我得到的结果是 MyLink ,这不是我想要的.我想要类似的东西

The result I got was MyLink which is not what I want. I want something like

https://website.com/-id1

我们将非常感谢您的帮助或链接.

Any help or a link will be highly appreciated.

推荐答案

正则表达式可以在非常具体,简单的HTML情况下使用.例如,如果文本仅包含 个标记,则可以使用"href\\s*=\\s*\"(?<url>.*?)\""提取URL,例如:

Regular expressions can be used in very specific, simple cases with HTML. For example, if the text contains only a single tag, you can use "href\\s*=\\s*\"(?<url>.*?)\"" to extract the URL, eg:

var url=Regex.Match(text,"href\\s*=\\s*\"(?<url>.*?)\"").Groups["url"].Value;

此模式将返回:

https://website.com/-id1

此正则表达式没有任何花哨的功能.它会寻找可能带有空格的href=,然后以非贪婪的方式(.*?)捕获第一个双引号和下一个双引号之间的所有内容.这是在命名的组url中捕获的.

This regex doesn't do anything fancy. It looks for href= with possible whitespace and then captures anything between the first double quote and the next in a non-greedy manner (.*?). This is captured in the named group url.

任何花哨的事情都会变得非常复杂.例如,同时支持单引号和双引号将需要进行特殊处理,以避免避免以单引号开头和以双引号结尾.该字符串可以是使用两种引号的多个<a>标记.

Anything more fancy and things get very complex. For example, supporting both single and double quotes would require special handling to avoid starting on a single and ending on a double quote. The string could multiple <a> tags that used both types of quotes.

对于复杂的解析,最好使用 AngleSharp

For complex parsing it would be better to use a library like AngleSharp or HtmlAgilityPack

这篇关于正则表达式c#从&lt; a&gt;中提取网址标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆