正则表达式c#从< a>中提取网址标签 [英] regex c# extracting url from <a> tag

查看：59 发布时间：2020/11/2 21:49:41 c# regex url extract

本文介绍了正则表达式c#从< a>中提取网址标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

但是，我尝试从标记中提取URL，而不是获取 https://website.com/- id1 ，我正在获取标签链接文本.这是我的代码:

I am trying to extract URL from an tag, however, instead of getting https://website.com/-id1, I am getting tag link text. Here is my code:

string text="<a style=\"font - weight: bold; \" href=\"https://website.com/-id1\">MyLink</a>";

 string parsed = Regex.Replace(text, " <[^>] + href =\"([^\"]+)\"[^>]*>", "$1 " );

    parsed = Regex.Replace(parsed, "<[^>]+>", "");

    Console.WriteLine(parsed);

我得到的结果是 MyLink ，这不是我想要的.我想要类似的东西

The result I got was MyLink which is not what I want. I want something like

https://website.com/-id1

我们将非常感谢您的帮助或链接.

Any help or a link will be highly appreciated.

推荐答案

正则表达式可以在非常具体，简单的HTML情况下使用.例如，如果文本仅包含个标记，则可以使用"href\\s*=\\s*\"(?<url>.*?)\""提取URL，例如:

Regular expressions can be used in very specific, simple cases with HTML. For example, if the text contains only a single tag, you can use "href\\s*=\\s*\"(?<url>.*?)\"" to extract the URL, eg:

var url=Regex.Match(text,"href\\s*=\\s*\"(?<url>.*?)\"").Groups["url"].Value;

此模式将返回:

https://website.com/-id1

此正则表达式没有任何花哨的功能.它会寻找可能带有空格的href=，然后以非贪婪的方式(.*?)捕获第一个双引号和下一个双引号之间的所有内容.这是在命名的组url中捕获的.

This regex doesn't do anything fancy. It looks for href= with possible whitespace and then captures anything between the first double quote and the next in a non-greedy manner (.*?). This is captured in the named group url.

任何花哨的事情都会变得非常复杂.例如，同时支持单引号和双引号将需要进行特殊处理，以避免避免以单引号开头和以双引号结尾.该字符串可以是使用两种引号的多个<a>标记.

Anything more fancy and things get very complex. For example, supporting both single and double quotes would require special handling to avoid starting on a single and ending on a double quote. The string could multiple <a> tags that used both types of quotes.

对于复杂的解析，最好使用 AngleSharp 或

For complex parsing it would be better to use a library like AngleSharp or HtmlAgilityPack

这篇关于正则表达式c#从< a>中提取网址标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

正则表达式c#从< a>中提取网址标签 [英] regex c# extracting url from <a> tag

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

正则表达式c#从&lt; a&gt;中提取网址标签 [英] regex c# extracting url from &lt;a&gt; tag

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

正则表达式c#从< a>中提取网址标签 [英] regex c# extracting url from <a> tag

登录关闭