当字符串包含多个双引号时,从字符串提取URL的正则表达式失败? [英] Regex for extract URL from string fails when string contains multiple double quotes?

查看:106
本文介绍了当字符串包含多个双引号时,从字符串提取URL的正则表达式失败?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用正则表达式从字符串中提取url,并且大多数情况下都可以正常工作;

I am using regex for extracting url from string and it's working mostly;

var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);

以下字符串可以正常工作:

following strings working fine:

"This is Test page <a href='test.aspx'>test page</a>"
"This is Test page <a href='test1.aspx'>test</a> another one <a href='test2.aspx'>test</a>"
"This is Tests\"s page <a href='test1.aspx'>test</a> another one <a href='test2.aspx'>test</a>"
"This is Test page"
"This is Test page\"s without problem"

但是有些时候它没有返回好的结果.以下代码返回错误结果(string contains 2 double quotes)-

But some time it's not returning good result. Following code return bad result (string contains 2 double quotes) -

var inputString="This string create \"problem\" for me";    
var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);    
var urls=regex.Matches(inputString).OfType<Match>().Select(m =>m.Groups["href"].Value);    
foreach(var zzzzzzz in urls){
  Console.WriteLine(zzzzzzz);
}

有问题的演示

有人可以帮我解决这个问题吗?

Could anyone help me to solve this problem?

推荐答案

也许您可以像这样更改您的正则表达式:<a .*?href=(?:['"](?<href>[^'"]*?)['"]) 在Csharp上:"<a .*?href=(?:['\"](?<href>[^'\"]*?)['\"])"

Maybe you can change your regex like this:<a .*?href=(?:['"](?<href>[^'"]*?)['"]) On Csharp:"<a .*?href=(?:['\"](?<href>[^'\"]*?)['\"])"

这篇关于当字符串包含多个双引号时,从字符串提取URL的正则表达式失败?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆