C#Regex-发现的href与字符串特定单词 [英] C# Regex- find href with specific word in string
问题描述
我如何才能找到href属性,其中包括在字符串中的特定词?
How can I find href attribute which includes specific word inside the string?
我输入的href =([;:=% - \ / \\\'\?+ [A-ZA-Z] * [blablabla] [];。: =% - \ / \\\'\] + [A-ZA-Z] * $)
,但它不匹配anithing
I enter it "href=([?;.:=%-\/\\\'\"]+[a-zA-Z]*[blablabla][?;.:=%-\/\\\'\"]+[a-zA-Z]*$)"
, but it doesn't match anithing.
感谢。
推荐答案
我强烈建议在这种情况下,使用正则表达式。我肯定使用HTML解析器极大地方便了工作。
I strongly advise against using regex in this case. I am sure using an HTML parser greatly facilitates the task.
下面是一个例子如何与 进行HtmlAgilityPack 。通过安装它的解决方法> 管理的NuGet软件包的解决方案... 并用
Here is an example how it can be done with HtmlAgilityPack. Install it via Solution > Manage NuGet Packages for Solution... and use
public List<string> HtmlAgilityPackGetHrefIfValueContains(string html, string href_text)
{
var hrefs = new List<string>();
HtmlAgilityPack.HtmlDocument hap;
Uri uriResult;
if (Uri.TryCreate(html, UriKind.Absolute, out uriResult) && uriResult.Scheme == Uri.UriSchemeHttp)
{ // html is a URL
var doc = new HtmlAgilityPack.HtmlWeb();
hap = doc.Load(uriResult.AbsoluteUri);
}
else
{ // html is a string
hap = new HtmlAgilityPack.HtmlDocument();
hap.LoadHtml(html);
}
var nodes = hap.DocumentNode.SelectNodes("//*[@href]");
if (nodes != null)
{
foreach (var node in nodes)
{
foreach (var attribute in node.Attributes)
if (attribute.Name == "href" && attribute.Value.Contains(href_text))
{
hrefs.Add(attribute.Value);
}
}
}
return hrefs;
}
现在,你可以通过 HTML
字符串的网页或网址,并得到所有的标签(如果你打算让 A
的HREF只,使用 //一[@href]
的XPath)包含 href_text
。
Now, you can pass the html
string or URL of the Web page, and get all tags (if you plan to get a
hrefs only, use //a[@href]
xpath) that contain href_text
.
这篇关于C#Regex-发现的href与字符串特定单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!