从锚标签中提取网址的正则表达式 [英] Regular Expression to Extract the Url out of the Anchor Tag

查看:40
本文介绍了从锚标签中提取网址的正则表达式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从锚标签中提取 http 链接?应提取的扩展名只能是 WMV 文件.

I want to extract the http link from inside the anchor tags? The extension that should be extracted should be WMV files only.

推荐答案

Regex:

<a\\s*href\\s*=\\s*(?:(\"|\')(?<link>[^\"]*.wmv)(\"|\'))\\s*>(?<name>.*)\\s*</a>

[注意:\s* 用于多个地方以匹配可能出现在 html 中的额外空白字符.]

[Note: \s* is used in several places to match the extra white space characters that can occur in the html.]

示例 C# 代码:

/// <summary>
/// Assigns proper values to link and name, if the htmlId matches the pattern
/// Matches only for .wmv files
/// </summary>
/// <returns>true if success, false otherwise</returns>
public static bool TryGetHrefDetailsWMV(string htmlATag, out string wmvLink, out string name)
{
    wmvLink = null;
    name = null;

    string pattern = "<a\\s*href\\s*=\\s*(?:(\"|\')(?<link>[^\"]*.wmv)(\"|\'))\\s*>(?<name>.*)\\s*</a>";

    if (Regex.IsMatch(htmlATag, pattern))
    {
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
        wmvLink = r.Match(htmlATag).Result("${link}");
        name = r.Match(htmlATag).Result("${name}");
        return true;
    }
    else
        return false;
}

MyRegEx.TryGetHrefDetailsWMV("<td><a href='/path/to/file'>Name of File</a></td>", 
                out wmvLink, out name); // No match
MyRegEx.TryGetHrefDetailsWMV("<td><a href='/path/to/file.wmv'>Name of File</a></td>",
                out wmvLink, out name); // Match
MyRegEx.TryGetHrefDetailsWMV("<td><a    href='/path/to/file.wmv'   >Name of File</a></td>", out wmvLink, out name); // Match

这篇关于从锚标签中提取网址的正则表达式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆