将相对网址替换为绝对网址 [英] Replace relative urls to absolute

查看:122
本文介绍了将相对网址替换为绝对网址的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个字符串形式的页面html源

I have the html source of a page in a form of string with me:

<html>
    <head>
          <link rel="stylesheet" type="text/css" href="/css/all.css" /> 
    </head>
    <body>
        <a href="/test.aspx">Test</a>
        <a href="http://mysite.com">Test</a>
        <img src="/images/test.jpg"/>
        <img src="http://mysite.com/images/test.jpg"/>
    </body>
</html>

我想将所有相对路径都转换为绝对路径。我希望输出为:

I want to convert all the relative paths to absolute. I want the output be:

<html>
    <head>
          <link rel="stylesheet" type="text/css" href="http://mysite.com/css/all.css" /> 
    </head>
    <body>
        <a href="http://mysite.com/test.aspx">Test</a>
        <a href="http://mysite.com">Test</a>
        <img src="http://mysite.com/images/test.jpg"/>
        <img src="http://mysite.com/images/test.jpg"/>
    </body>
</html>

注意:我只希望在该 string 。该字符串中已经存在的绝对值不应该被触及,它们对我来说很好,因为它们已经是绝对值了。
可以通过正则表达式或其他方式吗?

Note: I want only the relative paths to be converted to absolute ones in that string. The absolute ones which are already in that string should not be touched, they are fine to me as they are already absolute. Can this be done by regex or other means?

推荐答案

不要尝试使用以下方式解析html:正则表达式,如 https://stackoverflow.com/a/1732454/932418 https://stackoverflow.com/a/1758162/932418

使用像 HtmlAgilityPack 之类的html解析器

Use an html parser like HtmlAgilityPack instead

string html = 
@"<html>
    <head>
            <link rel=""stylesheet"" type=""text/css"" href=""/css/all.css"" /> 
    </head>
    <body>
        <a href=""/test.aspx"">Test</a>
        <a href=""http://example.com"">Test</a>
        <img src=""/images/test.jpg""/>
        <img src=""http://example.com/images/test.jpg""/>
    </body>
</html>";

StringWriter writer = new StringWriter();
string baseUrl= "http://example.com";
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);

foreach(var img in doc.DocumentNode.Descendants("img"))
{
    img.Attributes["src"].Value = new Uri(new Uri(baseUrl), img.Attributes["src"].Value).AbsoluteUri;
}

foreach (var a in doc.DocumentNode.Descendants("a"))
{
    a.Attributes["href"].Value = new Uri(new Uri(baseUrl), a.Attributes["href"].Value).AbsoluteUri;
}

doc.Save(writer);

string newHtml = writer.ToString();

这篇关于将相对网址替换为绝对网址的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆