将 http://添加到所有没有协议的链接 [英] Adding http:// to all links without a protocol

查看:62
本文介绍了将 http://添加到所有没有协议的链接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 VB.NET 并希望将 http:// 添加到所有尚未以 http://、https://、ftp://等开头的链接

I use VB.NET and would like to add http:// to all links that doesn't already start with http://, https://, ftp:// and so on.

"I want to add http here <a href=""www.google.com"" target=""_blank"">Google</a>,
but not here <a href=""http://www.google.com"" target=""_blank"">Google</a>."

当我只有链接时,这很容易,但是对于包含多个链接的整个字符串,我找不到好的解决方案.我想 RegEx 是要走的路,但我什至不知道从哪里开始.

It was easy when I just had the links, but I can't find a good solution for an entire string containing multiple links. I guess RegEx is the way to go, but I wouldn't even know where to start.

我自己可以找到正则表达式,这是我遇到问题的解析和前置.谁能给我一个 C# 或 VB.NET 中 Regex.Replace() 的例子?

I can find the RegEx myself, it's the parsing and prepending I'm having problems with. Could anyone give me an example with Regex.Replace() in C# or VB.NET?

感谢任何帮助!

推荐答案

引用 RFC 1738:

Quote RFC 1738:

"方案名称由一系列字符组成.小写字母a"--z"、数字以及字符加号 ("+")、句点 (".") 和连字符 ("-") 是允许的.为了弹性,解释 URL 的程序应该将大写字母视为等同于方案名称中的小写字母(例如,允许HTTP"和http")."

"Scheme names consist of a sequence of characters. The lower case letters "a"--"z", digits, and the characters plus ("+"), period ("."), and hyphen ("-") are allowed. For resiliency, programs interpreting URLs should treat upper case letters as equivalent to lower case in scheme names (e.g., allow "HTTP" as well as "http")."

太棒了!要匹配的正则表达式:

Excellent! A regex to match:

/^[a-zA-Z0-9+.-]+:\/\//

如果匹配您的 href 字符串,请继续.如果没有,请在前面加上http://".除非您要求提供具体细节,否则剩余的健全性检查是您的.请注意其他评论者对相关链接的看法.

If that matches your href string, continue on. If not, prepend "http://". Remaining sanity checks are yours unless you ask for specific details. Do note the other commenters' thoughts about relative links.

我开始怀疑您问错了问题……您可能没有任何东西可以将文本拆分为您需要处理的单个标记.请参阅寻找 C# HTML 解析器

I'm starting to suspect that you've asked the wrong question... that you perhaps don't have anything that splits the text up into the individual tokens you need to handle it. See Looking for C# HTML parser

作为盲目尝试忽略所有内容并仅攻击文本,使用不区分大小写的匹配,

As a blind try at ignoring all and just attacking the text, using case insensitive matching,

/(<a +href *= *")(.*?)(" *>)/

如果第二个反向引用匹配 /^[a-zA-Z0-9+.-]+:\/\//,则什么都不做.如果不匹配,替换为

If the second back-reference matches /^[a-zA-Z0-9+.-]+:\/\//, do nothing. If it does not match, replace it with

$1 + "http://" + $2 + $3

这不是 C# 语法,但应该毫不费力地翻译.

This isn't C# syntax, but it should translate across without too much effort.

这篇关于将 http://添加到所有没有协议的链接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆