检测字符串中的网址,并用“< a href ...”标签 [英] Detect URLs in a string and wrap with "<a href..." tag

查看:186
本文介绍了检测字符串中的网址,并用“< a href ...”标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我想写一些看起来应该很容易的东西,但无论出于什么原因,我都会遇到困难。我希望编写一个python函数,当传递一个字符串时,它将通过URL编码将HTML传递回来。

  unencoded_string =这是一个链接 -  http://google.com

def encode_string_with_links(unencoded_string):
#某种正则表达式魔术出现
return encoded_string

print encoded_string

'这是一个链接 - < a href =http://google.com> http://google.com< / a>'

谢谢!

解决方案

您需要的正则表达式魔术只是 sub (它会进行替换):

  def encode_string_with_links(unncoded_string):
返回URL_REGEX.sub(r'< a href =\1> \1< / a>',unncoded_string)

URL_REGEX 可能类似于:

  URL_REGEX = re.compile(r'''((?:mailto:| ftp:// | http://)[^<>' {} | \\ ^`[\]] *)''')

是一个相当宽松的URL正则表达式:它允许使用mailto,http和ftp方案,之后几乎一直持续下去,直到遇到不安全字符(除了要允许转义的百分比外)。如果你需要,你可以更严格。例如,可以要求百分比后跟一个有效的十六进制转义,或者只允许一个磅符号(对于片段)或强制查询参数和片段之间的顺序。虽然这应该足以让你开始。


I am looking to write something that seems like it should be easy enough, but for whatever reason I'm having a tough time getting my head around it.

I am looking to write a python function that, when passed a string, will pass that string back with HTML encoding around URLs.

unencoded_string = "This is a link - http://google.com"

def encode_string_with_links(unencoded_string):
    # some sort of regex magic occurs
    return encoded_string

print encoded_string

'This is a link - <a href="http://google.com">http://google.com</a>'

Thank you!

解决方案

The "regex magic" you need is just sub (which does a substitution):

def encode_string_with_links(unencoded_string):
  return URL_REGEX.sub(r'<a href="\1">\1</a>', unencoded_string)

URL_REGEX could be something like:

URL_REGEX = re.compile(r'''((?:mailto:|ftp://|http://)[^ <>'"{}|\\^`[\]]*)''')

This is a pretty loose regex for URLs: it allows mailto, http and ftp schemes, and after that pretty much just keeps going until it runs into an "unsafe" character (except percent, which you want to allow for escapes). You could make it more strict if you need to. For example, you could require that percents are followed by a valid hex escape, or only allow one pound sign (for the fragment) or enforce the order between query parameters and fragments. This should be enough to get you started, though.

这篇关于检测字符串中的网址,并用“&lt; a href ...”标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆