用URL链接替换文本中的URL [英] replace URLs in text with links to URLs

查看:375
本文介绍了用URL链接替换文本中的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用Python我希望将文本正文中的所有网址替换为指向这些网址的链接,就像Gmail所做的那样。
这可以用单行正则表达式完成吗?

Using Python I want to replace all URLs in a body of text with links to those URLs, like what Gmail does. Can this be done in a one liner regular expression?

编辑:按文本正文我只是指纯文本 - 没有HTML

by body of text I just meant plain text - no HTML

推荐答案

您可以使用DOM / HTML解析库(请参阅html5lib)加载文档,获取所有文本节点,将它们与正则表达式匹配并替换使用PCRE的正则表达式替换URI及其周围的锚点的文本节点,例如:

You can load the document up with a DOM/HTML parsing library ( see html5lib ), grab all text nodes, match them against a regular expression and replace the text nodes with a regex replacement of the URI with anchors around it using a PCRE such as:

/(https?:[;\/?\\@&=+$,\[\]A-Za-z0-9\-_\.\!\~\*\'\(\)%][\;\/\?\:\@\&\=\+\$\,\[\]A-Za-z0-9\-_\.\!\~\*\'\(\)%#]*|[KZ]:\\*.*\w+)/g

我很确定你可以匆匆找到某种实用工具,我不能想到我的头脑中的任何一个。

I'm quite sure you can scourge through and find some sort of utility that does this, I can't think of any off the top of my head though.

编辑:尝试使用答案他re:我如何获得python-markdown另外urlify格式化纯文本时的链接?

Try using the answers here: How do I get python-markdown to additionally "urlify" links when formatting plain text?

import re

urlfinder = re.compile("([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}|((news|telnet|nttp|file|http|ftp|https)://)|(www|ftp)[-A-Za-z0-9]*\\.)[-A-Za-z0-9\\.]+):[0-9]*)?/[-A-Za-z0-9_\\$\\.\\+\\!\\*\\(\\),;:@&=\\?/~\\#\\%]*[^]'\\.}>\\),\\\"]")

def urlify2(value):
    return urlfinder.sub(r'<a href="\1">\1</a>', value)

在字符串上调用urlify2,如果你没有处理DOM对象,我认为就是这样。

call urlify2 on a string and I think that's it if you aren't dealing with a DOM object.

这篇关于用URL链接替换文本中的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆