从URL区分文件名 [英] Distinguish a filename from an URL

查看:120
本文介绍了从URL区分文件名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在WeasyPrint的公共API中,我接受HTML输入的文件名或URL(以及其他类型):

In WeasyPrint’s public API I accept either filenames or URLs (among other types) for the HTML input:

document = HTML(filename='/foo/bar/baz.html')
document = HTML(url='http://example.net/bar/baz.html')

还可以选择不命名该参数,而让WeasyPrint猜测其类型:

There is also the option not to name the argument and let WeasyPrint guess its type:

document = HTML(sys.argv[1])

在某些情况下很容易:如果在Unix上以/开头,则为文件名;如果以http://开头,则可能为URL.但是,我们需要一种通用的算法来为任何字符串给出答案.

Some cases are easy: if it starts with a / on Unix it’s a filename, if it starts with http:// it’s probably an URL. But we need an general algorithm that gives an answer for any string.

当前,我尝试匹配此正则表达式:^([a-z][a-z0-1.+-]*):.根据 RFC 3986(URI)匹配的字符串以有效的URI方案开头.这在Unix上还不错,但是在Windows上却完全失败:C:\foo\bar.html匹配并且被视为URL.

Currently I try to match this regexp: ^([a-z][a-z0-1.+-]*):. A string that matches starts with a valid URI scheme according to RFC 3986 (URI). This is not bad on Unix, but utterly fails on Windows: C:\foo\bar.html matches and is treated like an URL.

我可以在正则表达式中将*更改为+,并且仅匹配至少两个字符长的URI方案.显然,没有比这更短的已知URI方案.

I could change the * to + in the regexp and only match URI schemes that are at least two characters long. Apparently there is no known URI scheme shorter than that.

还是有更好的标准?也许我应该只将猜测的" URL限制为少数方案.更特殊的情况下仍然可以使用HTML(url=foo).

Or is there a better criteria? Maybe I should just restrict "guessed" URLs to a handful of schemes. More exotic cases can still use HTML(url=foo).

url.startswith(['http:', 'https:', 'ftp:', 'data:'])

推荐答案

如果您真的必须在文件名和URL之间进行猜测,那么我会说一个包含2个或更多单词字符的字符串,然后冒号是一个URL,还有其他内容是一个文件,正如您所建议的那样.

If you really must guess well between filenames and URLs, I'd say a string with 2 or more word characters and then a colon was a URL, anything else is a file, just as you suggest.

另一个选项:尝试将其作为文件打开.如果失败,请尝试将其作为URL打开.

Another option: try to open it as a file. If it fails, try to open it as a URL.

更好的办法可能是聆听Python的Zen,抵制猜测的诱惑".呼叫者不知道他在说文件名还是URL?让他们指定它.

Better might be to listen to the Zen of Python, "resist the temptation to guess". Doesn't the caller know if he's talking about a filename or a URL? Have them specify it.

这篇关于从URL区分文件名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆