确认Python 2.6 ftplib不支持Unicode文件名?备择方案? [英] Confirm that Python 2.6 ftplib does not support Unicode file names? Alternatives?

查看:182
本文介绍了确认Python 2.6 ftplib不支持Unicode文件名?备择方案?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以确认Python 2.6 ftplib不支持Unicode文件名吗? 或者必须对Unicode文件名进行特殊编码才能与ftplib模块一起使用?

以下电子邮件交换似乎支持我的结论:ftplib模块只支持ASCII文件名称。



ftplib应该使用UTF-8而不是latin-1编码吗?
http://mail.python.org/ pipermail / python-dev / 2009-January / 085408.html



任何支持Unicode文件名的第三方Python FTP模块的建议?我没有成功搜索这个问题[1],[2]。
$ b

官方的Python文档没有提到Unicode文件名[3]。 b
$ b

谢谢,
Malcolm

[1] ftputil包裹了ftplib并继承了ftplib的明显的ASCII唯一支持?



[2] Paramiko的SFTP库支持Unicode文件名,但是我正在寻找相对于我们当前项目的ftp(vs sftp)支持。



[3] http://docs.python.org/ library / ftplib.html



解决方法:



Encodings.idna.ToASCII和.ToUnicode方法可用于将Unicode路径名转换为ASCII格式。如果使用这些函数包装所有远程路径名称和dir / nlst方法的输出,则可以使用标准ftplib创建一种保留Unicode路径名的方法(并且还保留文件系统上的Unicode文件名支持Unicode路径)。这种技术的缺点是服务器上的其他进程在引用上传到服务器的文件时也必须使用encodings.idna。顺便说一句:我明白,这是一个滥用encodings.idna图书馆。

谢谢彼得和鲍勃您的意见,我发现非常有帮助。

解决方案

ftplib 没有任何Unicode的知识。它的目的是为文件名传递字节串,当它被询问目录列表时它将返回字节串。这些是从服务器传递到/返回的字节的确切字符串。

如果将Unicode字符串传递给 ftplib 在Python 2.x中,当它被发送到底层套接字对象时,它最终会被强制转换为字节。这种强制使用Python的默认编码,即。为了安全起见,US-ASCII为非ASCII字符产生了异常。

你链接到的python-dev消息正在谈论 ftplib 在Python 3.x中,其中字符串是默认的Unicode。这使得像 ftplib 这样的模块在一个棘手的情况下,因为尽管他们现在在前端使用了Unicode字符串,但它背后的实际协议是基于字节的。因此,必须有一个额外的编码/解码级别,如果没有明确的干预来指定正在使用的编码,就会有一个公平的改变,它会选择错误的。

ftplib 选择默认为ISO-8859-1,以便将每个字节保留为Unicode字符串内的字符。不幸的是,在目标服务器为文件名使用UTF-8排序规则(无论FTP守护进程本身是否知道文件名是UTF-8,通常不会)的情况下,这会给出意想不到的结果。有很多这样的情况,Python标准库被粗暴地砍成Unicode字符串,带来负面的后果;包括Python 3在内的电池仍然在泄漏腐蚀性液体IMO。

Can someone confirm that Python 2.6 ftplib does NOT support Unicode file names? Or must Unicode file names be specially encoded in order to be used with the ftplib module?

The following email exchange seems to support my conclusion that the ftplib module only supports ASCII file names.

Should ftplib use UTF-8 instead of latin-1 encoding? http://mail.python.org/pipermail/python-dev/2009-January/085408.html

Any recommendations on a 3rd party Python FTP module that supports Unicode file names? I've googled this question without success [1], [2].

The official Python documentation does not mention Unicode file names [3].

Thank you, Malcolm

[1] ftputil wraps ftplib and inherits ftplib's apparent ASCII only support?

[2] Paramiko's SFTP library does support Unicode file names, however I'm looking specifically for ftp (vs. sftp) support relative to our current project.

[3] http://docs.python.org/library/ftplib.html

WORKAROUND:

The encodings.idna.ToASCII and .ToUnicode methods can be used to convert Unicode path names to an ASCII format. If you wrap all your remote path names and the output of the dir/nlst methods with these functions, then you can create a way to preserve Unicode path names using the standard ftplib (and also preserve Unicode file names on file systems that don't support Unicode paths). The downside to this technique is that other processes on the server will also have to use encodings.idna when referencing the files that you upload to the server. BTW: I understand that this is an abuse of the encodings.idna library.

Thank you Peter and Bob for your comments which I found very helpful.

解决方案

ftplib has no knowledge of Unicode whatsoever. It is intended to be passed byte-strings for filenames, and it'll return byte strings when asked for a directory list. Those are the exact strings of bytes passed-to/returned-from the server.

If you pass a Unicode string to ftplib in Python 2.x, it'll end up getting coerced to bytes when it's sent to the underlying socket object. This coercion uses Python's ‘default’ encoding, ie. US-ASCII for safety, with exceptions generated for non-ASCII characters.

The python-dev message to which you linked is talking about ftplib in Python 3.x, where strings are Unicode by default. This leaves modules like ftplib in a tricky situation because although they now use Unicode strings at their front-end, the actual protocol behind it is byte-based. There therefore has to be an extra level of encoding/decoding involved, and without explicit intervention to specify what encoding is in use, there's a fair change it'll choose wrong.

ftplib in 3.x chose to default to ISO-8859-1 in order to preserve each byte as a character inside the Unicode string. Unfortunately this will give unexpected results in the common case where the target server uses a UTF-8 collation for filenames (whether or not the FTP daemon itself knows that filenames are UTF-8, which it commonly won't). There are a number of cases like this where the Python standard libraries have been brutally hacked to Unicode strings with negative consequences; Python 3's batteries-included are still leaking corrosive fluid IMO.

这篇关于确认Python 2.6 ftplib不支持Unicode文件名?备择方案?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆