在 python mkdtemp 中处理 unicode 用户名 [英] Deal with unicode usernames in python mkdtemp
问题描述
我被 http://bugs.python.org/issue1681974 咬了 - 从那里引用:
<块引用>如果 Windows 用户名中包含任何非 ASCII 字符(如 ä 或 ö),则 mkdtemp 在 Windows 上将失败.mkdtemp 抛出编码错误.这似乎是因为 Windows 中的默认临时目录是 "c:\documents and settings\
OP 使用的解决方法是:
try: # http://bugs.python.org/issue1681974 的解决方法返回 tempfile.mkdtemp(prefix=prefix)除了 UnicodeDecodeError:tempdir = unicode(tempfile.gettempdir(), 'mbcs')返回 tempfile.mkdtemp(prefix=prefix, dir=tempdir)
我有两个问题:
- 为什么这应该起作用?
- 这是多么完整的证据?从一个类似的问题(见这个答案:Python Popen 未能在 Windows PowerShell 中使用正确的编码) 我有一个想法,我可能应该使用
sys.stdout.encoding
- 我是否接近标记?
实际上是一行:
print u"input encoding: %s; output encoding: %s; locale: %s" % (sys.stdin.encoding,getattr(sys.stdout,'encoding',None),locale.getdefaultlocale())
印刷品
<块引用>输入编码:无;输出编码:无;语言环境:('ja_JP', 'cp932')
所以也许我应该去 locale.getpreferredencoding() (例如参见 带有unicode路径的subprocess.Popen)
Edit2: 在评论中建议我在 mbcs 中编码前缀 - 不幸的是,这不是一个选项,因为代码库期望到处都是 unicode 并且迟早会爆炸.发布的代码是一个简化的片段.
Edit3:我的小解决方法显然没有解决任何问题 - 会尝试:
fsenc = sys.getfilesystemencoding() 或 'mbcs'返回 tempfile.mkdtemp(prefix=prefix.encode(fsenc)).decode(fsenc)
如果还有任何非 ascii 用户需要测试.
与此同时 - 下面的复制器对我不起作用:
C:\_\Python27\python.exe -u C:\__\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevconsole.py 18324 18325PyDev 控制台:starting.import sys;print('Python %s on %s' % (sys.version, sys.platform))Win32 上的 Python 2.7.8(默认,2014 年 6 月 30 日,16:03:49)[MSC v.1500 32 位(英特尔)]sys.path.extend(['C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash'])>>>d = u'ελληνικα'.encode(sys.getfilesystemencoding());os.environ['TEMP'] = os.path.abspath(d)>>>导入临时文件;tempfile.mkdtemp(前缀=u'x')u'c:\\users\\mrd\\appdata\\local\\temp\\xtf3nav'
和变体...
edit4 - 目录在绝对意义上存在:
<预><代码>>>>d = u'ελληνικα'.encode(sys.getfilesystemencoding());os.path.abspath(d)'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\e??????a'>>>断言 os.path.isdir(os.path.abspath(d))回溯(最近一次调用最后一次):文件<input>",第 1 行,在 <module> 中断言错误>>>d = u'ελληνικα'>>>os.path.abspath(d)u'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\\u03b5\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03b1'>>>断言 os.path.isdir(os.path.abspath(d))>>>我最终选择了:
sys_fs_enc = sys.getfilesystemencoding() 或 'mbcs'@静态方法def tempDir(前缀=无):尝试:# http://bugs.python.org/issue1681974 的解决方法见返回 tempfile.mkdtemp(prefix=prefix)除了 UnicodeDecodeError:尝试:traceback.print_exc()打印'试图将临时目录传递给...'tempdir = unicode(tempfile.gettempdir(), sys_fs_enc)返回 tempfile.mkdtemp(prefix=prefix, dir=tempdir)除了 UnicodeDecodeError:尝试:traceback.print_exc()打印'试图编码临时目录前缀...'返回 tempfile.mkdtemp(prefix=prefix.encode(sys_fs_enc)).decode(sys_fs_enc)除了:traceback.print_exc()打印无法创建 tmp 目录,Bash 将无法运行"\正确."
显然第一次尝试捕获就足够了,但我留下了回溯,所以我可以获得更多输入;)
I was bitten by http://bugs.python.org/issue1681974 - quoting from there:
mkdtemp fails on Windows if Windows user name has any non-ASCII characters, like ä or ö, in it. mkdtemp throws an encoding error. This seems to be because the default temp dir in Windows is
"c:\documents and settings\<user name>\local settings\temp"
The workaround the OP used is:
try: # workaround for http://bugs.python.org/issue1681974
return tempfile.mkdtemp(prefix=prefix)
except UnicodeDecodeError:
tempdir = unicode(tempfile.gettempdir(), 'mbcs')
return tempfile.mkdtemp(prefix=prefix, dir=tempdir)
I have 2 questions:
- Why this should work ?
- How full proof is this ? From a similar questions (see this answer: Python Popen failing to use proper encoding in Windows PowerShell) I got the notion that I maybe should use
sys.stdout.encoding
- am I anywhere near the mark ?
Edit: actually the line:
print u"input encoding: %s; output encoding: %s; locale: %s" % (
sys.stdin.encoding,getattr(sys.stdout,'encoding',None),
locale.getdefaultlocale())
prints
input encoding: None; output encoding: None; locale: ('ja_JP', 'cp932')
so maybe I should go for locale.getpreferredencoding() (see for instance subprocess.Popen with a unicode path)
Edit2: in the comments it is suggested I encode the prefix in mbcs - unfortunately this is not an option as the codebase expects unicode everywhere and will blow sooner or later. The code posted is a simplified fragment.
Edit3: my little workaround apparently did not workaround anything - will try:
fsenc = sys.getfilesystemencoding() or 'mbcs'
return tempfile.mkdtemp(prefix=prefix.encode(fsenc)).decode(fsenc)
if there's any non ascii user left to test that is.
Meanwhile - the reproducers below don't work for me:
C:\_\Python27\python.exe -u C:\__\JetBrains\PyCharm 3.4.1\helpers\pydev\pydevconsole.py 18324 18325
PyDev console: starting.import sys; print('Python %s on %s' % (sys.version, sys.platform))
Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win32
sys.path.extend(['C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash'])
>>> d = u'ελληνικα'.encode(sys.getfilesystemencoding()); os.environ['TEMP'] = os.path.abspath(d)
>>> import tempfile; tempfile.mkdtemp(prefix=u'x')
u'c:\\users\\mrd\\appdata\\local\\temp\\xtf3nav'
and variations...
edit4 - the directory exists in an absolute sense:
>>> d = u'ελληνικα'.encode(sys.getfilesystemencoding()); os.path.abspath(d)
'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\e??????a'
>>> assert os.path.isdir(os.path.abspath(d))
Traceback (most recent call last):
File "<input>", line 1, in <module>
AssertionError
>>> d = u'ελληνικα'
>>> os.path.abspath(d)
u'C:\\Dropbox\\eclipse_workspaces\\python\\wrye-bash\\\u03b5\u03bb\u03bb\u03b7\u03bd\u03b9\u03ba\u03b1'
>>> assert os.path.isdir(os.path.abspath(d))
>>>
I finally went with:
sys_fs_enc = sys.getfilesystemencoding() or 'mbcs'
@staticmethod
def tempDir(prefix=None):
try: # workaround for http://bugs.python.org/issue1681974 see there
return tempfile.mkdtemp(prefix=prefix)
except UnicodeDecodeError:
try:
traceback.print_exc()
print 'Trying to pass temp dir in...'
tempdir = unicode(tempfile.gettempdir(), sys_fs_enc)
return tempfile.mkdtemp(prefix=prefix, dir=tempdir)
except UnicodeDecodeError:
try:
traceback.print_exc()
print 'Trying to encode temp dir prefix...'
return tempfile.mkdtemp(
prefix=prefix.encode(sys_fs_enc)).decode(sys_fs_enc)
except:
traceback.print_exc()
print 'Failed to create tmp dir, Bash will not function ' \
'correctly.'
Apparently the first try catch is sufficient but I left the tracebacks in so I can get some more input ;)
这篇关于在 python mkdtemp 中处理 unicode 用户名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!