os.lisdir,得到unicode,返回unicode ... USUALLY?!?!? [英] os.lisdir, gets unicode, returns unicode... USUALLY?!?!?

查看:53
本文介绍了os.lisdir,得到unicode,返回unicode ... USUALLY?!?!?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




来自文档( http://docs.python.org/lib/os-file-dir.html

os.listdir:


"在Windows NT / 2k / XP和Unix上,如果path是一个Unicode对象,结果

将是一个Unicode对象列表。
<我是在Unix上的
。 (linux,ubuntu edgy)


因此它似乎并不总是返回unicode文件名。


似乎它试图解释使用文件系统的
编码的文件名,如果失败,它只返回文件名作为字节串。


所以你回来让'比方说一个21个文件名的数组,其中3个是
字节字符串,其余的是unicode字符串。


挖掘后,我发现这个在源代码中:



from the documentation (http://docs.python.org/lib/os-file-dir.html) for
os.listdir:

"On Windows NT/2k/XP and Unix, if path is a Unicode object, the result
will be a list of Unicode objects."

i''m on Unix. (linux, ubuntu edgy)

so it seems that it does not always return unicode filenames.

it seems that it tries to interpret the filenames using the filesystem''s
encoding, and if that fails, it simply returns the filename as byte-string.

so you get back let''s say an array of 21 filenames, from which 3 are
byte-strings, and the rest unicode strings.

after digging around, i found this in the source code:


#ifdef Py_USING_UNICODE

if(arg_is_unicode){

PyObject * w;


w = PyUnicode_FromEncodedObject(v,

Py_FileSystemDefaultEncoding,

" strict");

if(b) w!= NULL){

Py _DECREF(v);

v = w;

}

else {

/ *回归原来的字节串,如在补丁#683592中讨论的
* /

PyErr_Clear();

}

}

#endif
#ifdef Py_USING_UNICODE
if (arg_is_unicode) {
PyObject *w;

w = PyUnicode_FromEncodedObject(v,
Py_FileSystemDefaultEncoding,
"strict");
if (w != NULL) {
Py_DECREF(v);
v = w;
}
else {
/* fall back to the original byte string, as
discussed in patch #683592 */
PyErr_Clear();
}
}
#endif



所以如果to-unicode转换失败,它会回落到原来的

字节-串。我去看了补丁讨论。


现在我不知道该怎么办。

i知道:


1.文档完全错误。它并不总是返回

unicode文件名

2.文件没有说明如果

文件名是不是在文件系统编码,但我只是期望我

得到一个Unicode异常,就像其他地方一样。你看,例外是

好​​吧,我可以处理它们。但这是完全错误的。从现在开始,

我在哪里使用os.listdir,我将不得不浏览其中的所有

文件名,并检查它们是否是unicode-strings 。


所以基本上我想在这里问一下:我读错了吗?

或我使用os.listdir错误方式 ;?其他人如何处理

这个?


p.s:另外一个注释。如果您的代码需要os.listdir返回

unicode,这通常意味着您的所有代码都使用unicode字符串。

反过来意味着这些文件名以后会以某种方式进行交互
带有unicode字符串的
。这意味着那个字节字符串文件名将在稍后点自动转换为unicode,并且

自动转换很可能会失败,因为仅自动转换

使用''ascii''作为编码,如果不能使用
解码listdir中的文件名,那很可能这也是

将无法使用'ascii'作为charset。

gabor

so if the to-unicode-conversion fails, it falls back to the original
byte-string. i went and have read the patch-discussion.

and now i''m not sure what to do.
i know that:

1. the documentation is completely wrong. it does not always return
unicode filenames
2. it''s true that the documentation does not specify what happens if the
filename is not in the filesystem-encoding, but i simply expected that i
get an Unicode-exception, as everywhere else. you see, exceptions are
ok, i can deal with them. but this is just plain wrong. from now on,
EVERYWHERE where i use os.listdir, i will have to go through all the
filenames in it, and check if they are unicode-strings or not.

so basically i''d like to ask here: am i reading something incorrectly?
or am i using os.listdir the "wrong way"? how do other people deal with
this?

p.s: one additional note. if you code expects os.listdir to return
unicode, that usually means that all your code uses unicode strings.
which in turn means, that those filenames will somehow later interact
with unicode strings. which means that that byte-string-filename will
probably get auto-converted to unicode at a later point, and that
auto-conversion will VERY probably fail, because the auto-convert only
happens using ''ascii'' as the encoding, and if it was not possible to
decode the filename inside listdir, it''s quite probable that it also
will not work using ''ascii'' as the charset.
gabor

推荐答案



" gabor" < ga *** @ nekomancer.net写信息

news:ed ************************** @ news.flashnewsgr oups.com ...

"gabor" <ga***@nekomancer.netwrote in message
news:ed**************************@news.flashnewsgr oups.com...

所以如果to-unicode转换失败,它会回落到原来的

字节-串。我去看了补丁讨论。


现在我不知道该怎么办。

i知道:


1.文档完全错误。它并不总是返回

unicode文件名
so if the to-unicode-conversion fails, it falls back to the original
byte-string. i went and have read the patch-discussion.

and now i''m not sure what to do.
i know that:

1. the documentation is completely wrong. it does not always return
unicode filenames



除非有人另有说明,否则报告doc和code之间的差异

作为SF跟踪器上的错误。我不知道决议应该是什么?
be ;-)。


tjr

Unless someone says otherwise, report the discrepancy between doc and code
as a bug on the SF tracker. I have no idea of what the resolution should
be ;-).

tjr

gabor schrieb:
gabor schrieb:

所以基本上我想在这里问:我在读错了吗?
so basically i''d like to ask here: am i reading something incorrectly?



您正在正确阅读。这是它的行为。

You are reading it correctly. This is how it behaves.


或我使用os.listdir错误的方式?其他人如何处理

这个?
or am i using os.listdir the "wrong way"? how do other people deal with
this?



你没有说出为什么这个行为会给你带来问题 - 你只需要解释这个行为是什么。


大多数人都这样使用os.listdir:


os.listdir(路径)中的名字:

full = os.path.join(路径,名称)

attrib = os.stat(完整)

如果有条件:

f =开放(完整)

...


所有这些代码通常都可以正常使用当前行为,

so人们通常没有看到任何问题。


问候,

马丁

You didn''t say why the behavior causes a problem for you - you only
explained what the behavior is.

Most people use os.listdir in a way like this:

for name in os.listdir(path):
full = os.path.join(path, name)
attrib = os.stat(full)
if some-condition:
f = open(full)
...

All this code will typically work just fine with the current behavior,
so people typically don''t see any problem.

Regards,
Martin


Martin v .L?wis写道:
Martin v. L?wis wrote:

gabor schrieb:
gabor schrieb:

>或者我使用的是os.listdir 错误的方式?其他人如何处理
这个?
>or am i using os.listdir the "wrong way"? how do other people deal with
this?



你没有说出为什么这个行为会给你造成问题 - 你只需要解释这个行为是什么。


大多数人都这样使用os.listdir:


os.listdir(路径)中的名字:

full = os.path.join(路径,名称)

attrib = os.stat(完整)

如果有条件:

f =开放(完整)

...


所有这些代码通常都可以正常使用当前行为,

so人们通常没有看到任何问题。


You didn''t say why the behavior causes a problem for you - you only
explained what the behavior is.

Most people use os.listdir in a way like this:

for name in os.listdir(path):
full = os.path.join(path, name)
attrib = os.stat(full)
if some-condition:
f = open(full)
...

All this code will typically work just fine with the current behavior,
so people typically don''t see any problem.



i很抱歉,但它不起作用。实际上这正是我所做的,

并且它不起作用。它在os.path.join调用中死掉,其中file_name

被转换为unicode。和python使用''ascii''作为

这种情况​​下的字符集。但是,因为listdir已经无法使用filesystem-encoding解码file_name

,所以当尝试使用'ascii'时它通常也会失败。


示例:

i am sorry, but it will not work. actually this is exactly what i did,
and it did not work. it dies in the os.path.join call, where file_name
is converted into unicode. and python uses ''ascii'' as the charset in
such cases. but, because listdir already failed to decode the file_name
with the filesystem-encoding, it usually also fails when tried with ''ascii''.

example:


>> dir_name = u''something''
unicode_file_name = u''\ u732b.txt''#japanese cat-symbol
bytestring_file_name = unicode_file_name.encode(''utf-8'')

import os.path

os.path.join(dir_name,unicode_file_name)
>>dir_name = u''something''
unicode_file_name = u''\u732b.txt'' # the japanese cat-symbol
bytestring_file_name = unicode_file_name.encode(''utf-8'')
import os.path

os.path.join(dir_name,unicode_file_name)



你的东西/ \\\猫.txt' '

u''something/\u732b.txt''


>>>

os.path .join(dir_name,bytestring_file_name)
>>>

os.path.join(dir_name,bytestring_file_name)



Traceback(最近一次调用最后一次):

文件" < stdin>",第1行,在?

文件" /usr/lib/python2.4/posixpath.py" ;,第65行,加入

path + =''/''+ b

UnicodeDecodeError:''ascii''编解码器无法解码位置1的字节0xe7:

序号不在范围内(128)

Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib/python2.4/posixpath.py", line 65, in join
path += ''/'' + b
UnicodeDecodeError: ''ascii'' codec can''t decode byte 0xe7 in position 1:
ordinal not in range(128)


>>>
>>>



gabor


gabor


这篇关于os.lisdir,得到unicode,返回unicode ... USUALLY?!?!?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆