如何以“正确"的方式处理带有空字节的 Python unicode 字符串? [英] How do I handle Python unicode strings with null-bytes the 'right' way?

查看:29
本文介绍了如何以“正确"的方式处理带有空字节的 Python unicode 字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题

似乎 PyWin32 很乐意提供以空字符结尾的 unicode 字符串作为返回值.我想以正确"的方式处理这些字符串.

It seems that PyWin32 is comfortable with giving null-terminated unicode strings as return values. I would like to deal with these strings the 'right' way.

假设我得到一个字符串,如:u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'.这似乎是一个 C 风格的空终止字符串挂在 Python unicode 对象中.我想把这个坏小子修剪成一个普通的 ol' 字符串,例如,我可以在窗口标题栏中显示.

Let's say I'm getting a string like: u'C:\\Users\\Guest\\MyFile.asy\x00\x00sy'. This appears to be a C-style null-terminated string hanging out in a Python unicode object. I want to trim this bad boy down to a regular ol' string of characters that I could, for example, display in a window title bar.

在第一个空字节处修剪字符串是正确的处理方法吗?

Is trimming the string off at the first null byte the right way to deal with it?

我没想到会得到这样的返回值,所以我想知道我是否遗漏了有关 Python、Win32 和 unicode 如何协同工作的重要信息……或者这只是 PyWin32 错误.

I didn't expect to get a return value like this, so I wonder if I'm missing something important about how Python, Win32, and unicode play together... or if this is just a PyWin32 bug.

背景

我正在使用 Win32 文件选择器功能 GetOpenFileNameW 来自 PyWin32 包.根据文档,此函数返回一个包含完整文件名路径的元组作为 Python unicode 对象.

I'm using the Win32 file chooser function GetOpenFileNameW from the PyWin32 package. According to the documentation, this function returns a tuple containing the full filename path as a Python unicode object.

当我打开设置了现有路径和文件名的对话框时,我得到一个奇怪的返回值.

When I open the dialog with an existing path and filename set, I get a strange return value.

例如我将默认设置为:C:\\Users\\Guest\\MyFileIsReallyReallyReallyAwesome.asy

在对话框中,我将名称更改为 MyFile.asy 并单击保存.

In the dialog I changed the name to MyFile.asy and clicked save.

返回值的完整路径部分为:u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`

The full path part of the return value was: u'C:\Users\Guest\MyFile.asy\x00wesome.asy'`

我希望它是:u'C:\\Users\\Guest\\MyFile.asy'

该函数正在返回一个回收的缓冲区,而不修剪终止字节.不用说,我的其余代码并未设置为处理 C 风格的空终止字符串.

The function is returning a recycled buffer without trimming off the terminating bytes. Needless to say, the rest of my code wasn't set up for handling a C-style null-terminated string.

演示代码

以下代码演示了 GetSaveFileNameW 返回值中以空字符结尾的字符串.

The following code demonstrates null-terminated string in return value from GetSaveFileNameW.

说明:在对话框中将文件名更改为MyFile.asy",然后单击保存".观察打印到控制台的内容.我得到的输出是 u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'.

Directions: In the dialog change the filename to 'MyFile.asy' then click Save. Observe what is printed to the console. The output I get is u'C:\\Users\\Guest\\MyFile.asy\x00wesome.asy'.

import win32gui, win32con

if __name__ == "__main__":
    initial_dir = 'C:\\Users\\Guest'
    initial_file = 'MyFileIsReallyReallyReallyAwesome.asy'
    filter_string = 'All Files\0*.*\0'
    (filename, customfilter, flags) = \
        win32gui.GetSaveFileNameW(InitialDir=initial_dir,
                    Flags=win32con.OFN_EXPLORER, File=initial_file,
                    DefExt='txt', Title="Save As", Filter=filter_string,
                    FilterIndex=0)
    print repr(filename)

注意:如果您没有足够缩短文件名(例如,如果您尝试 MyFileIsReally.asy),则字符串将是完整的,没有空字节.

Note: If you don't shorten the filename enough (for example, if you try MyFileIsReally.asy) the string will be complete without a null byte.

环境

Windows 7 Professional 64 位(无服务包)、Python 2.7.1、PyWin32 Build 216

Windows 7 Professional 64-bit (no service pack), Python 2.7.1, PyWin32 Build 216

更新:PyWin32 跟踪器工件

根据我目前收到的评论和回答,这很可能是 pywin32 错误,所以我提交了 跟踪器工件.

Based on the comments and answers I have received so far, this is likely a pywin32 bug so I filed a tracker artifact.

更新 2:已修复!

Mark Hammond 在跟踪器工件中报告说这确实是一个错误.已为 rev f3fdaae5e93d 签入了一个修复程序,因此希望这能在下一个版本中发布.

Mark Hammond reported in the tracker artifact that this is indeed a bug. A fix was checked in to rev f3fdaae5e93d, so hopefully that will make the next release.

我认为以下 Aleksi Torhamo 的回答是修复前 PyWin32 版本的最佳解决方案.

I think Aleksi Torhamo's answer below is the best solution for versions of PyWin32 before the fix.

推荐答案

我认为这是一个错误.处理它的正确方法可能是修复 pywin32,但如果您觉得不够冒险,只需修剪它.

I'd say it's a bug. The right way to deal with it would probably be fixing pywin32, but in case you aren't feeling adventurous enough, just trim it.

您可以使用 filename.split('\x00', 1)[0] 获取第一个 '\x00' 之前的所有内容.

You can get everything before the first '\x00' with filename.split('\x00', 1)[0].

这篇关于如何以“正确"的方式处理带有空字节的 Python unicode 字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆