C Python API扩展忽略了open(errors ="ignore"),并始终抛出编码异常 [英] C Python API Extensions is ignoring open(errors="ignore") and keeps throwing the encoding exception anyways
问题描述
使用无效的UTF8给出文件/myfiles/file_with_invalid_encoding.txt :
Given a file /myfiles/file_with_invalid_encoding.txt with invalid UTF8 as:
parse this correctly
Føö»BÃ¥r
also parse this correctly
我正在使用C API中的内置Python open
函数,如最小示例所示(不包括C Python设置样板):
I am using the builtin Python open
function from the C API as follows the minimal example (excluding C Python setup boilerplate):
const char* filepath = "/myfiles/file_with_invalid_encoding.txt";
PyObject* iomodule = PyImport_ImportModule( "builtins" );
if( iomodule == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* openfunction = PyObject_GetAttrString( iomodule, "open" );
if( openfunction == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* openfile = PyObject_CallFunction( openfunction,
"s", filepath, "s", "r", "i", -1, "s", "UTF8", "s", "ignore" );
if( openfile == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* iterfunction = PyObject_GetAttrString( openfile, "__iter__" );
Py_DECREF( openfunction );
if( iterfunction == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* openfileresult = PyObject_CallObject( iterfunction, NULL );
Py_DECREF( iterfunction );
if( openfileresult == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* fileiterator = PyObject_GetAttrString( openfile, "__next__" );
Py_DECREF( openfileresult );
if( fileiterator == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* readline;
std::cout << "Here 1!" << std::endl;
while( ( readline = PyObject_CallObject( fileiterator, NULL ) ) != NULL ) {
std::cout << "Here 2!" << std::endl;
std::cout << PyUnicode_AsUTF8( readline ) << std::endl;
Py_DECREF( readline );
}
PyErr_PrintEx(100);
PyErr_Clear();
PyObject* closefunction = PyObject_GetAttrString( openfile, "close" );
if( closefunction == NULL ) {
PyErr_PrintEx(100); return;
}
PyObject* closefileresult = PyObject_CallObject( closefunction, NULL );
Py_DECREF( closefunction );
if( closefileresult == NULL ) {
PyErr_PrintEx(100); return;
}
Py_XDECREF( closefileresult );
Py_XDECREF( iomodule );
Py_XDECREF( openfile );
Py_XDECREF( fileiterator );
我通过ignore
参数调用open
函数来忽略编码错误,但是Python忽略了我,并且在发现无效的UTF8字符时不断抛出编码异常:
I am calling the open
function passing the ignore
parameter to ignore encoding errors, but Python is ignoring me and keeps throwing encoding exceptions when it finds invalid UTF8 characters:
Here 1!
Traceback (most recent call last):
File "/usr/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 26: invalid start byte
如上所示,在下面,当我调用builtins.open()
函数时,我正在传递ignore
参数,但它没有任何作用.我也尝试将ignore
更改为replace
,但是C Python始终会抛出相应的异常:
As you can see above, and here bellow, when I am calling the builtins.open()
function, I am passing the ignore
parameter, but it does not have any effect. I also trying changing ignore
to replace
, but C Python keeps throwing enconding exceptions anyways:
PyObject* openfile = PyObject_CallFunction( openfunction,
"s", filepath, "s", "r", "i", -1, "s", "UTF8", "s", "ignore" );
推荐答案
PyObject_CallFunction
(以及Py_BuildValue
等)采用单个格式的字符串来描述所有参数.当你做
PyObject_CallFunction
(and Py_BuildValue
, and others) takes a single format string describing all of the arguments. When you do
PyObject* openfile = PyObject_CallFunction( openfunction,
"s", filepath, "s", "r", "i", -1, "s", "UTF8", "s", "ignore" );
您说过一个字符串参数",而filepath
之后的所有参数都将被忽略.相反,您应该这样做:
you've said "one string argument" and all the arguments after filepath
get ignored. Instead you should do:
PyObject* openfile = PyObject_CallFunction( openfunction,
"ssiss", filepath, "r", -1, "UTF8", "ignore" );
说"5个参数:2个字符串和一个int,以及另外两个字符串".即使您选择使用其他PyObject_Call*
功能之一,也会发现以这种方式使用Py_BuildValue
也更容易.
to say "5 arguments: 2 strings, and int, and two more strings". Even if you choose to use one of the other PyObject_Call*
functions you'll find it easier to use Py_BuildValue
this way too.
这篇关于C Python API扩展忽略了open(errors ="ignore"),并始终抛出编码异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!