使用Python C扩展时如何解决"UnicodeDecodeError:'utf-8'编解码器无法解码字节"的问题? [英] How to fix 'UnicodeDecodeError: 'utf-8' codec can't decode byte' when using Python C Extensions?
问题描述
给出以下文件 bug.txt
:
event "øat" not handled
我在文件 fastfilewrapper.cpp
#include <Python.h>
#include <cstdio>
#include <iostream>
#include <sstream>
#include <fstream>
static PyObject* hello_world(PyObject *self, PyObject *args) {
printf("Hello, world!\n");
std::string retval;
std::ifstream fileifstream;
fileifstream.open("./bug.txt");
std::getline( fileifstream, retval );
fileifstream.close();
std::cout << "retval " << retval << std::endl;
return Py_BuildValue( "s", retval.c_str() );
}
static PyMethodDef hello_methods[] = { {
"hello_world", hello_world, METH_NOARGS,
"Print 'hello world' from a method defined in a C extension."
},
{NULL, NULL, 0, NULL}
};
static struct PyModuleDef hello_definition = {
PyModuleDef_HEAD_INIT,
"hello", "A Python module that prints 'hello world' from C code.",
-1, hello_methods
};
PyMODINIT_FUNC PyInit_fastfilepackage(void) {
Py_Initialize();
return PyModule_Create(&hello_definition);
}
我使用 setup.py
from distutils.core import setup, Extension
# https://bugs.python.org/issue35893
from distutils.command import build_ext
def get_export_symbols(self, ext):
parts = ext.name.split(".")
if parts[-1] == "__init__":
initfunc_name = "PyInit_" + parts[-2]
else:
initfunc_name = "PyInit_" + parts[-1]
build_ext.build_ext.get_export_symbols = get_export_symbols
setup(name='fastfilepackage', version='1.0', \
ext_modules=[Extension('fastfilepackage', ['fastfilewrapper.cpp'])])
然后,我使用以下 test.py
脚本:
Then, I use this test.py
script:
import fastfilepackage
iterable = fastfilepackage.hello_world()
print('iterable', iterable)
但是当我运行 test.py
Python脚本时,Python抛出此异常:
But Python throws this exception when I run the test.py
Python Script:
$ PYTHONIOENCODING=utf8 python3 test.py
Hello, world!
retval event "▒at" not handled
Traceback (most recent call last):
File "test.py", line 3, in <module>
iterable = fastfilepackage.hello_world()
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 7: invalid start byte
如何从无效的Unicode字符中恢复?
How can I recover from invalid Unicode characters?
即绑定C和Python时忽略这些错误.
i.e., ignore these errors when binding C and Python.
仅在使用Python时,我可以使用它:
When purely working with Python, I can use this:
file_in = open( './bug.txt', errors='replace' )
line = file_in.read()
print( "The input line was: {line}".format(line=line) )
与 Python C扩展
绑定时,等同于 errors ='replace'
的是什么?
What is the equivalent to errors='replace'
when binding with Python C Extensions
?
推荐答案
如果您想使用'replace'错误处理语义,则应像这样在C端进行处理,并将其返回到python端:
If you want to have 'replace' error handling semantic you should do it on the C side like so and return it to the python side:
return PyUnicode_DecodeUTF8(retval.c_str(), retval.size(), "replace");
在我们的情况下,这会像:
This will give in our case sth like:
Hello, world!
retval event "?at" not handled
iterable event "�at" not handled
这篇关于使用Python C扩展时如何解决"UnicodeDecodeError:'utf-8'编解码器无法解码字节"的问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!