pandas -编写包含Unicode的Excel文件-IllegalCharacterError [英] Pandas - Writing an excel file containing unicode - IllegalCharacterError
问题描述
我有以下代码:
import pandas as pd
x = [u'string with some unicode: \x16']
df = pd.DataFrame(x)
如果我尝试将此数据帧写为excel文件:
If I try to write this dataframe as an excel file:
df.to_excel("test.xlsx")
或者,如果我尝试使用utf-8编码将此数据帧写为excel文件:
Or, if I try to write this dataframe as an excel file, with utf-8 encoding:
ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
df.to_excel(ew)
我收到以下错误:
IllegalCharacterError Traceback (most recent call last)
<ipython-input-4-62adec25ae8d> in <module>()
1 ew = pd.ExcelWriter('test.xlsx',options={'encoding':'utf-8'})
2 #df.to_excel("test.xlsx")
----> 3 df.to_excel(ew)
/usr/local/lib/python2.7/dist-packages/pandas/util/decorators.pyc in wrapper(*args, **kwargs)
86 else:
87 kwargs[new_arg_name] = new_arg_value
---> 88 return func(*args, **kwargs)
89 return wrapper
90 return _deprecate_kwarg
/usr/local/lib/python2.7/dist-packages/pandas/core/frame.pyc in to_excel(self, excel_writer, sheet_name, na_rep, float_format, columns, header, index, index_label, startrow, startcol, engine, merge_cells, encoding, inf_rep)
1258 formatted_cells = formatter.get_formatted_cells()
1259 excel_writer.write_cells(formatted_cells, sheet_name,
-> 1260 startrow=startrow, startcol=startcol)
1261 if need_save:
1262 excel_writer.save()
/usr/local/lib/python2.7/dist-packages/pandas/io/excel.pyc in write_cells(self, cells, sheet_name, startrow, startcol)
679 colletter = get_column_letter(startcol + cell.col + 1)
680 xcell = wks.cell("%s%s" % (colletter, startrow + cell.row + 1))
--> 681 xcell.value = _conv_value(cell.val)
682 style_kwargs = {}
683
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in value(self, value)
360 def value(self, value):
361 """Set the value and infer type and display options."""
--> 362 self._bind_value(value)
363
364 @property
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in _bind_value(self, value)
269 elif self.guess_types:
270 value = self._infer_value(value)
--> 271 self.set_explicit_value(value, self.data_type)
272
273
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in set_explicit_value(self, value, data_type)
235 raise ValueError('Invalid data type: %s' % data_type)
236 if isinstance(value, STRING_TYPES):
--> 237 value = self.check_string(value)
238 self._value = value
239 self.data_type = data_type
/usr/local/lib/python2.7/dist-packages/openpyxl/cell/cell.pyc in check_string(self, value)
220 value = value[:32767]
221 if next(ILLEGAL_CHARACTERS_RE.finditer(value), None):
--> 222 raise IllegalCharacterError
223 return value
224
IllegalCharacterError:
如何将包含unicode的pandas数据框写入Excel文件?
How can I write a pandas dataframe containing unicode to an excel file?
推荐答案
不是这样的Unicode问题... \x16
(或Unicode字符串\u0016
表示相同字符)是ASCII控制代码22(SYN) ). Pandas说,在Excel文件中包含控制代码(制表符和换行符除外)是无效的,尽管我对Excel文件了解不多,但肯定不可能将它们包含在XML 1.0文件中,而XML 1.0文件就是其中的内容. xlsx.
Not a Unicode issue as such... \x16
(or in Unicode strings \u0016
refers to the same character) is ASCII control code 22 (SYN). Pandas says it's invalid to have control codes (other than tab and newlines) in an Excel file, and though I don't know much about Excel files it would certainly be impossible to include them in an XML 1.0 file, which is what's inside a xlsx.
因此,很可能无法在Excel中包括任意字符序列(带有控制代码).您应该在编写之前将它们过滤掉,或者如果您确实需要保留原始数据,请使用仅由您的应用程序识别的某种形式的临时编码.
So most likely there is no way to include arbitrary character sequences (with control codes) in an Excel. You should filter them out before writing, or if you really need to preserve the original data use some form of ad hoc encoding recognised only by your application.
这篇关于 pandas -编写包含Unicode的Excel文件-IllegalCharacterError的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!