泡菜编码utf-8问题 [英] Pickle encoding utf-8 issue

查看:179
本文介绍了泡菜编码utf-8问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将熊猫数据框腌制到本地目录中,以便可以在另一个Jupyter笔记本中进行处理。乍一看似乎写入成功,但是当尝试在新的Jupyter笔记本中读取时,读取失败。

I'm trying to pickle a pandas dataframe to my local directory so I can work on it in another jupyter notebook. The write appears to go successful at first but when trying to read it in a new jupyter notebook the read is unsuccessful.

当我打开泡菜文件时,我似乎已经写了,文件的唯一内容是:

When I open the pickle file I appear to have wrote, the file's only contents are:

错误! /Users/.../income.pickle不是UTF-8编码的
禁用保存。
有关更多详细信息,请参见控制台。

Error! /Users/.../income.pickle is not UTF-8 encoded Saving disabled. See console for more details.

我还检查了泡菜文件本身只有几KB。

I also checked and the pickle file itself is only a few kilobytes.

这是我写泡菜的代码:


with open('income.pickle', 'wb', encoding='UTF-8') as to_write:
    pickle.dump(new_income_df, to_write)

这是我的阅读代码:


with open('income.pickle', 'rb') as read_file:
    income_df = pickle.load(read_file)

另外,当我返回Income_df时,我得到以下输出:

Also when I return income_df I get this output:

Series([], dtype:float64)

Series([], dtype: float64)

这是一个空序列,在尝试对其调用大多数序列方法时会出错。

It's an empty series that I errors on when trying to call most series methods on it.

如果有人知道解决办法,我会非常高兴。

If anyone knows a fix for this I'm all ears. Thanks in advance!

编辑:

这是我到达的解决方案:

This is the solution I arrived at:

with open('cleaned_df', 'wb') as to_write:
    pickle.dump(df, to_write)

with open('cleaned_df','rb') as read_file:
    df = pickle.load(read_file)

比我预期的简单得多

推荐答案

酸洗通常用于存储原始数据,而不是传递一个Pandas DataFrame对象。当您尝试对其进行腌制时,在这种情况下,它将仅存储顶级模块名称Series。

Pickling is generally used to store raw data, not to pass a Pandas DataFrame object. When you try to pickle it, it will just store the top level module name, Series, in this case.

1)您只能将DataFrame中的数据写入csv文件。

1) You can write only the data from the DataFrame to a csv file.

# Write/read csv file using DataFrame object's "to_csv" method.
import pandas as pd
new_income_df.to_csv("mydata.csv")
new_income_df2 = pd.read_csv("mydata.csv")

2)如果可以将数据作为函数保存在带有* .py名称的常规python模块中,则可以从Jupyter笔记本中调用它。您还可以在更改内部值之后重新加载该函数。请参阅autoreload ipynb文档: https://ipython.org/ipython-doc /3/config/extensions/autoreload.html

2) If your data can be saved as a function in a regular python module with a *.py name, you can call it from a Jupyter notebook. You can also reload the function after you have changed the values inside. See autoreload ipynb documentation: https://ipython.org/ipython-doc/3/config/extensions/autoreload.html

# Saved as "mymodule1.py" (from notebook1.ipynb).
import pandas as pd
def funcdata():
    new_income_df = pd.DataFrame(data=[100, 101])
    return new_income_df

# notebook2.ipynb
%load_ext autoreload
%autoreload 2
import pandas as pd
import mymodule1.py
df2 = mymodule1.funcdata()
print(df2)
# Change data inside fucdata() in mymodule1.py and see if it changes here.

3)您可以使用%store命令在Jupyter笔记本之间共享数据。

参见src: https:// www .dataquest.io / blog / jupyter-notebook-tips-tricks-shortcuts /

并且:在IPython笔记本之间共享数据

3) You can share data between Jupyter notebooks using %store command.
See src : https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
And: Share data between IPython Notebooks

# %store example, first Jupyter notebook.
from sklearn import datasets
dataset = datasets.load_iris()
%store dataset

# from a new Jupyter notebook read.
%store -r dataset

这篇关于泡菜编码utf-8问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆