如何在pandas df中存储二维数组并读取它而不将其变成字符串 [英] how to store 2d array in pandas df and read it without it turning into strings
问题描述
我有一个 df,其中一列是数组,其中每个单元格为 1*50 维,有 20 行.
I have a df where a column is array where each cell is 1*50 dimension and there are 20 rows.
import pandas as pd
df = pd.DataFrame(zip(list(range(0, 20, 1)), np.random.rand(20, 50)),
columns=['id', 'array'])
此时将数组列用于与其他数组的任何数组运算(加法、乘法、除法等)都没有问题.
At this point there is no issue to use the array column for any array operations (addition, multiplication, division etc) with other array.
但是如果将 df 保存为 csv 并在另一个笔记本中读取它(我在这里没有很好的演示方法),数组列中的每个单元格都会变成列表包装的字符串并使用 astliteral_eval 或 to_numpy没有帮助.
But if one saves the df as csv and reads it in another notebook (which I don't have a good way to demo here), each cell in the array column turns into list wrapped strings and using either ast literal_eval or to_numpy doesn't help.
'[1.2 -2.3 2.1 ... 4.1]'
这里如何防止数组变成字符串?
How to prevent array turning into strings here?
推荐答案
对我来说,pandas.read_csv 中转换器的使用是有效的.转换器获得一个 lambda 函数,它去除每个字符串并删除换行符.之后,我可以应用 np.fromstring 以空格作为分隔符.
For me the usage of converters in pandas.read_csv works. The converters get a lambda function, which strips each string, and removes the linebreaks. Afterwards, I can apply np.fromstring with space as seperator.
pd.read_csv("file.csv",converters={"array":lambda x: np.fromstring(x.strip("][").replace("\n", ""), sep=" ")})
这篇关于如何在pandas df中存储二维数组并读取它而不将其变成字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!