如何在pandas df中存储二维数组并读取它而不将其变成字符串 [英] how to store 2d array in pandas df and read it without it turning into strings

查看:287
本文介绍了如何在pandas df中存储二维数组并读取它而不将其变成字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 df,其中一列是数组,其中每个单元格为 1*50 维,有 20 行.

I have a df where a column is array where each cell is 1*50 dimension and there are 20 rows.

import pandas as pd
df = pd.DataFrame(zip(list(range(0, 20, 1)), np.random.rand(20, 50)),
             columns=['id', 'array'])

此时将数组列用于与其他数组的任何数组运算(加法、乘法、除法等)都没有问题.

At this point there is no issue to use the array column for any array operations (addition, multiplication, division etc) with other array.

但是如果将 df 保存为 csv 并在另一个笔记本中读取它(我在这里没有很好的演示方法),数组列中的每个单元格都会变成列表包装的字符串并使用 astliteral_eval 或 to_numpy没有帮助.

But if one saves the df as csv and reads it in another notebook (which I don't have a good way to demo here), each cell in the array column turns into list wrapped strings and using either ast literal_eval or to_numpy doesn't help.

'[1.2 -2.3 2.1 ... 4.1]'

这里如何防止数组变成字符串?

How to prevent array turning into strings here?

推荐答案

对我来说,pandas.read_csv 中转换器的使用是有效的.转换器获得一个 lambda 函数,它去除每个字符串并删除换行符.之后,我可以应用 np.fromstring 以空格作为分隔符.

For me the usage of converters in pandas.read_csv works. The converters get a lambda function, which strips each string, and removes the linebreaks. Afterwards, I can apply np.fromstring with space as seperator.

pd.read_csv("file.csv",converters={"array":lambda x: np.fromstring(x.strip("][").replace("\n", ""), sep=" ")})

这篇关于如何在pandas df中存储二维数组并读取它而不将其变成字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆