从numpy数组创建pandas DataFrame会导致奇怪的错误 [英] Creating pandas DataFrame from numpy array leads to strange errors

查看:72
本文介绍了从numpy数组创建pandas DataFrame会导致奇怪的错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

缺点是当我尝试从函数式 numpy 数组创建它们时,DataFrame 会吐出字节序错误.这是一个pastebin,更多详细信息如下:http://pastebin.com/Sdg9EM61

The short of the long is that DataFrames are spitting out endianess errors when I try to create them from functional numpy arrays. Here is a pastebin, more details below: http://pastebin.com/Sdg9EM61

在我的领域,我们以 .FIT 格式存储数据,这是一种二进制格式(这可能是以后有用的信息)

In my field we store data in .FIT format, which is a binary format (this may be useful info later)

我真的不知道如何解决以下几行代码和错误.

I don't really know how to address this following few lines of code and error.

d = fits.getdata('file.fit')
d2 = np.array(d)

然后你可以用这个做一些很酷的技巧:

Then you can do cool tricks with this like:

d2[d2['key1'] > 10.]
d2[['key1', 'key2']]

当我将其转换为 Pandas DataFrame 时

When I convert it to a pandas DataFrame

d3 = pandas.DataFrame(d2)

事情开始变得奇怪.列名称已更改,例如:

things start to get weird. The column names have changed for example:

d3.columns

返回

Index([u'key1', u'key2'], dtype='object')

在每个列名前面加上这个新的 u,而不是

with this new u in front of every column name as opposed to

d2.dtype

回来

dtype([('key1', '>i4'), ('key2', 'S19')])

尽管当您使用 d3.dtypes 而不是 d3.columns 时,数据框中的数据类型看起来不错...

although the data types look ok in the dataframe when you do d3.dtypes instead of d3.columns...

无论如何,DataFrame 拥有所有数据并保留数据类型,我可以打印出特征数据等,但是一旦我尝试执行以下操作:

Anyways, the DataFrame has all the data and preserves the data types and I can print out characterizing data and such, but as soon as I try to do something like this:

d3[d3['key1'] > 10.]

我收到一个关于字节序的巨大错误:

I get a monster error about endianess:

ValueError: Big-endian buffer not supported on little-endian compiler

关于这意味着什么以及如何解决的任何见解?

Any insight as to what this means and how to fix?

推荐答案

好的,FITS 文件实际上是问题所在.事实证明 FITS 都是大端,而大熊猫和 scipy 之类的东西往往采用小端(我不知道这个端业务是什么,只是总结了一个线程),这显然会导致一些奇怪的问题(我在看之前从未见过)在熊猫).

Ok, The FITS file is in fact the issue. Turns out FITS are all big endian while pandas and scipy and stuff tend to assume little endian (I have no idea what this endian business is, just summarizing a thread) and this causes some weird issues apparently (that I've never seen until looking at pandas).

我找到的解决方案是:

d = fits.getdata('data.fit')
df=pd.DataFrame(np.array(d).byteswap().newbyteorder())

解决方案位于此处:https://github.com/astropy/astropy/issues/1156

这篇关于从numpy数组创建pandas DataFrame会导致奇怪的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆