从numpy数组创建pandas DataFrame会导致奇怪的错误 [英] Creating pandas DataFrame from numpy array leads to strange errors
问题描述
缺点是当我尝试从函数式 numpy 数组创建它们时,DataFrame 会吐出字节序错误.这是一个pastebin,更多详细信息如下:http://pastebin.com/Sdg9EM61
The short of the long is that DataFrames are spitting out endianess errors when I try to create them from functional numpy arrays. Here is a pastebin, more details below: http://pastebin.com/Sdg9EM61
在我的领域,我们以 .FIT 格式存储数据,这是一种二进制格式(这可能是以后有用的信息)
In my field we store data in .FIT format, which is a binary format (this may be useful info later)
我真的不知道如何解决以下几行代码和错误.
I don't really know how to address this following few lines of code and error.
d = fits.getdata('file.fit')
d2 = np.array(d)
然后你可以用这个做一些很酷的技巧:
Then you can do cool tricks with this like:
d2[d2['key1'] > 10.]
d2[['key1', 'key2']]
等
当我将其转换为 Pandas DataFrame 时
When I convert it to a pandas DataFrame
d3 = pandas.DataFrame(d2)
事情开始变得奇怪.列名称已更改,例如:
things start to get weird. The column names have changed for example:
d3.columns
返回
Index([u'key1', u'key2'], dtype='object')
在每个列名前面加上这个新的 u,而不是
with this new u in front of every column name as opposed to
d2.dtype
回来
dtype([('key1', '>i4'), ('key2', 'S19')])
尽管当您使用 d3.dtypes 而不是 d3.columns 时,数据框中的数据类型看起来不错...
although the data types look ok in the dataframe when you do d3.dtypes instead of d3.columns...
无论如何,DataFrame 拥有所有数据并保留数据类型,我可以打印出特征数据等,但是一旦我尝试执行以下操作:
Anyways, the DataFrame has all the data and preserves the data types and I can print out characterizing data and such, but as soon as I try to do something like this:
d3[d3['key1'] > 10.]
我收到一个关于字节序的巨大错误:
I get a monster error about endianess:
ValueError: Big-endian buffer not supported on little-endian compiler
关于这意味着什么以及如何解决的任何见解?
Any insight as to what this means and how to fix?
推荐答案
好的,FITS 文件实际上是问题所在.事实证明 FITS 都是大端,而大熊猫和 scipy 之类的东西往往采用小端(我不知道这个端业务是什么,只是总结了一个线程),这显然会导致一些奇怪的问题(我在看之前从未见过)在熊猫).
Ok, The FITS file is in fact the issue. Turns out FITS are all big endian while pandas and scipy and stuff tend to assume little endian (I have no idea what this endian business is, just summarizing a thread) and this causes some weird issues apparently (that I've never seen until looking at pandas).
我找到的解决方案是:
d = fits.getdata('data.fit')
df=pd.DataFrame(np.array(d).byteswap().newbyteorder())
解决方案位于此处:https://github.com/astropy/astropy/issues/1156
这篇关于从numpy数组创建pandas DataFrame会导致奇怪的错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!