如何使用 pandas 带有多列的numpy的阵列 [英] How to Use Pandas with Multiple Column Numpy Array

查看:186
本文介绍了如何使用 pandas 带有多列的numpy的阵列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,我难倒这个我已经看过了熊猫的文档,但我不能找出正确的方式做到这一点,我想我只是做一个烂摊子。基本上我有哪些是numpy的阵列数据例如。

Okay I'm stumped on this I've looked at the Pandas documentation but I can't figure out the right way to do it and I think I'm just making a mess. Basically I have data which are numpy arrays e.g.

data = numpy.loadtxt('foo.txt', dtype=str,delimiter=',') 
gps_data = numpy.concatenate((data[0:len(data),0:2],data[0:len(data),3:5]),axis=1)
gps_time = data[0:len(data),2:3].astype(numpy.float)/1000

gps_data基本上看起来像这样

gps_data basically looks like this

array([['50.3482627', '-71.662499', '30', 'network'],
       ['50.3482588', '-71.6624934', '30', 'network'],
       ['50.34829', '-71.6625077', '30', 'network'],
       ...,
       ['20.3482488', '-78.66245463999999', '9', 'gps'],
       ['20.3482598', '-78.6625174', '30', 'network'],
       ['20.34824943', '-78.6624565', '10', 'gps']],
      dtype='|S18')

和gps_time

array([[  1.16242035e+09],
       [  1.26242036e+09],
       [  1.36242038e+09],
       ...,
       [  1.32330411e+09],
       [  1.16330413e+09],
       [  1.26330413e+09]])

我想要做的是使用数据帧带来的另一个类似的期待阵列称为acc_data和gps_data结合起来,然后回去通过并填写不同的数据缺失倍。例如。这就是我一直在试图

What I'm trying to do is use DataFrame to bring another similar looking array called acc_data and combine it with gps_data and then go back through and fill in the different missing data times. E.g. this is what I've been trying

DF1 =数据框(gps_data,指数= gps_time,列= ['GPS'])

df1 = DataFrame(gps_data,index=gps_time,columns=['GPS'])

和它给这个错误

ValueError错误:传递价值的形状(4,35047),指数暗示(1,
  35047)

ValueError: Shape of passed values is (4, 35047), indices imply (1, 35047)

我不知道如何处理,如果我能找到解决这个话的方式我认为下一步DF2但acc_data将正常工作,然后我可以做

Which I don't know how to handle, if I can find a way around that then I assume the next step df2 but for acc_data will work fine, and then I can do

P =面板({'ACC':DF1,'GPS':DF2})

p = Panel({'ACC': df1, 'GPS': df2})

任何帮助将大大AP preciated已在此难倒了最后几个小时。

Any help would be greatly appreciated been stumped on this for last few hours.

推荐答案

您需要确保你在尽可能多的列名传递(使用关键字),因为是你numpy的阵列中的列:

You need to make sure you pass in as many column names (using the columns keyword) as there are columns in your NumPy array:

df1 = DataFrame(gps_data, index=gps_time, columns=['col1', 'col2', 'col3', 'col4'])

熊猫引发错误,因为你已经有四列给它一个数组,它只有您已指定了一个列名,'GPS'

这篇关于如何使用 pandas 带有多列的numpy的阵列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆