ValueError:数组中使用pandas DataFrame的数组必须全部具有相同的长度 [英] ValueError: arrays must all be same length in python using pandas DataFrame

查看:1721
本文介绍了ValueError:数组中使用pandas DataFrame的数组必须全部具有相同的长度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,正在使用pandas包(python3.6)中的Dataframe.

I'm a newbie in python and using Dataframe from pandas package (python3.6).

我将其设置为以下代码,

I set it up like below code,

df = DataFrame({'list1': list1, 'list2': list2, 'list3': list3, 'list4': list4, 'list5': list5, 'list6': list6})

并给出类似ValueError: arrays must all be same length

所以我检查了所有数组的长度,然后list1& list2比其他列表多1个数据.如果我想使用pd.resample将1个数据添加到其他4个列表(list3list4list5list6)中,那么我应该如何编写代码...?

So I checked all the length of arrays, and list1 & list2 have 1 more data than other lists. If I want to add 1 data to those other 4 lists(list3, list4, list5, list6) by using pd.resample, then how should I write code...?

这些列表也是1分钟的时间序列列表.

Also, those lists are time series list with 1 minute.

有人在这里有想法或帮助我吗?

Does anybody have an idea or help me out here?

谢谢.

编辑 所以我改变了EdChum所说的. 并在最前面添加了时间表.就像下面.

EDIT So I changed as what EdChum said. and added time list at the front. it is like below.

2017-04-01 0:00 895.87  730 12.8    4   19.1    380
2017-04-01 0:01 894.4   730 12.8    4   19.1    380
2017-04-01 0:02 893.08  730 12.8    4   19.3    380
2017-04-01 0:03 890.41  730 12.8    4   19.7    380
2017-04-01 0:04 889.28  730 12.8    4   19.93   380

我输入了类似的代码

df.resample('1min', how='mean', fill_method='pad')

它给了我这个错误:TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

推荐答案

我只是为每个列表构建一个Series,然后

I'd just construct a Series for each list and then concat them all:

In [38]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1, name='list1')
s2 = pd.Series(l2, name='list2')
df = pd.concat([s1,s2], axis=1)
df

Out[38]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

您可以为Series ctor传递一个name arg,它将为df中的每一列命名,再加上NaN列长不匹配的地方

As you can pass a name arg for the Series ctor it will name each column in the df, plus it will place NaN where the column lengths don't match

resample是指当您有一个DatetimeIndex想要基于其而不是您想要的某个时间段重新设定基准或调整长度时.您想reindex我认为这是不必要的和混乱的:

resample refers to when you have a DatetimeIndex for which you want to rebase or adjust the length based on some time period which is not what you want here. You want to reindex which I think is unnecessary and messy:

In [40]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1)
s2 = pd.Series(l2)
df = pd.DataFrame({'list1':s1.reindex(s2.index), 'list2':s2})
df

Out[40]: 
  list1  list2
0     a      1
1     b      2
2     c      3
3   NaN      4

在这里您需要知道最长的长度,然后使用该索引reindex所有系列,如果您只是concat,它将自动调整长度并用NaN

Here you'd need to know the longest length and then reindex all Series using that index, if you just concat it will automatically adjust the lengths and fill missing elements with NaN

这篇关于ValueError:数组中使用pandas DataFrame的数组必须全部具有相同的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆