ValueError:数组中使用pandas DataFrame的数组必须全部具有相同的长度 [英] ValueError: arrays must all be same length in python using pandas DataFrame
问题描述
我是python的新手,正在使用pandas包(python3.6)中的Dataframe.
I'm a newbie in python and using Dataframe from pandas package (python3.6).
我将其设置为以下代码,
I set it up like below code,
df = DataFrame({'list1': list1, 'list2': list2, 'list3': list3, 'list4': list4, 'list5': list5, 'list6': list6})
并给出类似ValueError: arrays must all be same length
所以我检查了所有数组的长度,然后list1
& list2
比其他列表多1个数据.如果我想使用pd.resample
将1个数据添加到其他4个列表(list3
,list4
,list5
,list6
)中,那么我应该如何编写代码...?
So I checked all the length of arrays, and list1
& list2
have 1 more data than other lists. If I want to add 1 data to those other 4 lists(list3
, list4
, list5
, list6
) by using pd.resample
, then how should I write code...?
这些列表也是1分钟的时间序列列表.
Also, those lists are time series list with 1 minute.
有人在这里有想法或帮助我吗?
Does anybody have an idea or help me out here?
谢谢.
编辑 所以我改变了EdChum所说的. 并在最前面添加了时间表.就像下面.
EDIT So I changed as what EdChum said. and added time list at the front. it is like below.
2017-04-01 0:00 895.87 730 12.8 4 19.1 380
2017-04-01 0:01 894.4 730 12.8 4 19.1 380
2017-04-01 0:02 893.08 730 12.8 4 19.3 380
2017-04-01 0:03 890.41 730 12.8 4 19.7 380
2017-04-01 0:04 889.28 730 12.8 4 19.93 380
我输入了类似的代码
df.resample('1min', how='mean', fill_method='pad')
它给了我这个错误:TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
推荐答案
I'd just construct a Series
for each list and then concat
them all:
In [38]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1, name='list1')
s2 = pd.Series(l2, name='list2')
df = pd.concat([s1,s2], axis=1)
df
Out[38]:
list1 list2
0 a 1
1 b 2
2 c 3
3 NaN 4
您可以为Series
ctor传递一个name
arg,它将为df中的每一列命名,再加上NaN
列长不匹配的地方
As you can pass a name
arg for the Series
ctor it will name each column in the df, plus it will place NaN
where the column lengths don't match
resample
是指当您有一个DatetimeIndex
想要基于其而不是您想要的某个时间段重新设定基准或调整长度时.您想reindex
我认为这是不必要的和混乱的:
resample
refers to when you have a DatetimeIndex
for which you want to rebase or adjust the length based on some time period which is not what you want here. You want to reindex
which I think is unnecessary and messy:
In [40]:
l1 = list('abc')
l2 = [1,2,3,4]
s1 = pd.Series(l1)
s2 = pd.Series(l2)
df = pd.DataFrame({'list1':s1.reindex(s2.index), 'list2':s2})
df
Out[40]:
list1 list2
0 a 1
1 b 2
2 c 3
3 NaN 4
在这里您需要知道最长的长度,然后使用该索引reindex
所有系列,如果您只是concat
,它将自动调整长度并用NaN
Here you'd need to know the longest length and then reindex
all Series using that index, if you just concat
it will automatically adjust the lengths and fill missing elements with NaN
这篇关于ValueError:数组中使用pandas DataFrame的数组必须全部具有相同的长度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!