创建一个空的Pandas DataFrame,然后填充它? [英] Creating an empty Pandas DataFrame, then filling it?
问题描述
我从大熊猫Data Frame文档开始: http:// pandas.pydata.org/pandas-docs/stable/dsintro.html
我想迭代地在数据框中填充时间序列中的值一种计算。
所以基本上,我想初始化数据帧与列A,B和时间戳行,全部0或全部NaN。
添加初始值,并从之前的行中计算新行数据,例如行[A] [t] =行[A] [t-1] +1左右。
我目前使用的代码如下,但我觉得这是一种丑陋,必须有一种方法来直接使用数据框架,或者只是一个更好的方法。
注意:我使用的是Python 2.7。
import datetime as dt
import pandas as pd
import scipy as s
如果__name__ =='__main__':
base = dt.datetime.today()。date()
dates = [base-dt .timedelta(days = x)for x in range(0,10)]
dates.sort()
valdict = {}
symbols = ['A',' B','C']
为符号中的symb:
valdict [symb] = pd.Series(s.zeros(len(dates)),日期)
日期在日期:
如果thedate> date [0]:
for symb in valdict:
valdict [symb] [thedate] = 1 + valdict [symb] [thedate - dt.timedelta(days = 1)]
print valdict
这里有几个建议: / p>
使用 date_range
索引:
import datetime
注意:我们可以通过写入:创建一个空的DataFrame(使用
import pandas as pd
import numpy as np
todays_date = datetime.datetime.now()。date()
index = pd.date_range(todays_date-datetime.timedelta (10),期间= 10,频率='D')
列= ['A','B','C']
NaN
s);df_ = pd.DataFrame(index = index,columns = columns)
df_ = df_.fillna (0)#与0而不是NaNs
要对数据进行这些类型的计算,请使用numpy数组:
data = np.array([np.arange(10)] * 3).T
因此,我们可以创建DataFrame:
在[10]中:df = pd.DataFrame(data,index = index,columns =列)
在[11]中:df
输出[11]:
ABC
2012-11-29 0 0 0
2012-11- 30 1 1 1
2012-12-01 2 2 2
2012-12-02 3 3 3
2012-12-03 4 4 4
2012-12-04 5 5 5
2012-12-05 6 6 6
2012-12-06 7 7 7
2012-12-07 8 8 8
2012-12-08 9 9 9
I'm starting from the pandas Data Frame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html
I'd like to iteratively fill the Data Frame with values in a time series kind of calculation. So basically, I'd like to initialize, data frame with columns A,B and timestamp rows, all 0 or all NaN.
I'd then add initial values and go over this data calculating the new row from the row before, say row[A][t] = row[A][t-1]+1 or so.
I'm currently using the code as below, but I feel it's kind of ugly and there must be a way to do this with a data frame directly or just a better way in general. Note: I'm using Python 2.7.
import datetime as dt import pandas as pd import scipy as s if __name__ == '__main__': base = dt.datetime.today().date() dates = [ base - dt.timedelta(days=x) for x in range(0,10) ] dates.sort() valdict = {} symbols = ['A','B', 'C'] for symb in symbols: valdict[symb] = pd.Series( s.zeros( len(dates)), dates ) for thedate in dates: if thedate > dates[0]: for symb in valdict: valdict[symb][thedate] = 1+valdict[symb][thedate - dt.timedelta(days=1)] print valdict
解决方案Here's a couple of suggestions:
Use
date_range
for the index:import datetime import pandas as pd import numpy as np todays_date = datetime.datetime.now().date() index = pd.date_range(todays_date-datetime.timedelta(10), periods=10, freq='D') columns = ['A','B', 'C']
Note: we could create an empty DataFrame (with
NaN
s) simply by writing:df_ = pd.DataFrame(index=index, columns=columns) df_ = df_.fillna(0) # with 0s rather than NaNs
To do these type of calculations for the data, use a numpy array:
data = np.array([np.arange(10)]*3).T
Hence we can create the DataFrame:
In [10]: df = pd.DataFrame(data, index=index, columns=columns) In [11]: df Out[11]: A B C 2012-11-29 0 0 0 2012-11-30 1 1 1 2012-12-01 2 2 2 2012-12-02 3 3 3 2012-12-03 4 4 4 2012-12-04 5 5 5 2012-12-05 6 6 6 2012-12-06 7 7 7 2012-12-07 8 8 8 2012-12-08 9 9 9
这篇关于创建一个空的Pandas DataFrame,然后填充它?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!