使用pandas / python从导入的CSV中挑选日期 [英] Picking dates from an imported CSV with pandas/python
问题描述
我有一个.csv文件包含每日数据,如下所示:
I am having a .csv file with daily data, as follows:
some 19 more header rows
Werte
01.01.1971 07:00:00 ; 0.0
02.01.1971 07:00:00 ; 1.2
...and so on
b
$ b
which I import with:
RainD=pd.read_csv('filename.csv',skiprows=20,sep=';',dayfirst=True,parse_dates=True)
结果,我得到
In [416]: RainD
Out[416]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 14976 entries, 1971-01-01 07:00:00 to 2012-01-01 07:00:00
Data columns:
Werte: 14976 non-null values
dtypes: object(1)
因此它的aa Dataframe,但也许一个Timeseries可能是正确的办法?但是如何导入它呢? pandas文档列出了 read_csv
中的 dtype
选项,但没有关于我可以/应该指定的信息。
So its a a Dataframe, but maybe a Timeseries might be the right way? But how do I import it as such? The pandas documentation list a dtype
option in read_csv
, but no info on what I can/should specify.
但另一方面, DatetimeIndex:
在我看来似乎熊猫很清楚我处理日期这里,但仍然使它成为一个Dataframe。为此, RainD ['1971']
只会导致一个 u'no项目命名为1971'
错误。
But on the other hand, the DatetimeIndex:
seems to me like pandas is quite aware of the fact that i deals with Dates here, but still makes it a Dataframe. And for that, something like RainD['1971']
just results in an u'no item named 1971'
Key error.
我有一种感觉,我只是缺少一些非常明显的东西,因为时间序列分析似乎是熊猫是为了。
I have the feeling that I am just missing something really obvious, since time series analysis seems to be THE thing pandas was made for.
我的另一个第一个想法是,熊猫可能会感到困惑的事实,日期是写在正确的(即dd.mm.yyyy;))的方式,但 RainD.head()
显示我可以消化那只是很好。
Another first idea of mine was that pandas might get confused by the fact that the dates are written in the correct (ie dd.mm.yyyy ;) ) way, but a RainD.head()
shows me that i could digest that just fine.
JC
推荐答案
EdChum的 df [df.index.year == 1971]
解决了我的问题。
EdChum's df[df.index.year == 1971]
solved my issue.
我可能还有其他问题(即过时的pandas版本),但现在,我可以继续工作了。
I might have some other issues (ie outdated version of pandas), but for now, I can continue working.
这篇关于使用pandas / python从导入的CSV中挑选日期的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!