哪个是从给定日期提取日,月,年的最快方法? [英] Which is the fastest way to extract day, month and year from a given date?
问题描述
第一种方法:
code> df = pandas.read_csv(filename)
for x in xrange(len(df)):
df.loc [i,'Day'] = int(df.loc [i ,'Date']。split(' - ')[2])
第二种方法: p>
df = pandas.read_csv(filename)
/ pre>
for x in xrange(len(df)):
df .loc [i,'Day'] = datetime.strptime(df.loc [i,'Date'],'%Y-%m-%d')。day
谢谢。
解决方案在0.15.0你将可以使用新的.dt访问器在语法上做到这一点。
在[36]中:df = DataFrame(date_range ('20000101',期间= 150000,频率='H'),列= ['日期']
在[37]中:df.head(5)
输出[37 ]:
日期
0 2000-01-01 00:00:00
1 2000-01-01 01:00:00
2 2000-01-01 02:00 :00
3 2000-01-01 03:00:00
4 2000-01-01 04:00:00
[5行x 1列]
在[38]中:%timeit f(df)
10循环,最佳3:22 ms每循环
在[39]中:def f(df):
df = df.copy()
df ['Year'] = DatetimeIndex(df ['Date'])。 year
df ['Month'] = DatetimeIndex(df ['Date'])。month
df ['Day'] = DatetimeIndex(df ['Date'])day
return df
....:
在[40]中:f(df).head()
出[40]:
日期年月日
0 2000-01-01 00:00:00 2000 1 1
1 2000-01-01 01:00:00 2000 1 1
2 2000-01-01 02:00:00 2000 1 1
3 2000-01-01 03:00:00 2000 1 1
4 2000-01-01 04:00:00 2000 1 1
[5行x 4列]
从0.15.0开始(2014年9月底发布),以下是可能使用新的.dt访问器:
df ['Year'] = df ['Date']。dt.year
df ['Month'] = df ['Date']。dt.month
df ['Day'] = df ['Date']。dt.day
I read a csv file containing 150,000 lines into a pandas dataframe. This dataframe has a field, 'Date', with the dates in yyyy-mm-dd format. I want to extract the month, day and year from it and copy into the dataframes' columns, 'Month', 'Day' and 'Year' respectively. For a few hundred records the below two methods work ok, but for 150,000 records both take a ridiculously long time to execute. Is there a faster way to do this for 100,000+ records?
First method:
df = pandas.read_csv(filename) for i in xrange(len(df)): df.loc[i,'Day'] = int(df.loc[i,'Date'].split('-')[2])
Second method:
df = pandas.read_csv(filename) for i in xrange(len(df)): df.loc[i,'Day'] = datetime.strptime(df.loc[i,'Date'], '%Y-%m-%d').day
Thank you.
解决方案In 0.15.0 you will be able to use the new .dt accessor to do this nice syntactically.
In [36]: df = DataFrame(date_range('20000101',periods=150000,freq='H'),columns=['Date']) In [37]: df.head(5) Out[37]: Date 0 2000-01-01 00:00:00 1 2000-01-01 01:00:00 2 2000-01-01 02:00:00 3 2000-01-01 03:00:00 4 2000-01-01 04:00:00 [5 rows x 1 columns] In [38]: %timeit f(df) 10 loops, best of 3: 22 ms per loop In [39]: def f(df): df = df.copy() df['Year'] = DatetimeIndex(df['Date']).year df['Month'] = DatetimeIndex(df['Date']).month df['Day'] = DatetimeIndex(df['Date']).day return df ....: In [40]: f(df).head() Out[40]: Date Year Month Day 0 2000-01-01 00:00:00 2000 1 1 1 2000-01-01 01:00:00 2000 1 1 2 2000-01-01 02:00:00 2000 1 1 3 2000-01-01 03:00:00 2000 1 1 4 2000-01-01 04:00:00 2000 1 1 [5 rows x 4 columns]
From 0.15.0 on (release in end of Sept 2014), the following is now possible with the new .dt accessor:
df['Year'] = df['Date'].dt.year df['Month'] = df['Date'].dt.month df['Day'] = df['Date'].dt.day
这篇关于哪个是从给定日期提取日,月,年的最快方法?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!