使用loc的pandas数据帧索引 [英] pandas data frame indexing using loc
问题描述
我有一种FIZZ-BUZZ问题。我有工作日的日历。在下一栏中,我将'FIZZ'放在某些行中。如果两个'FIZZ'之间存在空隙,我将'BUZZ'置于其间,除非工作日为'SUN'。请参阅下面的代码(我使用0.15.2 pandas版本):
I have a sort of FIZZ-BUZZ problem. I have a calendar with weekdays. In the next column I put 'FIZZ' at certain rows. If there is an empty gap between two 'FIZZ' I put 'BUZZ' in between except if the weekday is 'SUN'. See the code below (I use 0.15.2 pandas version):
import datetime
import pandas as pd
dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 10), freq='D'), columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])
df['A'] = df['Weekday']
idx_lst = [0, 2, 3, 5, 9]
df.loc[idx_lst, 'A'] = 'FIZZ'
previous_idx = idx_lst[0]
for idx in idx_lst:
print idx
try:
print df.loc[idx - 1, 'Weekday'], df.loc[idx, 'Weekday']
if idx - previous_idx == 2 and df.loc[idx - 1, 'Weekday'] != 'SUN':
df.loc[idx-1, 'A'] = 'BUZZ'
except KeyError:
continue
previous_idx = idx
print df
输出为:
0
2
2014-12-18 00:00:00 FRI
3
FRI SAT
5
2014-12-21 00:00:00 MON
9
2014-12-18 00:00:00 FRI
Date Weekday A
0 2014-01-01 WED FIZZ
1 2014-01-02 THU BUZZ
2 2014-01-03 FRI FIZZ
3 2014-01-04 SAT FIZZ
4 2014-01-05 SUN BUZZ
5 2014-01-06 MON FIZZ
6 2014-01-07 TUE TUE
7 2014-01-08 WED WED
8 2014-01-09 THU THU
9 2014-01-10 FRI FIZZ
注意第4行,A列中应该有SUN而不是BUZZ。另请注意,如果idx-1不在idx_lst中,则.loc [idx-1]会给出时间戳。如果我使用.ix而不是.loc我会得到正确答案:
Notice line 4, there should be SUN in column A instead of BUZZ. Notice also that if idx-1 is not in idx_lst the .loc[idx-1] gives a time stamp. if I use .ix instead of .loc I get the correct answer:
0
2
THU FRI
3
FRI SAT
5
SUN MON
9
THU FRI
Date Weekday A
0 2014-01-01 WED FIZZ
1 2014-01-02 THU BUZZ
2 2014-01-03 FRI FIZZ
3 2014-01-04 SAT FIZZ
4 2014-01-05 SUN SUN
5 2014-01-06 MON FIZZ
6 2014-01-07 TUE TUE
7 2014-01-08 WED WED
8 2014-01-09 THU THU
9 2014-01-10 FRI FIZZ
有任何解释吗?在此先感谢。
Any explanations ? Thanks in advance.
推荐答案
令人惊讶的行为是由于 pd.Series
尝试将类似日期时间的值强制转换为 pd.Timestamps
。
The surprising behavior is due to pd.Series
attempting to coerce datetime-like values to pd.Timestamps
.
df.loc [1]
返回 pd.Series([pd.Timestamp('2014-01-02'),'THU','THU'])
,遗憾的是被强制转移到时间戳,因为这三个值都是类似日期时间的:
df.loc[1]
returns pd.Series([pd.Timestamp('2014-01-02'), 'THU', 'THU'])
which unfortunately gets coerced to Timestamps since all three values are datetime-like:
In [154]: pd.Series([pd.Timestamp('2014-01-02'), 'THU', 'THU'])
Out[154]:
0 2014-01-02
1 2014-12-18
2 2014-12-18
dtype: datetime64[ns]
相比之下, df.loc [2]
不会将值强制转换为时间戳,因为'FIZZ'
不是类似日期的:
In contrast, df.loc[2]
does not coerce the values to Timestamps since 'FIZZ'
is not datelike:
In [155]: pd.Series([pd.Timestamp('2014-01-03'), 'FRI', 'FIZZ'])
Out[155]:
0 2014-01-03 00:00:00
1 FRI
2 FIZZ
dtype: object
在使用 .loc $ c $之前,首先形成系列
df ['Weekday']
可以避免此问题c>:
The problem can be avoided by forming the Series, df['Weekday']
first before using .loc
:
In [158]: df['Weekday'].loc[1]
Out[158]: 'THU'
这是因为 df ['Weekday']。 dtype
仍为 dtype('O')
;没有转换为时间戳。
This works because df['Weekday'].dtype
remains dtype('O')
; there is no conversion to Timestamps.
for idx in idx_lst:
try:
# print(idx-1, df.ix[idx - 1, 'Weekday'], df.loc[idx - 1, 'Weekday'])
if idx - previous_idx == 2 and df['Weekday'].loc[idx - 1] != 'SUN':
df.loc[idx-1, 'A'] = 'BUZZ'
except KeyError:
continue
previous_idx = idx
收益率
Date Weekday A
0 2014-01-01 WED FIZZ
1 2014-01-02 THU BUZZ
2 2014-01-03 FRI FIZZ
3 2014-01-04 SAT FIZZ
4 2014-01-05 SUN SUN
5 2014-01-06 MON FIZZ
6 2014-01-07 TUE TUE
7 2014-01-08 WED WED
8 2014-01-09 THU THU
9 2014-01-10 FRI FIZZ
这篇关于使用loc的pandas数据帧索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!