pandas:用loc迭代DataFrame索引 [英] pandas: iterating over DataFrame index with loc

查看:471
本文介绍了pandas:用loc迭代DataFrame索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我似乎无法找到.loc行为背后的原因。我知道它是基于标签的,所以如果我迭代Index对象,下面的最小例子应该可行。但事实并非如此。我当然用谷歌搜索,但我需要有一个已经掌握索引的人的额外解释。

I can't seem to find the reasoning behind the behaviour of .loc. I know it is label based, so if I iterate over Index object the following minimal example should work. But it doesn't. I googled of course but I need additional explanation from someone who has already got a grip on indexing.

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'),   columns=['Date'])
df['Weekday'] = df['Date'].apply(lambda x: dict_weekday[x.isoweekday()])

for idx in df.index:
    print df.loc[idx, 'Weekday']


推荐答案

问题不在 df.loc ;
df.loc [idx,'Weekday'] 刚刚返回一个系列。
令人惊讶的行为是由于 pd.Series 尝试将类似日期时间的值转换为Timestamps的方式。

The problem is not in df.loc; df.loc[idx, 'Weekday'] is just returning a Series. The surprising behavior is due to the way pd.Series tries to cast datetime-like values to Timestamps.

df.loc[0, 'Weekday']

形成系列

pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))

pd.Series(...)被调用,它尝试将数据转换为适当的dtype。

When pd.Series(...) is called, it tries to cast the data to an appropriate dtype.

如果你追踪代码,你会发现它最终到达这些行在pandas.core.common._possibly_infer_to_datetimelike

If you trace through the code, you'll find that it eventually arrives at these lines in pandas.core.common._possibly_infer_to_datetimelike:

sample = v[:min(3,len(v))]
inferred_type = lib.infer_dtype(sample)

正在检查前几个el数据的数据,并试图推断dtype。
当其中一个值是pd.Timestamp时,Pandas会检查是否所有数据都可以转换为Timestamps。实际上,'Wed'可以转换为pd.Timestamp:

which is inspecting the first few elements of the data and trying to infer the dtype. When one of the values is a pd.Timestamp, Pandas checks to see if all the data can be cast as Timestamps. Indeed, 'Wed' can be cast to pd.Timestamp:

In [138]: pd.Timestamp('Wed')
Out[138]: Timestamp('2014-12-17 00:00:00')

这是问题的根源,导致 pd.Series 返回
两个时间戳而不是时间戳和字符串:

This is the root of the problem, which results in pd.Series returning two Timestamps instead of a Timestamp and a string:

In [139]: pd.Series(np.array([pd.Timestamp('2014-01-01 00:00:00'), 'WED'], dtype=object))
Out[139]: 
0   2014-01-01
1   2014-12-17
dtype: datetime64[ns]

因此返回

In [140]: df.loc[0, 'Weekday']
Out[140]: Timestamp('2014-12-17 00:00:00')

而不是'Wed'

替代方案:选择系列 df ['工作日'] 首先

Alternative: select the Series df['Weekday'] first:

有很多解决方法; EdChum表明,向样本添加非日期(整数)值可以防止pd.Series将所有值转换为时间戳。

There are many workarounds; EdChum shows that adding a non-datelike (integer) value to the sample can prevent pd.Series from casting all the values to Timestamps.

或者,您可以访问 df ['Weekdays'] 之前使用 .loc

Alternatively, you could access df['Weekdays'] before using .loc:

for idx in df.index:
    print df['Weekday'].loc[idx]






替代方案: df.loc [[idx],'工作日']


Alternative: df.loc[[idx], 'Weekday']:

另一种选择是

for idx in df.index:
    print df.loc[[idx], 'Weekday'].item()

df.loc [[idx],'Weekday'] 首先选择 DataFrame df.loc [[IDX]] 。例如,当 idx 等于 0 时,

df.loc[[idx], 'Weekday'] first selects the DataFrame df.loc[[idx]]. For example, when idx equals 0,

In [10]: df.loc[[0]]
Out[10]: 
        Date Weekday
0 2014-01-01     WED

df.loc [0] 返回系列:

In [11]: df.loc[0]
Out[11]: 
Date      2014-01-01
Weekday   2014-12-17
Name: 0, dtype: datetime64[ns]

系列尝试将值转换为单个有用的dtype。 DataFrame可以为每列提供不同的dtype。因此, Date 列中的时间戳不会影响 Weekday 列中值的dtype。

Series tries to cast the values to a single useful dtype. DataFrames can have a different dtype for each column. So the Timestamp in the Date column does not affect the dtype of the value in the Weekday column.

因此,使用返回DataFrame的索引选择器可以避免这个问题。

So the problem was avoided by using an index selector which returns a DataFrame.

替代方法:在工作日使用整数

另一种方法是将isoweekday整数存储在中工作日,并在打印时仅在结尾处转换为字符串:

Yet another alternative is to store the isoweekday integer in Weekday, and convert to strings only at the end when you print:

import datetime
import pandas as pd

dict_weekday = {1: 'MON', 2: 'TUE', 3: 'WED', 4: 'THU', 5: 'FRI', 6: 'SAT', 7: 'SUN'}
df = pd.DataFrame(pd.date_range(datetime.date(2014, 1, 1), datetime.date(2014, 1, 15), freq='D'),   columns=['Date'])
df['Weekday'] = df['Date'].dt.weekday+1   # add 1 for isoweekday

for idx in df.index:
    print dict_weekday[df.loc[idx, 'Weekday']]






替代方案:使用 df.ix


Alternative: use df.ix:

df.loc _LocIndexer ,而 df.ix _IXIndexer 。他们有
不同的 __ getitem __ 方法。如果你单步执行代码(例如,使用pdb),你会发现 df.ix 调用 df.getvalue

df.loc is a _LocIndexer, whereas df.ix is a _IXIndexer. They have different __getitem__ methods. If you step through the code (for example, using pdb) you'll find that df.ix calls df.getvalue:

def __getitem__(self, key):
    if type(key) is tuple:
        try:
            values = self.obj.get_value(*key)

和DataFrame方法 df.get_value 成功返回'WED'

and the DataFrame method df.get_value succeeds in returning 'WED':

In [14]: df.get_value(0, 'Weekday')
Out[14]: 'WED'

这就是为什么 df.ix 是另一种适用于此的选择。

This is why df.ix is another alternative that works here.

这篇关于pandas:用loc迭代DataFrame索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆