pandas dataframe.query方法语法 [英] Pandas dataframe.query method syntax

查看:85
本文介绍了 pandas dataframe.query方法语法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:

我想更好地了解熊猫

I would like to gain a better understanding of the Pandas DataFrame.query method and what the following expression represents:

match = dfDays.query('index > @x.name & price >= @x.target')

@x.name代表什么?

我知道此代码(带有pandas.tslib.Timestamp数据的新列)的输出结果是什么,但对用于获得此最终结果的表达式没有清楚的了解.

I understand what the resulting output is for this code (a new column with pandas.tslib.Timestamp data) but don't have a clear understanding of the expression used to get this end result.

数据:

从这里:

矢量化的日期查询方式和价格数据

np.random.seed(seed=1)
rng = pd.date_range('1/1/2000', '2000-07-31',freq='D')
weeks = np.random.uniform(low=1.03, high=3, size=(len(rng),))
ts2 = pd.Series(weeks
               ,index=rng)
dfDays = pd.DataFrame({'price':ts2})
dfWeeks = dfDays.resample('1W-Mon').first()
dfWeeks['target'] = (dfWeeks['price'] + .5).round(2)

def find_match(x):
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))

推荐答案

@MaxU所说的一切都很完美!

Everything @MaxU said is perfect!

我想为应用此问题的特定问题添加一些上下文.

I wanted to add some context to the specific problem that this was applied to.

这是在数据框dfWeeks.apply中使用的辅助函数.需要注意的两件事:

This is a helper function that is used in the dataframe dfWeeks.apply. Two things to note:

  1. find_match采用单个参数x.这将是dfWeeks的一行.
    • 每一行都是一个pd.Series对象,每一行都将通过此函数传递.这是使用apply的本质.
    • apply将此行传递给helper函数时,该行具有name属性,该属性等于数据框中该行的索引值.在这种情况下,我知道索引值是pd.Timestamp,我将使用它来进行所需的比较.
  1. find_match takes a single argument x. This will be a single row of dfWeeks.
    • Each row is a pd.Series object and each row will be passed through this function. This is the nature of using apply.
    • When apply passes this row to the helper function, the row has a name attribute that is equal to the index value for that row in the dataframe. In this case, I know that the index value is a pd.Timestamp and I'll use it to do the comparing I need to do.

我不必使用query ...我喜欢使用query.我认为这会使一些代码更漂亮. OP提供的以下功能可以用不同的方式写

I didn't have to use query... I like using query. It is my opinion that it makes some code prettier. The following function, as provided by the OP, could've been written differently

def find_match(x):
    """Original"""
    match = dfDays.query('index > @x.name & price >= @x.target')
    if not match.empty:
        return match.index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match, 1))

find_match_alt

或者我们可以这样做,这可能有助于解释query字符串在上面的作用

find_match_alt

Or we could've done this, which may help to explain what the query string is doing above

def find_match_alt(x):
    """Alternative to OP's"""
    date_is_afterwards = dfDays.index > x.name
    price_target_is_met = dfDays.price >= x.target
    both_are_true = price_target_is_met & date_is_afterwards
    if (both_are_true).any():
        return dfDays[both_are_true].index[0]

dfWeeks.assign(target_hit=dfWeeks.apply(find_match_alt, 1))


比较这两个功能应该可以很好地理解


Comparing these two functions should give good perspective.

这篇关于 pandas dataframe.query方法语法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆