正确使用map将函数映射到df,python pandas [英] Correct use of map for mapping a function onto a df, python pandas

查看:185
本文介绍了正确使用map将函数映射到df,python pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

现在搜索了一段时间,对此一无所获.寻找最佳实践答案.我的代码有效,但是我不确定是否要引入问题.

Searching for awhile now and can't get anything concrete on this. Looking for a best practice answer. My code works, but I'm not sure if I'm introducing problems.

# df['Action'] = list(map(my_function, df.param1)) # Works but older 
    # i think?
df['Action'] = df['param1'].map(my_function)

这两个都产生相同的可见结果.我不完全确定第一个注释行的工作方式,但这是我在互联网上发现的一个示例,该示例已在此处应用并且有效.我发现的地图的其他大多数用途都类似于第二行,它是从Series对象中调用的.

Both of these produce the same VISIBLE result. I'm not entirely sure how the first, commented out line works, but it is an example I found on the internets that I applied here and it worked. Most other uses of map I've found are like the 2nd line, where it is called from the Series object.

第一个问题是,哪个是更好的实践,第一个到底在做什么?

So first question, which of these is better practice and what exactly is the first one doing?

第二个也是最后一个问题.这是两者中更重要的. 映射,应用,应用映射-不确定在这里使用哪个. 第一个注释掉的代码行不起作用,而第二个注释正是我想要的.

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

def my_function(param1, param2, param3):
    return param1 * param2 * param3 # example

# Can't get this df.map function to work?
# Error map is not attribute of dataframe
# df['New_Col'] = df.map(my_function, df.param1, df.param1.shift(1), 
#    df.param2.shift(1))

# TypeError: my_function takes 3 positional args, but 4 were given
# df['New_Col'] = df.apply(my_function, args=(df.param1, df.param1.shift(1), 
#    df.param2.shift(1)))

# This works, not sure why
df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
     df.param2.shift(1)))

我正在尝试根据当前行和前一行基于df的两列来计算结果.我已经尝试过在map上进行修改,并在直接从df调用(df.map,df.apply)时应用,但没有成功.但是,如果我使用list(map(...))表示法,则效果很好.

I'm trying to compute a result that is based off of two columns of the df, from the current and previous rows. I've tried variations on map and apply when called from the df directly (df.map, df.apply) and haven't had success. But if I use the list(map(...)) notation it works great.

列表(map(...))是否可以接受?哪个是最佳做法?是否有直接从df对象直接使用Apply或Map的正确方法?

Is list(map(...)) acceptable? Which is best practice? Is there a correct way to use apply or map directly from the df object?

谢谢大家,谢谢.

下面的MaxU响应也起作用.就这样,这两项工作都可以完成:

MaxU's response below works also. As it is, both of these work:

df['New_Col'] = list(map(my_function, df.param1, df.param1.shift(1), 
        df.param2.shift(1)))
df['New_Col'] = my_function(df.parma1, df.param1.shift(1), df.param2.shift(1))

# This does NOT work
df['New_Col'] = df.apply(my_function, axis=1, args=(df.param1, 
        df.param1.shift(1), df.param2.shift(1)))
# Also does not work
# AttributeError: ("'float' object has no attribute 'shift'", 
    'occurred at index 2000-01-04 00:00:00')
# Will work if I remove the shift(), but not what I need.
df['New_Col'] = df.apply(lambda x: my_function(x.param1, x.param1.shift(1),
    x.param2.shift(1)))    

对于在此处使用的正确语法,我仍然不清楚,并且这三种方法中的任何一种是否都优于其他方法(我猜list(map(...))是以下方法中的最差"这是3,因为它会迭代并且不会被向量化.

I'm still unclear as to the proper syntax to use apply here, and if any of these 3 methods are superior to the other (I'm guessing list(map(...)) is the "worst" of the 3 since it iterates and isn't vectorized.

推荐答案

第一个问题是,哪个是更好的实践? 第一个在做什么?

So first question, which of these is better practice and what exactly is the first one doing?

df['Action'] = df['param1'].map(my_function)

更加惯用,更快(矢量化)并且更可靠.

is much more idiomatic, faster (vectorized) and more reliable.

第二个也是最后一个问题.这是两者中更重要的.地图, apply,applymap-不确定在这里使用哪个.第一个被注释掉 这行代码行不通,而第二行则给了我确切的信息 想要.

2nd and final question. This is the more important of the two. Map, apply, applymap - not sure which to use here. The first commented out line of code does NOT work, while the second gives me exactly what I want.

熊猫没有DataFrame.map()-只有Series.map(),因此,如果您需要访问映射功能中的多个列,则可以使用DataFrame.apply().

Pandas does NOT have DataFrame.map() - only Series.map(), so if you need to access multiple columns in your mapping function - you can use DataFrame.apply().

演示:

df['New_Col'] = df.apply(lamba x: my_function(x.param1,
                                              x.param1.shift(1),
                                              x.param2.shift(1),
                         axis=1) 

或者只是:

df['New_Col'] = my_function(df.param1, df.param1.shift(1), df.param2.shift(1))

这篇关于正确使用map将函数映射到df,python pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆