pandas 函数:DataFrame.apply()在最上面一行运行两次 [英] Pandas function: DataFrame.apply() runs top row twice

查看:303
本文介绍了 pandas 函数:DataFrame.apply()在最上面一行运行两次的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个函数的两个版本,这些函数使用Pandas for Python 2.7逐行通过inputs.csv.

I have two versions of a function that uses Pandas for Python 2.7 to go through inputs.csv, row by row.

第一个版本在single column上使用Series.apply(),并按预期遍历每一行.

The first version uses Series.apply() on a single column, and goes through each row as intended.

第二个版本在multiple columns上使用DataFrame.apply(),由于某种原因,它两次读取第一行.然后,它继续执行其余行而不重复.

The second version uses DataFrame.apply() on multiple columns, and for some reason it reads the top row twice. It then goes on to execute the rest of the rows without duplicates.

有什么主意为什么后者要读两次顶行?

Any ideas why the latter reads the top row twice?

版本1 – Series.apply() (一次读取第一行)

import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")

def v1(x):
    y = x
    return pd.Series(y)
df["Y"] = df["X"].apply(v1)


版本2 – DataFrame.apply() (读取第一行两次)

import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")

def v2(f):
    y = f["X"]
    return pd.Series(y)
df["Y"] = df[(["X", "Z"])].apply(v2, axis=1)


print y:

v1(x):            v2(f):

    Row_1         Row_1
    Row_2         Row_1
    Row_3         Row_2
                  Row_3

推荐答案

这是设计使然,如 apply函数需要知道返回数据的形状,以便智能地确定如何将其组合. Apply是一种快捷方式,可以智能地应用聚合,转换或过滤.您可以尝试将函数分开,这样可以避免重复调用.

The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. Apply is a shortcut that intelligently applies aggregate, transform or filter. You can try breaking apart your function like so to avoid the duplicate calls.

这篇关于 pandas 函数:DataFrame.apply()在最上面一行运行两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆