pandas 函数:DataFrame.apply()在最上面一行运行两次 [英] Pandas function: DataFrame.apply() runs top row twice
问题描述
我有一个函数的两个版本,这些函数使用Pandas
for Python 2.7
逐行通过inputs.csv
.
I have two versions of a function that uses Pandas
for Python 2.7
to go through inputs.csv
, row by row.
第一个版本在single column
上使用Series.apply()
,并按预期遍历每一行.
The first version uses Series.apply()
on a single column
, and goes through each row as intended.
第二个版本在multiple columns
上使用DataFrame.apply()
,由于某种原因,它两次读取第一行.然后,它继续执行其余行而不重复.
The second version uses DataFrame.apply()
on multiple columns
, and for some reason it reads the top row twice. It then goes on to execute the rest of the rows without duplicates.
有什么主意为什么后者要读两次顶行?
Any ideas why the latter reads the top row twice?
版本1 – Series.apply()
(一次读取第一行)
import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")
def v1(x):
y = x
return pd.Series(y)
df["Y"] = df["X"].apply(v1)
版本2 – DataFrame.apply()
(读取第一行两次)
import pandas as pd
df = pd.read_csv(inputs.csv, delimiter=",")
def v2(f):
y = f["X"]
return pd.Series(y)
df["Y"] = df[(["X", "Z"])].apply(v2, axis=1)
print y
:
v1(x): v2(f):
Row_1 Row_1
Row_2 Row_1
Row_3 Row_2
Row_3
推荐答案
这是设计使然,如这里
apply函数需要知道返回数据的形状,以便智能地确定如何将其组合. Apply是一种快捷方式,可以智能地应用聚合,转换或过滤.您可以尝试将函数分开,这样可以避免重复调用.
The apply function needs to know the shape of the returned data to intelligently figure out how it will be combined. Apply is a shortcut that intelligently applies aggregate, transform or filter. You can try breaking apart your function like so to avoid the duplicate calls.
这篇关于 pandas 函数:DataFrame.apply()在最上面一行运行两次的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!