如何使用def函数变量函数遍历pandas df [英] How to iterate over pandas df with a def function variable function

查看:854
本文介绍了如何使用def函数变量函数遍历pandas df的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我希望您能在这里为我提供指导,因为我有些迷路,并且对python编程没有真正的经验.

i hope you can guide me here, cause i am a little lost and not really experienced in python programing.

我的目标:我必须为给定的化合物"计算加合物",两者均表示麻木,但是对于化合物"来说,有46种不同的加合物".

My goal: i have to calculate the "adducts" for a given "Compound", both represents numbes, but for eah "Compound" there are 46 different "Adducts".

每个加合物的计算如下:

Each adduct is calculated as follow:

加合物1 = [精确质量* M/电荷+加合物质量]

Adduct 1 = [Exact_mass*M/Charge + Adduct_mass]

其中,exact_mass =数字,M和Charge =根据每种加合物类型的数量(1、2、3等),Adduct_mass =根据每种加合物的数量(正或负).

where exact_mass = number, M and Charge = number (1, 2, 3, etc) according to each type of adduct, Adduct_mass = number (positive or negative) according to each adduct.

我的数据:2个数据帧.一种带有加合物的名称,M,Charge,Adduct_mass. 另一个对应于我要迭代的化合物的Compound_name和Exact_mass(我只是放了一个小的数据集)

My data: 2 data frames. One with the Adducts names, M, Charge, Adduct_mass. The other one correspond to the Compound_name and Exact_mass of the Compounds i want to iterate over (i just put a small data set)

加成:df_al

import pandas as pd 
data = [["M+3H", 3, 1, 1.007276], ["M+3Na", 3, 1, 22.989], ["M+H", 1, 1, 1.007276], ["2M+H", 1, 2, 1.007276], ["M-3H", 3, 1, -1.007276]]
df_al = pd.DataFrame(data, columns=["Ion_name", "Charge", "M", "Adduct_mass"])

化合物:df

import pandas as pd 
data1 = [[1, "C3H64O7", 596.465179], [2, "C30H42O7", 514.293038], [4, "C44H56O8", 712.397498], [4, "C24H32O6S", 448.191949], [5, "C20H28O3", 316.203834]]
df = pd.DataFrame(data1, columns=["CdId", "Formula", "exact_mass"])

我的代码:

df_name = df_al["Ion_name"]
df_mass = df_al["adduct_mass"]
df_div = df_al["Div"]
df_M = df_al["M"]

然后我为每个离子定义一个使用索引来设置每个值的函数

then i defined for each ion a function using the index to set each value

def A0(x):
    return x*df_M[0]/df_div[0] + df_mass[0]

def A1(x):
    return x*df_M[1]/df_div[1] + df_mass[1]

def A2(x):
    return x*df_M[2]/df_div[2] + df_mass[2]

def A3(x):
    return x*df_M[3]/df_div[3] + df_mass[3]

def A4(x):
    return x*df_M[4]/df_div[4] + df_mass[4]

def A5(x): 
    return x*df_M[5]/df_div[5] + df_mass[5]

def A6(x):
    return x*df_M[6]/df_div[6] + df_mass[6]

依此类推,直到功能A46

and so on, till func A46

然后我将每个函数映射到每个化合物,并将每个值存储在df的新列中(这是我的另一个问题:如何在与相应的功能?)

then i .map each function to to each of the Compounds and i store each value in a new column in the df (Here is my other problem: how to add the name of each ion on the top of each column matching the corresponding function?)

df[df_name.loc[0]] = df["exact_mass"].map(A0)
df[df_name.loc[1]] = df["exact_mass"].map(A1)
df[df_name.loc[2]] = df["exact_mass"].map(A2)
df[df_name.loc[3]] = df["exact_mass"].map(A3)
df[df_name.loc[4]] = df["exact_mass"].map(A4)
df[df_name.loc[5]] = df["exact_mass"].map(A5)
df[df_name.loc[6]] = df["exact_mass"].map(A6)

. . . 以此类推,直到应用A46.

. . . and so on till applying A46.

我想这是定义函数的简单方法,它会根据每个离子而变化(也许是forloop?),也是应用函数并获得相应名称而不用.loc的简单方法.

I thing it could be a simpler way to def the function and that it changes according each ion (maybe a forloop?) and also a simpler way to apply the function and get the corresponding name without .loc each one.

谢谢!

推荐答案

一种方法是将functools.partialmap一起使用.

One way is using functools.partial together with map.

鉴于函数调用的规律性,我会尝试以下方法:

Given the regularity of your function calls, I would try something like:

from funtools import partial

def func(x, n):
    return x*df_M[n]/df_div[n] + df_mass[n]

for i in range(max_i): #change max_i with the integer you need
    df[df_name.loc[i]] = map(partial(func, n=i), df["exact_mass"])
    #df[df_name.loc[i]] = df["exact_mass"].map(partial(func, n=i)) should work as well

此处的更多信息 https://docs.python.org/3.7/library/functools.html#functools.partial

这篇关于如何使用def函数变量函数遍历pandas df的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆