使用Python中列表中的变量操作DataFrame的函数 [英] Function for DataFrame operation using variables in the list with Python

查看:901
本文介绍了使用Python中列表中的变量操作DataFrame的函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个列表 list = ['OUT','IN'] 其中列表的所有元素都是数据框中的变量名称,后缀 _3M,_6M,_9M,15M 附加到它。

清单:

list = ['OUT','IN']
$ b

Input_df:

ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M
A 2 3 4 6 2 3 4 6
B 3 3 5 7 3 3 5 7
C 2 3 6 6 2 3 6 6
D 3 3 7 7 3 3 7 7



我正在解决的问题是减去

OUT_6M OUT_3M ,然后输入 Out_3M-6M



<2> OUT_9M from OUT_6M 并输入到单独的列中作为 Out_6M-9M



3。 OUT_15M 从 OUT_9M ,然后作为 Out_9M-15M

b
$ b

同样重复每一个e在列表中保留 OUT_3M IN_3M ,我在示例 Output_df 数据集。



Output_df:




ID输出3M输出3M-6M输出6M-9M Out_9M-15M IN_3M IN_3M-6M IN_6M-9M IN_9M-15M
A 2 1 1 2 2 1 1 2
B 3 0 2 2 3 0 2 2
C 2 1 3 0 2 1 3 0
D 3 0 4 0 3 0 4 0



列表中有很多元素,需要执行操作。有什么办法可以通过编写函数来解决这个问题。谢谢!

解决方案

我不确定你写的函数是什么意思,想做?例如:

  postfixes = ['3M','6M','9M','15M'] 
前缀= ['IN','OUT']

#分配空间,同时复制_3M
output_df = input_df.copy()

#重命名a很少
output_df.rename(columns = {'_'。join((prefix,postfixes [i])):'_'。join((prefix,postfixes [i-1] +' - '+ postfixes [ (1,len(postfixes))},inplace = True)


#计算差值
(1,len(postfixes)):
postfix = postfixes [i] +' - '+ postfixes [i-1]
output_df [' _'。join((prefix,postfix))] = input_df ['_'。join((prefix,postfixes [i-1]))]。values - input_df ['_'。join((prefix,postfixes [i ])]]。values

output_df是input_df的一个副本,与_3M情况分开,并预先分配DataF而不是一次创建一列(在你的代码中并不重要,但是如果你有成千上万的列,它会浪费时间在内存中移动内存,否则......)

另外,你应该避免调用一个列表list,或者当你试图将一个元组转换成一个列表的时候,你会得到一些令人讨厌的bug。


I have a list list = ['OUT', 'IN']where all the elements of the list is a variable name in the data frame with suffixes _3M, _6M, _9M, 15Mattached to it.

List: list = ['OUT', 'IN']

Input_df:

ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M A 2 3 4 6 2 3 4 6 B 3 3 5 7 3 3 5 7 C 2 3 6 6 2 3 6 6 D 3 3 7 7 3 3 7 7

The problem I am solving to do is subtracting the

1.OUT_6M from OUT_3M and entering in into separate column as Out_3M-6M

2.OUT_9M from OUT_6M and entering in into separate column as Out_6M-9M

3.OUT_15M from OUT_9M and entering in into separate column as Out_9M-15M

The Same repeats to each and every element in the list while keeping the OUT_3M and IN_3M which I mentioned in the sample Output_df dataset.

Output_df:

ID Out_3M Out_3M-6M Out_6M-9M Out_9M-15M IN_3M IN_3M-6M IN_6M-9M IN_9M-15M A 2 1 1 2 2 1 1 2 B 3 0 2 2 3 0 2 2 C 2 1 3 0 2 1 3 0 D 3 0 4 0 3 0 4 0

There are many elements in the list which I need to perform operation on. Is there any way I could solve this by writing a function. Thanks!

解决方案

I'm not sure what you mean by writing a function, aren't a couple of for cycles enough for what you want to do? Something like:

postfixes = ['3M','6M','9M','15M']
prefixes = ['IN','OUT']

# Allocate the space, while also copying _3M
output_df = input_df.copy()

# Rename a few
output_df.rename(columns={'_'.join((prefix, postfixes[i])): '_'.join((prefix, postfixes[i-1] + '-' + postfixes[i]))
                          for prefix in prefixes for i in range(1, len(postfixes))}, inplace=True)


# Compute the differences
for prefix in prefixes:
    for i in range(1,len(postfixes)):
        postfix = postfixes[i] + '-' + postfixes[i-1]
        output_df['_'.join((prefix, postfix))] = input_df['_'.join((prefix, postfixes[i-1]))].values - input_df['_'.join((prefix, postfixes[i]))].values

The output_df is a copy of input_df in the beginning, both to avoid dealing with the _3M case separately, and to pre-allocate the DataFrame instead of creating the columns one at a time (it doesn't matter in your code, but if you had thousands of columns it would waste time moving stuff around in memory otherwise...)

Also, you should avoid calling a list "list" or you're going to get some nasty-to-find bugs along the way when you're trying to convert a tuple to a list!

这篇关于使用Python中列表中的变量操作DataFrame的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆