使用Python中列表中的变量操作DataFrame的函数 [英] Function for DataFrame operation using variables in the list with Python
问题描述
我有一个列表 list = ['OUT','IN']
其中列表的所有元素都是数据框中的变量名称,后缀 _3M,_6M,_9M,15M
附加到它。
清单: Input_df: 我正在解决的问题是减去
list = ['OUT','IN']
$ c
$ b
ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M
A 2 3 4 6 2 3 4 6
B 3 3 5 7 3 3 5 7
C 2 3 6 6 2 3 6 6
D 3 3 7 7 3 3 7 7
OUT_6M
从 OUT_3M
,然后输入 Out_3M-6M
<2> OUT_9M from
OUT_6M
并输入到单独的列中作为 Out_6M-9M
3。 OUT_15M 从
OUT_9M
,然后作为 Out_9M-15M
$ b
同样重复每一个e在列表中保留 OUT_3M
和 IN_3M
,我在示例 Output_df
数据集。
Output_df:
ID输出3M输出3M-6M输出6M-9M Out_9M-15M IN_3M IN_3M-6M IN_6M-9M IN_9M-15M
A 2 1 1 2 2 1 1 2
B 3 0 2 2 3 0 2 2
C 2 1 3 0 2 1 3 0
D 3 0 4 0 3 0 4 0
列表中有很多元素,需要执行操作。有什么办法可以通过编写函数来解决这个问题。谢谢!
我不确定你写的函数是什么意思,想做?例如:
postfixes = ['3M','6M','9M','15M']
前缀= ['IN','OUT']
#分配空间,同时复制_3M
output_df = input_df.copy()
#重命名a很少
output_df.rename(columns = {'_'。join((prefix,postfixes [i])):'_'。join((prefix,postfixes [i-1] +' - '+ postfixes [ (1,len(postfixes))},inplace = True)
#计算差值
(1,len(postfixes)):
postfix = postfixes [i] +' - '+ postfixes [i-1]
output_df [' _'。join((prefix,postfix))] = input_df ['_'。join((prefix,postfixes [i-1]))]。values - input_df ['_'。join((prefix,postfixes [i ])]]。values
output_df是input_df的一个副本,与_3M情况分开,并预先分配DataF而不是一次创建一列(在你的代码中并不重要,但是如果你有成千上万的列,它会浪费时间在内存中移动内存,否则......)
另外,你应该避免调用一个列表list,或者当你试图将一个元组转换成一个列表的时候,你会得到一些令人讨厌的bug。
I have a list list = ['OUT', 'IN']
where all the elements of the list is a variable name in the data frame with suffixes _3M, _6M, _9M, 15M
attached to it.
List:
list = ['OUT', 'IN']
Input_df:
ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M
A 2 3 4 6 2 3 4 6
B 3 3 5 7 3 3 5 7
C 2 3 6 6 2 3 6 6
D 3 3 7 7 3 3 7 7
The problem I am solving to do is subtracting the
1.OUT_6M
from OUT_3M
and entering in into separate column as Out_3M-6M
2.OUT_9M
from OUT_6M
and entering in into separate column as Out_6M-9M
3.OUT_15M
from OUT_9M
and entering in into separate column as Out_9M-15M
The Same repeats to each and every element in the list while keeping the OUT_3M
and IN_3M
which I mentioned in the sample Output_df
dataset.
Output_df:
ID Out_3M Out_3M-6M Out_6M-9M Out_9M-15M IN_3M IN_3M-6M IN_6M-9M IN_9M-15M
A 2 1 1 2 2 1 1 2
B 3 0 2 2 3 0 2 2
C 2 1 3 0 2 1 3 0
D 3 0 4 0 3 0 4 0
There are many elements in the list which I need to perform operation on. Is there any way I could solve this by writing a function. Thanks!
I'm not sure what you mean by writing a function, aren't a couple of for cycles enough for what you want to do? Something like:
postfixes = ['3M','6M','9M','15M']
prefixes = ['IN','OUT']
# Allocate the space, while also copying _3M
output_df = input_df.copy()
# Rename a few
output_df.rename(columns={'_'.join((prefix, postfixes[i])): '_'.join((prefix, postfixes[i-1] + '-' + postfixes[i]))
for prefix in prefixes for i in range(1, len(postfixes))}, inplace=True)
# Compute the differences
for prefix in prefixes:
for i in range(1,len(postfixes)):
postfix = postfixes[i] + '-' + postfixes[i-1]
output_df['_'.join((prefix, postfix))] = input_df['_'.join((prefix, postfixes[i-1]))].values - input_df['_'.join((prefix, postfixes[i]))].values
The output_df is a copy of input_df in the beginning, both to avoid dealing with the _3M case separately, and to pre-allocate the DataFrame instead of creating the columns one at a time (it doesn't matter in your code, but if you had thousands of columns it would waste time moving stuff around in memory otherwise...)
Also, you should avoid calling a list "list" or you're going to get some nasty-to-find bugs along the way when you're trying to convert a tuple to a list!
这篇关于使用Python中列表中的变量操作DataFrame的函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!