使用迭代器迭代不同的数据帧 [英] Iterating over different data frames using an iterator
问题描述
假设我有n个数据帧df_1
,df_2
,df_3
,... df_n
,分别包含名为SPEED1
,SPEED2
,SPEED3
,...,的列SPEEDn
,例如:
Suppose I have n number of data frames df_1
, df_2
, df_3
, ... df_n
, containing respectively columns named SPEED1
,SPEED2
, SPEED3
, ..., SPEEDn
, for instance:
import numpy as np
df_1 = pd.DataFrame({'SPEED1':np.random.uniform(0,600,100)})
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(0,600,100)})
,我想对所有数据帧进行相同的更改.如何通过在相似的行上定义一个函数来做到这一点?
and I want to make the same changes to all of the data frames. How do I do so by defining a function on similar lines?
def modify(df,nr):
df_invalid_nr=df_nr[df_nr['SPEED'+str(nr)]>500]
df_valid_nr=~df_invalid_nr
Invalid_cycles_nr=df[df_invalid]
df=df[df_valid]
print(Invalid_cycles_nr)
print(df)
所以,当我尝试运行上述功能时
So, when I try to run the above function
modify(df_1,1)
它返回未经修改的整个数据帧和无效循环为空数组.我猜想我需要在函数中某处的全局数据帧上定义修改,以便此工作.
It returns the entire data frame without modification and the invalid cycles as an empty array. I am guessing I need to define the modification on the global dataframe somewhere in the function for this to work.
我也不确定是否可以用其他方式做到这一点,比如说只是循环遍历所有数据帧的迭代器.但是,我不确定它是否会起作用.
I am also not sure if I could do this another way, say just looping an iterator through all the data frames. But, I am not sure it will work.
for i in range(1,n+1):
df_invalid_i=df_i[df_i['SPEED'+str(i)]>500]
df_valid_i=~df_invalid_i
Invalid_cycles_i=df[df_invalid]
df=df[df_valid]
print(Invalid_cycles_i)
print(df)
通常,我如何使用迭代器访问df_1
?这似乎是一个问题.
How do I, in general, access df_1
using an iterator? It seems to be a problem.
任何帮助将不胜感激,谢谢!
Any help would be appreciated, thanks!
推荐答案
解决方案
输入
import pandas as pd
import numpy as np
df_1 = pd.DataFrame({'SPEED1':np.random.uniform(1,600,100))
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(1,600,100))
代码
在我看来,更好的方法是将dfs
存储到列表中,并在其上枚举以将信息添加到dfs
中以创建valid
列:
Code
To my mind a better approach would be to store your dfs
into a list and enumerate over it for augmenting informations into your dfs
to create a valid
column:
for idx, df in enumerate([df_1, df_2]):
col = 'SPEED'+str(idx+1)
df['valid'] = df[col] <= 500
print(df_1)
SPEED valid
0 516.395756 False
1 14.643694 True
2 478.085372 True
3 592.831029 False
4 1.431332 True
然后您可以使用df_1[df_1.valid]
或df_1[df_1.valid == False]
这是适合您问题的解决方案,请参见另一种解决方案,它可能更干净,并在下面提供注释以获取所需的说明.
It is a solution to fit your problem, see Another solution that may be more clean and Notes below for explanations you need.
如果可以的话,请重新考虑您的代码.每个DataFrame都有一个列速度,然后将其命名为SPEED
:
If it is possible for you re-think your code. Each DataFrame has one column speed, then name it SPEED
:
dfs = dict(df_1=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}),
df_2=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}))
它将允许您执行以下一项操作:
It will allow you to do the following one liner:
dfs = dict(map(lambda key_val: (key_val[0],
key_val[1].assign(valid = key_val[1]['SPEED'] <= 500)),
dfs.items()))
print(dfs['df_1'])
SPEED valid
0 516.395756 False
1 14.643694 True
2 478.085372 True
3 592.831029 False
4 1.431332 True
说明:
-
dfs.items()
返回键(即名称)和值(即DataFrame)的列表 -
map(foo, bar)
应用函数foo(请参见此答案和
dfs.items()
returns a list of key (i.e. names) and values (i.e. DataFrames)map(foo, bar)
apply the function foo (see this answer, and DataFrame assign) to all the elements of bar (i.e. to all the key/value pairs ofdfs.items()
.dict()
cast the map to a dict.
请注意,函数modify
没有返回任何内容...我建议您对Python的可变性和不可变性有更多的了解.此文章很有趣.
Notice that your function modify
is not returning anything... I suggest you to have more readings on mutability and immutability in Python. This article is interesting.
然后您可以测试以下示例:
You can then test the following for instance:
def modify(df):
df=df[df.SPEED1<0.5]
#The change in df is on the scope of the function only,
#it will not modify your input, return the df...
return df
#... and affect the output to apply changes
df_1 = modify(df_1)
关于使用迭代器进行的访问df_1
请注意,当您这样做时:
About access df_1
using an iterator
Notice that when you do:
for i in range(1,n+1):
df_i something
循环中的
df_i
将为每次迭代调用对象df_i
(而不是df_1
等)
要按其名称调用对象,请改用globals()['df_'+str(i)]
(假设df_1
至df_n+1
位于globals()
中)-来自此
df_i
in your loop will call the object df_i
for each iteration (and not df_1
etc.)
To call an object by its name, use globals()['df_'+str(i)]
instead (Assuming that df_1
to df_n+1
are located in globals()
) - from this answer.
在我看来,这不是一个干净的方法.我不知道如何创建DataFrame,但如果可能的话,我建议您将它们存储到字典中,而不要手动影响:
To my mind it is not a clean approach. I don't know how do you create your DataFrames but if it is possible for your I will suggest you to store them into a dictionary instead affecting manually:
dfs = {}
dfs['df_1'] = ...
,或者如果df_1
至df_n
已经存在,则自动执行-根据 vestland答案的第一部分:
or a bit more automatically if df_1
to df_n
already exist - according to first part of vestland answer :
dfs = dict((var, eval(var)) for
var in dir() if
isinstance(eval(var), pd.core.frame.DataFrame) and 'df_' in var)
然后,您可以更轻松地遍历DataFrames:
Then it would be easier for your to iterate over your DataFrames:
for i in range(1,n+1):
dfs['df_'+str(i)'] something
这篇关于使用迭代器迭代不同的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!