使用迭代器迭代不同的数据帧 [英] Iterating over different data frames using an iterator

查看:89
本文介绍了使用迭代器迭代不同的数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有n个数据帧df_1df_2df_3,... df_n,分别包含名为SPEED1SPEED2SPEED3,...,的列SPEEDn,例如:

Suppose I have n number of data frames df_1, df_2, df_3, ... df_n, containing respectively columns named SPEED1 ,SPEED2, SPEED3, ..., SPEEDn, for instance:

import numpy as np
df_1 = pd.DataFrame({'SPEED1':np.random.uniform(0,600,100)})
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(0,600,100)})

,我想对所有数据帧进行相同的更改.如何通过在相似的行上定义一个函数来做到这一点?

and I want to make the same changes to all of the data frames. How do I do so by defining a function on similar lines?

def modify(df,nr):
    df_invalid_nr=df_nr[df_nr['SPEED'+str(nr)]>500]
    df_valid_nr=~df_invalid_nr
    Invalid_cycles_nr=df[df_invalid]
    df=df[df_valid]
    print(Invalid_cycles_nr)
    print(df)

所以,当我尝试运行上述功能时

So, when I try to run the above function

modify(df_1,1)

它返回未经修改的整个数据帧和无效循环为空数组.我猜想我需要在函数中某处的全局数据帧上定义修改,以便此工作.

It returns the entire data frame without modification and the invalid cycles as an empty array. I am guessing I need to define the modification on the global dataframe somewhere in the function for this to work.

我也不确定是否可以用其他方式做到这一点,比如说只是循环遍历所有数据帧的迭代器.但是,我不确定它是否会起作用.

I am also not sure if I could do this another way, say just looping an iterator through all the data frames. But, I am not sure it will work.

for i in range(1,n+1):
    df_invalid_i=df_i[df_i['SPEED'+str(i)]>500]
    df_valid_i=~df_invalid_i
    Invalid_cycles_i=df[df_invalid]
    df=df[df_valid]
    print(Invalid_cycles_i)
    print(df)

通常,我如何使用迭代器访问df_1?这似乎是一个问题.

How do I, in general, access df_1 using an iterator? It seems to be a problem.

任何帮助将不胜感激,谢谢!

Any help would be appreciated, thanks!

推荐答案

解决方案

输入

import pandas as pd
import numpy as np 

df_1 = pd.DataFrame({'SPEED1':np.random.uniform(1,600,100))
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(1,600,100))

代码

在我看来,更好的方法是将dfs存储到列表中,并在其上枚举以将信息添加到dfs中以创建valid列:

Code

To my mind a better approach would be to store your dfs into a list and enumerate over it for augmenting informations into your dfs to create a valid column:

for idx, df in enumerate([df_1, df_2]):
    col = 'SPEED'+str(idx+1)
    df['valid'] = df[col] <= 500

print(df_1)

        SPEED  valid
0  516.395756  False
1   14.643694   True
2  478.085372   True
3  592.831029  False
4    1.431332   True

然后您可以使用df_1[df_1.valid]df_1[df_1.valid == False]

这是适合您问题的解决方案,请参见另一种解决方案,它可能更干净,并在下面提供注释以获取所需的说明.

It is a solution to fit your problem, see Another solution that may be more clean and Notes below for explanations you need.

如果可以的话,请重新考虑您的代码.每个DataFrame都有一个列速度,然后将其命名为SPEED:

If it is possible for you re-think your code. Each DataFrame has one column speed, then name it SPEED:

dfs = dict(df_1=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}),
           df_2=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}))

它将允许您执行以下一项操作:

It will allow you to do the following one liner:

dfs = dict(map(lambda key_val: (key_val[0],
                                key_val[1].assign(valid = key_val[1]['SPEED'] <= 500)),
               dfs.items()))

print(dfs['df_1'])

        SPEED  valid
0  516.395756  False
1   14.643694   True
2  478.085372   True
3  592.831029  False
4    1.431332   True

说明:

  • dfs.items()返回键(即名称)和值(即DataFrame)的列表
  • map(foo, bar)应用函数foo(请参见此答案
  • dfs.items() returns a list of key (i.e. names) and values (i.e. DataFrames)
  • map(foo, bar) apply the function foo (see this answer, and DataFrame assign) to all the elements of bar (i.e. to all the key/value pairs of dfs.items().
  • dict() cast the map to a dict.

请注意,函数modify没有返回任何内容...我建议您对Python的可变性和不可变性有更多的了解.此文章很有趣.

Notice that your function modify is not returning anything... I suggest you to have more readings on mutability and immutability in Python. This article is interesting.

然后您可以测试以下示例:

You can then test the following for instance:

def modify(df):
    df=df[df.SPEED1<0.5]
    #The change in df is on the scope of the function only, 
    #it will not modify your input, return the df...
    return df

#... and affect the output to apply changes
df_1 = modify(df_1)

关于使用迭代器进行的访问df_1

请注意,当您这样做时:

About access df_1 using an iterator

Notice that when you do:

for i in range(1,n+1):
    df_i something

循环中的

df_i将为每次迭代调用对象df_i(而不是df_1等) 要按其名称调用对象,请改用globals()['df_'+str(i)](假设df_1df_n+1位于globals()中)-来自此

df_i in your loop will call the object df_i for each iteration (and not df_1 etc.) To call an object by its name, use globals()['df_'+str(i)] instead (Assuming that df_1 to df_n+1 are located in globals()) - from this answer.

在我看来,这不是一个干净的方法.我不知道如何创建DataFrame,但如果可能的话,我建议您将它们存储到字典中,而不要手动影响:

To my mind it is not a clean approach. I don't know how do you create your DataFrames but if it is possible for your I will suggest you to store them into a dictionary instead affecting manually:

dfs = {}
dfs['df_1'] = ...

,或者如果df_1df_n已经存在,则自动执行-根据 vestland答案的第一部分:

or a bit more automatically if df_1 to df_n already exist - according to first part of vestland answer :

dfs = dict((var, eval(var)) for
           var in dir() if
           isinstance(eval(var), pd.core.frame.DataFrame) and 'df_' in var)

然后,您可以更轻松地遍历DataFrames:

Then it would be easier for your to iterate over your DataFrames:

for i in range(1,n+1):
    dfs['df_'+str(i)'] something

这篇关于使用迭代器迭代不同的数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆