使用迭代器迭代不同的数据帧 [英] Iterating over different data frames using an iterator

查看：89 发布时间：2020/5/4 5:41:38 python pandas loops dataframe

本文介绍了使用迭代器迭代不同的数据帧的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有n个数据帧df_1，df_2，df_3，... df_n，分别包含名为SPEED1，SPEED2，SPEED3，...，的列SPEEDn，例如:

Suppose I have n number of data frames df_1, df_2, df_3, ... df_n, containing respectively columns named SPEED1 ,SPEED2, SPEED3, ..., SPEEDn, for instance:

import numpy as np
df_1 = pd.DataFrame({'SPEED1':np.random.uniform(0,600,100)})
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(0,600,100)})

，我想对所有数据帧进行相同的更改.如何通过在相似的行上定义一个函数来做到这一点?

and I want to make the same changes to all of the data frames. How do I do so by defining a function on similar lines?

def modify(df,nr):
    df_invalid_nr=df_nr[df_nr['SPEED'+str(nr)]>500]
    df_valid_nr=~df_invalid_nr
    Invalid_cycles_nr=df[df_invalid]
    df=df[df_valid]
    print(Invalid_cycles_nr)
    print(df)

所以，当我尝试运行上述功能时

So, when I try to run the above function

modify(df_1,1)

它返回未经修改的整个数据帧和无效循环为空数组.我猜想我需要在函数中某处的全局数据帧上定义修改，以便此工作.

It returns the entire data frame without modification and the invalid cycles as an empty array. I am guessing I need to define the modification on the global dataframe somewhere in the function for this to work.

我也不确定是否可以用其他方式做到这一点，比如说只是循环遍历所有数据帧的迭代器.但是，我不确定它是否会起作用.

I am also not sure if I could do this another way, say just looping an iterator through all the data frames. But, I am not sure it will work.

for i in range(1,n+1):
    df_invalid_i=df_i[df_i['SPEED'+str(i)]>500]
    df_valid_i=~df_invalid_i
    Invalid_cycles_i=df[df_invalid]
    df=df[df_valid]
    print(Invalid_cycles_i)
    print(df)

通常，我如何使用迭代器访问df_1?这似乎是一个问题.

How do I, in general, access df_1 using an iterator? It seems to be a problem.

任何帮助将不胜感激，谢谢！

Any help would be appreciated, thanks!

解决方案

输入

import pandas as pd
import numpy as np 

df_1 = pd.DataFrame({'SPEED1':np.random.uniform(1,600,100))
df_2 = pd.DataFrame({'SPEED2':np.random.uniform(1,600,100))

代码

在我看来，更好的方法是将dfs存储到列表中，并在其上枚举以将信息添加到dfs中以创建valid列:

Code

To my mind a better approach would be to store your dfs into a list and enumerate over it for augmenting informations into your dfs to create a valid column:

for idx, df in enumerate([df_1, df_2]):
    col = 'SPEED'+str(idx+1)
    df['valid'] = df[col] <= 500

print(df_1)

        SPEED  valid
0  516.395756  False
1   14.643694   True
2  478.085372   True
3  592.831029  False
4    1.431332   True

然后您可以使用df_1[df_1.valid]或df_1[df_1.valid == False]

这是适合您问题的解决方案，请参见另一种解决方案，它可能更干净，并在下面提供注释以获取所需的说明.

It is a solution to fit your problem, see Another solution that may be more clean and Notes below for explanations you need.

如果可以的话，请重新考虑您的代码.每个DataFrame都有一个列速度，然后将其命名为SPEED:

If it is possible for you re-think your code. Each DataFrame has one column speed, then name it SPEED:

dfs = dict(df_1=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}),
           df_2=pd.DataFrame({'SPEED':np.random.uniform(0,600,100)}))

它将允许您执行以下一项操作:

It will allow you to do the following one liner:

dfs = dict(map(lambda key_val: (key_val[0],
                                key_val[1].assign(valid = key_val[1]['SPEED'] <= 500)),
               dfs.items()))

print(dfs['df_1'])

        SPEED  valid
0  516.395756  False
1   14.643694   True
2  478.085372   True
3  592.831029  False
4    1.431332   True

说明:

dfs.items()返回键(即名称)和值(即DataFrame)的列表
map(foo, bar)应用函数foo(请参见此答案和

dfs.items() returns a list of key (i.e. names) and values (i.e. DataFrames)
map(foo, bar) apply the function foo (see this answer, and DataFrame assign) to all the elements of bar (i.e. to all the key/value pairs of dfs.items().
dict() cast the map to a dict.

请注意，函数modify没有返回任何内容...我建议您对Python的可变性和不可变性有更多的了解.此文章很有趣.

Notice that your function modify is not returning anything... I suggest you to have more readings on mutability and immutability in Python. This article is interesting.

然后您可以测试以下示例:

You can then test the following for instance:

def modify(df):
    df=df[df.SPEED1<0.5]
    #The change in df is on the scope of the function only, 
    #it will not modify your input, return the df...
    return df

#... and affect the output to apply changes
df_1 = modify(df_1)

关于使用迭代器进行的访问`df_1`

请注意，当您这样做时:

About access `df_1` using an iterator

Notice that when you do:

for i in range(1,n+1):
    df_i something

循环中的

df_i将为每次迭代调用对象df_i(而不是df_1等) 要按其名称调用对象，请改用globals()['df_'+str(i)](假设df_1至df_n+1位于globals()中)-来自此

df_i in your loop will call the object df_i for each iteration (and not df_1 etc.) To call an object by its name, use globals()['df_'+str(i)] instead (Assuming that df_1 to df_n+1 are located in globals()) - from this answer.

在我看来，这不是一个干净的方法.我不知道如何创建DataFrame，但如果可能的话，我建议您将它们存储到字典中，而不要手动影响:

To my mind it is not a clean approach. I don't know how do you create your DataFrames but if it is possible for your I will suggest you to store them into a dictionary instead affecting manually:

dfs = {}
dfs['df_1'] = ...

，或者如果df_1至df_n已经存在，则自动执行-根据 vestland答案的第一部分:

or a bit more automatically if df_1 to df_n already exist - according to first part of vestland answer :

dfs = dict((var, eval(var)) for
           var in dir() if
           isinstance(eval(var), pd.core.frame.DataFrame) and 'df_' in var)

然后，您可以更轻松地遍历DataFrames:

Then it would be easier for your to iterate over your DataFrames:

for i in range(1,n+1):
    dfs['df_'+str(i)'] something

这篇关于使用迭代器迭代不同的数据帧的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用迭代器迭代不同的数据帧 [英] Iterating over different data frames using an iterator

问题描述

推荐答案

解决方案

输入

代码

Code

关于使用迭代器进行的访问`df_1`

About access `df_1` using an iterator

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用迭代器迭代不同的数据帧 [英] Iterating over different data frames using an iterator

问题描述

推荐答案

解决方案

输入

代码

Code

关于使用迭代器进行的访问df_1

About access df_1 using an iterator

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

关于使用迭代器进行的访问`df_1`

About access `df_1` using an iterator

登录关闭