遍历 pandas 数据框列表 [英] Looping through a list of pandas dataframes

查看:82
本文介绍了遍历 pandas 数据框列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

两个大熊猫快速问题.

  1. 我有一个要对其应用过滤器的数据帧列表.

  1. I have a list of dataframes I would like to apply a filter to.

countries = [us, uk, france]
for df in countries:
    df = df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')] 

运行此命令后,df不会更改.这是为什么? 如果我遍历数据框以创建一个新列,如下所示,则可以正常工作,并更改列表中的每个df.

When I run this, the df's don't change afterwards. Why is that? If I loop through the dataframes to create a new column, as below, this works fine, and changes each df in the list.

 for df in countries:
      df["Continent"] = "Europe"

  • 作为后续问题,当我为不同国家/地区创建数据框列表时,我注意到了一些奇怪的事情.我定义了列表,然后将转换应用于列表中的每个df.在转换了这些不同的dfs之后,我再次调用了该列表.我很惊讶地看到该列表仍然指向未更改的数据帧,因此我不得不重新定义该列表以更新结果.有人可以解释为什么会这样吗?

  • As a follow up question, I noticed something strange when I created a list of dataframes for different countries. I defined the list then applied transformations to each df in the list. After I transformed these different dfs, I called the list again. I was surprised to see that the list still pointed to the unchanged dataframes, and I had to redefine the list to update the results. Could anybody shed any light on why that is?

    推荐答案

    看看此答案,您可以看到for df in countries:等同于

    Taking a look at this answer, you can see that for df in countries: is equivalent to something like

    for idx in range(len(countries)):
        df = countries[idx]
        # do something with df
    

    显然不会真正修改您列表中的任何内容.通常,在像这样的循环中迭代列表时,修改列表是一种不好的做法.

    which obviously won't actually modify anything in your list. It is generally bad practice to modify a list while iterating over it in a loop like this.

    一种更好的方法是列表理解,您可以尝试类似

    A better approach would be a list comprehension, you can try something like

     countries = [us, uk, france]
     countries = [df[(df["Send Date"] > '2016-11-01') & (df["Send Date"] < '2016-11-30')]
                  for df in countries] 
    

    请注意,通过这样的列表理解,我们实际上并没有修改原始列表,而是创建了一个新列表,并将其分配给保存原始列表的变量.

    Notice that with a list comprehension like this, we aren't actually modifying the original list - instead we are creating a new list, and assigning it to the variable which held our original list.

    此外,您可能会考虑将所有数据放在带有附加国家/地区列或类似内容的单个DataFrame中-Python级循环通常较慢,并且与DataGrid相比,使用DataFrames列表通常较不方便单个DataFrame,可以充分利用矢量化的熊猫方法.

    Also, you might consider placing all of your data in a single DataFrame with an additional country column or something along those lines - Python-level loops are generally slower and a list of DataFrames is often much less convenient to work with than a single DataFrame, which can fully leverage the vectorized pandas methods.

    这篇关于遍历 pandas 数据框列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆