遍历数据框列表以删除特定行 Pandas [英] Iterate through a list of dataframes to drop particular rows Pandas

查看:52
本文介绍了遍历数据框列表以删除特定行 Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我之前的问题 我要求在 Pandas 中删除特定行

在帮助下,我删除了 1980 年之前的行.季节"列(包含年份)采用以下格式:

With help, I was to drop rows that before 1980. The 'Season' column (that had the years) were in this format:

 2018-19
 2017-18
 This
 list would go
 till 1960

在之前的问题(链接)中,@jezrael 提供了一个解决方案,帮助我在 1980 年之前删除了行.

In the earlier question(linked) @jezrael gave a solution that helped me drop rows before 1980.

我有一个包含 30 个数据帧的列表(称为 list).我想遍历 30 个数据帧,并为每个 df 删除 1980 年之前的所有行.例如,list 中的一项是 BOS如果 BOS['Season] 有:

I have a list (called list) that has 30 dataframes. I want to iterate through the 30 dataframes and drop all rows before 1980 for every df. For instance, one of the items in list is BOS if BOS['Season] has:

 2018-19
 2017-18
 1959-1960

我应该得到

2018-19
2017-18

对于 list

这是我尝试过的,但出现错误或什么也不会发生:

This is what I tried, but got errors or nothing would happen:

for df in list:
   df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]

我的代码有什么问题?我是python的新手.我认为通过将 df 分配给更改,它将实现它到 list

What's wrong with my code? I am new to python. I thought by assigning df to the the change, it would implement it to every 'df in the list

谢谢!

更新:我有一个名为 leaguelist.这个列表有 30 个数据帧.我查看了 jazrael 和 IMCoin 的解决方案.他们俩都工作了.但这是我的要求.

UPDATE: I have a list named as league. This list has 30 DataFrames. I looked at jazrael's and IMCoin's solution. Both of them worked. But here is my requirement.

在为每个 DataFrame 删除 1980 年之前的行之后.我希望能够直接使用该 DataFrame,而不是通过列表.这就是我的意思.

After dropping rows before 1980 for every DataFrame. I want to be able to use that DataFrame directly, and not through the list. Here is what I mean.


#Before Appending to the list
BOS = pd.read_csv(dir+"Boston_Sheet")
# I have 30 different cities, each having a CSV file and making each city have 
# their own DataFrame. So Boston as `BOS`, Chicago as `CHI` and like that 30 cities. 

所有这 30 个城市数据帧都已添加到列表 league 中.将 city DataFrame 过滤到上述条件后,我希望能够使用过滤后的数据调用 BOSCHI .这只是为了方便我开发其他功能.

All of those 30 city DataFrames have already been appended to the list league. After filtering the city DataFrame to the conditions above, I want to be able to call BOS or CHI with the filtered data. This is just so that it will be easy for me developing other functions down the line.

推荐答案

您需要创建已过滤的 DataFrame 的新列表或重新分配旧的列表:

You need create new list of filtered DataFrames or reaasign old one:

注意:不要使用变量list,因为builtins(python代码字).

Notice: Dont use variable list, because builtins (python code word).

L = [df[df['Season'].str.split('-').str[0].astype(int) > 1980] for df in L]

循环版本:

output = []
for df in L:
   df = df[df['Season'].str.split('-').str[0].astype(int) > 1980]
   output.append(df)

如果只需要提取长度为 4 的第一个整数:

If need extract only first integers with length 4:

L = [df, df]
L = [df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980] 
          for df in L]

print (L)
[    Season
0  2018-19
1  2017-18,     Season
0  2018-19
1  2017-18]

如果数据具有相同的结构,我建议创建一个带有新列的大 DataFrame 以区分城市:

If data have same structure I suggest create one big DataFrame with new column for distinguish cities:

import glob

files = glob.glob('files/*.csv')
dfs = [pd.read_csv(fp).assign(City=os.path.basename(fp).split('.')[0]) for fp in files]
df = pd.concat(dfs, ignore_index=True)
print (df)
          Season           City
0        2018-19   Boston_Sheet
1           This   Boston_Sheet
2  list would go   Boston_Sheet
3      till 1960   Boston_Sheet
4        2018-19  Chicago_Sheet
5        2017-18  Chicago_Sheet
6           This  Chicago_Sheet

df1 = df[df['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980]
print (df1)
     Season           City
0   2018-19   Boston_Sheet
4   2018-19  Chicago_Sheet
5   2017-18  Chicago_Sheet

df2 = df1[df1['City'] == 'Boston_Sheet']
print (df2)
    Season          City
0  2018-19  Boston_Sheet

df3 = df1[df1['City'] == 'Chicago_Sheet']
print (df3)
     Season           City
4   2018-19  Chicago_Sheet
5   2017-18  Chicago_Sheet

<小时>

如果需要将每个DataFrame分开,可以通过DataFrame的字典:


If need each DataFrame separate, it is possible by dictionary of DataFrames:

import glob

files = glob.glob('files/*.csv')
dfs_dict = {os.path.basename(fp).split('.')[0] : pd.read_csv(fp) for fp in files}

print (dfs_dict)

print (dfs_dict['Boston_Sheet'])
          Season
0        2018-19
1           This
2  list would go
3      till 1960

print (dfs_dict['Chicago_Sheet'])
0   2018-19
1   2017-18
2      This

然后在字典理解中处理:

Then processing in dictionary comprehension:

dfs_dict = {k:v[v['Season'].str.extract('(\d{4})', expand=False).astype(float) > 1980] 
                 for k, v in dfs_dict.items()}
print (dfs_dict)
{'Boston_Sheet':     Season
0  2018-19, 'Chicago_Sheet':      Season
0   2018-19
1   2017-18}

print (dfs_dict['Boston_Sheet'])
    Season
0  2018-19

print (dfs_dict['Chicago_Sheet'])
     Season
0   2018-19
1   2017-18

这篇关于遍历数据框列表以删除特定行 Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆