遍历不同的数据框 [英] Iterate over different dataframe

查看:136
本文介绍了遍历不同的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试遍历三个数据帧以发现它们之间的差异.我有一个包含所有内容的主数据帧和另外两个包含部分主数据帧的数据帧.我正在尝试编写python代码来识别其他两个文件中缺少的内容.主文件如下所示:

I am trying to iterate over three data frames to find the difference between them. I have a master data frame which contains everything and two other data frames which contains partial of master data frame. I am trying to write a python code to identify what is missing in the other two files. Master file looks like following:

ID  Name
1   Mike
2   Dani
3   Scott
4   Josh
5   Nate
6   Sandy

第二个数据帧如下所示:

second data frame looks like following:

ID  Name
1   Mike
2   Dani
3   Scott
6   Sandy

第三数据框如下:

ID  Name
1   Mike
2   Dani
3   Scott
4   Josh
5   Nate

因此将有两个输出数据帧.所需的输出如下所示,用于第二个数据帧:

So there will be two output data frame. Desired output for looks like following for second data frame:

ID  Name
4   Josh
5   Nate

第三个数据帧的期望输出如下:

desired output for third data frame looks like following:

ID  Name
6   Sandy

我在Google上找不到任何类似的内容.我试过了:

I didn't find anything similar on Google. I tried this:

for i in second['ID'], third['ID']:
   if i not in master['ID']:
     print(i)

它返回主文件中的所有数据.

It returns all the data in master file.

另外,如果我尝试此代码:

Also if I try this code :

import pandas as pd

names = ["Mike", "Dani", "Scott", "Josh", "Nate", "Sandy"]
ids = [1, 2, 3, 4, 5, 6]
master = pd.DataFrame({"ID": ids, "Name": names})
# print(master)

names_second = ["Mike", "Dani", "Scott", "Sandy"]
ids_second = [1, 2, 3, 6]
second = pd.DataFrame({"ID": ids_second, "Name": names_second})
# print(second)

names_third = ["Mike", "Dani", "Scott", "Josh", "Nate"]
ids_third = [1, 2, 3, 4, 5]
third = pd.DataFrame({"ID": ids_third, "Name": names_third})
# print(third)
for i in master['ID']:
    if i not in second["ID"]:
        print("NOT IN SECOND", i)
    if i not in third["ID"]:
        print("NOT IN THIRD", i)

输出::

NOT IN SECOND 4
NOT IN SECOND 5
NOT IN THIRD 5
NOT IN SECOND 6
NOT IN THIRD 6

为什么说NOT IN SECOND 6NOT IN THIRD 5?

有什么建议吗?预先感谢.

Any suggestion? Thanks in advance.

推荐答案

您可以尝试将.isin~一起使用来过滤dataframes.要与第二个进行比较,您可以使用master[~master.ID.isin(second.ID)]和类似的作为第三个:

You can try using .isin with ~ to filter dataframes. To compare with second you can use master[~master.ID.isin(second.ID)] and similar for third:

cmp_master_second, cmp_master_third = master[~master.ID.isin(second.ID)],  master[~master.ID.isin(third.ID)]

print(cmp_master_second)
print('\n-------- Seperate dataframes -----------\n')
print(cmp_master_third)

结果:

    Name
ID      
4   Josh
5   Nate

-------- Seperate dataframes -----------

     Name
ID       
6   Sandy

这篇关于遍历不同的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆