比较数据框的列并返回差值 [英] Comparing columns of dataframes and returning the difference

查看:90
本文介绍了比较数据框的列并返回差值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个数据框(大约37个),并且想比较这些数据框的列名,以便我知道它们都具有相同的列和列顺序.数据帧存储为对象列表(例如,tbl [0]是第一个数据帧,tbl [1]是第二个数据帧,依此类推).

I have multiple dataframes (about 37) and would like to compare the columns names of these dataframes so that I know all of them have the same columns and columns order. The dataframes are stored as a list of objects (e.g. tbl[0] is the first dataframe, tbl[1] is second and so on).

我编写了以下代码块,它将获取我的数据框(tbl)的每一列,并将它们与其他数据框的列进行比较,如果存在差异,则将这些数据框的索引附加到2D列表中,我可以稍后回想一下,以查看列不匹配的地方.

I wrote the following block of code that will take each column of my dataframes (tbl) and compare them with other dataframes' columns, if there is difference then the index of these dataframes is appended to a 2D list, which I can later recall to see where the columns mismatch.

a = [[]]
for i in range(0,len(tbl)):
    for j in range(i+1, len(tbl)):
        if(~(tbl[i].columns.equals(tbl[j].columns))):
            a.append([i, j])

但是当我运行上面的代码时,它将附加我正在比较的所有数据帧索引.我在这里做错什么吗?

But when I run the above code, it appends all the dataframes indexes that I am comparing. Is there anything wrong that I am doing here?

示例:

tbl = []

for i in range(0,3):
    tbl.append(pd.DataFrame({'a':[1,2,3],'b':[3,4,5], 'c':[7,8,3], 'd':[1,5,3]}))

a = [[]]
for i in range(0,len(tbl)):
    for j in range(i+1, len(tbl)):
        if(~(tbl[i].columns.equals(tbl[j].columns))):
            a.append([i, j])

出于这个问题,我创建了3个虚拟数据框,它们具有相同的列名(a,b,c,d).当我使用前面提到的代码比较列名时,得到以下输出:

For the sake of this question, I have created 3 dummy dataframes that have same columns names (a, b, c, d). When I compare the columns names using the code mentioned earlier, I get the following output:

[[], [0, 1], [0, 2], [1, 2]]

我应该得到一个空名单吗?我在这里做什么错了?

Shouldn't I be getting an empty list? What am I doing wrong here?

推荐答案

使用if not bool标量一起使用:

print (~True)
-2

a = [[]]
for i in range(0,len(tbl)):
    for j in range(i+1, len(tbl)):
        if not (tbl[i].columns.equals(tbl[j].columns)):
            a.append([i, j])

print (a)
[[]]

这篇关于比较数据框的列并返回差值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆