根据匹配的列映射多个数据框 [英] Mapping multiple dataframe based on the matching columns

查看：79 发布时间：2020/5/18 23:07:26 python pandas numpy dataframe

本文介绍了根据匹配的列映射多个数据框的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有25个数据帧，我需要合并这些数据帧并从所有25个数据帧中查找重复出现的行，例如，我的数据框如下所示，

I have 25 data frames which I need to merge and find recurrently occurring rows from all 25 data frames, For example, my data frame looks like following,

df1
chr start   end     name
1   12334   12334   AAA
1   2342    2342    SAP
2   3456    3456    SOS
3   4537    4537    ABR
df2
chr start   end     name
1   12334   12334   DSF
1   3421    3421    KSF
2   7689    7689    LUF
df3 
chr start   end     name
1   12334   12334   DSF
1   3421    3421    KSF
2   4537    4537    LUF
3   8976    8976    BAR
4   6789    6789    AIN

最后，我的目标是要有一个如下的输出数据框，

And In the end, I am aiming to have an output data frame like following,

chr start   end     name    Sample
1   12334   12334   AAA df1
1   12334   12334   AAA df2
1   12334   12334   AAA df3

我可以通过以下解决方案到达那里，通过字典将这三个数据帧都添加到一个更大的数据帧dfs中

I can get there with the following solution, By dictionary which adds all these three data frames into one bigger data frame dfs

dfs = {'df1':df1，'df2':df2}

dfs = {'df1': df1, 'df2': df2}

然后，

common_tups = set.intersection(*[set(df[['chr', 'start', 'end']].drop_duplicates().apply(tuple, axis=1).values) for df in dfs.values()])
pd.concat([df[df[['chr', 'start', 'end']].apply(tuple, axis=1).isin(common_tups)].assign(Sample=name) for (name, df) in dfs.items()])

这给出了所有三个数据帧中具有匹配行的结果数据帧，但是我有25个数据帧，我从下面的目录中将其作为列表调用，

This gives out the resulting data frame with matching rows from all three data frames, but I have 25 data frames which I am calling as list from the directory as following,

path         = 'Fltered_vcfs/' 
files        = os.listdir(path)
results      = [os.path.join(path,i) for i in files if i.startswith('vcf_filtered')]

因此，如何在字典中显示列表结果"，并进一步进行操作以获得所需的输出.任何帮助或建议，我们将不胜感激.

And so how can I show the list 'results' in the dictionary and proceed further to get the desired output. Any help or suggestions are greatly appreciated.

谢谢

根据匹配的列映射多个数据框 [英] Mapping multiple dataframe based on the matching columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

根据匹配的列映射多个数据框 [英] Mapping multiple dataframe based on the matching columns

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭