根据匹配的列映射多个数据框 [英] Mapping multiple dataframe based on the matching columns
问题描述
我有25个数据帧,我需要合并这些数据帧并从所有25个数据帧中查找重复出现的行, 例如,我的数据框如下所示,
I have 25 data frames which I need to merge and find recurrently occurring rows from all 25 data frames, For example, my data frame looks like following,
df1
chr start end name
1 12334 12334 AAA
1 2342 2342 SAP
2 3456 3456 SOS
3 4537 4537 ABR
df2
chr start end name
1 12334 12334 DSF
1 3421 3421 KSF
2 7689 7689 LUF
df3
chr start end name
1 12334 12334 DSF
1 3421 3421 KSF
2 4537 4537 LUF
3 8976 8976 BAR
4 6789 6789 AIN
最后,我的目标是要有一个如下的输出数据框,
And In the end, I am aiming to have an output data frame like following,
chr start end name Sample
1 12334 12334 AAA df1
1 12334 12334 AAA df2
1 12334 12334 AAA df3
我可以通过以下解决方案到达那里, 通过字典将这三个数据帧都添加到一个更大的数据帧dfs中
I can get there with the following solution, By dictionary which adds all these three data frames into one bigger data frame dfs
dfs = {'df1':df1,'df2':df2}
dfs = {'df1': df1, 'df2': df2}
然后,
common_tups = set.intersection(*[set(df[['chr', 'start', 'end']].drop_duplicates().apply(tuple, axis=1).values) for df in dfs.values()])
pd.concat([df[df[['chr', 'start', 'end']].apply(tuple, axis=1).isin(common_tups)].assign(Sample=name) for (name, df) in dfs.items()])
这给出了所有三个数据帧中具有匹配行的结果数据帧,但是我有25个数据帧,我从下面的目录中将其作为列表调用,
This gives out the resulting data frame with matching rows from all three data frames, but I have 25 data frames which I am calling as list from the directory as following,
path = 'Fltered_vcfs/'
files = os.listdir(path)
results = [os.path.join(path,i) for i in files if i.startswith('vcf_filtered')]
因此,如何在字典中显示列表结果",并进一步进行操作以获得所需的输出.任何帮助或建议,我们将不胜感激.
And so how can I show the list 'results' in the dictionary and proceed further to get the desired output. Any help or suggestions are greatly appreciated.
谢谢
推荐答案
使用 glob
模块,您可以使用
Using the glob
module, you can use
import os
from glob import glob
path = 'Fltered_vcfs'
f_names = glob(os.path.join(path, 'vcf_filtered*.*'))
然后,您可以使用
使用字典理解创建字典.
Then, your dictionary can be created with dictionary comprehension using
import pandas as pd
{os.path.splitext(os.path.split(f_name)[1])[0]: pd.read_csv(f_name,sep='\t') for f_name in f_names}
这篇关于根据匹配的列映射多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!