比较不同长度的两个数据帧,逐行,并为每行添加相等值的列 [英] Comparing two dataframes of different length row by row and adding columns for each row with equal value
问题描述
我在python pandas中有两个不同长度的数据帧,如下所示:
df1:df2:
Column1 Column2 Column3 ColumnA ColumnB
0 1 ar 0 1 a
1 2 bu 1 1 d
2 3 ck 2 1 e
3 4 dj 3 2 r
4 5 ef 4 2 w
5 3 y
6 3 h
我现在想做的是比较df1的Column1和df2的ColumnA。对于每个命中,其中df2中的ColumnA中的行具有与df1中的Column1中的行相同的值,我想将一个列附加到具有vaule的df1。对于找到hit的行,df2的ColumnB具有,以便我的结果如下所示:
df1:
Column1 Column2 Column3 Column4 Column5 Column6
0 1 arade
1 2 burw
2 3 ckyh
3 4 dj
4 5 ef
到目前为止,我尝试过的是:
df2:
if df1 [Column1] == df2 [ColumnA]:
print'yey!'
b $ b
这给我一个错误,我不能比较不同长度的两个数据帧。所以我试过:
for df1,df2:
if def2 [def2 ['ColumnA']。isin (def1 ['column1'])]:
print'lalala'
else:
print'Nope'
我得到一个输出的工作,但我不认为它迭代的行和比较它们,因为它只打印'lalala'两次。所以我研究了一些,并找到了一种方法来迭代数据框架的每一行,这是:
索引, df1.iterrows():
print row ['Column1]
但我不知道如何使用它来比较两个数据帧的列并获得我的期望输出。
任何帮助如何做到这一点将非常感激。
解决方案p>我建议您使用DataFrame API,允许使用DF操作 加入,合并 , groupby 等。您可以在下面找到我的解决方案:
import pandas as pd
pre>
df1 = pd.DataFrame({'Column1':[1,2,3,4,5],
'Column2':['a','b' 'c','d','e'],
'Column3':['r','u','k','j','f']}
df2 = pd.DataFrame({'Column1':[1,1,1,2,2,3,3],'ColumnB':['a','d','e','r' 'd','d','''''')
dfs = pd.DataFrame({})
名为df2.groupby b $ b buffer_df = pd.DataFrame({'Column1':group ['Column1'] [:1]})
i = 0
索引,组['ColumnB']中的值iteritems ):
i + = 1
string ='Column_'+ str(i)
buffer_df [string] = value
dfs = dfs.append(buffer_df)
result = pd.merge(df1,dfs,how ='left',on ='Column1')
print(result)
结果是:
Column1 Column2 Column3 Column_0 Column_1 Column_2
0 1 arade
1 2 burw NaN
2 3 ckyh NaN
3 4 dj NaN NaN NaN
4 5 ef NaN NaN NaN
Ps更多详细信息:
1)对于df2我通过'Column1'产生组。单个组是数据框。示例如下:
Column1 ColumnB
0 1 a
1 1 d
2 1 e
2)为每个组产生数据帧 buffer_df :
Column1 Column_0 Column_1 Column_2
0 1 ade
3)之后我创建DF dfs :
Column1 Column_0 Column_1 Column_2
0 1 ade
3 2 rw NaN
5 3 yh NaN
4)最后,我对 df1 和 dfs 执行left join以获取所需的结果。
2)* buffer_df 是迭代产生的:
step0 (buffer_df = pd.DataFrame({'Column1':group ['Column1'] [:1]})):
Column1
5 3
step1 Column0'= group ['ColumnB'] [5]):
Column1 Column_0
5 3 y
step2(buffer_df ['Column_1'] = group ['ColumnB' ] [5]):
Column1 Column_0 Column_1
5 3 yh
I have two dataframes of different length in python pandas like this:
df1: df2: Column1 Column2 Column3 ColumnA ColumnB 0 1 a r 0 1 a 1 2 b u 1 1 d 2 3 c k 2 1 e 3 4 d j 3 2 r 4 5 e f 4 2 w 5 3 y 6 3 h
What I am trying to do now is comparing Column1 of df1 and ColumnA of df2. For each "hit", where a row in ColumnA in df2 has the same value as a row in Column1 in df1, I want to append a column to df1 with the vaule ColumnB of df2 has for the row where the "hit" was found, so that my result looks like this:
df1: Column1 Column2 Column3 Column4 Column5 Column6 0 1 a r a d e 1 2 b u r w 2 3 c k y h 3 4 d j 4 5 e f
What I have tried so far was:
for row in df1, df2: if df1[Column1] == df2[ColumnA]: print 'yey!'
which gave me an error saying I could not compare two dataframes of different length. So I tried:
for row in df1, df2: if def2[def2['ColumnA'].isin(def1['column1'])]: print 'lalala' else: print 'Nope'
Which "works" in terms that I get an output, but I do not think it iterates over the rows and compares them, since it only prints 'lalala' two times. So I researched some more and found a way to iterate over each row of the dataframe, which is:
for index, row in df1.iterrows(): print row['Column1]
But I do not know how to use this to compare the columns of the two dataframes and get the output I desire.
Any help on how to do this would be really appreciated.
解决方案I recommend you to use DataFrame API which allows to operate with DF in terms of join, merge, groupby, etc. You can find my solution below:
import pandas as pd df1 = pd.DataFrame({'Column1': [1,2,3,4,5], 'Column2': ['a','b','c','d','e'], 'Column3': ['r','u','k','j','f']}) df2 = pd.DataFrame({'Column1': [1,1,1,2,2,3,3], 'ColumnB': ['a','d','e','r','w','y','h']}) dfs = pd.DataFrame({}) for name, group in df2.groupby('Column1'): buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]}) i = 0 for index, value in group['ColumnB'].iteritems(): i += 1 string = 'Column_' + str(i) buffer_df[string] = value dfs = dfs.append(buffer_df) result = pd.merge(df1, dfs, how='left', on='Column1') print(result)
The result is:
Column1 Column2 Column3 Column_0 Column_1 Column_2 0 1 a r a d e 1 2 b u r w NaN 2 3 c k y h NaN 3 4 d j NaN NaN NaN 4 5 e f NaN NaN NaN
P.s. More details:
1) for df2 I produce groups by 'Column1'. The single group is a data frame. Example below:
Column1 ColumnB 0 1 a 1 1 d 2 1 e
2) for each group I produce data frame buffer_df:
Column1 Column_0 Column_1 Column_2 0 1 a d e
3) after that I create DF dfs:
Column1 Column_0 Column_1 Column_2 0 1 a d e 3 2 r w NaN 5 3 y h NaN
4) in the end I execute left join for df1 and dfs obtaining needed result.
2)* buffer_df is produced iteratively:
step0 (buffer_df = pd.DataFrame({'Column1': group['Column1'][:1]})): Column1 5 3 step1 (buffer_df['Column_0'] = group['ColumnB'][5]): Column1 Column_0 5 3 y step2 (buffer_df['Column_1'] = group['ColumnB'][5]): Column1 Column_0 Column_1 5 3 y h
这篇关于比较不同长度的两个数据帧,逐行,并为每行添加相等值的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!