比较PandaS DataFrames并返回从第一个丢失的行 [英] Compare PandaS DataFrames and return rows that are missing from the first one
本文介绍了比较PandaS DataFrames并返回从第一个丢失的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
,DataFrame
df1 = pd.DataFrame({
City:[Chicago,San Franciso,Boston ],
State:[Illinois,California,Massachusett]})
df2 = pd.DataFrame({
City 芝加哥,Mmmmiami,达拉斯,奥马哈],
州:[伊利诺伊州,佛罗里达州,德克萨斯州,内布拉斯加州]})
df = pd.concat([df1,df2])
df = df.reset_index(drop = True)
df_gpby = df.groupby list(df.columns))
idx = [x [0] for x in df_gpby.groups.values()if len(x)== 1]
blah = df.reindex(idx)
解决方案
基于@ EdChum的建议:
df = pd.merge(df1,df2,how ='outer',indicator = True)
rows_in_df1_not_in_df2 = df [df ['_ merge '] =='left_只有'] [df1.columns]
rows_in_df1_not_in_df2
|索引|城市|州|
| ------ | ------------ | ------------ |
| 1 |圣弗朗西斯|加利福尼亚|
| 2 |波士顿|马萨诸塞|
I have 2 dataFrames and want to compare them and return rows from the first one (df1) that are not in the second one (df2). I found a way to compare them and return the differences, but can't figure out how to return only missing ones from df1.
import pandas as pd
from pandas import Series, DataFrame
df1 = pd.DataFrame( {
"City" : ["Chicago", "San Franciso", "Boston"] ,
"State" : ["Illinois", "California", "Massachusett"] } )
df2 = pd.DataFrame( {
"City" : ["Chicago", "Mmmmiami", "Dallas" , "Omaha"] ,
"State" : ["Illinois", "Florida", "Texas", "Nebraska"] } )
df = pd.concat([df1, df2])
df = df.reset_index(drop=True)
df_gpby = df.groupby(list(df.columns))
idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
blah = df.reindex(idx)
解决方案
Building on @EdChum's suggestion:
df = pd.merge(df1, df2, how='outer', indicator=True)
rows_in_df1_not_in_df2 = df[df['_merge']=='left_only'][df1.columns]
rows_in_df1_not_in_df2
|Index |City |State |
|------|------------|------------|
|1 |San Franciso|California |
|2 |Boston |Massachusett|
这篇关于比较PandaS DataFrames并返回从第一个丢失的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文