通过 pandas 中的非唯一(重复)单元传播值 [英] propagating values over non-unique (duplicate) cells in pandas

查看:112
本文介绍了通过 pandas 中的非唯一(重复)单元传播值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框:

  import pandas as pd 

df = pd.DataFrame( {'玩家':['Sam','Greg','Steve','Sam',
'Jill','Bill','Nod','Mallory','Ping','Lamar'] ,
'地址':['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St' ,'20 Main St','7 Cotton Dr','7 Cotton Dr'],
'状态':['感染','','死亡','','','','' ','','','感染'],
})

print(df)

,我想将状态值'感染'传播给同一地址内的所有人。



这意味着如果不止一个人在同一个地址,并且一个人的状态受到感染,那么每个人都会有这种身份。



所以结果如下所示:

  df2 = pd.DataFrame({'Players':[ 'Sam','Greg','Steve','Sam',
'Jill','Bill','Nod','Mallory','Ping','Lamar'],
'地址':['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St','20 Main St' ,'7 Cotton Dr','7 Cotton Dr'],
'状态':['感染','','死亡','感染','','感染','','' ,'感染','感染'],
})

print(df2)

我该怎么做?到目前为止,我尝试了这一点:

  df [df.duplicated(Address)] 

但它只会选择后面的重复项,而不是全部重复项。 / b> $ p
$ b

 在[19]中:
infected = df [df [ 'Status'] =='Infected']。set_index('Address')
df.loc [df ['Address']。isin(infected.index),'Status'] = df ['Address']] .map(infected ['Status'])。fillna('')
df

Out [19]:
地址玩家状态
0 112 Fake St Sam被感染的
1 13 Crest St Greg
2 14 Main St Steve Dead
3 112假的St Sam感染
4 2 Morningwood Jill
5 7 Cotton Dr Bill Infected
6 14 Main St Nod
7 20 Main St Mallory
8 7棉花Ping博士感染
9 7棉花博士Lamar感染



S o首先构建你的df视图,其状态是'Infected',然后我们将索引设置为地址,这会创建一个查找表,然后我们可以使用 map 受感染的索引中,并返回状态。



我使用 loc 在这里只选择受感染索引中的地址,以保持其他行不变。


I have the following dataframe

import pandas as pd

df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam',
                 'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
                 'Address': ['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St','20 Main St','7 Cotton Dr','7 Cotton Dr'],
                 'Status': ['Infected','','Dead','','','','','','','Infected'],
                 })

print(df)

and I want to propagate the Status value 'infected' to everyone inside the same Address.

This means if more than one person is in the same address and one person has the status infected then everyone will have this status.

So the result would look like this:

df2=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam',
                 'Jill', 'Bill', 'Nod', 'Mallory', 'Ping', 'Lamar'],
                 'Address': ['112 Fake St','13 Crest St','14 Main St','112 Fake St','2 Morningwood','7 Cotton Dr','14 Main St','20 Main St','7 Cotton Dr','7 Cotton Dr'],
                 'Status': ['Infected','','Dead','Infected','','Infected','','','Infected','Infected'],
                 })

print(df2)

How would I do this? So far I tried this:

df[df.duplicated("Address")]

But it only selects the later duplicates not all of them

解决方案

Here is one method:

In [19]:    
infected = df[df['Status']=='Infected'].set_index('Address')
df.loc[df['Address'].isin(infected.index),'Status'] = df['Address'].map(infected['Status']).fillna('')
df

Out[19]:
         Address  Players    Status
0    112 Fake St      Sam  Infected
1    13 Crest St     Greg          
2     14 Main St    Steve      Dead
3    112 Fake St      Sam  Infected
4  2 Morningwood     Jill          
5    7 Cotton Dr     Bill  Infected
6     14 Main St      Nod          
7     20 Main St  Mallory          
8    7 Cotton Dr     Ping  Infected
9    7 Cotton Dr    Lamar  Infected

So this first constructs a view of your df where the status is 'Infected', we then set the index to the address, this creates a lookup table where we can then lookup the address using map in the infected index and return the status.

I use loc here to only select the addresses that are in the infected index, to leave the other rows untouched.

这篇关于通过 pandas 中的非唯一(重复)单元传播值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆