如何从另一个数据框中获取值的行列名 [英] How to get the row-column name of a value from another data frame

查看:29
本文介绍了如何从另一个数据框中获取值的行列名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定范围表(start, end)

name     blue         green          yellow        purple              
a        1, 5                        654, 678       11, 15
b                     88761, 88776  
c        1211, 1215                  38, 47    
d        89, 95                                     1567, 1578

和数据框df

Supplier        colour                   
Abi             1                               
John            678          
Smith           120               
Tim             1570 
Don             87560                       

我如何对df进行过滤操作以仅包含其colour列中的值在表中提供的范围内的行?

并创建第三列,将列colour中的值与范围表中的row-column名称相关联。我希望最终df是这样的:

Supplier        colour    Source                  
Abi             1         a-blue                      
John            678       a-yellow                   
Tim             1570      d-purple

谢谢!

s = df2.stack()
print(s)

给予:

a  1,      name          5
           blue       654,
           green       678
           yellow      11,
           purple       15
b  88761,  name      88776
c  1211,   name       1215
           blue        38,
           green        47
d  89,     name         95
           blue      1567,
           green      1578
dtype: object

和:

s = df2.stack()
s = s.str.split("[, ]+").apply(lambda x: pd.Interval(*map(int, x), closed="both"))
print(s)

给予:

Traceback (most recent call last):
  File "/Users/PycharmProjects/sup.py", line 12, in <module>
    s = s.str.split("[, ]+").apply(lambda x: pd.Interval(*map(int,x), closed="both"))
  File "/Users/.conda/envs/lib/python3.8/site-packages/pandas/core/series.py", line 4200, in apply
    mapped = lib.map_infer(values, f, convert=convert_dtype)
  File "pandas/_libs/lib.pyx", line 2402, in pandas._libs.lib.map_infer
  File "/Users/PycharmProjects/sup.py", line 12, in <lambda>
    s = s.str.split("[, ]+").apply(lambda x: pd.Interval(*map(int,x), closed="both"))
TypeError: 'float' object is not iterable

推荐答案

首先通过replace()方法替换' 'NaN

df1=df1.replace(r's+',float('NaN'),regex=True)
                  #^ it will replace one or more occurence of ' '

那么我们的想法是将字符串范围设置为组合范围值的实际列表:

s=df1.set_index('name').stack().dropna().replace('',float('NaN')) 
#OR if the above line not working them try: 
#s=df1.set_index('name').stack().dropna().replace('',float('NaN'),regex=True) 
m=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode()
s=m.unique()
m.index=m.index.get_level_values(0)+'-'+m.index.get_level_values(1)
m=m.drop_duplicates()

最后:

out=df2[df2['colour'].isin(s)]
#Filtered out the data

现在创建源列:

out['Source']=m[m.isin(out['colour'].tolist())].index

out['Source']=out['colour'].map(dict(m.reset_index()[[0,'index']].values))

OUT的输出:

   Supplier     colour  source
0   Abi         1       a-blue
1   John        678     a-yellow
3   Tim         1570    d-purple

这篇关于如何从另一个数据框中获取值的行列名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆