pandas 系列的矢量化查询到字典 [英] Vectorized Lookups of Pandas Series to a Dictionary

查看:58
本文介绍了 pandas 系列的矢量化查询到字典的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

需要根据两个现有列rowcol的值从布尔值创建熊猫数据框列系列same_group.如果一行中的两个单元格在字典memberships中具有相似的值(相交值),则行必须显示True;否则,则需要显示False(无相交值).如何以向量化方式(不使用Apply)来做到这一点?

A pandas dataframe column series, same_group needs to be created from booleans according to the values of two existing columns, row and col. The row needs to show True if both cells across a row have similar values (intersecting values) in a dictionary memberships, and False otherwise (no intersecting values). How do I do this in a vectorized way (not using apply)?

import pandas as pd
import numpy as np 
n = np.nan
memberships = {
    'a':['vowel'],
    'b':['consonant'],
    'c':['consonant'],
    'd':['consonant'],
    'e':['vowel'],
    'y':['consonant', 'vowel']
}

congruent = pd.DataFrame.from_dict(  
         {'row': ['a','b','c','d','e','y'],
            'a': [ n, -.8,-.6,-.3, .8, .01],
            'b': [-.8,  n, .5, .7,-.9, .01],
            'c': [-.6, .5,  n, .3, .1, .01],
            'd': [-.3, .7, .3,  n, .2, .01],
            'e': [ .8,-.9, .1, .2,  n, .01],
            'y': [ .01, .01, .01, .01,  .01, n],
       }).set_index('row')
congruent.columns.names = ['col']

cs = congruent.stack().to_frame()
cs.columns = ['score']
cs.reset_index(inplace=True)
cs.head(6)

如何基于对字典的查找来完成创建此新列的操作?

How do I accomplish creating this new column based on a lookup on a dictionary?

请注意,我正在尝试找到相交点,而不是等价点.例如,第4行的same_group应该为1,因为ay都是元音(尽管y是有时是元音",因此属于辅音和元音组).

Note that I'm trying to find intersection, not equivalence. For example, row 4 should have a same_group of 1, since a and y are both vowels (despite that y is "sometimes a vowel" and thus belongs to groups consonant and vowel).

推荐答案

# create a series to make it convenient to map
# make each member a set so I can intersect later
lkp = pd.Series(memberships).apply(set)

# get number of rows and columns
# map the sets to column and row indices
n, m = congruent.shape
c = congruent.columns.to_series().map(lkp).values
r = congruent.index.to_series().map(lkp).values


print(c)
[{'vowel'} {'consonant'} {'consonant'} {'consonant'} {'vowel'}
 {'consonant', 'vowel'}]


print(r)
[{'vowel'} {'consonant'} {'consonant'} {'consonant'} {'vowel'}
 {'consonant', 'vowel'}]


# use np.repeat, np.tile, zip to create cartesian product
# this should match index after stacking
# apply set intersection for each pair
# empty sets are False, otherwise True
same = [
    bool(set.intersection(*tup))
    for tup in zip(np.repeat(r, m), np.tile(c, n))
]

# use dropna=False to ensure we maintain the
# cartesian product I was expecting
# then slice with boolean list I created
# and dropna
congruent.stack(dropna=False)[same].dropna()

row  col
a    e      0.80
     y      0.01
b    c      0.50
     d      0.70
     y      0.01
c    b      0.50
     d      0.30
     y      0.01
d    b      0.70
     c      0.30
     y      0.01
e    a      0.80
     y      0.01
y    a      0.01
     b      0.01
     c      0.01
     d      0.01
     e      0.01
dtype: float64


产生想要的结果


Produce wanted result

congruent.stack(dropna=False).reset_index(name='Score') \
    .assign(same_group=np.array(same).astype(int)).dropna()

这篇关于 pandas 系列的矢量化查询到字典的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆