使用具有多个值的字典过滤数据框 [英] filter dataframe using dictionary with multiple values

查看:48
本文介绍了使用具有多个值的字典过滤数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,当用户选择要应用过滤器的时,该列由三列组成在系统上,使用以下 keys ==列名创建字典: values =单元格值,我想将过滤后的数据框显示为桌子.

I have a dataframe that is consist of 3 columns when user select what are the columns and values that he want to apply a filter on it the system create a dictionary with these keys==columns name : values = cell values, I want to display the filtered dataframe as a table.

我想在字典和数据框之间进行比较,并显示过滤后的数据.

import pandas as pd
df =pd.DataFrame({
            "source_number":[11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
             "category":["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })  

#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":["cat1","cat3"]}

预期结果:

source_number   location    category
 32345             loc1       cat3
 11199             loc2       cat1
 12342             loc2       cat3
 1232              loc2       cat3
 123244            loc2       cat3

streamlit代码:

import numpy as np
import pandas as pd
import streamlit as st 
    
    
df =pd.DataFrame({
            "source_number":                        [ 
             [11199,11328,11287,32345,12342,1232,13456,123244,13456],
             "location":          
             ["loc2","loc1","loc3","loc1","loc2","loc2","loc3","loc2","loc1"],
              "category": 
             ["cat1","cat2","cat1","cat3","cat3","cat3","cat2","cat3","cat2"],
             })  
    
    is_check = st.checkbox("Display Data")
    if is_check:
        st.table(df)
    
    
    columns = st.sidebar.multiselect("Enter the variables", df.columns)
    
    sidebars = {}
    for y in columns:
        ucolumns=list(df[y].unique())
    
        sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)   
    L = [df[k].isin(v) if isinstance(v, list) 
         else df[k].eq(v) 
         for k, v in sidebars.items() if k in df.columns]
    df = df[np.logical_and.reduce(L)]
    st.table(df)

如何获得期望的结果,我知道我需要遍历字典并进行比较

How to get the excpected result i know that i need to iterate over the dictionary and compare

我根据 @jezrael 的答案解决了字典和数据框之间的对应关系.

After i solved the comapring between dictionary and dataframe based on the answer of @jezrael.

它仍然在第一行显示以下错误:

it still display the below error at the first :

KeyError: True
Traceback:
File "f:\aienv\lib\site-packages\streamlit\script_runner.py", line 333, in _run_script
    exec(code, module.__dict__)
File "F:\AIenv\streamlit\app.py", line 326, in <module>
    df = df[np.logical_and.reduce(L)]
File "f:\aienv\lib\site-packages\pandas\core\frame.py", line 2902, in __getitem__
    indexer = self.columns.get_loc(key)
File "f:\aienv\lib\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc
    raise KeyError(key) from err

推荐答案

使用dict理解通过np.logical_and并减少 技巧:

Use dict comprehension for select dynamic by columns names with values in lists by Series.isin with np.logical_and and reduce trick:

注意-如果在字典中使用 isin ,则所有值都必须为 list

Notice - If use isin in dict all values has to be list

df = df[np.logical_and.reduce([df[k].isin(v) for k, v in sidebars.items()])]
print (df)
   source_number location category
0          11199     loc2     cat1
3          32345     loc1     cat3
4          12342     loc2     cat3
5           1232     loc2     cat3
7         123244     loc2     cat3

如果可能,可能使用dict中的标量或列表.

If possible scalars or lists in dict is possible use if-else in list comprehension with test scalars by Series.eq:

#let say the created dictionary have the below value
sidebars = {"location":["loc1","loc2"],"category":"cat3"}

L = [df[k].isin(v) if isinstance(v, list) else df[k].eq(v) for k, v in sidebars.items()]
df = df[np.logical_and.reduce(L)]
print (df)
   source_number location category
3          32345     loc1     cat3
4          12342     loc2     cat3
5           1232     loc2     cat3
7         123244     loc2     cat3

如果可能,某些列可能无法通过dict键进行匹配来过滤它(但随后不会通过此不匹配的键进行过滤):

If possible some column no match by keys of dict is possible filter it (but then not filtered by this not matched key):

L = [df[k].isin(v) for k, v in sidebars.items() if k in df.columns]

L = [df[k].isin(v) if isinstance(v, list) 
     else df[k].eq(v) 
     for k, v in sidebars.items() if k in df.columns]


df = df[np.logical_and.reduce(L)]

首次使用流式编码,所以可能会有更好的解决方案,如果传递空字典,这就是问题.

First time code in streamlit, so possible better solutions, here is problem if passed empty dictionary.

因此可以通过 bool(sidebars)进行检查:

So possible check it by if bool(sidebars):

is_check = st.checkbox("Display Data")
if is_check:
    st.table(df)


columns = st.sidebar.multiselect("Enter the variables", df.columns)

sidebars = {}
for y in columns:
    ucolumns=list(df[y].unique())
    print (ucolumns)

    sidebars[y]=st.sidebar.multiselect('Filter '+y, ucolumns)   

if bool(sidebars):
    L = [df[k].isin(v) if isinstance(v, list) 
         else df[k].eq(v) 
         for k, v in sidebars.items() if k in df.columns]
    
    df1 = df[np.logical_and.reduce(L)]
    st.table(df1)

这篇关于使用具有多个值的字典过滤数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆