如何从具有部分指定元组键的字典中的多列索引中设置新列? [英] How to set new columns in a multi-column index from a dict with partially specified tuple keys?
问题描述
我有一个以下列方式初始化的 Pandas 数据框:
I have a pandas dataframe initialized in the following way:
import pandas as pd
my_multi_index = pd.MultiIndex.from_tuples([('a', 'a1'), ('a', 'a2'),
('b', 'b1'), ('b', 'b2')],
names=['key1', 'key2'])
df = pd.DataFrame(data=[[1, 2], [3, 4], [5, 6], [7, 8]],
columns=['col1', 'col2'],
index=my_multi_index)
print(df)
给出:
# col1 col2
# key1 key2
# a a1 1 2
# a2 3 4
# b b1 5 6
# b2 7 8
现在我想使用部分键切片向这个数据帧添加一个新列 desc1
但不是在代码中,我想从配置中做到这一点,即带有部分元组键的字典:
Now I'd like to add a new column desc1
to this dataframe using partial key slicing BUT not in code, I'd like to do this from configuration i.e. a dictionary with partial tuple keys:
# i'd like to externalize this and not hardcode it i.e. easier maintenance
df.loc[pd.IndexSlice['a', :], 'desc1'] = 'x'
df.loc[pd.IndexSlice['b', 'b1'], 'desc1'] = 'y1'
df.loc[pd.IndexSlice['b', 'b2'], 'desc1'] = 'y2'
print(df)
给出:
# key1 key2
# a a1 1 2 x
# a2 3 4 x
# b b1 5 6 y1
# b2 7 8 y2
请注意,设置 'x' 不依赖于 ('a', _)
键的第二个组件,而设置 'y1' 和 'y2' 确实依赖于 ('a', _)
键的第二个组件('b', 'b1')
键.一个可能的解决方案是完全指定映射,但如果我有一个 100 (a, _)
其分配不依赖于第二个组件,这也是不可取的.我希望达到上述结果,但不硬编码切片分配,而是希望从外部化字典中实现:
notice that setting 'x' doesn't depend on the second component of the ('a', _)
key and setting 'y1' and 'y2' do depend on the second component of the ('b', 'b1')
key. A possible solution is to fully specify the mapping but this is also not desirable if I have a 100 (a, _)
whose assignment doesn't depend on the second component. I wish to reach the above result but not hard-coding the sliced assignments, instead I'd like to do it from an externalized dictionary:
我的配置字典如下所示:
My configuration dictionary would look like this:
my_dict = {
('a', None): 'x',
('b', 'b1'): 'y1',
('b', 'b2'): 'y2'
}
是否有一种 pythonic 和 pandas-tonic 的方式来应用带有部分指定键的字典来达到之前生成的切片分配?
Is there a pythonic and pandas-tonic way to apply this dictionary with partially specified keys to reach the sliced assignment produced before?
推荐答案
我们可以利用这样一个事实,即我们可以将元组作为 MultiIndex 切片器传递.此外,我们会稍微调整您的 my_dict
.然后我们应用一个简单的 for 循环:
We can leverage the fact that we can pass tuples as a MultiIndex slicer. Also we slightly adjust your my_dict
. Then we apply a simple for loop:
my_dict = {
('a',): 'x',
('b', 'b1'): 'y1',
('b', 'b2'): 'y2'
}
for idx, value in my_dict.items():
df.loc[idx, 'desc1'] = value
col1 col2 desc1
key1 key2
a a1 1 2 x
a2 3 4 x
b b1 5 6 y1
b2 7 8 y2
第二个选项是使用 Index.map
并填写你的字典中的第一个值,所以我们可以使用 Series.ffill
:
Second option would be to use Index.map
and filling in the first value in your dict, so we can use Series.ffill
:
my_dict = {
('a', 'a1'): 'x',
('b', 'b1'): 'y1',
('b', 'b2'): 'y2'
}
df['desc1'] = df.index.map(my_dict)
df['desc1'] = df['desc1'].ffill()
col1 col2 desc1
key1 key2
a a1 1 2 x
a2 3 4 x
b b1 5 6 y1
b2 7 8 y2
这篇关于如何从具有部分指定元组键的字典中的多列索引中设置新列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!