pandas :将数据框的多个列用作另一个的索引 [英] Pandas: Use multiple columns of a dataframe as index of another

查看:57
本文介绍了 pandas :将数据框的多个列用作另一个的索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个很大的数据框,其中包含我的数据,另一个具有相同第一维的数据框,其中包含有关每个时间点的元数据(例如,它是什么试验编号,是什么试验类型).

我要做的是使用元数据帧"的值对大型数据帧进行切片.我想将它们分开(而不是将元数据帧存储为较大索引的多索引).

现在,我正在尝试执行以下操作:

def my_func(container):
   container.big_df.set_index(container.meta_df[['col1', 'col2']])
   container.big_df.loc['col1val', 'col2val'].plot()

但是,这将返回以下错误:

ValueError: Must pass DataFrame with boolean values only

请注意,如果我仅将单个列传递给set_index,则此方法会很好.

有人能弄清楚这里出了什么问题吗?或者,有人可以告诉我,我正在以一种完全愚蠢的方式进行此操作,并且有更好的方法来解决此问题? :)

我的解决方案

感谢您的想法.我玩了一点索引,这似乎是最简单/最快的.我不喜欢剥离其名称的索引,并且转置值等似乎很麻烦.我意识到一些有趣的事情(也许很容易修复):

dfa.set_index(dfb[['col1', 'col2']]) 

不起作用,但是

dfa.set_index([dfb.col1, dfb.col2])

确实.

因此,基本上可以按照以下约定将dfb转换为列列表,以使set_index起作用:

dfa.set_index([dfb[col] for col in ['col1', 'col2']])

解决方案

使用MultiIndex.from_arrays()创建索引对象:

import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3], "B":["a","b","c"]})
df2 = pd.DataFrame({"C":[100,200,300]})
df2.index = pd.MultiIndex.from_arrays(df1.values.T)

print df2

结果:

       C
1 a  100
2 b  200
3 c  300

I've got a large dataframe with my data in it, and another dataframe of the same first dimension that contains metadata about each point in time (e.g., what trial number it was, what trial type it was).

What I want to do is slice the large dataframe using the values of the "metadataframe". I want to keep these separate (rather than storing the metadataframe as a multi-index of the larger one).

Right now, I am trying to do something like this:

def my_func(container):
   container.big_df.set_index(container.meta_df[['col1', 'col2']])
   container.big_df.loc['col1val', 'col2val'].plot()

However, this returns the following error:

ValueError: Must pass DataFrame with boolean values only

Note that this works fine if I only pass a single column to set_index.

Can anyone figure out what's going wrong here? Alternatively, can someone tell me that I'm doing this in a totally stupid and hacky way, and that there's a much better way to go about it? :)

MY SOLUTION

Thanks for the ideas. I played around with the indexing a little bit, and this seems to be the easiest / fastest. I didn't like having to strip the index of its name, and transposing the values etc. seemed cumbersome. I realized something interesting (and probably worth easily fixing):

dfa.set_index(dfb[['col1', 'col2']]) 

doesn't work, but

dfa.set_index([dfb.col1, dfb.col2])

does.

So, you can basically turn dfb into a list of columns, making set_index work, by the following convention:

dfa.set_index([dfb[col] for col in ['col1', 'col2']])

解决方案

Use MultiIndex.from_arrays() to create the index object:

import pandas as pd
df1 = pd.DataFrame({"A":[1,2,3], "B":["a","b","c"]})
df2 = pd.DataFrame({"C":[100,200,300]})
df2.index = pd.MultiIndex.from_arrays(df1.values.T)

print df2

the result:

       C
1 a  100
2 b  200
3 c  300

这篇关于 pandas :将数据框的多个列用作另一个的索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆