在 pandas 中设置现有数据框的多重索引 [英] set multi index of an existing data frame in pandas

查看:85
本文介绍了在 pandas 中设置现有数据框的多重索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个DataFrame看起来像

  Emp1    Empl2           date       Company
0    0        0     2012-05-01         apple
1    0        1     2012-05-29         apple
2    0        1     2013-05-02         apple
3    0        1     2013-11-22         apple
18   1        0     2011-09-09        google
19   1        0     2012-02-02        google
20   1        0     2012-11-26        google
21   1        0     2013-05-11        google

我想通过公司和日期来为此DataFrame设置MultiIndex.当前它具有默认索引.我正在使用df.set_index(['Company', 'date'], inplace=True)

I want to pass the company and date for setting a MultiIndex for this DataFrame. Currently it has a default index. I am using df.set_index(['Company', 'date'], inplace=True)

df = pd.DataFrame()
for c in company_list:
        row = pd.DataFrame([dict(company = '%s' %s, date = datetime.date(2012, 05, 01))])
        df = df.append(row, ignore_index = True)
        for e in emp_list:
            dataset  = pd.read_sql("select company, emp_name, date(date), count(*) from company_table where  = '"+s+"' and emp_name = '"+b+"' group by company, date, name LIMIT 5 ", con)
                if len(dataset) == 0:
                row = pd.DataFrame([dict(sitename='%s' %s, name = '%s' %b, date = datetime.date(2012, 05, 01), count = np.nan)])
                dataset = dataset.append(row, ignore_index=True)
            dataset = dataset.rename(columns = {'count': '%s' %b})
            dataset = dataset.groupby(['company', 'date', 'emp_name'], as_index = False).sum()

            dataset = dataset.drop('emp_name', 1)
            df = pd.merge(df, dataset, how = '')
            df = df.sort('date', ascending = True)
            df.fillna(0, inplace = True)

df.set_index(['Company', 'date'], inplace=True)            
print df

但是当我打印此DataFrame时,它会打印None.我自己从stackoverflow看到了这个解决方案.这不是正确的方法吗?另外,我还希望对公司和日期列的位置进行改组,以便公司成为第一个索引,而日期成为层次结构中的第二个索引.有什么想法吗?

But when I print this DataFrame, it prints None. I saw this solution from stackoverflow it self. Is this not the correct way of doing it. Also I want to shuffle the positions of the columns company and date so that company becomes the first index, and date becomes the second in Hierarchy. Any ideas on this?

推荐答案

当您传入就对原始变量进行更改并返回None时,函数不会返回修改后的数据帧,它返回None.

When you pass inplace in makes the changes on the original variable and returns None, and the function does not return the modified dataframe, it returns None.

is_none = df.set_index(['Company', 'date'], inplace=True)
df  # the dataframe you want
is_none # has the value None

所以当你有这样的一行时:

so when you have a line like:

df = df.set_index(['Company', 'date'], inplace=True)

它首先修改df ...,然后将df设置为无"!

it first modifies df... but then it sets df to None!

也就是说,您应该只使用以下行:

That is, you should just use the line:

df.set_index(['Company', 'date'], inplace=True)

这篇关于在 pandas 中设置现有数据框的多重索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆