用可能的最简单索引在python pandas中转置一列 [英] Transposing one column in python pandas with the simplest index possible

查看:408
本文介绍了用可能的最简单索引在python pandas中转置一列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据(data_current):

import pandas as pd
import numpy as np

data_current=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation','meditation'],'disease':['acne','hypertension', 'cancer','lupus']})
data_current

我想做的是移置其中一列,这样,我不必为一列具有相同药物和不同疾病的药物排成一行,而为每种药物有一行并且有多个疾病列.保持索引尽可能简单也很重要,即0,1,2 ...,即我不想将'medicines'分配为索引列,因为我将其合并到其他键上. 因此,我需要获取data_needed

What I would like to do is to transpose one of the columns, so that instead of having multiple rows with same medicine and different diseases I have one row for each medicine with several columns for diseases. It is also important to keep index as simple as possible, i.e. 0,1,2... i.e. I don't want to assign 'medicines' as index column because I will merge it on some other key. So, I need to get data_needed

data_needed=pd.DataFrame({'medicine':['green tea','fried tomatoes','meditation'],'disease_1':['acne','hypertension','cancer'], 'disease_2':['np.nan','np.nan','lupus']})
data_needed

推荐答案

这里是实现输出的一种方法

Here's one to achieve the output

首先,在medicine上的groupby并获得disease作为列表

Firstly, groupby on medicine and get the disease as list

In [368]: md = (data_current.groupby('medicine')
                            .apply(lambda x: x['disease'].tolist())
                            .reset_index())

In [369]: md
Out[369]:
         medicine                0
0  fried tomatoes   [hypertension]
1       green tea           [acne]
2      meditation  [cancer, lupus]

然后将列中的列表转换为单独的列

Then convert the lists in column to separate columns

In [370]: dval = pd.DataFrame(md[0].tolist(), )

In [371]: dval
Out[371]:
              0      1
0  hypertension   None
1          acne   None
2        cancer  lupus

现在,您可以concat-mddval

In [372]: md = md.drop(0, axis=1)

In [373]: data_final = pd.concat([md, dval], axis=1)

然后,根据需要重命名列.

And, rename the columns as you want.

In [374]: data_final.columns = ['medicine', 'disease_1', 'disease_2']

In [375]: data_final
Out[375]:
         medicine     disease_1 disease_2
0  fried tomatoes  hypertension      None
1       green tea          acne      None
2      meditation        cancer     lupus

这篇关于用可能的最简单索引在python pandas中转置一列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆