大 pandas 融化后的分类列 [英] Categorical column after melt in pandas

查看:50
本文介绍了大 pandas 融化后的分类列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Pandas 中进行 melt 操作后是否可能以分类变量列结束?

Is it possible to end up with a categorical variable column after a melt operation in pandas?

如果我这样设置数据:

import pandas as pd
import numpy as np

df = pd.DataFrame(
    np.random.randn(3, 5), 
    columns=["A", "B", "C", "D", "E"]
)
df["id"] = range(1, 4)
df

|    |         A |         B |         C |         D |          E |   id |
|----|-----------|-----------|-----------|-----------|------------|------|
|  0 | -0.406174 | -0.686917 | -0.172913 | -0.273074 | -0.0246714 |    1 |
|  1 |  0.323783 | -1.7731   |  1.57581  | -1.15671  | -1.23926   |    2 |
|  2 | -1.1426   | -0.591279 |  1.15265  |  0.326712 | -0.86374   |    3 |

然后申请

melted_df = df.melt(id_vars="id", value_vars=["A", "B", "C", "D", "E"])
melted_df

|    |   id | variable   |      value |
|----|------|------------|------------|
|  0 |    1 | A          | -0.406174  |
|  1 |    2 | A          |  0.323783  |
|  2 |    3 | A          | -1.1426    |
|  3 |    1 | B          | -0.686917  |
|  4 |    2 | B          | -1.7731    |
|  5 |    3 | B          | -0.591279  |
|  6 |    1 | C          | -0.172913  |
|  7 |    2 | C          |  1.57581   |
|  8 |    3 | C          |  1.15265   |
|  9 |    1 | D          | -0.273074  |
| 10 |    2 | D          | -1.15671   |
| 11 |    3 | D          |  0.326712  |
| 12 |    1 | E          | -0.0246714 |
| 13 |    2 | E          | -1.23926   |
| 14 |    3 | E          | -0.86374   |

variable 列的 dtype 是 object

The dtype of the variable column is object

melted_df.dtypes

id            int64
variable     object
value       float64
dtype: object

我希望这是category.我知道,我可以通过以下方式轻松转换它:

I'd like this to be category. I know, I can convert it easily by:

melted_df["variable"].astype("category")

但是对于大型数据集,我想避免这种开销.在 文档 中我没有找到这样的选项,但由于结果列根据定义包含分类数据,我认为一定有可能.

But for large datasets, I'd like to avoid this overhead. In the documentation I didn't find such an option, but since the resulting column contains categorical data by definition, I presume there must be a possiblity.

推荐答案

我认为 melt 不可能,因为当它创建该列时,它会推断出 dtype 和 'category' 不是 pandas 当前推断的 dtype .(这是一个相关问题,它不能正确推断 Int32 dtypes 为什么是pandas.melt 弄乱了我的数据类型?).

I don't think it's possible with melt, because when it creates that column it infers the dtype and 'category' is not a dtype that pandas currently infers. (Here's a related issue where it doesn't correctly infer Int32 dtypes Why is pandas.melt messing with my dtypes?).

stack 将保留分类数据类型.stack 将导致与melt 的排序略有不同,但数据将相同.stack 在命名结果列方面也有点笨拙.

stack will keep the categorical dtype if you first convert the columns. stack will result in a slightly different ordering than melt, but the data will be the same. stack is also a bit clunkier with naming the resulting columns.

df = df.set_index('id')
df.columns = df.columns.astype('category')

res = (df.stack()
         .rename_axis(['id', 'variable'])
         .rename('value')
         .reset_index())
#    id variable     value
#0    1        A  0.424781
#1    1        B -0.317107
#2    1        C  0.731121
#3    1        D  0.042642
#4    1        E  0.648352
#...
#13   3        D -0.889600
#14   3        E -1.822898

res.dtypes
#id             int64
#variable    category
#value        float64
#dtype: object

这篇关于大 pandas 融化后的分类列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆