pandas 中连续数据的平行坐标图 [英] parallel coordinates plot for continous data in pandas

查看:107
本文介绍了 pandas 中连续数据的平行坐标图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

pandas的parallel_coordinates函数非常有用:

import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
sampdata = read_csv('/usr/local/lib/python3.3/dist-packages/pandas/tests/data/iris.csv')
parallel_coordinates(sampdata, 'Name')

但是,当您拥有连续数据时,其行为将不是您所期望的:

mypos = np.random.randint(10, size=(100, 2))
mydata = DataFrame(mypos, columns=['x', 'y'])
myres = np.random.rand(100, 1)
mydata['res'] = myres
parallel_coordinates(mydata, 'res')

我想用线条的颜色来反映幅度 连续变量的从白色到黑色的渐变, 最好还具有一定的透明度(alpha值), 旁边还有一个彩条.

解决方案

我今天遇到了完全相同的问题.我的解决方案是从熊猫中复制parallel_coordinates并使其适应我的特殊需求.我认为这对其他人可能有用,这是我的实现:

def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None,
                     use_columns=False, xticks=None, colormap=None,
                     **kwds):
    import matplotlib.pyplot as plt
    import matplotlib as mpl

    n = len(frame)
    class_col = frame[class_column]
    class_min = np.amin(class_col)
    class_max = np.amax(class_col)

    if cols is None:
        df = frame.drop(class_column, axis=1)
    else:
        df = frame[cols]

    used_legends = set([])

    ncols = len(df.columns)

    # determine values to use for xticks
    if use_columns is True:
        if not np.all(np.isreal(list(df.columns))):
            raise ValueError('Columns must be numeric to be used as xticks')
        x = df.columns
    elif xticks is not None:
        if not np.all(np.isreal(xticks)):
            raise ValueError('xticks specified must be numeric')
        elif len(xticks) != ncols:
            raise ValueError('Length of xticks must match number of columns')
        x = xticks
    else:
        x = range(ncols)

    fig = plt.figure()
    ax = plt.gca()

    Colorm = plt.get_cmap(colormap)

    for i in range(n):
        y = df.iloc[i].values
        kls = class_col.iat[i]
        ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), **kwds)

    for i in x:
        ax.axvline(i, linewidth=1, color='black')

    ax.set_xticks(x)
    ax.set_xticklabels(df.columns)
    ax.set_xlim(x[0], x[-1])
    ax.legend(loc='upper right')
    ax.grid()

    bounds = np.linspace(class_min,class_max,10)
    cax,_ = mpl.colorbar.make_axes(ax)
    cb = mpl.colorbar.ColorbarBase(cax, cmap=Colorm, spacing='proportional', ticks=bounds, boundaries=bounds, format='%.2f')

    return fig

我不知道它是否可以与pandas原始功能提供的每个选项一起使用.但对于您的示例,它给出了以下内容:

parallel_coordinates(mydata, 'res', colormap="binary")

您可以通过在上一个功能中更改此行来添加alpha值:

ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), alpha=(kls - class_min)/(class_max-class_min), **kwds)

对于熊猫的原始示例,删除名称并将最后一列用作值:

sampdata = read_csv('iris_modified.csv')
parallel_coordinates(sampdata, 'Value')

我希望这会对您有所帮助!

克里斯托夫

The parallel_coordinates function from pandas is very useful:

import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
sampdata = read_csv('/usr/local/lib/python3.3/dist-packages/pandas/tests/data/iris.csv')
parallel_coordinates(sampdata, 'Name')

But when you have continous data, its behavior is not what you would expect:

mypos = np.random.randint(10, size=(100, 2))
mydata = DataFrame(mypos, columns=['x', 'y'])
myres = np.random.rand(100, 1)
mydata['res'] = myres
parallel_coordinates(mydata, 'res')

I would like to have the color of the lines to reflect the magnitude of the continuous variable, e.g. in a gradient from white to black, preferably also with the possibility of some transparency (alpha value), and with a color bar beside.

解决方案

I had the exact same problem today. My solution was to copy the parallel_coordinates from pandas and to adapt it for my special needs. As I think it can be useful for others, here is my implementation:

def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None,
                     use_columns=False, xticks=None, colormap=None,
                     **kwds):
    import matplotlib.pyplot as plt
    import matplotlib as mpl

    n = len(frame)
    class_col = frame[class_column]
    class_min = np.amin(class_col)
    class_max = np.amax(class_col)

    if cols is None:
        df = frame.drop(class_column, axis=1)
    else:
        df = frame[cols]

    used_legends = set([])

    ncols = len(df.columns)

    # determine values to use for xticks
    if use_columns is True:
        if not np.all(np.isreal(list(df.columns))):
            raise ValueError('Columns must be numeric to be used as xticks')
        x = df.columns
    elif xticks is not None:
        if not np.all(np.isreal(xticks)):
            raise ValueError('xticks specified must be numeric')
        elif len(xticks) != ncols:
            raise ValueError('Length of xticks must match number of columns')
        x = xticks
    else:
        x = range(ncols)

    fig = plt.figure()
    ax = plt.gca()

    Colorm = plt.get_cmap(colormap)

    for i in range(n):
        y = df.iloc[i].values
        kls = class_col.iat[i]
        ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), **kwds)

    for i in x:
        ax.axvline(i, linewidth=1, color='black')

    ax.set_xticks(x)
    ax.set_xticklabels(df.columns)
    ax.set_xlim(x[0], x[-1])
    ax.legend(loc='upper right')
    ax.grid()

    bounds = np.linspace(class_min,class_max,10)
    cax,_ = mpl.colorbar.make_axes(ax)
    cb = mpl.colorbar.ColorbarBase(cax, cmap=Colorm, spacing='proportional', ticks=bounds, boundaries=bounds, format='%.2f')

    return fig

I don't know if it will works with every option that pandas original function provides. But for your example, it gives something like this:

parallel_coordinates(mydata, 'res', colormap="binary")

You can add alpha value by changing this line in the previous function:

ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), alpha=(kls - class_min)/(class_max-class_min), **kwds)

And for pandas original example, removing names and using the last column as values:

sampdata = read_csv('iris_modified.csv')
parallel_coordinates(sampdata, 'Value')

I hope this will help you!

Christophe

这篇关于 pandas 中连续数据的平行坐标图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆