Python:将计算的线添加到具有嵌套分类x轴的散点图中 [英] Python: Add calculated lines to a scatter plot with a nested categorical x-axis

查看:52
本文介绍了Python:将计算的线添加到具有嵌套分类x轴的散点图中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

交叉发布:

这是我的尝试,使用 pandas bokeh :

进口:

 将熊猫作为pd导入从bokeh.io导入output_notebook,显示,reset_output从bokeh.palettes导入Spectral5,Turbo256来自bokeh.plotting导入图来自bokeh.transform导入factor_cmap从bokeh.models导入Band,Span,FactorRange,ColumnDataSource 

创建数据:

  fruits = ['Apples,'Pears']年= ['2015','2016']数据= {'水果':水果,'2015':[2,1],'2016':[5,3]}fruit_df = pd.DataFrame(data).set_index("fruit")tidy_df =(pd.DataFrame(data).melt(id_vars = ["fruit"],var_name ="year").assign(fruit_year = lambda df:list(zip(df ['fruit'],df ['year']))).set_index('fruit_year')) 

创建 bokeh 情节:

  p = Figure(x_range = FactorRange(factors = tidy_df.index.unique()),plot_height = 400,plot_width = 400,tooltips = [('Fruit','@fruit'),#第一个字符串是用户定义的;第二个字符串必须引用一列(年","@年"),(值","@值")])cds = ColumnDataSource(tidy_df)index_cmap = factor_cmap("fruit",Spectral5 [:2],factor = sorted(tidy_df ["fruit"].unique()))#这是对数据帧的引用p.circle(x ='fruit_year',y ='value',尺寸= 20,来源= cds,fill_color = index_cmap,line_color =无,)#如何将中位数仅添加到一个分类部分?中位数=跨度(位置= tidy_df.loc [tidy_df [水果"] ==苹果",值"] .median(),#苹果的中值#dimension ='height',line_color ='红色',line_dash ='虚线',line_width = 1.0)p.add_layout(中位数)#如何将这个标准偏差范围添加到苹果"或梨"部分?带=带(base ='fruit_year',低= 2,上限= 4,来源= cds,)p.add_layout(band)显示(p) 

输出:

我可以解决这个问题吗? https://github.com/bokeh/bokeh/issues/8592 是否还有其他适用于Python的数据可视化库可以实现此目的?Altair,Holoviews,Matplotlib,Plotly ...?

解决方案

Band 是一个连接的区域,但是所需输出的图像具有两个断开的区域.意思是,您实际上需要两个频段.请看这里的示例,以更好地了解乐队: https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#bands

通过使用 Band(base ='fruit_year',lower = 2,upper = 4,source = cds),您要求Bokeh为每个 fruit_year的值绘制一个频带.code>,下坐标为2,上坐标为4.这正是在散景图上看到的.

有些不相关,但仍然是一个错误-请注意您的X轴与所需轴有何不同.您必须首先指定主要类别,因此将 list(zip(df ['year']))替换为 list(zip(df ['year'],df ['fruit'])).

现在,进入操作方法"部分.由于需要两个单独的频段,因此无法为它们提供相同的数据源.这样做的方法是拥有两个额外的数据源-每个频段一个.最终是这样的:

 年份,[('2015',0.3),('2016',0.5)]中的sd:b_df =(tidy_df [tidy_df ['year'] ==年].drop(columns = ['year','fruit']).assign(lower = lambda df:df ['value'].min()-sd,upper = lambda df:df ['value'].max()+ sd).drop(columns ='value'))p.add_layout(Band(base ='fruit_year',lower ='lower',upper ='upper',source = ColumnDataSource(b_df))) 

但是,还有两个问题.第一个是琐碎的-自动Y范围(默认为 DataRange1d 类的实例)将不考虑波段的高度.因此,乐队可以轻松地越界并被剧情裁剪.解决方案是使用考虑到SD值的手动量程.

第二个问题是,带的宽度限于X范围因子,这意味着圆将部分位于带之外.这个不是那么容易解决.通常的解决方案是使用 transform 只是在边缘稍微移动坐标.但是,由于这是分类轴,因此我们无法做到这一点.一种可能的解决方案是创建一个自定义 Band 模型,该模型添加偏移量:

  class MyBand(Band):#language = TypeScript__implementation__ ="从模型/注释/波段"导入{Band,BandView}导出类MyBandView扩展BandView {受保护的_map_data():无效{super._map_data()const base_sx = this.model.dimension =='height'吗?this._lower_sx:this._lower_sy如果(base_sx.length> 1){const offset =(base_sx [1]-base_sx [0])/2base_sx [0]-=偏移量base_sx [base_sx.length-1] + =偏移量}}}出口类MyBand扩展Band {__view_type__:MyBandView静态init_MyBand():void {this.prototype.default_view = MyBandView}}" 

只需在上面的代码中将 Band 替换为 MyBand ,它就可以工作.一个警告-您将需要安装Node.js,并且启动时间将持续一到两秒,因为自定义模型代码需要编译.另一个警告-自定义模型代码了解BokehJS的内部结构.意思是,虽然它可以与Bokeh 2.0.2一起使用,但我不能保证它可以与任何其他Bokeh版本一起使用.

Cross-post: https://discourse.bokeh.org/t/add-calculated-horizontal-lines-corresponding-to-categories-on-the-x-axis/5544

I would like to duplicate this plot in Python:

Here is my attempt, using pandas and bokeh:

Imports:

import pandas as pd
from bokeh.io import output_notebook, show, reset_output
from bokeh.palettes import Spectral5, Turbo256
from bokeh.plotting import figure
from bokeh.transform import factor_cmap
from bokeh.models import Band, Span, FactorRange, ColumnDataSource

Create data:

fruits = ['Apples', 'Pears']
years = ['2015', '2016']

data = {'fruit' : fruits,
        '2015'   : [2, 1],
        '2016'   : [5, 3]}

fruit_df = pd.DataFrame(data).set_index("fruit")
tidy_df = (pd.DataFrame(data)
           .melt(id_vars=["fruit"], var_name="year")
           .assign(fruit_year=lambda df: list(zip(df['fruit'], df['year'])))
           .set_index('fruit_year'))

Create bokeh plot:

p = figure(x_range=FactorRange(factors=tidy_df.index.unique()),
           plot_height=400,
           plot_width=400,
           tooltips=[('Fruit', '@fruit'), # first string is user-defined; second string must refer to a column
                     ('Year', '@year'),
                     ('Value', '@value')])

cds = ColumnDataSource(tidy_df)

index_cmap = factor_cmap("fruit", 
                         Spectral5[:2], 
                         factors=sorted(tidy_df["fruit"].unique())) # this is a reference back to the dataframe

p.circle(x='fruit_year', 
         y='value', 
         size=20,
         source=cds,
         fill_color=index_cmap,
         line_color=None,
        )
# how do I add a median just to one categorical section?
median = Span(location=tidy_df.loc[tidy_df["fruit"] == "Apples", "value"].median(), # median value for Apples
              #dimension='height', 
              line_color='red',
              line_dash='dashed', 
              line_width=1.0
             )

p.add_layout(median)

# how do I add this standard deviation(ish) band to just the Apples or Pears section?
band = Band(
    base='fruit_year',
    lower=2,
    upper=4,
    source=cds,
)

p.add_layout(band)

show(p)

Output:

Am I up against this issue? https://github.com/bokeh/bokeh/issues/8592 Is there any other data visualization library for Python that can accomplish this? Altair, Holoviews, Matplotlib, Plotly... ?

解决方案

Band is a connected area, but your image of the desired output has two disconnected areas. Meaning, you actually need two bands. Take a look at the example here to better understand bands: https://docs.bokeh.org/en/latest/docs/user_guide/annotations.html#bands

By using Band(base='fruit_year', lower=2, upper=4, source=cds) you ask Bokeh to plot a band where for each value of fruit_year, the lower coordinate will be 2 and the upper coordinate will be 4. Which is exactly what you see on your Bokeh plot.

A bit unrelated but still a mistake - notice how your X axis is different from what you wanted. You have to specify the major category first, so replace list(zip(df['fruit'], df['year'])) with list(zip(df['year'], df['fruit'])).

Now, to the "how to" part. Since you need two separate bands, you cannot provide them with the same data source. The way to do it would be to have two extra data sources - one for each band. It ends up being something like this:

for year, sd in [('2015', 0.3), ('2016', 0.5)]:
    b_df = (tidy_df[tidy_df['year'] == year]
            .drop(columns=['year', 'fruit'])
            .assign(lower=lambda df: df['value'].min() - sd,
                    upper=lambda df: df['value'].max() + sd)
            .drop(columns='value'))
    p.add_layout(Band(base='fruit_year', lower='lower', upper='upper',
                      source=ColumnDataSource(b_df)))

There are two issues left however. The first one is a trivial one - the automatic Y range (an instance of DataRange1d class by default) will not take the bands' heights into account. So the bands can easily go out of bounds and be cropped by the plot. The solution here is to use manual ranging that takes the SD values into account.

The second issue is that the width of band is limited to the X range factors, meaning that the circles will be partially outside of the band. This one is not that easy to fix. Usually a solution would be to use a transform to just shift the coordinates a bit at the edges. But since this is a categorical axis, we cannot do it. One possible solution here is to create a custom Band model that adds an offset:

class MyBand(Band):
    # language=TypeScript
    __implementation__ = """
import {Band, BandView} from "models/annotations/band"

export class MyBandView extends BandView {
    protected _map_data(): void {
        super._map_data()
        const base_sx = this.model.dimension == 'height' ? this._lower_sx : this._lower_sy
        if (base_sx.length > 1) {
            const offset = (base_sx[1] - base_sx[0]) / 2
            base_sx[0] -= offset
            base_sx[base_sx.length - 1] += offset
        }
    }
}

export class MyBand extends Band {
    __view_type__: MyBandView

    static init_MyBand(): void {
        this.prototype.default_view = MyBandView
    }
}
    """

Just replace Band with MyBand in the code above and it should work. One caveat - you will need to have Node.js installed and the startup time will be longer for a second or two because the custom model code needs compilation. Another caveat - the custom model code knows about internals of BokehJS. Meaning, that while it's working with Bokeh 2.0.2 I can't guarantee that it will work with any other Bokeh version.

这篇关于Python:将计算的线添加到具有嵌套分类x轴的散点图中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆