Python中SciPy树状图的自定义群集颜色(link_color_func?) [英] Custom cluster colors of SciPy dendrogram in Python (link_color_func?)

查看:533
本文介绍了Python中SciPy树状图的自定义群集颜色(link_color_func?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想用我以字典形式制作的颜色图(即{leaf: color})为群集着色.

I want to color my clusters with a color map that I made in the form of a dictionary (i.e. {leaf: color}).

我已尝试遵循> https ://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/,但是由于某种原因,颜色变得混乱了.默认图看起来不错,我只想为这些颜色分配不同的颜色.我看到有一个link_color_func,但是当我尝试使用颜色映射(D_leaf_color字典)时,出现错误b/c,这不是一个函数.我创建了D_leaf_color来自定义与特定簇相关的叶子的颜色.在我的实际数据集中,颜色代表着某种意义,因此我转向了任意颜色分配.

I've tried following https://joernhees.de/blog/2015/08/26/scipy-hierarchical-clustering-and-dendrogram-tutorial/ but the colors get messed up for some reason. The default plot looks good, I just want to assign those colors differently. I saw that there was a link_color_func but when I tried using my color map (D_leaf_color dictionary) I got an error b/c it wasn't a function. I've created D_leaf_color to customize the colors of the leaves associated with particular clusters. In my actual dataset, the colors mean something so I'm steering away from arbitrary color assignments.

我不想在实际数据中使用color_threshold b/c,我拥有更多的聚类,并且SciPy重复了颜色,因此出现了这个问题. . .

I don't want to use color_threshold b/c in my actual data, I have way more clusters and SciPy repeats the colors, hence this question. . .

如何使用叶色字典自定义树状图簇的颜色?

我发布了GitHub问题 https://github.com/scipy/scipy/issues/6346 中,我在

I made a GitHub issue https://github.com/scipy/scipy/issues/6346 where I further elaborated on the approach to color the leaves in Interpreting the output of SciPy's hierarchical clustering dendrogram? (maybe found a bug...) but I still can't figure out how to actually either: (i) use dendrogram output to reconstruct my dendrogram with my specified color dictionary or (ii) reformat my D_leaf_color dictionary for the link_color_func parameter.

# Init
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()

# Load data
from sklearn.datasets import load_diabetes

# Clustering
from scipy.cluster.hierarchy import dendrogram, fcluster, leaves_list
from scipy.spatial import distance
from fastcluster import linkage # You can use SciPy one too

%matplotlib inline

# Dataset
A_data = load_diabetes().data
DF_diabetes = pd.DataFrame(A_data, columns = ["attr_%d" % j for j in range(A_data.shape[1])])

# Absolute value of correlation matrix, then subtract from 1 for disimilarity
DF_dism = 1 - np.abs(DF_diabetes.corr())

# Compute average linkage
A_dist = distance.squareform(DF_dism.as_matrix())
Z = linkage(A_dist,method="average")

# Color mapping
D_leaf_colors = {"attr_1": "#808080", # Unclustered gray

                 "attr_4": "#B061FF", # Cluster 1 indigo
                 "attr_5": "#B061FF",
                 "attr_2": "#B061FF",
                 "attr_8": "#B061FF",
                 "attr_6": "#B061FF",
                 "attr_7": "#B061FF",

                 "attr_0": "#61ffff", # Cluster 2 cyan
                 "attr_3": "#61ffff",
                 "attr_9": "#61ffff",
                 }

# Dendrogram
# To get this dendrogram coloring below  `color_threshold=0.7`
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None, leaf_font_size=12, leaf_rotation=45, link_color_func=D_leaf_colors)
# TypeError: 'dict' object is not callable

我还尝试了如何获得由scipy.cluster.hierarchy制作的树状图的子树

推荐答案

以下是使用linkage()的返回矩阵Z的解决方案(前面已经介绍过,但在

Here a solution that uses the return matrix Z of linkage() (described early but a little hidden in the docs) and link_color_func:

# see question for code prior to "color mapping"

# Color mapping
dflt_col = "#808080"   # Unclustered gray
D_leaf_colors = {"attr_1": dflt_col,

                 "attr_4": "#B061FF", # Cluster 1 indigo
                 "attr_5": "#B061FF",
                 "attr_2": "#B061FF",
                 "attr_8": "#B061FF",
                 "attr_6": "#B061FF",
                 "attr_7": "#B061FF",

                 "attr_0": "#61ffff", # Cluster 2 cyan
                 "attr_3": "#61ffff",
                 "attr_9": "#61ffff",
                 }

# notes:
# * rows in Z correspond to "inverted U" links that connect clusters
# * rows are ordered by increasing distance
# * if the colors of the connected clusters match, use that color for link
link_cols = {}
for i, i12 in enumerate(Z[:,:2].astype(int)):
  c1, c2 = (link_cols[x] if x > len(Z) else D_leaf_colors["attr_%d"%x]
    for x in i12)
  link_cols[i+1+len(Z)] = c1 if c1 == c2 else dflt_col

# Dendrogram
D = dendrogram(Z=Z, labels=DF_dism.index, color_threshold=None,
  leaf_font_size=12, leaf_rotation=45, link_color_func=lambda x: link_cols[x])

这里是输出:

这篇关于Python中SciPy树状图的自定义群集颜色(link_color_func?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆