Matplotlib 按分类因素散布颜色 [英] Matplotlib scatter color by categorical factors

查看:38
本文介绍了Matplotlib 按分类因素散布颜色的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个基本的散点图,其中 x 和 y 是浮动的.但是我想根据第三个类别变量来更改标记的颜色.分类变量采用字符串形式.这似乎引起了问题.

I have a basic scatter where the x and y are float. But I want to change the color of the marker based on a third categorical variable. The categorical variable is in a string form. This seems to cause an issue.

要使用 iris 数据集 - 这是我想我会使用的代码:

To use the iris dataset- here is the code I think I would use:

#Scatter of Petal
x=df['Petal Length']
y=df['Petal Width']
z=df['Species']
plt.scatter(x, y, c=z, s=15, cmap='hot')
plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')

但是我得到一个错误:无法将字符串转换为浮点:iris-setosa

But I get an error that: could not convert string to float: iris-setosa

在运行之前是否必须将分类变量更改为数字变量,还是可以对当前格式的数据进行某些操作?

Do I have to change the categorical variable to a numeric one before I run, or is there something I can do with the data in its current format?

谢谢

更新:整个回溯是:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-47-d67ee3bffc3b> in <module>()
      3 y=df['Petal Width']
      4 z=df['Species']
----> 5 plt.scatter(x, y, c=z, s=15, cmap='hot')
      6 plt.xlabel('Petal Width')
      7 plt.ylabel('Petal Length')

/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.pyc in scatter(x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, hold, **kwargs)
   3198         ret = ax.scatter(x, y, s=s, c=c, marker=marker, cmap=cmap, norm=norm,
   3199                          vmin=vmin, vmax=vmax, alpha=alpha,
-> 3200                          linewidths=linewidths, verts=verts, **kwargs)
   3201         draw_if_interactive()
   3202     finally:

/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.pyc in scatter(self, x, y, s, c, marker, cmap, norm, vmin, vmax, alpha, linewidths, verts, **kwargs)
   3605 
   3606         if c_is_stringy:
-> 3607             colors = mcolors.colorConverter.to_rgba_array(c, alpha)
   3608         else:
   3609             # The inherent ambiguity is resolved in favor of color

/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba_array(self, c, alpha)
    420             result = np.zeros((nc, 4), dtype=np.float)
    421             for i, cc in enumerate(c):
--> 422                 result[i] = self.to_rgba(cc, alpha)
    423             return result
    424 

/Users/mpgartland1/anaconda/lib/python2.7/site-packages/matplotlib/colors.pyc in to_rgba(self, arg, alpha)
    374         except (TypeError, ValueError) as exc:
    375             raise ValueError(
--> 376                 'to_rgba: Invalid rgba arg "%s"\n%s' % (str(arg), exc))
    377 
    378     def to_rgba_array(self, c, alpha=None):

ValueError: to_rgba: Invalid rgba arg "Iris-setosa"
to_rgb: Invalid rgb arg "Iris-setosa"
could not convert string to float: iris-setosa

推荐答案

正如您的回溯告诉您的那样,您不能将字符串传递给color参数.您可以传递颜色,也可以传递将其本身解释为颜色的值数组.

As your traceback tells you, you can't pass a string to the color parameter. You can pass either colors, or an array of values that it will interpret as colors itself.

请参阅:http://matplotlib.org/api/pyplot_api.html?highlight=plot#matplotlib.pyplot.plot

可能有一种更优雅的方法,但是以下一种实现方式(我使用了以下数据集:

There is probably a more elegant way, but one implementation would be the following (I used the following dataset: https://raw.githubusercontent.com/pydata/pandas/master/pandas/tests/data/iris.csv):

import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.cm as cmx
from pandas import read_csv

df = read_csv('iris.csv')

#Scatter of Petal
x=df['PetalLength']
y=df['PetalWidth']

# Get unique names of species
uniq = list(set(df['Name']))

# Set the color map to match the number of species
z = range(1,len(uniq))
hot = plt.get_cmap('hot')
cNorm  = colors.Normalize(vmin=0, vmax=len(uniq))
scalarMap = cmx.ScalarMappable(norm=cNorm, cmap=hot)

# Plot each species
for i in range(len(uniq)):
    indx = df['Name'] == uniq[i]
    plt.scatter(x[indx], y[indx], s=15, color=scalarMap.to_rgba(i), label=uniq[i])

plt.xlabel('Petal Width')
plt.ylabel('Petal Length')
plt.title('Petal Width vs Length')
plt.legend(loc='upper left')
plt.show()

给出如下内容:

明确为图例添加标签.

这篇关于Matplotlib 按分类因素散布颜色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆