Matplotlib 中的按列值着色 [英] Color by Column Values in Matplotlib
问题描述
在 R 中使用 ggplot2
库时,我最喜欢的一个方面是能够轻松指定美学.我可以快速制作散点图并应用与特定列相关联的颜色,我希望能够使用 python/pandas/matplotlib 执行此操作.我想知道是否有任何方便的函数供人们使用 Pandas 数据框和 Matplotlib 将颜色映射到值?
##ggplot 散点图示例,带有 R 数据框,`df`,由 col3 着色ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()##pandas 数据框的理想情况,'df',颜色由 col3 选择df.plot(x=col1,y=col2,color=col3)
感谢您的回复,但我想包含一个示例数据框来阐明我的要求.两列包含数字数据,第三列是分类变量.我正在考虑的脚本将根据此值分配颜色.
np.random.seed(250)df = pd.DataFrame({'Height': np.append(np.random.normal(6, 0.25, size=5), np.random.normal(5.4, 0.25, size=5)),'权重':np.append(np.random.normal(180, 20, size=5), np.random.normal(140, 20, size=5)),性别":[男"、男"、男"、男"、男"、女"、女"、女"、女"、女"]})身高体重性别0 5.824970 159.210508 男1 5.780403 180.294943 男2 6.318295 199.142201 男3 5.617211 157.813278 男4 6.340892 191.849944 男5 5.625131 139.588467 女6 4.950479 146.711220 女7 5.617245 121.571890 女8 5.556821 141.536028 女9 5.714171 134.396203 女
导入和数据
import numpy进口大熊猫导入 matplotlib.pyplot 作为 plt进口seabornseaborn.set(style='ticks')numpy.random.seed(0)N = 37_genders= ['女性'、'男性'、'非二进制'、'无响应']df = 熊猫.DataFrame({'高度(厘米)':numpy.random.uniform(低=130,高=200,大小=N),'重量(公斤)':numpy.random.uniform(低=30,高=100,大小=N),'性别':numpy.random.choice(_genders, size=N)})
2021 年 8 月更新
- 对于
seaborn 0.11.0
,建议使用新的图形级别函数,例如旧答案
在这种情况下,我会直接使用 matplotlib.
将 numpy 导入为 np导入 matplotlib.pyplot 作为 plt将熊猫导入为 pddef dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):图, ax = plt.subplots()类别 = np.unique(df[catcol])颜色 = np.linspace(0, 1, len(categories))colordict = dict(zip(类别,颜色))df[颜色"] = df[catcol].apply(lambda x: colordict[x])ax.scatter(df[xcol], df[ycol], c=df.Color)返回无花果如果 1:df = pd.DataFrame({'Height':np.random.normal(size=10),'重量':np.random.normal(大小= 10),性别":[男"、男"、未知"、男"、男"、女性"、没有回应"、未知"、女性"、女性"]})图 = dfScatter(df)fig.savefig('fig1.png')
这给了我:
据我所知,颜色列可以是任何与 matplotlib 兼容的颜色(RBGA 元组、HTML 名称、十六进制值等).
我无法获得除数值以外的任何东西来处理颜色图.
One of my favorite aspects of using the
ggplot2
library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?##ggplot scatterplot example with R dataframe, `df`, colored by col3 ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point() ##ideal situation with pandas dataframe, 'df', where colors are chosen by col3 df.plot(x=col1,y=col2,color=col3)
EDIT: Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.
np.random.seed(250) df = pd.DataFrame({'Height': np.append(np.random.normal(6, 0.25, size=5), np.random.normal(5.4, 0.25, size=5)), 'Weight': np.append(np.random.normal(180, 20, size=5), np.random.normal(140, 20, size=5)), 'Gender': ["Male","Male","Male","Male","Male", "Female","Female","Female","Female","Female"]}) Height Weight Gender 0 5.824970 159.210508 Male 1 5.780403 180.294943 Male 2 6.318295 199.142201 Male 3 5.617211 157.813278 Male 4 6.340892 191.849944 Male 5 5.625131 139.588467 Female 6 4.950479 146.711220 Female 7 5.617245 121.571890 Female 8 5.556821 141.536028 Female 9 5.714171 134.396203 Female
解决方案Imports and Data
import numpy import pandas import matplotlib.pyplot as plt import seaborn seaborn.set(style='ticks') numpy.random.seed(0) N = 37 _genders= ['Female', 'Male', 'Non-binary', 'No Response'] df = pandas.DataFrame({ 'Height (cm)': numpy.random.uniform(low=130, high=200, size=N), 'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N), 'Gender': numpy.random.choice(_genders, size=N) })
Update August 2021
- With
seaborn 0.11.0
, it's recommended to use new figure level functions likeseaborn.relplot
than to useFacetGrid
directly.
seaborn.relplot(data=df, x='Weight (kg)', y='Height (cm)', hue='Gender', hue_order=_genders, aspect=1.61) plt.show()
Update October 2015
Seaborn handles this use-case splendidly:
- Map
matplotlib.pyplot.scatter
onto aseaborn.FacetGrid
fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61) fg.map(plt.scatter, 'Weight (kg)', 'Height (cm)').add_legend()
Which immediately outputs:
Old Answer
In this case, I would use matplotlib directly.
import numpy as np import matplotlib.pyplot as plt import pandas as pd def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'): fig, ax = plt.subplots() categories = np.unique(df[catcol]) colors = np.linspace(0, 1, len(categories)) colordict = dict(zip(categories, colors)) df["Color"] = df[catcol].apply(lambda x: colordict[x]) ax.scatter(df[xcol], df[ycol], c=df.Color) return fig if 1: df = pd.DataFrame({'Height':np.random.normal(size=10), 'Weight':np.random.normal(size=10), 'Gender': ["Male","Male","Unknown","Male","Male", "Female","Did not respond","Unknown","Female","Female"]}) fig = dfScatter(df) fig.savefig('fig1.png')
And that gives me:
As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).
I'm having trouble getting anything but numerical values to work with the colormaps.
这篇关于Matplotlib 中的按列值着色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
- With