Matplotlib中按列值着色 [英] Color by Column Values in Matplotlib
问题描述
在R中使用ggplot2
库的我最喜欢的方面之一是能够轻松指定美观.我可以快速创建散点图并应用与特定列关联的颜色,我希望能够使用python/pandas/matplotlib做到这一点.我想知道人们是否使用了一些便利功能来使用pandas数据框和Matplotlib将颜色映射到值?
One of my favorite aspects of using the ggplot2
library in R is the ability to easily specify aesthetics. I can quickly make a scatterplot and apply color associated with a specific column and I would love to be able to do this with python/pandas/matplotlib. I'm wondering if there are there any convenience functions that people use to map colors to values using pandas dataframes and Matplotlib?
##ggplot scatterplot example with R dataframe, `df`, colored by col3
ggplot(data = df, aes(x=col1, y=col2, color=col3)) + geom_point()
##ideal situation with pandas dataframe, 'df', where colors are chosen by col3
df.plot(x=col1,y=col2,color=col3)
感谢您的答复,但我想提供一个示例数据框来阐明我的要求.两列包含数值数据,第三列是类别变量.我正在考虑的脚本将基于该值分配颜色.
Thank you for your responses but I want to include a sample dataframe to clarify what I am asking. Two columns contain numerical data and the third is a categorical variable. The script I am thinking of will assign colors based on this value.
import pandas as pd
df = pd.DataFrame({'Height':np.random.normal(10),
'Weight':np.random.normal(10),
'Gender': ["Male","Male","Male","Male","Male",
"Female","Female","Female","Female","Female"]})
推荐答案
2015年10月更新
Seaborn出色地处理了这个用例:
Update October 2015
Seaborn handles this use-case splendidly:
import numpy
import pandas
from matplotlib import pyplot
import seaborn
seaborn.set(style='ticks')
numpy.random.seed(0)
N = 37
_genders= ['Female', 'Male', 'Non-binary', 'No Response']
df = pandas.DataFrame({
'Height (cm)': numpy.random.uniform(low=130, high=200, size=N),
'Weight (kg)': numpy.random.uniform(low=30, high=100, size=N),
'Gender': numpy.random.choice(_genders, size=N)
})
fg = seaborn.FacetGrid(data=df, hue='Gender', hue_order=_genders, aspect=1.61)
fg.map(pyplot.scatter, 'Weight (kg)', 'Height (cm)').add_legend()
立即输出:
在这种情况下,我将直接使用matplotlib.
In this case, I would use matplotlib directly.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
def dfScatter(df, xcol='Height', ycol='Weight', catcol='Gender'):
fig, ax = plt.subplots()
categories = np.unique(df[catcol])
colors = np.linspace(0, 1, len(categories))
colordict = dict(zip(categories, colors))
df["Color"] = df[catcol].apply(lambda x: colordict[x])
ax.scatter(df[xcol], df[ycol], c=df.Color)
return fig
if 1:
df = pd.DataFrame({'Height':np.random.normal(size=10),
'Weight':np.random.normal(size=10),
'Gender': ["Male","Male","Unknown","Male","Male",
"Female","Did not respond","Unknown","Female","Female"]})
fig = dfScatter(df)
fig.savefig('fig1.png')
那给了我
据我所知,该颜色列可以是任何与matplotlib兼容的颜色(RBGA元组,HTML名称,十六进制值等).
As far as I know, that color column can be any matplotlib compatible color (RBGA tuples, HTML names, hex values, etc).
我很难获得除数值以外的任何颜色来使用颜色图.
I'm having trouble getting anything but numerical values to work with the colormaps.
这篇关于Matplotlib中按列值着色的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!