pandas.plot参数c vs s [英] pandas.plot argument c vs s
问题描述
我从python的一本机器学习书中获得以下代码:
I have the following code from a machine learning book in python:
copy_set.plot(kind = "scatter" , x = "longitude" ,
y = "latitude" , alpha = 0.4 ,
s = copy_set[ "population" ],
label = "population" , figsize=(10,7),
c = "median_house_value" , cmap = plt.get_cmap ( "jet" ) )
median_house_value
和population
是copy_set
数据帧中的两列.我不明白为什么对于参数s
我必须使用copy_set['population']
,但是对于参数c
,只能使用列名median_house_value
.当我尝试仅对参数s
使用列名时,收到一条错误消息:
median_house_value
and population
are two columns in the copy_set
dataframe. I don't understand why for argument s
I have to use copy_set['population']
but for argument c
it is possible to only use the column name median_house_value
. When I try to only use the column name for parameter s
, I get an an error message:
TypeError: ufunc 'sqrt' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
推荐答案
很好的问题. df.plot
是matplotlib几个绘图功能的包装.对于kind="scatter"
,将调用matplotlib的scatter
函数.首先,将df.plot()
的大多数参数转换为Series
中的数据,这些数据是从相应名称的数据框的列中获取的.
Very good question. df.plot
is a wrapper around several of matplotlib's plotting functions. For kind="scatter"
matplotlib's scatter
function will be called. Most of the arguments to df.plot()
are first converted to the data within the Series
you get from the dataframe's column of the respective name.
例如
df.plot(x="lon", y="lat")
将转换为
ax.scatter(x=df["lon"].values, y=df["lat"].values)
其余参数传递给分散点, 因此
Remaining arguments are passed through to scatter, hence
df.plot(x="lon", y="lat", some_argument_pandas_doesnt_know=True)
将导致
ax.scatter(x=df["lon"].values, y=df["lat"].values, some_argument_pandas_doesnt_know=True)
因此,尽管pandas转换了参数x
,y
,c
,但对于s
却不这样做.因此,s
可以简单地传递给ax.scatter
,但是该matplotlib函数不知道像"population"
这样的某些字符串的含义.
对于传递给matplotlib函数的参数,需要坚持使用matplotlib的签名,并且在s
的情况下直接提供数据.
So while pandas converts th arguments x
, y
, c
, it doesn't do so for s
. s
is hence simply passed on to ax.scatter
, but that matplotlib function doesn't know what some string like "population"
would mean.
For arguments that are passed on to the matplotlib function one would need to stick to matplotlib's signature and in the case of s
supply the data directly.
但是请注意,matplotlib的分散本身也允许使用字符串作为其参数.但是,这需要告诉它应从哪个数据集中获取它们.这是通过data
参数完成的.因此,以下方法可以很好地工作,并且等同于问题中的pandas调用的matplotlib:
Note however, that matplotlib's scatter itself also allows to use strings for its arguments. This however requires to tell it from which dataset they shall be taken. This is done via the data
argument. Hence the following works fine and would be the matplotlib equivalent to the pandas call in the question:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(42)
df = pd.DataFrame(np.random.rand(20,2), columns=["lon", "lat"])
df["pop"] = np.random.randint(5,300,size=20)
df["med"] = np.random.rand(20)*1e5
fig, ax = plt.subplots(figsize=(10,7))
sc = ax.scatter(x = "lon", y = "lat", alpha = 0.4,
s = "pop", label = "population" ,
c = "med" , cmap = "jet", data=df)
fig.colorbar(sc, label="med")
ax.set(xlabel="longitude", ylabel="latitude")
plt.show()
最后,您现在可能会问,通过data
参数将数据提供给matplotlib是否同样不能通过熊猫包装器来实现.不幸的是没有,因为熊猫在内部使用data
作为参数,因此它不会被传递.
因此,您有两个选择:
Finally you may now ask whether supplying the data to matplotlib via the data
argument would not equally be possible via passing through the pandas wrapper. Unfortunately not, because pandas uses data
as argument internally such that it'll not be passed through.
Therefore your two options are:
- 在问题中使用熊猫,并通过
s
参数而不是列名提供数据本身. - 使用如下所示的matplotlib并为所有参数使用列名. (或者使用数据本身,这是您在查看matplotlib代码时最常看到的.)
- Use pandas as in the question and supply the data itself via the
s
argument instead of the column name. - Use matplotlib as shown here and use column names for all arguments. (Or use the data itself, which you see most often when looking at matplotlib code.)
这篇关于pandas.plot参数c vs s的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!