pandas.plot参数c vs s [英] pandas.plot argument c vs s

查看:87
本文介绍了pandas.plot参数c vs s的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从python的一本机器学习书中获得以下代码:

I have the following code from a machine learning book in python:

copy_set.plot(kind = "scatter" , x = "longitude" , 
              y = "latitude" , alpha = 0.4 , 
              s = copy_set[ "population" ], 
              label = "population" , figsize=(10,7), 
              c = "median_house_value" , cmap = plt.get_cmap ( "jet" ) ) 

median_house_valuepopulationcopy_set数据帧中的两列.我不明白为什么对于参数s我必须使用copy_set['population'],但是对于参数c,只能使用列名median_house_value.当我尝试仅对参数s使用列名时,收到一条错误消息:

median_house_value and population are two columns in the copy_set dataframe. I don't understand why for argument s I have to use copy_set['population'] but for argument c it is possible to only use the column name median_house_value. When I try to only use the column name for parameter s, I get an an error message:

TypeError: ufunc 'sqrt' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

推荐答案

很好的问题. df.plot是matplotlib几个绘图功能的包装.对于kind="scatter",将调用matplotlib的scatter函数.首先,将df.plot()的大多数参数转换为Series中的数据,这些数据是从相应名称的数据框的列中获取的.

Very good question. df.plot is a wrapper around several of matplotlib's plotting functions. For kind="scatter" matplotlib's scatter function will be called. Most of the arguments to df.plot() are first converted to the data within the Series you get from the dataframe's column of the respective name.

例如

df.plot(x="lon", y="lat")

将转换为

ax.scatter(x=df["lon"].values, y=df["lat"].values)

其余参数传递给分散点, 因此

Remaining arguments are passed through to scatter, hence

df.plot(x="lon", y="lat", some_argument_pandas_doesnt_know=True)

将导致

ax.scatter(x=df["lon"].values, y=df["lat"].values, some_argument_pandas_doesnt_know=True)

因此,尽管pandas转换了参数xyc,但对于s却不这样做.因此,s可以简单地传递给ax.scatter,但是该matplotlib函数不知道像"population"这样的某些字符串的含义.
对于传递给matplotlib函数的参数,需要坚持使用matplotlib的签名,并且在s的情况下直接提供数据.

So while pandas converts th arguments x, y, c, it doesn't do so for s. s is hence simply passed on to ax.scatter, but that matplotlib function doesn't know what some string like "population" would mean.
For arguments that are passed on to the matplotlib function one would need to stick to matplotlib's signature and in the case of s supply the data directly.

但是请注意,matplotlib的分散本身也允许使用字符串作为其参数.但是,这需要告诉它应从哪个数据集中获取它们.这是通过data参数完成的.因此,以下方法可以很好地工作,并且等同于问题中的pandas调用的matplotlib:

Note however, that matplotlib's scatter itself also allows to use strings for its arguments. This however requires to tell it from which dataset they shall be taken. This is done via the data argument. Hence the following works fine and would be the matplotlib equivalent to the pandas call in the question:

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np; np.random.seed(42)

df = pd.DataFrame(np.random.rand(20,2), columns=["lon", "lat"])
df["pop"] = np.random.randint(5,300,size=20)
df["med"] = np.random.rand(20)*1e5

fig, ax = plt.subplots(figsize=(10,7))
sc = ax.scatter(x = "lon", y = "lat", alpha = 0.4, 
                s = "pop", label = "population" , 
                c = "med" , cmap = "jet", data=df)
fig.colorbar(sc, label="med")
ax.set(xlabel="longitude", ylabel="latitude")

plt.show()

最后,您现在可能会问,通过data参数将数据提供给matplotlib是否同样不能通过熊猫包装器来实现.不幸的是没有,因为熊猫在内部使用data作为参数,因此它不会被传递. 因此,您有两个选择:

Finally you may now ask whether supplying the data to matplotlib via the data argument would not equally be possible via passing through the pandas wrapper. Unfortunately not, because pandas uses data as argument internally such that it'll not be passed through. Therefore your two options are:

  1. 在问题中使用熊猫,并通过s参数而不是列名提供数据本身.
  2. 使用如下所示的matplotlib并为所有参数使用列名. (或者使用数据本身,这是您在查看matplotlib代码时最常看到的.)
  1. Use pandas as in the question and supply the data itself via the s argument instead of the column name.
  2. Use matplotlib as shown here and use column names for all arguments. (Or use the data itself, which you see most often when looking at matplotlib code.)

这篇关于pandas.plot参数c vs s的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆