Matplotlib直方图时间与百分比(NBA统计数据) [英] Matplotlib Histogram Time vs. Percentage (NBA stats)

查看:92
本文介绍了Matplotlib直方图时间与百分比(NBA统计数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我遇到了一个问题,我正在努力解决这个问题.在pandas/matplotlib中非常新.

I'm having an issue and I'm trying to wrap my head around it. Very new at pandas/matplotlib.

我想显示一个直方图,其中射击时钟(0-24秒)在X轴上为bin,在Y轴上为成功/未命中的百分比.

I want to show a histogram with the shot clock (0-24sec) in bins on the X-axis and the percentage of makes/misses on the Y-axis.

我的数据在一列中有击球时钟,在另一列中显示未命中/命中(0 和 1).我很难弄清楚如何根据 bin 生成百分比.

My data has the shot clock in one column and in another, shows misses/makes (0 and 1). I'm having a hard time figuring out how to generate percentage based on bin.

非常感谢

import matplotlib.pyplot as plt
fig = plt.figure()
x = nba_hist['SHOT_CLOCK']
y = nba_hist['FGM']
plt.hist(x)
plt.show()


SHOT_CLOCK  FGM
10.8        1
3.4         0
5.0         0
10.3        0
10.9        0
9.1         0
14.5        0
3.4         1
12.4        0
17.4        0
16          0
12.1        1
4.3         1

因此,通过这段代码,我得到了射门得分的百分比,但它没有分散在垃圾箱中.有什么想法吗?

So with this code I'm getting the field goal percentage but its not spread across the bins. Any ideas?

df_miss=nba_hist[nba_hist['FGM'] == 0]
df_hits=nba_hist[nba_hist['FGM'] == 1]

bins=np.arange(0,25,6)
hist_hits, bins_ = np.histogram(df_hits['FGM'], bins=bins)
hist_miss, bins_ = np.histogram(df_miss['FGM'], bins=bins)

推荐答案

可通过将绝对频率除以事件总数来获得垃圾箱中事件的相对频率.

The relative frequency of the events in the bins would be obtained by dividing the absolute frequency by the total number of events.

因此,您需要计算直方图,例如与 numpy

You would therefore need to calculate the histogram, e.g. with numpy

hist, bins = np.histogram(x)

根据是否将每个仓中的事件数除以总事件数来得出不同的图.
从左侧的那个你可以很容易地掌握,例如较大时钟时间的命中率较高(这当然对真实数据可能没有意义).从右边的图上,您宁愿了解到在中等时钟时间进行了更多的试验-如果仅显示相对命中,则根本看不到.

Depending on whether you then divide by the number of events within each bin, or the number of total events you can get different plots.
From the one on the left hand side you can easily grasp that e.g. the hit rate is higher for a larger clock time (this may not make sense for the real data of course). From the plot on the right you would rather grasp that more trials were made for medium clock times - something that is not seen at all if you only show the relative hits.

from __future__ import division
import pandas as pd
import numpy as np; np.random.seed(2)
import matplotlib.pyplot as plt

t = np.random.rand(100)*24
hit = np.random.randint(0,2, size=100)
df = pd.DataFrame({"time":t, "hits":hit})
df_miss=df[df.hits == 0]
df_hits=df[df.hits == 1]

bins=np.arange(0,28,4)
hist_hits, bins_ = np.histogram(df_hits.time, bins=bins)
hist_miss, bins_ = np.histogram(df_miss.time, bins=bins)

rel_hits = hist_hits/(hist_hits+hist_miss)*100.
rel_miss = hist_miss/(hist_hits+hist_miss)*100.

rel_hits_n = hist_hits/np.sum(hist_hits+hist_miss)*100.
rel_miss_n = hist_miss/np.sum(hist_hits+hist_miss)*100.


fig , (ax, ax2) = plt.subplots(ncols=2, figsize=(7,3))

ax.bar(bins[:-1], rel_hits, width=4,  
       color="mediumseagreen", align="edge", ec="k")
ax.bar(bins[:-1], rel_miss,  bottom=rel_hits, width=4, 
       color="tomato", align="edge", ec="k")
ax.set_xticks(bins)
ax.set_ylabel("relative hits and misses [%]")
ax2.bar(bins[:-1], rel_hits_n, width=4,  
        color="mediumseagreen", align="edge", ec="k", label="hit")
ax2.bar(bins[:-1], rel_miss_n,  bottom=rel_hits_n, width=4, 
        color="tomato", align="edge", ec="k", label="miss")
ax2.set_xticks(bins)
ax2.set_ylabel("normalized hits and misses [%]")

plt.legend()
plt.tight_layout()
plt.show()

这篇关于Matplotlib直方图时间与百分比(NBA统计数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆