带有标记目标的线的动态直方图子图 [英] Dynamic histogram subplots with line to mark target

查看:47
本文介绍了带有标记目标的线的动态直方图子图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试发布一些类似的解决方案,以便运气不佳.

I've been trying to get some posted similar solutions to work with no luck.

我正在尝试获取我们制造过程中所有 Step NoCost 直方图.每个零件有不同的步骤数,因此我想在每个零件的一个图/图像上具有一组直方图.

I am trying to get histograms for Cost for all the Step No in our manufacturing process. There are a different number of steps for each part, so I want to have a set of histograms on one plot/image for each part.

在我的真实数据中有很多部分,所以如果这可以遍历很多部分并保存理想的图表.

In my real data there are many parts so if this could loop through many parts and save the graphs that would be ideal.

此外,对于我想叠加在直方图上的每个步骤,我们都有一个目标成本.这在单独的数据框中表示.我卡在了子图的循环中,所以我还没有尝试这个.

Additionally we have a target cost for each step that I want to overlay on the histogram. This is represented in a separate dataframe. I got stuck on the loop for the subplots so I didn't try this yet.

以下是我所能找到的关于每个步骤直方图应该是什么样子的内容:

Here's as close as to what I can find for what each step histogram should look like:

到目前为止,这是我的代码:

Here is my code so far:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
    df2 = df1[df1['Step No'].isin([step])]
    axs[step].hist(df2['Cost'])
plt.show()

预先感谢您帮助我!

这是目标成本,我希望在直方图中显示为垂直线:

Here is the Target Cost I'd like to be shown as vertical line on the histogram:

PartNo  StepNo  TargetCost
ABC     10      12
ABC     20      20
ABC     30     13

以下是一些历史数据示例,这些数据应放在直方图的bin中:

Here's some sample historical data which should be in bins in the histogram:

PartNo  SerialNo    StepNo  CostType    Cost
ABC      123        10      Labor       11
ABC      123        10      Material    16
ABC      456        10      Labor       21
ABC      456        10      Material    26
ABC      789        10      Labor       21
ABC      789        10      Material    16
ABC      1011       10      Labor       11
ABC      1011       10      Material    6
ABC      1112       10      Labor       1
ABC      1112       10      Material    -4
ABC      123        20      Labor       11
ABC      123        20      Material    19
ABC      456        20      Labor       24
ABC      456        20      Material    29
ABC      789        20      Labor       24
ABC      789        20      Material    19
ABC      1011       20      Labor       14
ABC      1011       20      Material    9
ABC      1112       20      Labor       4
ABC      1112       20      Material    -1
ABC      123        30      Labor       11
ABC      123        30      Material    13
ABC      456        30      Labor       18
ABC      456        30      Material    23
ABC      789        30      Labor       18
ABC      789        30      Material    13
ABC      1011       30      Labor       8
ABC      1011       30      Material    3
ABC      1112       30      Labor       -2
ABC      1112       30      Material    -7

还有第二个样本数据集:

And a second sample dataset:

PartNo  SerialNo    StepNo  CostType    Cost
DEF     Aplha       10  Labor   2
DEF     Zed         10  Labor   3
DEF     Kelly       10  Labor   4
DEF     Aplha       20  Labor   3
DEF     Zed         20  Labor   2
DEF     Kelly       20  Labor   5
DEF     Aplha       30  Labor   6
DEF     Zed         30  Labor   7
DEF     Kelly       30  Labor   5
DEF     Aplha       40  Labor   3
DEF     Zed         40  Labor   4
DEF     Kelly       40  Labor   2
DEF     Aplha       50  Labor   8
DEF     Zed         50  Labor   9
DEF     Kelly       50  Labor   7

推荐答案

您找不到可以直接为您的数据集解决此问题的直方图函数.您需要以适合您需求的方式聚合数据,然后用条形图表示您的发现.

You won't find a histogram function that solves this directly for your dataset. You'll need to aggregate the data in a way that suits your needs, and then represent your findings with a bar plot.

我发现您的目标和数据有点令人困惑,但是我认为在给出这些假设之后,我已经弄清了您的想法:

I find your objective and data a bit confusing, but I think I've figured out what you're after given these assumptions:

  1. 您想要汇总每个 StepNo 的成本
  2. 成本类型无关
  3. 必须计算总目标成本,因为您要汇总每个 StepNo 内的所有成本.

剧情

编辑

这不是OP想要的.经过一番来回,我们找到了一个似乎有效的解决方案

(从问题开始)我正在尝试获取所有步骤号的成本直方图

(from the question) I am trying to get histograms for Cost for all the Step No

(来自评论)我实际上想对每个步骤中每个序列号的成本之和有一个直方图.

(from a comment) I actually want to have a historgram for the sum of the cost per serial no in each step.

由于您必须在直方图中的 y 轴上有 count 或频率,因此您必须以某种有意义的方式聚合数据.在下面,您将看到每个步骤在每个SerialNO的汇总成本中所用的bin数量的计数.

Since you've got to have count or frequency on the y-axis in a histogram, you will have to aggregate the data in some way that makes sense. Below you'll see the count for a bin number of choice for aggregated costs of each SerialNO at each step.

结果:

代码:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab


# Load data in two steps:
# df1 = pd.read_clipboard(sep='\\s+')
# Part No Serial No   Step No Cost Type   Cost
# ABC      123        10      Labor       11
# ABC      123        10      Material    16
# ABC      456        10      Labor       21
# ABC      456        10      Material    26
# ...

# df2 = pd.read_clipboard(sep='\\s+')
# Part No Step No Target Cost
# ABC     10      12
# ABC     20      20
# ABC     30     13

# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)

# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()

df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])

# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']

# pylab.rcParams['figure.figsize'] = (2, 1)

def multiHist(x_data, x_label, bins):

    # Hisrogram setup
    fig, ax = plt.subplots()
    ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')

    # Horizontal line
    x0 = dfs['TargetTotal'].iloc[0]
    ax.axvline(x0, color='red', linewidth=2)

    # Annotation
    ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
            xycoords=('data', 'axes fraction'), textcoords='offset points',
            horizontalalignment='left', verticalalignment='center',
            arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
                            connectionstyle='angle,angleA=0,angleB=90,rad=10'),)

    # Labels
    ax.set_xlabel(x_label, color = 'grey')
    ax.legend(loc='upper left')
    plt.show()

# Identify and plot  data for each StepNo
for step in df3['StepNo'].unique():
    dfs = df3[df3['StepNo']==step]

    # Data to plot
    cost = dfs['Cost']
    labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])

    # Plot
    multiHist(x_data = cost, x_label = labels, bins = 4)    

这篇关于带有标记目标的线的动态直方图子图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆