使用带有for循环的分组值的数据框制作时间线图 [英] Making a timeline graph with a dataframe with grouped values needing a for loop

查看:52
本文介绍了使用带有for循环的分组值的数据框制作时间线图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有数据框:

values               start time   end time
Ed, Taylor, Liv       0:00:00      0:00:15 
Ed, Liv, Peter        0:00:15      0:00:30
Taylor, Liv, Peter    0:00:30      0:00:49
Ed, Liv, Peter        0:00:49      0:01:02

我该如何遍历值并创建时间线(最有可能在matplotlib中,也许是plt.broken_barh())来绘制它们在值"列中的时间段?例如,X 轴将跨越 0:00:00 到 0:01:02(存在最小值和最大值),而 Ed 的条形将从 0:00:00 到 0:00:15, 0:00:15到0:30,0:00:30到0:00:49不在,0:00:49到0:01:02回来.在迭代完 Ed 之后,它会依次执行 Taylor、Liv 和 Peter(将包含在 values.unique() 中的值)以完成一个包含 4 个条形图,其中缺少片段,其中没有时间序列值值"的元素

How could I iterate over values and create a timeline (most likely in matplotlib, maybe plt.broken_barh() ) that plots the segments of time that they are within the column "values?" For example, the X axis would span 0:00:00 to 0:01:02 (min and max values present) and the bar for Ed would go from 0:00:00 to 0:00:15, 0:00:15 to 0:30, be absent from 0:00:30 to 0:00:49, and come back up from 0:00:49 to 0:01:02. After iterating through Ed, it would do Taylor, Liv, and then Peter (the values that would be contained in values.unique() ) to finish with a graph with 4 bars with missing segments where there is not a time series value for the element of "values"

我不太熟悉时间序列数据,尤其是当我要绘制的值只是一列中存在字符串而不是诸如金钱或温度之类的值时.基本上,我要寻找的只是价值是否存在于时间线上.

I'm fairly unfamiliar with time series data, especially when the value I'm looking to plot is just the presence of a string within a column as opposed to a value like money or temperature. Basically all I'm looking for is whether the value is present on a timeline or not.

推荐答案

设置数据框的方法使用起来并不那么简单.由于所有名称都放在一个复合字符串中,因此需要将它们分开才能使用.

The way the dataframe is set up is not so straightforward to use. As all the names are put together in a compound string, they need to be separated to be useable.

可以使用 pd.to_datatime 将时间戳转换为熊猫时间戳.

The timestamps can be converted to pandas timestamps using pd.to_datatime.

这是一种显示数据的方法.许多其他方法也是可能的,例如为每个人创建一个带有布尔值的列,以判断他们是否包含在 values 列中.

Here is a way to display the data. Many other approaches are possible, such as creating a column for each person with a boolean to tell whether they are included in the values column.

from matplotlib import pyplot as plt
import pandas as pd
from datetime import datetime
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()

df = pd.DataFrame([['Ed, Taylor, Liv', '0:00:00', '0:00:15'],
                   ['Ed, Liv, Peter', '0:00:15', '0:00:30'],
                   ['Taylor, Liv, Peter', '0:00:30', '0:00:49'],
                   ['Ed, Liv, Peter', '0:00:49', '0:01:02']],
                  columns=['values', 'start time', 'end time'])
df['start time'] = pd.to_datetime(df['start time'])
df['end time'] = pd.to_datetime(df['end time'])

persons_set = set(name.strip() for names in df['values'] for name in names.split(","))
persons = {p: i for i, p in enumerate(sorted(persons_set))}
print(persons)
for person in persons:
    periods = []
    for names, start, end in zip(df['values'], df['start time'], df['end time']):
        if person in set(name.strip() for name in names.split(",")):
            periods.append((start, end - start))
    plt.broken_barh(periods, (persons[person] - 0.45, 0.9),
                    facecolors=plt.cm.plasma(persons[person] / len(persons)))

plt.yticks(range(len(persons)), persons)
plt.show()

这篇关于使用带有for循环的分组值的数据框制作时间线图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆