如何在可绘制的时间序列图表中添加和定义多条线? [英] How do I add and define multiple lines in a plotly time series chart?

查看:70
本文介绍了如何在可绘制的时间序列图表中添加和定义多条线?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 python 的 plotly 库创建一个基于线的时间序列图.我想将其连接到时间序列数据库,但目前我一直在使用 csv 数据进行测试.

I'm creating a line based time series graph using the plotly library for python. I'd like to connect it to a time series database, but for now I've been testing with csv data.

是否可以有 xy 轴(时间与值),并从另一个 csv 列值(主机)加载多行并附加到 x和 y 图?

Is it possible to have an x and y axis (time vs value), and load multiple lines from another csv column value (host) and append to the x and y graph?

import pandas as pd
import plotly.express as px

 df = pd.read_csv('stats.csv')

 fig = px.line(df, x = 'time', y = 'connections', title='connections')
 fig.show()

我想在具有特定 csv 主机列值的同一图形上定义多条线,以便每一行由 host 列中的任何内容定义,并使用 time vs connections 轴.px.line 方法是否适用于该用例,还是我应该查看其他方法?

I'd like to define more than one line on the same graph with a particular csv host column value, so that each line is defined by anything in the host column, and uses the time vs connections axis. Can the px.line method work for that use case, or should I be looking at another method?

推荐答案

有了 plotly,您的源是数据库连接还是 csv 文件都无关紧要.无论哪种方式,您很可能都会通过 Pandas 数据框处理该部分.但是既然您在谈论数据库,我将向您展示如何在具有典型数据库结构的数据集上轻松构建绘图图表,您通常不得不依赖数据的分组和子集来显示更改随着时间的推移,您的数据的不同子类别.Plotly express 有一些有趣的数据集尝试(dir(px.data)),比如 gapminder 数据集:

With plotly it shouldn't matter whether your sources are database connections or csv files. You'll most likely handle that part through pandas dataframes either way. But since you're talking about databases, I'm going to show you how you can easily build a plotly chart on a dataset with a typical database structure where you often have to rely on grouping and subsetting of the data in order to show changes over time for different subcategories of your data. Plotly express has got a few interesting datasets try (dir(px.data)), like the gapminder dataset:

    country continent   year    lifeExp pop gdpPercap   iso_alpha   iso_num
0   Afghanistan Asia    1952    28.801  8425333 779.445314  AFG 4
1   Afghanistan Asia    1957    30.332  9240934 820.853030  AFG 4
2   Afghanistan Asia    1962    31.997  10267083    853.100710  AFG 4
3   Afghanistan Asia    1967    34.020  11537966    836.197138  AFG 4
4   Afghanistan Asia    1972    36.088  13079460    739.981106  AFG 4

如果您使用正确的方法,您可以轻松地使用 px.line() 在这样的数据集上构建图形,并让图形函数为您处理分组.甚至稍后使用相同的函数向该数字添加数据.下面的图是使用 px.line()go.Figure()add_traces

If you use the correct approach, you can easily use px.line() to build a figure on such a dataset and let the figure function take care of the grouping for you. And even use the same function to add data to that figure later. The following figures below are built using a combination of px.line(), go.Figure() and add_traces

图 1: 使用 px.line()

此图显示了欧洲大陆人均国内生产总值最高的五个国家.数据使用诸如 color='country' 之类的参数进行分组.

This plot shows the five countries with the highset gross domestic product per capita on the European continent. The data is grouped using arguments like color='country'.

图 2:向同一图中添加数据

此图将美洲大陆人均国内生产总值最高的五个国家添加到第一个图中.这引发了以另一种方式识别数据的需要,以便能够查看数据是欧洲的还是美洲的.这是使用参数 line_dash='country' 处理的,因此与原始图相比的所有新数据都有虚线.

This plot adds the five countries with the highest gross domestic product per capita on the american continent to the first plot. This triggers the need to discern the data in one more way to make it possible to see wheter the data is european or american. This is handled using the argument line_dash='country' so that all new data compared to the original plot have dashed lines.

Tihs 只是一种方法.如果最终结果符合您的要求,我们也可以讨论其他方法.

Tihs is only one way to do it. If the end result is what you're looking for, we can discuss other approaches as well.

完整代码:

import plotly.graph_objs as go
import plotly.express as px
import pandas as pd

# Data
gapminder = px.data.gapminder()

# Most productive european countries (as of 2007)
df_eur = gapminder[gapminder['continent']=='Europe']
df_eur_2007 = df_eur[df_eur['year']==2007]
eur_gdp_top5=df_eur_2007.nlargest(5, 'gdpPercap')['country'].tolist()
df_eur_gdp_top5 = df_eur[df_eur['country'].isin(eur_gdp_top5)]

# Most productive countries on the american continent (as of 2007)
df_ame = gapminder[gapminder['continent']=='Americas']
df_ame_2007 = df_ame[df_ame['year']==2007]
df_ame_top5=df_ame_2007.nlargest(5, 'gdpPercap')['country'].tolist()
df_ame_gdp_top5 = df_ame[df_ame['country'].isin(df_ame_top5)]

# Plotly figure 1
fig = px.line(df_eur_gdp_top5, x='year', y='gdpPercap',
              color="country",
              line_group="country", hover_name="country")
fig.update_layout(title='Productivity, Europe' , showlegend=False)


# Plotly figure 2
fig2 = go.Figure(fig.add_traces(
                 data=px.line(df_ame_gdp_top5, x='year', y='gdpPercap',
                              color="country",
                              line_group="country", line_dash='country', hover_name="country")._data))
fig2.update_layout(title='Productivity, Europe and America', showlegend=False)

#fig.show()
fig2.show()

这篇关于如何在可绘制的时间序列图表中添加和定义多条线?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆