带有跳过坐标的平行坐标图 [英] Parallel coordinates plot with skipped coordinates

查看:62
本文介绍了带有跳过坐标的平行坐标图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

人们在 100 m、400 m、1600 m 赛道上比赛,他们的完成时间被记录下来.我想在平行坐标图中显示每个赛车手的数据.一些赛车手可能无法完成赛道.在这种情况下,我想以某种方式标记它,无论是通过无穷远点还是某种特定轨道的颜色.

People are racing at 100 m, 400 m, 1600 m tracks and their finish time is recorded. I want to present data for each racer in parallel coordinates plot. Some racers may not finish the track. In this case I would like to mark it somehow, either by an infinity point or somehow with a color for a specific track.

作为一个例子,我在油漆中绘制了一个平行坐标图:
Lazyman 还没有完成 1600m 的轨道,这个是用 x 标记的.

As an example I made a parallel coordinates plot in paint:
Lazyman hasn't finished the 1600m track and this is marked with x.

以下racing.csv"中给出了示例数据集:

An example data set is given in the following "racing.csv":

RACER,TRACK.100m,TRACK.400m,TRACK.1500m
Superman,0.1,0.5,1
Lazyman,200,900,Inf

我已经尝试过使用熊猫的解决方案:

I have tried a solution with pandas:

import pandas
from pandas.tools.plotting import parallel_coordinates
import matplotlib.pyplot as plt

d = pandas.read_csv('racing.csv')

f = plt.figure()
parallel_coordinates(d, 'RACER')
f.axes[0].set_yscale('log')

plt.show()

这给出了 Lazyman 在 1600m 处没有 Inf 值的图:

This gives a plot without Inf value for Lazyman at 1600m:

我还为ggplot准备了一个csv(可能有更好的方法来做到这一点):

I also prepared a csv for ggplot (there may be a better way to do this):

RACER,TRACK,TIME
Superman,100m,0.1
Superman,400m,0.5
Superman,1600m,1
Lazyman,100m,200
Lazyman,400m,900
Lazyman,1600m,Inf

使用 ggplot:

require(ggplot2)
d <- read.csv('racing2.csv')
g <- ggplot(d) + geom_line(aes(x=TRACK,y=TIME,group=RACER, color=RACER))
g <- g + scale_y_log10()
ggsave('ggplot.png')

我靠近了:


因为这显示了一个无穷大值,但没有对其进行任何注释.


as this shows an infinity value, but doesn't make any annotation to it.

任何解决方案,无论是 Python 还是 R,都将不胜感激.此外,感谢有关标记未完成比赛的建议.

Any solution, either Python or R, will be appreciated. Also, suggestions regarding marking unfinished races are appreciated.

推荐答案

With R and ggplot2:

With R and ggplot2:

构建一些虚假数据:

df <- data.frame(ID = factor(c(rep(1, 3), rep(2, 3), rep(3, 3)), labels = c('Realman', 'Lazyman', 'Superman')),
             race = factor(rep(seq(1,3,1), 3), labels = c('100m', '400m', '1600m')),
             runTime = c(8.9, 20.5, 150.9, 100.1, 300.3, +Inf, 1.2, 5, +Inf))

        ID  race runTime
# 1  Realman  100m     8.9
# 2  Realman  400m    20.5
# 3  Realman 1600m   150.9
# 4  Lazyman  100m   100.1
# 5  Lazyman  400m   300.3
# 6  Lazyman 1600m     Inf
# 7 Superman  100m     1.2
# 8 Superman  400m     5.0
# 9 Superman 1600m     Inf

结果:

ggplot(filter(df, runTime != +Inf), aes(x = race, y = runTime, group = ID, color = ID)) + 
    geom_line(size = 2) +
    geom_point(size = 4) +

    geom_line(data = df, linetype = 'dashed', size = 1) +        
    geom_point(data = df, shape = 21, size = 1) +

    geom_text(aes(label = runTime), position = position_nudge(y = -.1)) +

    scale_y_continuous(trans = 'log10', breaks = c(1, 10, 100, 1000)) +
    scale_x_discrete('Track') +
    scale_color_manual('Racer', values = brewer.pal(length(levels(df$ID)), 'Set1')) +

    theme(panel.background = element_blank(),
          panel.grid.major.x = element_line(colour = 'lightgrey', size = 25),
          legend.position = 'top',
          axis.line.y = element_line('black', .5, arrow = arrow()))

这篇关于带有跳过坐标的平行坐标图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆