gnuplot:使用字符组合的热图 [英] gnuplot: Heatmap using character combinations

查看:112
本文介绍了gnuplot:使用字符组合的热图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在分析文本中的两个字符组合,我想使用gnuplot可视化热图中的频率.我的输入文件采用以下格式(COUNT代表此组合的实际数量)

I am currently analysing two character combinations in texts and I want to visualize the frequencies in a heatmap using gnuplot. My input file is in the format (COUNT stands for the actual number of this combination)

a a COUNT
a b COUNT
...
z y COUNT
z z COUNT

现在,我想创建一个热图(例如在此显示的第一个网站).我想在x轴和y轴上显示A-Z中的字符,即

Now I'd like to create a heatmap (like the first one that is shown on this site). On the x axis as well on the y axis I'd like to display the characters from A-Z, i.e.

a
b
...
z
     a b ... z

我对gnuplot还是很陌生,所以我尝试了plot "input.dat" using 2:1:3 with images,这会导致错误消息无法以空的x范围进行绘图".我天真地运行set xrange['a':'z']的方法并没有太大帮助.

I am pretty new to gnuplot, so I tried plot "input.dat" using 2:1:3 with images, which results in an error message "Can't plot with an empty x range". My naive approach to run set xrange['a':'z'] did not help much.

关于SO有很多相关问题,但是它们要么处理数字x值(例如 gnuplot:用行和列名称标记矩阵(热图)的x和y轴)

There are a bunch of related questions on SO, but they either deal with numeric x-values (e.g. Heatmap with Gnuplot on a non-uniform grid) or with different input data formats (e.g. gnuplot: label x and y-axis of matrix (heatmap) with row and column names)

所以我的问题是:将输入文件转换成漂亮的gnuplot热图的最简单方法是什么?

So my question is: What is the easiest way to transform my input file into a nice gnuplot heatmap?

推荐答案

您需要将字母字符转换为整数.可以在gnuplot中以某种方式执行此操作,但这可能很麻烦.

You need to convert the alphabet characters to integers. It might be possible to do this somehow in gnuplot, but it would probably be messy.

我的解决方案是使用快速的python脚本转换数据文件(假设它被称为data.dat):

My solution would be to use a quick python script to convert the datafile (let's say it is called data.dat):

#!/usr/bin/env python2.7

with open('data.dat', 'r') as i:
    with open('data2.dat', 'w') as o:
        lines = i.readlines()
        for line in lines:
            line = line.split()
            x = str(ord(line[0].lower()) - ord('a'))
            y = str(ord(line[1].lower()) - ord('a'))
            o.write("%s %s %s\n" % (x, y, line[2]))

这需要一个像这样的文件:

This takes a file like this:

a a 1
a b 2
a c 3
b a 4
b b 5
b c 6
c a 7
c b 8
c c 9

并将其转换为:

0 0 1
0 1 2
0 2 3
1 0 4
1 1 5
1 2 6
2 0 7
2 1 8
2 2 9

然后您可以在gnuplot中对其进行绘制:

Then you can plot it in gnuplot:

#!/usr/bin/env gnuplot

set terminal pngcairo
set output 'test.png'

set xtics ("a" 0, "b" 1, "c" 2)
set ytics ("a" 0, "b" 1, "c" 2)

set xlabel 'First Character'
set ylabel 'Second Character'

set title 'Character Combination Counts'

plot 'data2.dat' with image

以这种方式手动设置tic有点笨拙,但效果很好.

It's a little clunky to set the tics manually that way, but it works fine.

这篇关于gnuplot:使用字符组合的热图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆