使用排序的文件从原始文件中绘制带有相应Y值的X轴 [英] Using sorted file to plot X-axis with corresponding Y-values from the original file

查看:79
本文介绍了使用排序的文件从原始文件中绘制带有相应Y值的X轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

GitHub上的示例数据

我有一个csv文件,其中包含2列.第一列的格式为:name001.a.a,第二列为4位数字(例如:0001).

I have a csv file which has 2 columns. The first column is in the format of: name001.a.a and the second column is 4 digit number (ex: 0001).

我还有另一个文件已对上面文件的第一列进行了排序.

I have another file which has sorted first column of the file above.

索引第一列的目的是因为1)我有很多这些文件,将来会在同一张图中绘制出来2)我需要对它们进行排序.

The purpose in indexing the first column is because 1) I have many of these files that I will be plotting in a same graph in the future 2) I need them to be sorted.

具有两列的实际文件( us_csv_file )的格式如下:

The actual file (us_csv_file)which has both columns is in the format of following:

name002.a.a,0002
name001.a.a,0001
name005.a.a,0025

排序的CSV文件( hostnum.csv )-我用来对第一列进行排序的方式如下(定界符为TAB):

The sorted CSV file (hostnum.csv) - I use to sort the first column is as follows (delimiter is a TAB):

"1    name001.a.a"
"2    name002.a.a"
"3    name005.a.a"

我试图寻找其他解决方案,或者解决它,但是找不到.有人可以帮我提供代码吗?

I have tried to search for any other ideas to work around it, or solve it, but could not find it. Anyone could help me with the code please?

我的问题是:

如何使用排序后的文件绘制带有字符串标签(不带索引号)的X轴,但显示第一个文件中Y值对应的4位数字?

How can I use the sorted file to plot the X-axis with the label of strings (without the index numbers) but show the corresponding 4 digits number from the 1st file for Y-values?

我使用excel创建的示例图如下所示: 作为模型创建的图形

The sample graph I created using excel would look like this: Graph that was created as a model

-------------------------------------------- ---------------- 编辑1 ----------------------------------------- -------------------

------------------------------------------------------------ EDIT 1------------------------------------------------------------

*更新:图形显示在下面的代码之后* 新代码之后-GRAPH

from matplotlib import pyplot as plt
from matplotlib import ticker as ticker
from textwrap import wrap
import numpy as np
import csv

csv_file = []
with open('hostnum.csv', 'r') as host:
    for line in host.readlines():
        line = line.replace('"', '')
        line = line.strip('\n')
        rank, value = line.split("  ")
        csv_file.append(value)

us_csv_file = []
with open('firsFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file.append(line)

us_csv_file1 = []
with open('secondFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file1.append(line)

us_csv_file2 = []
with open('thirdFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file2.append(line)        

us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file1.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file2.sort(key=lambda x: csv_file.index(x[0]))


plt.title("\n".join(wrap("ery very very very long long long title title title that that that wrapped wrapped wrapped")))
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)



plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.plot([int(item[1]) for item in us_csv_file1], 'o-')
plt.plot([int(item[1]) for item in us_csv_file2], 'o-')

#plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.xticks(np.arange(len(csv_file))[::100], csv_file[::100])
plt.savefig('./test.png') #saves a picture of the graph to the file

plt.show()

-------------------------------------------- ---------------- 编辑2 ----------------------------------------- -------------------

------------------------------------------------------------ EDIT 2------------------------------------------------------------

将图更改为散点图.但是,值与x轴不匹配.添加了示例图片,但是应该有节点名称,而不是x轴上的数字,与上面的示例图片相同 更新的行:

Changed the plot to scatter. But, values do not match to x-axis. Added a sample picture, but instead of numbers in the x-axis, there should be node names, as same as my sample picture above Updated lines:

plt.scatter(range(len(us_csv_file)), [int(item[1]) for item in us_csv_file], c='r')

#plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.xticks(np.arange(len(csv_file))[::1], csv_file[::1])
plt.savefig('./test.png')

我要使用主机名作为X轴的内容

What I am trying to get with host names as X-axis

-------------------------------------------- ---------------- 编辑3 ----------------------------------------- -------------------

------------------------------------------------------------ EDIT 3------------------------------------------------------------

最后更改了代码以清除X轴,但是它仍然不起作用.另外,用我拥有的3个文件绘制图形,并为每个文件添加了不同的符号.

Changed the code at the end to clear the X-axis, but it is still not working. Additionally, graphed with 3 files I have and added different symbols for each.

更新代码

Updated Code

from matplotlib import pyplot as plt
import numpy as np
from textwrap import wrap
import csv

csv_file = []
with open('hostnum.csv', 'r') as host:
    for line in host.readlines():
        line = line.replace('"', '')
        line = line.strip('\n')
        rank, value = line.split("  ")
        csv_file.append(value)

us_csv_file = []
with open('firsFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file.append(line)

us_csv_file1 = []
with open('secondFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file1.append(line)

us_csv_file2 = []
with open('thirdFile.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file2.append(line)


us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file1.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file2.sort(key=lambda x: csv_file.index(x[0]))


plt.scatter(range(len(us_csv_file)), [int(item[1]) for item in us_csv_file], c='r', marker='+', label="First")
plt.scatter(range(len(us_csv_file1)), [int(item[1]) for item in us_csv_file1], c='b', marker=(5,2), label="Second")
plt.scatter(range(len(us_csv_file2)), [int(item[1]) for item in us_csv_file2], c='g', marker=(5,1), label="Third")

plt.legend(loc='upper right') #where to indicate the labels of the signs
plt.grid(True) #Created grid for x-y axises

plt.title("\n".join(wrap("long long long long long long tittle ttitle ttitle that that fixed fixed ")))
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)

#plt.xticks(np.arange(0,len(csv_file),1000)[::2], csv_file[::2])
plt.xticks(np.arange(len(csv_file))[::2], csv_file[::2])
plt.yticks(np.arange(0,11000,1000))

plt.show()

带有X轴标签的图形不清楚(因为它也通过网格线显示)

Graph with X-axis labels unclear (as it shows it by Gridlines as well)

*最终成绩*

推荐答案

注意:排序可能不是最有效的方法,而是从头开始的

通过 csv.reader() 加载CSV文件将其迭代到列表中

Load the CSV file with csv.reader() and iterate it into a list

将排序后的XML文件也加载到另一个列表中(注意:您可能可以再次使用csv.reader()并将定界符设置为制表符以使其保持简单)

Load the sorted XML file into another list as well (Note: you can probably use csv.reader() again and set the delimiter to tab to keep it simple)

加载CSV文件的语法如下:

The syntax for loading a CSV file is as follows:

import csv
csv_file = []
with open('file.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        csv_file.append(line)

有关详细信息,请参见 csv.reader()文档并使用定界符.为了安全起见,请记住在打开其他文件时更改文件和阅读器的变量名称.

See the csv.reader() docs for more info and using delimiters. Just to be safe, remember to change the variable name of the file and reader when opening different files.

但是,对于您的hostnum.csvcsv将不起作用,因此您可以手动编写解析器.我为您做到了:

However, for your hostnum.csv, csv won't work, so you can write a parser by hand. I've done it for you:

csv_file = []
with open('/Users/dash/Documents/hostnum.csv', 'r') as host:
    for line in host.readlines():
        line = line.replace('"', '')
        line = line.strip('\n')
        rank, value = line.split("    ")
        csv_file.append(value)

通过xml列表中每个元素的位置对列表进行排序:

Sort the list by each element's position in the xml list:

us_csv_file.sort(key=lambda x: csv_file.index(x[0]))

这可以通过使用lambda(匿名函数)在CSV文件中获取字符串并在已排序的XML文件中查找其行号来实现. Lambda返回一个数字,然后该数字用于设置该元素在列表中的新位置.

This works by using a lambda (anonymous function) to take the string in the CSV file and look up its row number in the sorted XML file. The lambda returns a number, which sort then uses to set the new position the element in the list.

有关排序的基本教程,请参见 python Wiki .

See the python wiki for a basic tutorial on sorting.

要进行绘图,请使用 matplotlib.pyplot 并使用

For plotting, usematplotlib.pyplot and set the xticks with matplotlib.pyplot.xticks()

例如:

from matplotlib import pyplot as plt
import numpy as np

plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])

plt.show()

希望这会有所帮助!

lambda

这是完整的代码:

from matplotlib import pyplot as plt
import numpy as np
import csv

csv_file = []
with open('hostnum.csv', 'r') as host:
    for line in host.readlines():
        line = line.replace('"', '')
        line = line.strip('\n')
        rank, value = line.split("    ")
        csv_file.append(value)

us_csv_file = []
with open('us_csv_file.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file.append(line)

us_csv_file.sort(key=lambda x: csv_file.index(x[0]))

plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])

plt.show()


编辑(再次) 考虑一下之后,我认为最好的方法是为每个节点创建一个字典,并在其中存储所有值.


EDIT (Again) After thinking about it, I think the best way would be to create a dict for each node with all the values stored in it.

from matplotlib import pyplot as plt
import numpy as np
from textwrap import wrap
import csv

#Opens the sorted hostnum.csv file and reads it; replaces the quotation marks.
csv_file = []
with open('hostnum.csv', 'r') as host:
    for line in host.readlines():
        line = line.replace('"', '')
        line = line.strip('\n')
        rank, value = line.split("  ")
        csv_file.append(value)

#Opens the file and reads it
us_csv_file = []
with open('fileFirst.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file.append(line)

us_csv_file1 = []
with open('fileSecond.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file1.append(line)

us_csv_file2 = []
with open('fileThird.csv', 'r') as f:
    csvreader = csv.reader(f)
    for line in csvreader:
        us_csv_file2.append(line)


runs = []

file_0 = {}
file_1 = {}
file_2 = {}

for result in us_csv_file:
    node_name = result[0]
    node_value = result[1]

    if file_0.get(node_name):   # If the node exists in the list
        file_0[node_name].append(node_value)
    else:
        file_0[node_name] = [node_value]

runs.append(file_0)

for result in us_csv_file1:
    node_name = result[0]
    node_value = result[1]

    if file_1.get(node_name):   # If the node exists in the list
        file_1[node_name].append(node_value)
    else:
        file_1[node_name] = [node_value]

runs.append(file_1)

for result in us_csv_file2:
    node_name = result[0]
    node_value = result[1]

    if file_2.get(node_name):   # If the node exists in the list
        file_2[node_name].append(node_value)
    else:
        file_2[node_name] = [node_value]

runs.append(file_2)


# all_plots = [[[], []],[[], []],[[], []]]

all_plots = [] # Make an array of 3 arrays, each with a pair of arrays inside
# Each pair holds the x and y coordinates of the datapoints

for x in range(3):
    all_plots.append([[],[]])


for run_number, run_group in enumerate(runs):

    for key, values in run_group.items():
        sorted_position = csv_file.index(key)
        for item in values:
            all_plots[run_number][0].append(sorted_position)
            all_plots[run_number][1].append(int(item))

#indicates the label names at the given spot
plt.legend(loc='upper right')

#Creates grid for x-y axises
plt.grid(True)

#Creates wrapped title for the graph
plt.title("\n".join(wrap("longlonglonglonglonglonglonglonglonglonglonglonglonglongTITLETITLETITLETITLETITLETITLE")),size = 9.5)

#x-y labels for the graph
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)

#ticks - x and y axisses' data format.

plt.scatter(all_plots[0][0], all_plots[0][1], c='b', marker='+', label="First")
plt.scatter(all_plots[1][0], all_plots[1][1], c='g', marker=(5,2), label="Second")
plt.scatter(all_plots[2][0], all_plots[2][1], c='r', marker=(5,1), label="Third")


plt.xticks(range(len(csv_file))[::25], [item for item in csv_file][::25], rotation=90, size=8)


plt.yticks(np.arange(0,11000,1000), size=8)

#Saves a PNG file of the current graph to the folder and updates it every time
plt.savefig('./test.png', bbox_inches='tight')

# Not to cut-off bottom labels(manually) - enlarges bottom
plt.gcf().subplots_adjust(bottom=0.23)


plt.show()

这篇关于使用排序的文件从原始文件中绘制带有相应Y值的X轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆