使用排序的文件从原始文件中绘制带有相应Y值的X轴 [英] Using sorted file to plot X-axis with corresponding Y-values from the original file
问题描述
我有一个csv文件,其中包含2列.第一列的格式为:name001.a.a
,第二列为4位数字(例如:0001
).
I have a csv file which has 2 columns. The first column is in the format of: name001.a.a
and the second column is 4 digit number (ex: 0001
).
我还有另一个文件已对上面文件的第一列进行了排序.
I have another file which has sorted first column of the file above.
索引第一列的目的是因为1)我有很多这些文件,将来会在同一张图中绘制出来2)我需要对它们进行排序.
The purpose in indexing the first column is because 1) I have many of these files that I will be plotting in a same graph in the future 2) I need them to be sorted.
具有两列的实际文件( us_csv_file )的格式如下:
The actual file (us_csv_file)which has both columns is in the format of following:
name002.a.a,0002
name001.a.a,0001
name005.a.a,0025
排序的CSV文件( hostnum.csv )-我用来对第一列进行排序的方式如下(定界符为TAB):
The sorted CSV file (hostnum.csv) - I use to sort the first column is as follows (delimiter is a TAB):
"1 name001.a.a"
"2 name002.a.a"
"3 name005.a.a"
我试图寻找其他解决方案,或者解决它,但是找不到.有人可以帮我提供代码吗?
I have tried to search for any other ideas to work around it, or solve it, but could not find it. Anyone could help me with the code please?
我的问题是:
如何使用排序后的文件绘制带有字符串标签(不带索引号)的X轴,但显示第一个文件中Y值对应的4位数字?
How can I use the sorted file to plot the X-axis with the label of strings (without the index numbers) but show the corresponding 4 digits number from the 1st file for Y-values?
我使用excel创建的示例图如下所示: 作为模型创建的图形
The sample graph I created using excel would look like this: Graph that was created as a model
-------------------------------------------- ---------------- 编辑1 ----------------------------------------- -------------------
------------------------------------------------------------ EDIT 1------------------------------------------------------------
*更新:图形显示在下面的代码之后* 新代码之后-GRAPH
from matplotlib import pyplot as plt
from matplotlib import ticker as ticker
from textwrap import wrap
import numpy as np
import csv
csv_file = []
with open('hostnum.csv', 'r') as host:
for line in host.readlines():
line = line.replace('"', '')
line = line.strip('\n')
rank, value = line.split(" ")
csv_file.append(value)
us_csv_file = []
with open('firsFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file1 = []
with open('secondFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file1.append(line)
us_csv_file2 = []
with open('thirdFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file2.append(line)
us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file1.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file2.sort(key=lambda x: csv_file.index(x[0]))
plt.title("\n".join(wrap("ery very very very long long long title title title that that that wrapped wrapped wrapped")))
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.plot([int(item[1]) for item in us_csv_file1], 'o-')
plt.plot([int(item[1]) for item in us_csv_file2], 'o-')
#plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.xticks(np.arange(len(csv_file))[::100], csv_file[::100])
plt.savefig('./test.png') #saves a picture of the graph to the file
plt.show()
-------------------------------------------- ---------------- 编辑2 ----------------------------------------- -------------------
------------------------------------------------------------ EDIT 2------------------------------------------------------------
将图更改为散点图.但是,值与x轴不匹配.添加了示例图片,但是应该有节点名称,而不是x轴上的数字,与上面的示例图片相同 更新的行:
Changed the plot to scatter. But, values do not match to x-axis. Added a sample picture, but instead of numbers in the x-axis, there should be node names, as same as my sample picture above Updated lines:
plt.scatter(range(len(us_csv_file)), [int(item[1]) for item in us_csv_file], c='r')
#plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.xticks(np.arange(len(csv_file))[::1], csv_file[::1])
plt.savefig('./test.png')
我要使用主机名作为X轴的内容
What I am trying to get with host names as X-axis
-------------------------------------------- ---------------- 编辑3 ----------------------------------------- -------------------
------------------------------------------------------------ EDIT 3------------------------------------------------------------
最后更改了代码以清除X轴,但是它仍然不起作用.另外,用我拥有的3个文件绘制图形,并为每个文件添加了不同的符号.
Changed the code at the end to clear the X-axis, but it is still not working. Additionally, graphed with 3 files I have and added different symbols for each.
更新代码
Updated Code
from matplotlib import pyplot as plt
import numpy as np
from textwrap import wrap
import csv
csv_file = []
with open('hostnum.csv', 'r') as host:
for line in host.readlines():
line = line.replace('"', '')
line = line.strip('\n')
rank, value = line.split(" ")
csv_file.append(value)
us_csv_file = []
with open('firsFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file1 = []
with open('secondFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file1.append(line)
us_csv_file2 = []
with open('thirdFile.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file2.append(line)
us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file1.sort(key=lambda x: csv_file.index(x[0]))
us_csv_file2.sort(key=lambda x: csv_file.index(x[0]))
plt.scatter(range(len(us_csv_file)), [int(item[1]) for item in us_csv_file], c='r', marker='+', label="First")
plt.scatter(range(len(us_csv_file1)), [int(item[1]) for item in us_csv_file1], c='b', marker=(5,2), label="Second")
plt.scatter(range(len(us_csv_file2)), [int(item[1]) for item in us_csv_file2], c='g', marker=(5,1), label="Third")
plt.legend(loc='upper right') #where to indicate the labels of the signs
plt.grid(True) #Created grid for x-y axises
plt.title("\n".join(wrap("long long long long long long tittle ttitle ttitle that that fixed fixed ")))
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)
#plt.xticks(np.arange(0,len(csv_file),1000)[::2], csv_file[::2])
plt.xticks(np.arange(len(csv_file))[::2], csv_file[::2])
plt.yticks(np.arange(0,11000,1000))
plt.show()
带有X轴标签的图形不清楚(因为它也通过网格线显示)
Graph with X-axis labels unclear (as it shows it by Gridlines as well)
*最终成绩*
推荐答案
注意:排序可能不是最有效的方法,而是从头开始的
通过 csv.reader()
加载CSV文件将其迭代到列表中
Load the CSV file with csv.reader()
and iterate it into a list
将排序后的XML文件也加载到另一个列表中(注意:您可能可以再次使用csv.reader()
并将定界符设置为制表符以使其保持简单)
Load the sorted XML file into another list as well (Note: you can probably use csv.reader()
again and set the delimiter to tab to keep it simple)
加载CSV文件的语法如下:
The syntax for loading a CSV file is as follows:
import csv
csv_file = []
with open('file.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
csv_file.append(line)
有关详细信息,请参见 csv.reader()
文档并使用定界符.为了安全起见,请记住在打开其他文件时更改文件和阅读器的变量名称.
See the csv.reader()
docs for more info and using delimiters. Just to be safe, remember to change the variable name of the file and reader when opening different files.
但是,对于您的hostnum.csv
,csv
将不起作用,因此您可以手动编写解析器.我为您做到了:
However, for your hostnum.csv
, csv
won't work, so you can write a parser by hand. I've done it for you:
csv_file = []
with open('/Users/dash/Documents/hostnum.csv', 'r') as host:
for line in host.readlines():
line = line.replace('"', '')
line = line.strip('\n')
rank, value = line.split(" ")
csv_file.append(value)
通过xml列表中每个元素的位置对列表进行排序:
Sort the list by each element's position in the xml list:
us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
这可以通过使用lambda(匿名函数)在CSV文件中获取字符串并在已排序的XML文件中查找其行号来实现. Lambda返回一个数字,然后该数字用于设置该元素在列表中的新位置.
This works by using a lambda (anonymous function) to take the string in the CSV file and look up its row number in the sorted XML file. The lambda returns a number, which sort then uses to set the new position the element in the list.
有关排序的基本教程,请参见 python Wiki .
See the python wiki for a basic tutorial on sorting.
要进行绘图,请使用 matplotlib.pyplot
并使用
For plotting, usematplotlib.pyplot
and set the xticks with matplotlib.pyplot.xticks()
例如:
from matplotlib import pyplot as plt
import numpy as np
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.show()
希望这会有所帮助!
在lambda
这是完整的代码:
from matplotlib import pyplot as plt
import numpy as np
import csv
csv_file = []
with open('hostnum.csv', 'r') as host:
for line in host.readlines():
line = line.replace('"', '')
line = line.strip('\n')
rank, value = line.split(" ")
csv_file.append(value)
us_csv_file = []
with open('us_csv_file.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file.sort(key=lambda x: csv_file.index(x[0]))
plt.plot([int(item[1]) for item in us_csv_file], 'o-')
plt.xticks(np.arange(len(csv_file)), [item for item in csv_file])
plt.show()
编辑(再次) 考虑一下之后,我认为最好的方法是为每个节点创建一个字典,并在其中存储所有值.
EDIT (Again) After thinking about it, I think the best way would be to create a dict for each node with all the values stored in it.
from matplotlib import pyplot as plt
import numpy as np
from textwrap import wrap
import csv
#Opens the sorted hostnum.csv file and reads it; replaces the quotation marks.
csv_file = []
with open('hostnum.csv', 'r') as host:
for line in host.readlines():
line = line.replace('"', '')
line = line.strip('\n')
rank, value = line.split(" ")
csv_file.append(value)
#Opens the file and reads it
us_csv_file = []
with open('fileFirst.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file.append(line)
us_csv_file1 = []
with open('fileSecond.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file1.append(line)
us_csv_file2 = []
with open('fileThird.csv', 'r') as f:
csvreader = csv.reader(f)
for line in csvreader:
us_csv_file2.append(line)
runs = []
file_0 = {}
file_1 = {}
file_2 = {}
for result in us_csv_file:
node_name = result[0]
node_value = result[1]
if file_0.get(node_name): # If the node exists in the list
file_0[node_name].append(node_value)
else:
file_0[node_name] = [node_value]
runs.append(file_0)
for result in us_csv_file1:
node_name = result[0]
node_value = result[1]
if file_1.get(node_name): # If the node exists in the list
file_1[node_name].append(node_value)
else:
file_1[node_name] = [node_value]
runs.append(file_1)
for result in us_csv_file2:
node_name = result[0]
node_value = result[1]
if file_2.get(node_name): # If the node exists in the list
file_2[node_name].append(node_value)
else:
file_2[node_name] = [node_value]
runs.append(file_2)
# all_plots = [[[], []],[[], []],[[], []]]
all_plots = [] # Make an array of 3 arrays, each with a pair of arrays inside
# Each pair holds the x and y coordinates of the datapoints
for x in range(3):
all_plots.append([[],[]])
for run_number, run_group in enumerate(runs):
for key, values in run_group.items():
sorted_position = csv_file.index(key)
for item in values:
all_plots[run_number][0].append(sorted_position)
all_plots[run_number][1].append(int(item))
#indicates the label names at the given spot
plt.legend(loc='upper right')
#Creates grid for x-y axises
plt.grid(True)
#Creates wrapped title for the graph
plt.title("\n".join(wrap("longlonglonglonglonglonglonglonglonglonglonglonglonglongTITLETITLETITLETITLETITLETITLE")),size = 9.5)
#x-y labels for the graph
plt.xlabel("Node Names", fontsize = 8)
plt.ylabel("Run Times", fontsize = 8)
#ticks - x and y axisses' data format.
plt.scatter(all_plots[0][0], all_plots[0][1], c='b', marker='+', label="First")
plt.scatter(all_plots[1][0], all_plots[1][1], c='g', marker=(5,2), label="Second")
plt.scatter(all_plots[2][0], all_plots[2][1], c='r', marker=(5,1), label="Third")
plt.xticks(range(len(csv_file))[::25], [item for item in csv_file][::25], rotation=90, size=8)
plt.yticks(np.arange(0,11000,1000), size=8)
#Saves a PNG file of the current graph to the folder and updates it every time
plt.savefig('./test.png', bbox_inches='tight')
# Not to cut-off bottom labels(manually) - enlarges bottom
plt.gcf().subplots_adjust(bottom=0.23)
plt.show()
这篇关于使用排序的文件从原始文件中绘制带有相应Y值的X轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!