用Python解析图形数据文件 [英] Parsing graph data file with Python
问题描述
我有一个相对较小的问题,但我不能一直围绕它。我有一个文本文件,其中包含关于图的信息,结构如下:
li>
3
1
1
2
2 3
0
2
1
0
2
1 3
3
2
1
1
1
2
因此,节点1具有类型1,两个上边缘,2和3,并且没有下边缘。
节点2具有类型1,零向上边缘和2个向下边缘,1和3
节点3具有类型2,一个上边缘,1和1下边缘,2。
$ b
这个信息很容易被人阅读,但我在编写一个解析器来获取这些信息并以可用的形式存储时遇到了问题。
<我已经写了一个示例代码:
f = open('C:\\data','r' )
lines = f.readlines()
num_of_nodes = lines [0]
nodes = {}
counter = 0
skip_next = False
for line in line [1:]:
new = False
left = False
right = False
if line ==\\\
:
counter + = 1
nodes [counter] = []
new = True
continue
nodes [counter] .append(line.replace(\ n,))
哪些数据可以让我为每个节点分配信息。我想要一个像字典一样的东西,它会保存ID,上下邻居(如果没有可用的话,则为False)。我想现在我可以再次解析这个节点列表并且自己完成每个节点,但是我想知道是否可以修改这个循环,我必须在第一个地方做的很好。
{1:{'='h2_lin>解决方案
ups':[],'ups':[],'node_type':['2],'node_type':1},
2:{'downs':[1,3] :1},
3:{'downs':[2],'ups':[1],'node_type':2}}
接下来是代码:
def parse_chunk(chunk):
$ node_id = int(chunk [0])
node_type = int(chunk [1])$ b
$ b nb_up = int(chunk [2])
如果nb_up:
ups = map(int,chunk [3] .split())
next_pos = 4
else:
ups = []
next_pos = 3
nb_down = int(chunk [next_pos])
如果nb_down:
downs = map(int,chunk [next_pos + 1] .split())
else:
下s = []
返回node_id,dict(
node_type = node_type,$ b $ ups = ups,
downs =下降
)
def collect_chunks(行):
chunk = []
行中行:
行= line.strip()$ b $行中:
块。 append(line)
else:
yield block
chunk = []
如果块:
yield块
def parse(流) :
nb_nodes = int(stream.next()。strip())
if not nb_nodes:
return []
stream.next()
return dict( (b)b
def main(* args):
with open(args [0],r)as f:
打印解析(f)
if __name__ ==__main__:
import sys
main(* sys.argv [1:])
I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:
- first line contains the number of nodes
- a blank line is used for separation
- information about nodes follows, each chunk is separated from another by the empty line
- chunks contain the node id one one line, type on second, and information about edges follows
- there are two types of edges, up and down, and first number after node types denotes number of "up" edges, and their IDs follow in line after (if that number is 0, no "up" edges exist and the next number denotes the "down" edges)
- same goes for the "down" edges, number of them and their ids in line below
So, sample data with two nodes is:
3
1
1
2
2 3
0
2
1
0
2
1 3
3
2
1
1
1
2
So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.
This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.
I have written a sample code:
f = open('C:\\data', 'r')
lines = f.readlines()
num_of_nodes = lines[0]
nodes = {}
counter = 0
skip_next = False
for line in lines[1:]:
new = False
left = False
right = False
if line == "\n":
counter += 1
nodes[counter] = []
new = True
continue
nodes[counter].append(line.replace("\n", ""))
Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.
Is that what you want ?
{1: {'downs': [], 'ups': [2, 3], 'node_type': 1},
2: {'downs': [1, 3], 'ups': [], 'node_type': 1},
3: {'downs': [2], 'ups': [1], 'node_type': 2}}
Then here's the code:
def parse_chunk(chunk):
node_id = int(chunk[0])
node_type = int(chunk[1])
nb_up = int(chunk[2])
if nb_up:
ups = map(int, chunk[3].split())
next_pos = 4
else:
ups = []
next_pos = 3
nb_down = int(chunk[next_pos])
if nb_down:
downs = map(int, chunk[next_pos+1].split())
else:
downs = []
return node_id, dict(
node_type=node_type,
ups=ups,
downs=downs
)
def collect_chunks(lines):
chunk = []
for line in lines:
line = line.strip()
if line:
chunk.append(line)
else:
yield chunk
chunk = []
if chunk:
yield chunk
def parse(stream):
nb_nodes = int(stream.next().strip())
if not nb_nodes:
return []
stream.next()
return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))
def main(*args):
with open(args[0], "r") as f:
print parse(f)
if __name__ == "__main__":
import sys
main(*sys.argv[1:])
这篇关于用Python解析图形数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!