用Python解析图形数据文件 [英] Parsing graph data file with Python

查看:115
本文介绍了用Python解析图形数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个相对较小的问题,但我不能一直围绕它。我有一个文本文件,其中包含关于图的信息,结构如下:


  • 第一行包含节点数量 li>
  • 空白行用于分隔

  • 有关节点的信息,每个块都由空行分隔

  • 块包含节点id一行,第二个类型,以及关于边的信息

  • 有两种类型的边,上和下,以及节点类型后的第一个数向上边的数量,以及它们的ID随后一行(如果该数为0,则不存在向上边并且下一个数表示向下边)

  • 相同因此,具有两个节点的示例数据是: / p>

      3 

    1
    1
    2
    2 3
    0

    2
    1
    0
    2
    1 3

    3
    2
    1
    1
    1
    2

    因此,节点1具有类型1,两个上边缘,2和3,并且没有下边缘。
    节点2具有类型1,零向上边缘和2个向下边缘,1和3
    节点3具有类型2,一个上边缘,1和1下边缘,2。

    b
    $ b

    这个信息很容易被人阅读,但我在编写一个解析器来获取这些信息并以可用的形式存储时遇到了问题。



    <我已经写了一个示例代码:

      f = open('C:\\data','r' )
    lines = f.readlines()
    num_of_nodes = lines [0]
    nodes = {}
    counter = 0
    skip_next = False
    for line in line [1:]:
    new = False
    left = False
    right = False
    if line ==\\\

    counter + = 1
    nodes [counter] = []
    new = True
    continue
    nodes [counter] .append(line.replace(\ n,))

    哪些数据可以让我为每个节点分配信息。我想要一个像字典一样的东西,它会保存ID,上下邻居(如果没有可用的话,则为False)。我想现在我可以再次解析这个节点列表并且自己完成每个节点,但是我想知道是否可以修改这个循环,我必须在第一个地方做的很好。

      {1:{'='h2_lin>解决方案

    ups':[],'ups':[],'node_type':['2],'node_type':1},
    2:{'downs':[1,3] :1},
    3:{'downs':[2],'ups':[1],'node_type':2}}

    接下来是代码:

      def parse_chunk(chunk):
    $ node_id = int(chunk [0])
    node_type = int(chunk [1])$ ​​b
    $ b nb_up = int(chunk [2])
    如果nb_up:
    ups = map(int,chunk [3] .split())
    next_pos = 4
    else:
    ups = []
    next_pos = 3

    nb_down = int(chunk [next_pos])
    如果nb_down:
    downs = map(int,chunk [next_pos + 1] .split())
    else:
    下s = []

    返回node_id,dict(
    node_type = node_type,$ b $ ups = ups,
    downs =下降


    def collect_chunks(行):
    chunk = []
    行中行:
    行= line.strip()$ b $行中:
    块。 append(line)
    else:
    yield block
    chunk = []
    如果块:
    yield块

    def parse(流) :
    nb_nodes = int(stream.next()。strip())
    if not nb_nodes:
    return []
    stream.next()
    return dict( (b)b

    def main(* args):
    with open(args [0],r)as f:
    打印解析(f)

    if __name__ ==__main__:
    import sys
    main(* sys.argv [1:])


    I have one relatively small issue, but I can't keep to wrap my head around it. I have a text file which has information about a graph, and the structure is as follows:

    • first line contains the number of nodes
    • a blank line is used for separation
    • information about nodes follows, each chunk is separated from another by the empty line
    • chunks contain the node id one one line, type on second, and information about edges follows
    • there are two types of edges, up and down, and first number after node types denotes number of "up" edges, and their IDs follow in line after (if that number is 0, no "up" edges exist and the next number denotes the "down" edges)
    • same goes for the "down" edges, number of them and their ids in line below

    So, sample data with two nodes is:

    3
    
    1
    1
    2
    2 3
    0
    
    2
    1
    0
    2
    1 3
    
    3
    2
    1
    1
    1
    2
    

    So, node 1 has type 1, two up edges, 2 and 3, and no down edges. Node 2 has type 1, zero up edges, and 2 down edges, 1 and 3 Node 3 has type 2, one up edge, 1, and 1 down edge, 2.

    This info is clearly readable by human, but I am having issues writing a parser to take this information and store it in usable form.

    I have written a sample code:

    f = open('C:\\data', 'r')
    lines = f.readlines()
    num_of_nodes = lines[0]
    nodes = {}
    counter = 0
    skip_next = False
    for line in lines[1:]:
        new = False
        left = False
        right = False
        if line == "\n":
            counter += 1
            nodes[counter] = []
            new = True
            continue
        nodes[counter].append(line.replace("\n", ""))
    

    Which kinda gets me the info split for each node. I would like something like a dictionary, which would hold the ID, up and down neighbors for each (or False if there are none available). I suppose that I could now parse through this list of nodes again and do each on its own, but I am wondering can I modify this loop I have to do that nicely in the first place.

    解决方案

    Is that what you want ?

    {1: {'downs': [], 'ups': [2, 3], 'node_type': 1}, 
     2: {'downs': [1, 3], 'ups': [], 'node_type': 1}, 
     3: {'downs': [2], 'ups': [1], 'node_type': 2}}
    

    Then here's the code:

    def parse_chunk(chunk):
        node_id = int(chunk[0])
        node_type = int(chunk[1])
    
        nb_up = int(chunk[2])
        if nb_up:
            ups = map(int, chunk[3].split())
            next_pos = 4
        else:
            ups = []
            next_pos = 3
    
        nb_down = int(chunk[next_pos])
        if nb_down:
            downs = map(int, chunk[next_pos+1].split())
        else:
            downs = []
    
        return node_id, dict(
            node_type=node_type,
            ups=ups,
            downs=downs
            )
    
    def collect_chunks(lines):
        chunk = []
        for line in lines:
            line = line.strip()
            if line:
                chunk.append(line)
            else:
                yield chunk
                chunk = []
        if chunk:
            yield chunk
    
    def parse(stream):
        nb_nodes = int(stream.next().strip())
        if not nb_nodes:
            return []
        stream.next()
        return dict(parse_chunk(chunk) for chunk in collect_chunks(stream))
    
    def main(*args):
        with open(args[0], "r") as f:
            print parse(f)
    
    if __name__ == "__main__":
        import sys
        main(*sys.argv[1:])
    

    这篇关于用Python解析图形数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆