解析文本文件并分离字典中的数据 [英] Parsing Text file and segregating the data in a Dictionary

查看:26
本文介绍了解析文本文件并分离字典中的数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在解析文本文件时遇到了一个复杂的问题.

I have a kind of complex problem here in parsing a text file.

我需要什么:

  1. 通读文本文件.

  1. Read through a text file.

如果一行与特定条件匹配,则创建一个名为(条件 1)的键

If a line matches a specific condition, create a key named (condition 1)

复制后面的行作为列表.此列表需要与密钥相关联(条件 1)

Copy the lines that follow as a list. this list needs to be associated with key (Condition 1)

当再次遇到条件时,一个新的key并复制下面的行并重复步骤3直到文件结束

When the condition is encountered again, a new key and copy the lines following and repeat step 3 until the end of file

问题:我无法在给定键的列表中附加新项目

Problem: I am having trouble appending new items in the list for a given key

示例文本输入文件:

A1 letters characters jgjgjg
A2 letters numbers fgdhdhd
D1 letters numbers haksjshs
condition1, dhdjfjf
K2 letters characters jgjgjg
J1 alphas numbers fgdhdhd
L1 letters numbers haksjshs
condition2, dhdjfjf
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs

预期字典:

dictone = {'condition1':['K2 letters characters jgjgjg','J1 alphas numbers fgdhdhd','L1 letters numbers haksjshs'], 'condition2':['J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs','J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs'..........}

这是我到目前为止所做的..

Here is what I have done thus far..

flagInitial = False # flag to start copy after encountering condition

    with open(inputFilePath, "r") as tfile:

        for item in tfile:

            gcmatch = gcpattern.match(item)

            if gcmatch:

                extr = re.split(' ', item)
                laynum = extr[2]

                newKey = item[2:7] + laynum[:-1]
                flagInitial = True
                gcdict[newKey] = item
                continue

            if flagInitial == True:
                gcdict[newKey].append(item)  # stuck here 
                # print(gcdict[newKey])
                # print(newKey)

我是否缺少语法或其他东西?

Am I missing syntax or something ?

推荐答案

With re.search 函数和 collection.defaultdict 对象:

With re.search function and collection.defaultdict object:

import re
import collections

with open('input.txt', 'rt') as f:
    pat = re.compile(r'^condition\d+')
    d = collections.defaultdict(list)
    curr_key = None

    for line in f:               
        m = pat.search(line)
        if m:
            curr_key = m.group()
            continue
        if curr_key:
            d[curr_key].append(line.strip())         

print(dict(d))        

输出:

{'condition1': ['K2 letters characters jgjgjg', 'J1 alphas numbers fgdhdhd', 'L1 letters numbers haksjshs'], 'condition2': ['J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs', 'J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs']}

这篇关于解析文本文件并分离字典中的数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆