解析文本文件并将数据隔离在字典中 [英] Parsing Text file and segregating the data in a Dictionary
问题描述
我在解析文本文件时遇到一种复杂的问题.
I have a kind of complex problem here in parsing a text file.
我需要什么:
-
通读文本文件.
Read through a text file.
如果一行匹配特定条件,请创建一个名为(条件1)的键
If a line matches a specific condition, create a key named (condition 1)
复制后面的行作为列表.该列表需要与键相关联(条件1)
Copy the lines that follow as a list. this list needs to be associated with key (Condition 1)
再次遇到这种情况时,请输入一个新密钥并复制下面的行,然后重复步骤3直到文件结尾
When the condition is encountered again, a new key and copy the lines following and repeat step 3 until the end of file
问题:我无法在列表中为给定密钥添加新项目
Problem: I am having trouble appending new items in the list for a given key
示例文本输入文件:
A1 letters characters jgjgjg
A2 letters numbers fgdhdhd
D1 letters numbers haksjshs
condition1, dhdjfjf
K2 letters characters jgjgjg
J1 alphas numbers fgdhdhd
L1 letters numbers haksjshs
condition2, dhdjfjf
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs
J1 alphas numbers fgdhdhd
D1 letters numbers haksjshs
预期词典:
dictone = {'condition1':['K2 letters characters jgjgjg','J1 alphas numbers fgdhdhd','L1 letters numbers haksjshs'], 'condition2':['J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs','J1 alphas numbers fgdhdhd','D1 letters numbers haksjshs'..........}
这是我到目前为止所做的.
Here is what I have done thus far..
flagInitial = False # flag to start copy after encountering condition
with open(inputFilePath, "r") as tfile:
for item in tfile:
gcmatch = gcpattern.match(item)
if gcmatch:
extr = re.split(' ', item)
laynum = extr[2]
newKey = item[2:7] + laynum[:-1]
flagInitial = True
gcdict[newKey] = item
continue
if flagInitial == True:
gcdict[newKey].append(item) # stuck here
# print(gcdict[newKey])
# print(newKey)
我缺少语法或其他内容吗?
Am I missing syntax or something ?
推荐答案
具有re.search
函数和collection.defaultdict
对象:
import re
import collections
with open('input.txt', 'rt') as f:
pat = re.compile(r'^condition\d+')
d = collections.defaultdict(list)
curr_key = None
for line in f:
m = pat.search(line)
if m:
curr_key = m.group()
continue
if curr_key:
d[curr_key].append(line.strip())
print(dict(d))
输出:
{'condition1': ['K2 letters characters jgjgjg', 'J1 alphas numbers fgdhdhd', 'L1 letters numbers haksjshs'], 'condition2': ['J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs', 'J1 alphas numbers fgdhdhd', 'D1 letters numbers haksjshs']}
这篇关于解析文本文件并将数据隔离在字典中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!