使用类似JSON的对象将文本文件解析为CSV [英] parsing text file with JSON-like object into CSV
问题描述
我有一个包含键值对的文本文件,最后两个键值对包含类似JSON的对象,我想将它们分成列,并使用键作为列标题与其他值一起写入。数据文件 input.txt
的前三行如下所示:
I have a text file containing key-value pairs, with the last two key-value pairs containing JSON-like objects that I would like to split out into columns and write with the other values, using the keys as column headings. The first three rows of the data file input.txt
look like this:
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::44.6743867864386,Length3dCenterToCenter::44.6768028159989,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.53962362760298}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::57.8689351603823,Length3dCenterToCenter::57.8700464193429,Tag::<NULL>,{StartPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.075},{EndPoint::7822.85045874375[%2C]1730.80294308742[%2C]-3.43363070193163}
InnerDiameterOrWidth::0.1,InnerHeight::0.1,Length2dCenterToCenter::68.7161350545728,Length3dCenterToCenter::68.7172034962765,Tag::<NULL>,{StartPoint::7858.35924983374[%2C]1703.69341358077[%2C]-3.075},{EndPoint::7793.52927597915[%2C]1680.91224357457[%2C]-3.45819643838485}
我们最终想出了一些有效的方法,但是必须有更好的方法:
and we eventually came up with something that worked, but there must be a much better way:
import csv
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, line in enumerate(reader):
mysplit = [item.split('::') for item in line if item.strip()]
if not mysplit: # blank line
continue
keys, vals = zip(*mysplit)
start_vals = [item.split('[%2C]') for item in mysplit[-2]]
end_vals = [item.split('[%2C]') for item in mysplit[-1]]
a=list(keys[0:-2])
a.extend(['start1','start2','start3','end1','end2','end3'])
b=list(vals[0:-2])
b.append(start_vals[1][0])
b.append(start_vals[1][1])
b.append(start_vals[1][2][:-1])
b.append(end_vals[1][0])
b.append(end_vals[1][1])
b.append(end_vals[1][2][:-1])
if i == 0:
# if first line: write header
writer.writerow(a)
writer.writerow(b)
文件 output.csv
,如下所示
InnerDiameterOrWidth,InnerHeight,Length2dCenterToCenter,Length3dCenterToCenter,Tag,start1,start2,start3,end1,end2,end3
0.1,0.1,44.6743867864386,44.6768028159989,<NULL>,7858.35924983374,1703.69341358077,-3.075,7822.85045874375,1730.80294308742,-3.53962362760298
0.1,0.1,57.8689351603823,57.8700464193429,<NULL>,7793.52927597915,1680.91224357457,-3.075,7822.85045874375,1730.80294308742,-3.43363070193163
0.1,0.1,68.7161350545728,68.7172034962765,<NULL>,7858.35924983374,1703.69341358077,-3.075,7793.52927597915,1680.91224357457,-3.45819643838485
我们不想在将来编写这样的代码。
We don't want to write code like this in the future.
这样读取数据的最好方法是什么?
What is the best way to read data like this?
推荐答案
我将使用:
from itertools import chain
import csv
_header_translate = {
'StartPoint': ('start1', 'start2', 'start3'),
'EndPoint': ('end1', 'end2', 'end3')
}
def header(col):
header = col.strip('{}').split('::', 1)[0]
return _header_translate.get(header, (header,))
def cleancolumn(col):
col = col.strip('{}').split('::', 1)[1]
return col.split('[%2C]')
def chainedmap(func, row):
return list(chain.from_iterable(map(func, row)))
with open('input.txt', 'rb') as fin, open('output.csv', 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout)
for i, row in enumerate(reader):
if not i: # first row, write header first
writer.writerow(chainedmap(header, row))
writer.writerow(chainedmap(cleancolumn, row))
cleancolumn
方法接受任何列,并在删除大括号之后返回一个元组(可能只有一个值),删除第一个之前的所有内容: :
并拆分嵌入的逗号。通过使用 itertools.chain.from_iterable()
,我们将从列生成的一系列元组再次转换为csv writer的一个列表。
The cleancolumn
method takes any of your columns and returns a tuple (possibly with only one value) after removing the braces, removing everything before the first ::
and splitting on the embedded 'comma'. By using itertools.chain.from_iterable()
we turn the series of tuples generated from the columns into one list again for the csv writer.
当处理第一行时,我们从相同的列中生成一个标题行,替换 StartPoint
和 EndPoint
包含6个扩展标头的标头。
When handling the first line we generate one header row from the same columns, replacing the StartPoint
and EndPoint
headers with the 6 expanded headers.
这篇关于使用类似JSON的对象将文本文件解析为CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!