如何在python中将.txt转换为.xml [英] How to convert a .txt to .xml in python

查看:126
本文介绍了如何在python中将.txt转换为.xml的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我目前面临的问题是将文本文件转换为 xml 文件.文本文件将采用这种格式.

So the current problem I'm facing would be in converting a text file into a xml file. The text file would be in this format.

Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test  Time:  00:03:47 Test Step 2      TP1:  17.25    TP2:  2.46

我想转换成这种格式的xml:

I wanted to convert to convert it into a xml with this format:

<?xml version="1.0" encoding="utf-8"?>
<root>
 <filedata>
 </serialnumber>
 <operatorid>test</operatorid>
 <time>00:00:42 Test Step 2</time>
 <tp1>17.25</tp1>
 <tp2>2.46</tp2>
 </filedata>
...
</root>

我正在使用这样的代码将我以前的文本文件转换为 xml...但现在我在拆分行时遇到了问题.

I was using a code like this to convert my previous text file to xml...but right now I'm facing problems in splitting the lines.

import xml.etree.ElementTree as ET
import fileinput
import os
import itertools as it

root = ET.Element('root')
with open('text.txt') as f:
    lines = f.read().splitlines()
celldata = ET.SubElement(root, 'filedata')
for line in it.groupby(lines):
    line=line[0]
    if not line:
        celldata = ET.SubElement(root, 'filedata')
    else:
        tag = line.split(":")
        el=ET.SubElement(celldata,tag[0].replace(" ",""))
        tag=' '.join(tag[1:]).strip()
        if 'File Name' in line:
            tag = line.split("\\")[-1].strip()
        elif 'File Size' in line:
            splist =  filter(None,line.split(" "))
            tag = splist[splist.index('Low:')+1]
            #splist[splist.index('High:')+1]
        el.text = tag
import xml.dom.minidom as minidom
formatedXML = minidom.parseString(
                          ET.tostring(
                                      root)).toprettyxml(indent=" ",encoding='utf-8').strip()

with open("test.xml","wb") as f:
    f.write(formatedXML)

我在stackoverflow中看到了一个类似的问题" Python 文本文件到 xml "但问题是我无法将其更改为 .csv 格式,因为此文件是由某台机器生成的.如果有人知道如何解决它,请提供帮助.谢谢.

I saw a similar question in stackoverflow " Python text file to xml " but the problem is I couldn't change it into a .csv format as this file is generated by a certain machine. If anyone know how to solve it, please do help. Thank you.

推荐答案

这里有一个更好的分割线的方法.

Here is a better method of splitting the lines.

请注意,text 变量在技术上将是您的 .txt 文件,我特意修改了它,以便我们有更大的输出上下文.

Notice that the text variable would technically be your .txt file, and that I purposely modified it so that we have a greater context of the output.

from collections import OrderedDict
from pprint import pprint

# Text would be our loaded .txt file.
text = """Serial Number:  test    Operator ID:  test1  Time:  00:03:47 Test Step 1      TP1:  17.25    TP2:  2.46
Serial Number:      Operator ID:  test2  Time:  00:03:48 Test Step 2      TP1:  17.24    TP2:  2.47"""

# Headers of the intended break-points in the text files.
headers = ["Serial Number:", "Operator ID:", "Time:", "TP1:", "TP2:"]

information = []

# Split our text by lines.
for line in text.split("\n"):

    # Split our text up so we only have the information per header.
    default_header = headers[0]
    for header in headers[1:]:
        line = line.replace(header, default_header)
    info = [i.strip() for i in line.split(default_header)][1:]

    # Compile our header+information together into OrderedDict's.
    compiled_information = OrderedDict()
    for header, info in zip(headers, info):
        compiled_information[header] = info

    # Append to our overall information list.
    information.append(compiled_information)

# Pretty print the information (not needed, only for better display of data.)
pprint(information)

输出:

[OrderedDict([('Serial Number:', 'test'),
              ('Operator ID:', 'test1'),
              ('Time:', '00:03:47 Test Step 1'),
              ('TP1:', '17.25'),
              ('TP2:', '2.46')]),
 OrderedDict([('Serial Number:', ''),
              ('Operator ID:', 'test2'),
              ('Time:', '00:03:48 Test Step 2'),
              ('TP1:', '17.24'),
              ('TP2:', '2.47')])]

这种方法应该比您目前正在编写的方法具有更好的概括性,并且代码的想法是我从另一个项目中保存的.我建议您仔细阅读代码并理解其逻辑.

This method should generalize better than what you are currently writing, and the idea of the code is something I've had saved from another project. I recommend you going through the code and understanding its logic.

从这里您应该能够遍历 information 列表并创建您的自定义 .xml 文件.我建议您也查看 dicttoxml ,因为它可能会使在最后一步,您的生活要轻松得多.

From here you should be able to loop through the information list and create your custom .xml file. I would recommend you checking out dicttoxml as well, as it might make your life much easier on the final step.

关于您的代码,请记住:分解基本任务比尝试将它们全部合并为一个更容易.通过在拆分 txt 文件的同时尝试创建 xml 文件,您已经创建了一个难以解决的怪物,当它因错误而反抗时.相反,一次一个步骤——创建检查点",让你 100% 确定工作,然后继续下一个任务.

In regards to your code, remember: breaking down fundamental tasks is easier than trying to incorporate them all into one. By trying to create the xml file while you split your txt file you've created a monster that is hard to tackle when it revolts back with bugs. Instead, take it one step at a time -- create "checkpoints" that you are 100% certain work, and then move on to the next task.

这篇关于如何在python中将.txt转换为.xml的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆