如何使用python将.txt文件转换为xml文件? [英] How to convert .txt file into xml file using python?

查看:136
本文介绍了如何使用python将.txt文件转换为xml文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Latitude :23.1100348
Longitude:72.5364922
date&time :30:August:2014 05:04:31 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    25 dBm
15000     :    7073     :    23 dBm
15000     :    6102     :    24 dBm
15000     :    6101     :    24 dBm
15000     :    6103     :    17 dBm

Latitude :23.1120549
Longitude:72.5397988
date&time :30:August:2014 05:04:34 PM
gsm cell id: 4993
Neighboring List- Lac : Cid : RSSI
15000     :    7072     :    24 dBm
15000     :    7073     :    22 dBm
15000     :    6102     :    23 dBm
15000     :    6101     :    23 dBm
15000     :    2552     :    16 dBm

这是 my.txt 文件,我想将其转换为 xml 文件,例如

This is my.txt file I want convert it into xml file like

<celldata>
<time>        </time>
<latitude>    </latitude>
<longitude>   </longitude>

</celldata>

我试图列出所有组件,但我没有得到 o/p.我想在列表中存储纬度、经度、gsm 单元 ID、时间的所有值,这将在 xml 文件中添加类似的内容.我写下面的代码.

I tried to make list of all components but I didn't get o/p.I want to store all values of latitude,longitude,gsm cell id,time in list and this will add in xml file something like that. I write below code.

import re

pa = 'Longitude|Latitude|gsm cell id|Neighboring List- Lac : Cid : RSSI'

with open('cell.txt','rw') as file:
    for line in file:
        line.strip()    
        if re.search(pa, line):
            lineInfo = line.split(':')
            title = lineInfo[0]
            value = lineInfo[1]

推荐答案

尝试以下代码作为入门:

Try the following code as a starter:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'

        # If the line contains the wanted data, process it.
        m = rex.search(line)
        if m:
            # Fix some problems with the title as it will be used
            # as the tag name.
            title = m.group('title')
            title = title.replace('&', '')
            title = title.replace(' ', '')

            e = ET.SubElement(celldata, title.lower())
            e.text = m.group('value')
            e.tail = '\n'

# Display for debugging            
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

它显示您的示例数据:

<root>
<celldata>
<latitude>23.1100348</latitude>
<longitude>72.5364922</longitude>
<datetime>30:August:2014 05:04:31 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

<celldata>
<latitude>23.1120549</latitude>
<longitude>72.5397988</longitude>
<datetime>30:August:2014 05:04:34 PM</datetime>
<gsmcellid>4993</gsmcellid>
</celldata>

</root>

通缉邻居名单更新:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    for line in f:
        # Empty line starts new celldata element (hack style, uggly)
        if line.isspace():
            celldata = ET.SubElement(root, 'celldata')
            celldata.text = '\n'
            celldata.tail = '\n\n'
        else:
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

在邻居之前接受空行的更新 -- 也更好地用于一般目的:

Update for accepting empty line before neighbours -- also better implementation for general purposes:

#!python3

import re
import xml.etree.ElementTree as ET

rex = re.compile(r'''(?P<title>Longitude
                       |Latitude
                       |date&time
                       |gsm\s+cell\s+id
                       |Neighboring\s+List-\s+Lac\s+:\s+Cid\s+:\s+RSSI
                     )
                     \s*:?\s*
                     (?P<value>.*)
                     ''', re.VERBOSE)

root = ET.Element('root')
root.text = '\n'    # newline before the celldata element

with open('cell.txt') as f:
    celldata = ET.SubElement(root, 'celldata')
    celldata.text = '\n'    # newline before the collected element
    celldata.tail = '\n\n'  # empty line after the celldata element
    status = 0              # init status of the finite automaton
    for line in f:
        if status == 0:     # lines of the heading expected
            # If the line contains the wanted data, process it.
            m = rex.search(line)
            if m:
                # Fix some problems with the title as it will be used
                # as the tag name.
                title = m.group('title')
                title = title.replace('&', '')
                title = title.replace(' ', '')

                if line.startswith('Neighboring'):
                    neighbours = ET.SubElement(celldata, 'neighbours')
                    neighbours.text = '\n'
                    neighbours.tail = '\n'
                    status = 1  # empty line and then list of neighbours expected
                else:
                    e = ET.SubElement(celldata, title.lower())
                    e.text = m.group('value')
                    e.tail = '\n'
                    # keep the same status

        elif status == 1:   # empty line expected
            if line.isspace():
                status = 2  # list of neighbours must follow
            else:
                raise RuntimeError('Empty line expected. (status == {})'.format(status))
                status = 999 # error status

        elif status == 2:   # neighbour or the empty line as final separator

            if line.isspace():
                celldata = ET.SubElement(root, 'celldata')
                celldata.text = '\n'
                celldata.tail = '\n\n'
                status = 0  # go to the initial status
            else:
                # This is the neighbour item. Split it by colon,
                # and set the attributes of the item element.
                item = ET.SubElement(neighbours, 'item')
                item.tail = '\n'

                lac, cid, rssi = (a.strip() for a in line.split(':'))
                item.attrib['lac'] = lac
                item.attrib['cid'] = cid
                item.attrib['rssi'] = rssi.split()[0] # dBm removed
                # keep the same status

        elif status == 999: # error status -- break the loop
            break

        else:
            raise LogicError('Unexpected status {}.'.format(status))
            break

# Display for debugging
ET.dump(root)

# Include the root element to the tree and write the tree
# to the file.
tree = ET.ElementTree(root)
tree.write('cell.xml', encoding='utf-8', xml_declaration=True)

代码实现了所谓的有限自动机,其中 status 变量表示其当前状态.您可以使用铅笔和纸将其可视化——在内部绘制带有状态编号的小圆圈(在图论中称为节点).处于状态,您只允许某种输入(line).当输入被识别时,你将箭头(图论中的定向边)绘制到另一个状态(可能到相同的状态,作为一个循环返回到相同的节点).箭头标注为`条件|行动'.

The code implements so called finite automaton where the status variable represents its current status. You can visualize it using pencil and paper -- draw small circles with the status numbers inside (called nodes in the graph theory). Being at the status, you allow only some kind of input (line). When the input is recognized, you draw the arrow (oriented edge in the graph theory) to another status (possibly to the same status, as a loop returning back to the same node). The arrow is annotated `condition | action'.

结果一开始可能看起来很复杂;但是,从某种意义上说,您始终可以将注意力集中在属于特定状态的代码部分,这很容易.而且,代码可以很容易地修改.然而,有限自动机的能力有限.但它们非常适合此类问题.

The result may look complex at the beginning; however, it is easy in the sense that you can always focus ony on the part of the code that belongs to certain status. And also, the code can be easily modified. However, finite automatons have limited power. But they are just perfect for this kind of problems.

这篇关于如何使用python将.txt文件转换为xml文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆