修改包含多个XML文件的大文件以根据情况创建小文件 [英] Modify large file containing multiple XML files to create small file depending on condition

查看：77 发布时间：2020/10/28 21:14:54 python-3.x lxml elementtree

本文介绍了修改包含多个XML文件的大文件以根据情况创建小文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大文件，其中包含不同行的多个XML。我想根据多个标签与电子表格的列匹配的情况，使用行（或XML）创建一个新文件。例如，我有一个很大的XML文件。

I have a large file that contains multiple XMLs in different lines. I want to create a new file with lines (or XMLs) depending on a condition where multiple tags match columns of spreadsheet. For example, I have a large XML file.

<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>A</grade></result><details><name>John</name><house>Red</house><id>100</id><age>16</age><email>john@mail.com</email></details></student></data>
<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>B</grade></result><details><name>Alice</name><house>Blue</house><id>101</id><age>17</age><email>alice@mail.com</email></details></student></data>
<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>F</grade></result><details><name>Bob</name><house>Blue</house><id>100</id><age>16</age><email>bob@mail.com</email></details></student></data>
<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>A</grade></result><details><name>Hannah</name><house>Blue</house><id>103</id><age>17</age><email>hannah@mail.com</email></details></student></data>
<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>C</grade></result><details><name>James</name><house>Red</house><id>101</id><age>18</age><email>james@mail.com</email></details></student></data>

我需要创建一个文件，其中从如下所示的xlsx文件中选取房屋和ID：

I need to create a file where the house and id are picked from a xlsx file like below:

并创建如下所示的新文件：

and create a new file like below:

<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>F</grade></result><details><name>Bob</name><house>Blue</house><id>100</id><age>16</age><email>bob@mail.com</email></details></student></data>
<?xml version="1.0" encoding="UTF-8"?><data><student><result><grade>A</grade></result><details><name>Hannah</name><house>Blue</house><id>103</id><age>17</age><email>hannah@mail.com</email></details></student></data>

我尝试过的操作：

from lxml import etree as ET
import pandas as pd

df = pd.read_excel(open('Student_data.xlsx','rb'),sheet_name="Sheet2")
df['House_Id']=df['House'].map(str)+'-'+df['Id'].map(str)
required_ids = df['House_Id'].tolist()
required_ids = [str(i) for i in required_ids]
for event, element in ET.iterparse('new_student.xml'):
    if element.tag == 'xml' and not(element.xpath('.//id/text()')[0] in required_ids):
        element.clear()
        element.getparent().remove(element)
    if element.tag == 'data':
        tree = ET.ElementTree(element)
        tree.write('student_output.xml')

我能够使用xlsx文件中的2个变量（即['Blue-100'， 'Blue-103']），但不知道如何：

I am able to create the required id using the 2 variables from the xlsx file (i.e. ['Blue-100', 'Blue-103']) but don't know how to:

使用XML创建类似的对ID

导航查找对ID并创建一个仅包含所需行的新文件

请让我知道一种方法。

Please let me know a way to do this. Thanks in advance.

推荐答案

我想，如果您知道每行各种XML声明中给出的编码始终是UTF-如图8所示，您可以用几行来处理文件，其中每一行代表具有基于文件的IO和lxml etree的XML文档，如下所示：

I think, if you know the encoding given in the various XML declarations on each line is always UTF-8, then you can process the file with several lines where each line represents an XML document with file based IO and lxml etree as follows:

from lxml import etree as ET

required_ids = ['100','103']
required_houses = ['Blue', 'Blue']

with open('input.txt', 'r') as f, open('output.txt', 'w') as w:
    for line in f.readlines():
        root = ET.fromstring(bytes(line, encoding = 'UTF-8'))
        if root.xpath('.//id/text()')[0] in required_ids and root.xpath('.//house/text()')[0] in required_houses:
            #ET.dump(root)
            print(ET.tostring(root, encoding = 'unicode'), file = w, end = '\n')

这篇关于修改包含多个XML文件的大文件以根据情况创建小文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

修改包含多个XML文件的大文件以根据情况创建小文件 [英] Modify large file containing multiple XML files to create small file depending on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

修改包含多个XML文件的大文件以根据情况创建小文件 [英] Modify large file containing multiple XML files to create small file depending on condition

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭