在Python中使用BeautifulSoup识别和替换XML元素 [英] Identify and replace elements of XML using BeautifulSoup in Python

查看：247 发布时间：2020/9/20 7:52:36 python xml-parsing beautifulsoup

本文介绍了在Python中使用BeautifulSoup识别和替换XML元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用BeautifulSoup4在XML中查找和替换特定元素.更具体地说，我想查找"file_name"的所有实例(在下面的示例中，文件名为"Cyp26A1_atRA_minus_tet_plus.txt")，并用该文档的完整路径替换它-保存在"file_name_replacement_dir"变量中.我要做的第一件事是隔离感兴趣的部分，以便可以使用replaceWith()方法替换它.

I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.

XML

      <ParameterGroup name="Experiment_22">
        <Parameter name="Data is Row Oriented" type="bool" value="1"/>
        <Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
        <Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
        <Parameter name="First Row" type="unsignedInteger" value="1"/>

实际上有44个实验使用4个不同的文件名(因此11个的文件名是1，11的文件名是2，依此类推).因此，上面的XML代码段重复了44次，只是在文件名"行中存储了不同的文件.

There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on). So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.

到目前为止，我的代码

xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')

这将返回一个空列表.我还尝试了其他一些函数来代替"soup.findall()"，但仍无法找到文件名的句柄.有人知道我该怎么做吗?

This however returns an empty list. I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename. Does anybody know how to do what I'm trying to do?

推荐答案

xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'

from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)

for tag in soup.find_all("parameter", {'name': 'File Name'}):
    tag['value'] = os.path.join('new_dir', tag['value'])

print soup

关闭您的XML"ParameterGroup"标记.
标签的大写可能不会使用BeautifulSoup，请尝试使用小写字母parameter.
使用os.path来操纵路径，以便它可以跨平台工作.

Close your XML 'ParameterGroup' tag.
Capitalisation of tags may not work with BeautifulSoup, try parameter in lower case.
use os.path to manipulate paths so that it works cross-platforms.

这篇关于在Python中使用BeautifulSoup识别和替换XML元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Python中使用BeautifulSoup识别和替换XML元素 [英] Identify and replace elements of XML using BeautifulSoup in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Python中使用BeautifulSoup识别和替换XML元素 [英] Identify and replace elements of XML using BeautifulSoup in Python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭