在Python中使用BeautifulSoup识别和替换XML元素 [英] Identify and replace elements of XML using BeautifulSoup in Python

查看:247
本文介绍了在Python中使用BeautifulSoup识别和替换XML元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用BeautifulSoup4在XML中查找和替换特定元素.更具体地说,我想查找"file_name"的所有实例(在下面的示例中,文件名为"Cyp26A1_atRA_minus_tet_plus.txt"),并用该文档的完整路径替换它-保存在"file_name_replacement_dir"变量中.我要做的第一件事是隔离感兴趣的部分,以便可以使用replaceWith()方法替换它.

I am trying to use BeautifulSoup4 to find and replace specific elements within an XML. More specifically, I want to find all instances of 'file_name'(in the example below the file name is 'Cyp26A1_atRA_minus_tet_plus.txt') and replace it with the full path for that document - which is saved in the 'file_name_replacement_dir' variable. My first task, the bit i'm stuck on, is to isolate the section of interest so that I can replace it using the replaceWith() method.

XML

      <ParameterGroup name="Experiment_22">
        <Parameter name="Data is Row Oriented" type="bool" value="1"/>
        <Parameter name="Experiment Type" type="unsignedInteger" value="0"/>
        <Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>
        <Parameter name="First Row" type="unsignedInteger" value="1"/>

实际上有44个实验使用4个不同的文件名(因此11个的文件名是1,11的文件名是2,依此类推).因此,上面的XML代码段重复了44次,只是在文件名"行中存储了不同的文件.

There are actually 44 experiments with 4 different file names (So 11 with file name 1, 11 with file name 2 and so on). So the above snippet of XML is repeated 44 times, just with different files stored in the "File Name" line.

到目前为止,我的代码

xml_dir = 'D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models\Model_Line_2'
xml_file_name = 'RARa_RXR_M22.cps'
xml=model_dir+'\\'+model_name
file_name_replacement_dir = D:\MPhil\Model_Building\Models\Retinoic_acid\[06]\RAR_Models
soup = BeautifulSoup(open(xml))
print soup.find_all('parametergroup name="Experiment_22"')

这将返回一个空列表.我还尝试了其他一些函数来代替"soup.findall()",但仍无法找到文件名的句柄.有人知道我该怎么做吗?

This however returns an empty list. I've also tried a few other functions in place of 'soup.findall()' but still haven't been able to find a handle to the filename. Does anybody know how to do what I'm trying to do?

推荐答案

xml = '<ParameterGroup name="Experiment_22">\
<Parameter name="Data is Row Oriented" type="bool" value="1"/>\
<Parameter name="Experiment Type" type="unsignedInteger" value="0"/>\
<Parameter name="File Name" type="file" value="Cyp26A1_atRA_minus_tet_plus.txt"/>\
<Parameter name="First Row" type="unsignedInteger" value="1"/>\
</ParameterGroup>'

from bs4 import BeautifulSoup
import os
soup = BeautifulSoup(xml)

for tag in soup.find_all("parameter", {'name': 'File Name'}):
    tag['value'] = os.path.join('new_dir', tag['value'])

print soup

  • 关闭您的XML"ParameterGroup"标记.
  • 标签的大写可能不会 使用BeautifulSoup,请尝试使用小写字母parameter.
  • 使用os.path来操纵路径,以便它可以跨平台工作.
    • Close your XML 'ParameterGroup' tag.
    • Capitalisation of tags may not work with BeautifulSoup, try parameter in lower case.
    • use os.path to manipulate paths so that it works cross-platforms.
    • 这篇关于在Python中使用BeautifulSoup识别和替换XML元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆