使用 BeautifulSoup 在 Python 中查找非递归 DOM 子节点 [英] Finding a nonrecursive DOM subnode in Python using BeautifulSoup
问题描述
有没有办法在 Python 中使用 BeautifulSoup 找到非递归 DOM 子节点
?
Is there any way to find a nonrecursive DOM subnode in Python using BeautifulSoup
?
例如考虑解析一个 pom.xml
文件:
E.g. consider parsing a pom.xml
file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
如果我想在顶层获得 groupId
(特别是 project->groupId
,而不是 project->parent->groupId
代码>),我使用:
If I want to get groupId
at the top level (specifically project->groupId
, not project->parent->groupId
), I use:
with open(pom) as pomHandle:
soup = BeautifulSoup(pomHandle)
groupId = soup.groupid.text
但不幸的是,它会在文件中找到 groupId
的第一个物理出现,而不管层次结构级别,即 project->parent->groupId
.我实际上只想在特定节点级别而不是在其子节点中进行非递归查找.有没有办法在 BeautifulSoup
中做到这一点?
But unfortunately, that finds the first physical occurrence of groupId
in the file regardless of the hierarchy level, which is project->parent->groupId
. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in BeautifulSoup
?
推荐答案
您可以使用 recursive=False
在项目"节点内搜索:
You can search inside "project" node with recursive=False
:
groupId = soup.project.find('groupid', recursive=False).text
希望有所帮助.
这篇关于使用 BeautifulSoup 在 Python 中查找非递归 DOM 子节点的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!