BeautifulSoup find和find_all不能按预期工作 [英] BeautifulSoup find and find_all not working as expect

查看:76
本文介绍了BeautifulSoup find和find_all不能按预期工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我刚刚开始使用BeautifulSoup,但遇到了问题.我在下面设置了一个html代码段,并创建了一个BeautifulSoup对象:

I just starting using BeautifulSoup and I am encountering a problem. I set up a html snippet below and make a BeautifulSoup object:

html_snippet = '<p class="course"><span class="text84">Ae 100. Research in Aerospace. </span><span class="text85">Units to be arranged in accordance with work accomplished. </span><span class="text83">Open to suitably qualified undergraduates and first-year graduate students under the direction of the staff. Credit is based on the satisfactory completion of a substantive research report, which must be approved by the Ae 100 adviser and by the option representative. </span> </p>'
subject = BeautifulSoup(html_snippet)

我已经尝试过执行以下几种find和find_all操作,但是我得到的只是什么都不是或一个空列表:

I have tried doing several find and find_all operations like below but all I am getting is nothing or an empty list:

subject.find(text = 'A') 
subject.find(text = 'Research')
subject.next_element.find('A')
subject.find_all(text = 'A')

以前,当我从计算机上的html文件创建BeautifulSoup对象时,find和find_all操作都工作正常.但是,当我从通过urllib2在线阅读网页中拉出html_snippet时,出现了问题.

When I created the BeautifulSoup object from a html file on my computer before, the find and find_all operations were all working fine. However, when I pulled the html_snippet from reading a webpage online through urllib2, I am getting problems.

谁能指出问题出在哪里?

Can anyone point out where the issue is?

推荐答案

像这样传递参数:

import re
subject.find(text=re.compile('A'))

text 过滤器的默认行为是匹配整个身体.传递正则表达式可让您在片段上进行匹配.

The default behavior for the text filter is to match on the entire body. Passing in a regular expression lets you match on fragments.

若要仅匹配以A开头的正文,可以使用以下命令:

To match only bodies beginning with A, you can use the following:

subject.find(text=re.compile('^A'))

要仅匹配包含以A开头的单词的正文,可以使用:

To match only bodies containing words that begin with A, you can use:

subject.find_all(text = re.compile(r'\bA'))

很难确切地说出您要查找的内容,如果我误解了您的要求,请告诉我.

It's difficult to tell more specifically what you're looking for, let me know if I've misinterpreted what you're asking.

这篇关于BeautifulSoup find和find_all不能按预期工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆