美丽的汤.找不到任何东西 [英] beautiful soup .find can't find anything
问题描述
我正尝试在Facebook网上论坛中删除帖子:
I am trying to scrap posts in a Facebook group:
URL = 'https://www.facebook.com/groups/110354088989367/'
headers = {
"User-Agent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36'
}
def checkSubletGroup():
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
posts = soup.find_all("div", {"class_": "text_exposed_root"})
print(soup.prettify())
for post in posts:
print(post)
checkSubletGroup()
带有 class ="text_exposed_root"
的 div
显然在那里,因为我可以使用 CTRL f 找到它当我在 print(soup.prettify())
中进行搜索时,但是当我在 soup.find_all("div",{"class_":"text_exposed_root"})
中进行搜索时返回一个空列表,显然还有许多其他的类名.
The div
with class="text_exposed_root"
is clearly there because I can find it with CTRLf when I search in print(soup.prettify())
, but when I do soup.find_all("div", {"class_": "text_exposed_root"})
it is returning an empty list, so are many other class names that are clearly there.
请帮助.
推荐答案
问题是所有这些< div>
都在注释掉的HTML块内.
The problem is that all those <div>
are inside a commented out HTML block.
类似的方法可以解决此问题:
Something like this can workaround the issue:
soup = BeautifulSoup(page.text.replace('<!--', '').replace('-->', ''), 'html.parser')
之后,您可以简单地执行以下操作:
After that you can simply do:
posts = soup.find_all('div', 'text_exposed_root')
希望对您有帮助.
这篇关于美丽的汤.找不到任何东西的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!