HTML无法在“美丽汤"的浏览器中反映网页内容 [英] HTML does not reflect webpage content in browser for Beautiful Soup

查看:92
本文介绍了HTML无法在“美丽汤"的浏览器中反映网页内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Beautiful Soup从网站上抓取内容.做一些测试时,我得到以下输出(这只是最后的最后一点):

I am trying to scrape content from a website using Beautiful Soup. When doing some testing, I get the following output (this is just the last bit at the end):

<!-- 6. Load the app --> 
 <my-app>
    Loading...
 </my-app>

</body>

</html>

加载"部分是我想要的.为什么HTML不能为此加载?如果我在Google中查看源代码,也会发生同样的事情.如果看不到代码,该如何抓取.

The "Loading" part is what I want. Why is the html not loading for this? The same thing happens if I view source in Google. How can I scrape if I cannot see the code.

有问题的页面是:

https://searchusan.ama-assn.org/finder/usan/搜索/ */相关/1

谢谢.

推荐答案

Beautiful Soup将加载页面首次呈现时看到的页面.不幸的是,您要抓取的页面使用javascript,它会在初始页面加载后呈现所需的信息. Javascript总是会给Beautiful Soup带来问题,而我要使用javascript的唯一纯净的Beautiful Soup解决方案令人毛骨悚然,缓慢且容易崩溃/挂机.

Beautiful Soup loads the page that it sees when the page first renders. Unfortunately the page you are trying to scrape uses javascript which renders the information you want after the initial page load. Javascript always creates problems for Beautiful Soup and the only pure Beautiful Soup solution I got to work with javascript was frightfully hairy, slow, and crash/hang prone.

我建议您使用Selenium之类的带有美丽汤的工具,这样可以加载整个页面.

I recommend you use a tool like Selenium with Beautiful Soup which will allow the entire page to load.

这里是一个示例:使用Selenium和Beautiful Soup的Python抓取JavaScript

这篇关于HTML无法在“美丽汤"的浏览器中反映网页内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆