Python requests.get(url) 返回 javascript 代码而不是页面 html [英] Python requests.get(url) returning javascript code instead of the page html

查看:129
本文介绍了Python requests.get(url) 返回 javascript 代码而不是页面 html的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常简单的问题.我正在尝试从linkedIn 页面的html 中获取工作描述,但是我没有获取页面的html,而是得到了几行看起来像javascript 代码的行.我对此很陌生,因此将不胜感激任何帮助!谢谢

I have a very simple problem. I'm trying to get the job description from the html of a linkedIn page, but instead of getting the html of the page I'm getting few lines that look like a javascript code instead. I'm very new to this so any help will be greatly appreciated! Thanks

这是我的代码:

import requests
url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/"
page_html = requests.get(url).text
print(page_html)

当我运行它时,我没有得到我期望包含工作描述的 html...我只是得到了几行 javascript 代码.

When I run this I don't get the html that I expect containing the job description...I just get few lines of javascript code instead.

推荐答案

一些网站根据访问网站的浏览器类型呈现不同的内容.LinkedIn 就是这种行为的完美例子.如果浏览器具有高级功能,网站可能会呈现更丰富"的内容——更具动态和风格的内容.使用机器人无助于查看这些网站.

Some websites present different content based on the type of browser that is accessing the site. LinkedIn is a perfect example of such behavior. If the browser has advanced capabilities, the website may present "richer" content – something more dynamic and styled. And using the bot won't help to see these websites.

要解决此问题,您需要按照以下步骤操作:

To solve this problem, you need to follow these steps:

  1. 此处下载 chrome-driver.选择与您的操作系统相匹配的那个.
  2. 解压驱动,放到某个目录下.例如,\usr
  3. 通过运行pip install selenium安装Selenium,这是一个python模块.请注意,selenium 依赖于另一个名为 msgpack 的包.所以,你应该首先使用这个命令安装它pip install msgpack.
  4. 现在,我们准备运行以下代码
  1. Download chrome-driver from here. Choose the one that matches your OS.
  2. Extract the driver and put it in a certain directory. For example, \usr
  3. Install Selenium which is a python module by running pip install selenium. Note that, selenium depends on another package called msgpack. So, you should install it first using this command pip install msgpack.
  4. Now, we are ready to run the following code

import requests
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def create_browser(webdriver_path):
    #create a selenium object that mimics the browser
    browser_options = Options()
    #headless tag created an invisible browser
    browser_options.add_argument("--headless")
    browser_options.add_argument('--no-sandbox')
    browser = webdriver.Chrome(webdriver_path, chrome_options=browser_options)
    print("Done Creating Browser")
    return browser


url = "https://www.linkedin.com/jobs/view/inside-sales-manager-at-stericycle-1089095836/"
browser = create_browser('/usr/chromedriver') #DON'T FORGET TO CHANGE THIS AS YOUR DIRECTORY
browser.get(url)
page_html = browser.page_source
print(page_html[-10:]) #prints dy></html>

现在,您拥有了整个页面.我希望这能回答你的问题!!

Now, you have the whole page. I hope this answers your question!!

这篇关于Python requests.get(url) 返回 javascript 代码而不是页面 html的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆