Python phantomjs加载网页不正确 [英] Python phantomjs loading webpage not correct

查看:250
本文介绍了Python phantomjs加载网页不正确的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在从此链接提取内容时遇到问题

I have an issue where extracting from this link

通过此链接为我带来数据,该链接本身就是主页. http://www.bursamalaysia.com/market /listed-companies/company-announcements/#/?category = all

brings me data from this link instead which is the main page itself. http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=all

知道为什么会这样吗? 我正在使用PhantomJS硒和漂亮的汤来搭配我.

Any idea why is this occuring ? I am using PhantomJS selenium and beautiful soup to assit me in this.

# The standard library modules
import os
import sys
import re
import sqlite3
import locale
# The wget module
import wget
import time
import calendar
from datetime import datetime
# The BeautifulSoup module
from bs4 import BeautifulSoup

# The selenium module
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


def getURLS(url):
    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
    driver.get(url) # load the web page
    src = driver.page_source
    #Get text and split it
    soup = BeautifulSoup(src, 'html5lib')

    print soup

link ='http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=FA&sub_category=FA1&alphabetical=All&company=5250'
getURLS(link)

Alex Lucaci的解决方案

def getURLS(url):
    driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
    driver.get(url) # load the web page
    src = driver.page_source
    category_select = Select(driver.find_element_by_xpath('//*[@id="bm_announcement_types"]'))
    category_select.select_by_visible_text("Financial Results")
    category_select2 = Select(driver.find_element_by_xpath('//*[@id="bm_sub_announcement_types"]'))
    category_select2.select_by_visible_text("Financial Results")
    category_select3 = Select(driver.find_element_by_xpath('//*[@id="bm_company_list"]'))
    category_select3.select_by_visible_text("7-ELEVEN MALAYSIA HOLDINGS BERHAD (5250)")
    driver.find_element_by_xpath('//*[@id="bm_company_announcements_search_form"]/input[1]').click()
    src = driver.page_source
    soup = BeautifulSoup(src, 'html5lib')
    link="http://www.bursamalaysia.com/market/listed-companies/company-announcements/#/?category=all"
    getURLS(link)

推荐答案

在保存源代码时,页面未完全加载您提交的帖子,因此请尝试等待一会儿再获取页面源代码:

When you are saving the source the page is not completely loaded with your submitted post so try to wait for a couple of second before fetching the page source:

def getURLS(url):
driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true'])
driver.get(url) # load the web page
time.sleep(5)# waiting for 5 seconds before fetching the source
src = driver.page_source
#Get text and split it
soup = BeautifulSoup(src, 'html5lib')

print soup

要执行下拉选择,请按照以下步骤导入Select类:from selenium.webdriver.support.ui import Select,然后必须选择如下的下拉元素:

To perform dropdown select you have import the Select class as follow : from selenium.webdriver.support.ui import Select and then you have to select the dropdown element like that:

category_select = Select(driver.find_element_by_xpath('//*[@id="bm_announcement_types"]'))
category_select.select_by_visible_text('Financial Results')

在我的示例中,我已经完成了-Category-下拉菜单,请按照每个类别的确切步骤进行操作. 请注意,通过xpath选择下拉列表是最好的方法,您可以使用Google Chrome->右键单击元素->检查->右键单击出现的右键菜单中的<select>->复制->来实现此目的.复制Xpath

In my example I've done it for the -Category- dropdown, follow the exact steps for every category. Note that selecting the dropdown by xpath is the best way and you can achieve this by using Google Chrome -> righ click on the element -> Inspect-> right click on the <select> in the right menu that appeared -> Copy -> Copy Xpath

选择所有元素后,必须单击提交"并等待几秒钟的加载时间,然后您将获取源代码.

When you`ve selected all the element you have to click the Submit and wait for a couple of seconds to load and after that you will fetch the source code.

让我知道我的回答是否对您有帮助.

Let me know if my answer helped you.

这篇关于Python phantomjs加载网页不正确的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆