无法从布局复杂的表格中抓取三个字段 [英] Can't scrape three fields from a table with complicated layout
问题描述
我用python和硒创建了一个脚本,以从网站上可用的表中解析三个字段franking credit
,gross divident
和further information
.仅当使浏览器单击其中带有加号的 圆形黄色按钮 时,才会显示最后两个字段.
I've created a script in python together with selenium to parse three fields franking credit
,gross divident
and further information
from a table available in a website. The last two fields are revealed only when the browser is made to click on a circular yellow button having plus sign within it.
但是,单击按钮时,它们变为红色,表示已显示信息.
However, when the buttons are clicked, they turn into red which indicates that the information got displayed.
我的脚本可以单击所有按钮,但不能从该表中抓取三个字段.
My script can click on all the buttons but it can't scrape the three fields from that table.
我已附上一张图片,向您展示它的真实外观.
I've attached an image to show you how it really looks like.
我知道如果我向此https://www.sharedividends.com.au/wp-content/custom/ajaxfile.php?code=MLT
发送带有相关有效负载的帖子http请求,则可以获取json中的所有表格字段,但这不是我想要解决的方式.
I know if I send a post http requests with concerning payload to this https://www.sharedividends.com.au/wp-content/custom/ajaxfile.php?code=MLT
, I can get all the tabular fields in json but that is not how I wanna solve this.
我尝试过:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.sharedividends.com.au/mlt-dividend-history/"
driver = webdriver.Chrome()
driver.get(url)
table = driver.find_element_by_css_selector("#divTable")
driver.execute_script("arguments[0].scrollIntoView();",table)
for items in driver.find_elements_by_css_selector("td.sorting_1"):
driver.execute_script("arguments[0].scrollIntoView();",items)
items.click()
for elems in driver.find_elements_by_css_selector("#divTable tbody tr"):
franking_credit = elems.find_elements_by_css_selector("td")[5].text
gross_divident = elems.find_elements_by_css_selector("td")[6].text
further_info = elems.find_elements_by_css_selector("td")[7].text
print(franking_credit,gross_divident,further_info)
driver.quit()
当我运行上述脚本时,它会抛出此错误IndexError: list index out of range
并指向franking_credit =
这行.
Whe I run the above script it throws this error IndexError: list index out of range
pointing at franking_credit =
this line.
这是该表的外观.我已经在下面感兴趣的图像中标记了该表中的三个字段.
This is how that table looks like. I've marked the three fields in that table within the image below which I'm interested in.
如何解析该表中的三个字段?
How can I parse the three fields from that table?
推荐答案
您将收到以下错误消息,因为在运行自动化脚本时,该脚本显示20行带有其他属性,而不是10行.请尝试以下代码.
You are getting following error because when run automation scripts it showing 20 rows with some other attribute instead of 10 rows.Try the following code.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = "https://www.sharedividends.com.au/mlt-dividend-history/"
driver = webdriver.Chrome()
driver.get(url)
table = driver.find_element_by_css_selector("#divTable")
driver.execute_script("arguments[0].scrollIntoView();",table)
for items in driver.find_elements_by_css_selector("td.sorting_1"):
driver.execute_script("arguments[0].scrollIntoView();",items)
items.click()
for elems in driver.find_elements_by_css_selector("#divTable tbody tr[role='row']"):
franking_credit = elems.find_elements_by_css_selector("td")[5].text
gross_divident = elems.find_elements_by_css_selector("td")[6].get_attribute('textContent')
further_info = elems.find_elements_by_css_selector("td")[7].get_attribute('textContent')
print(franking_credit, gross_divident,further_info)
控制台上的输出:
$ 0.0446 $ 0.1486 10.4C FRANKED @ 30%; DRP NIL DISCOUNT
$ 0.0107 $ 0.0357 2.5C FRANKED@30%; SP ECIAL; DRP SUSP
$ 0.0386 $ 0.1286 9C FRANKED @ 30%; DR P NIL DISCOUNT
$ 0.0437 $ 0.1457 10.2C FRANKED @ 30%; DRP NIL DISCOUNT
$ 0.0377 $ 0.1257 8.8C FRANKED @ 30%; DRP NIL DISCOUNT
$ 0.0429 $ 0.1429 10C FRANKED @ 30%; D RP NIL DISCOUNT
$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP NIL DISCOUNT
$ 0.0424 $ 0.1414 9.9C FRANKED @ 30%; DRP NIL DISCOUNT
$ 0.0373 $ 0.1243 8.7C FRANKED @ 30%; DRP
$ 0.0441 $ 0.1471 10.3C FR@30%;0.4C SP ECIAL;DRP;NIL DIS
这篇关于无法从布局复杂的表格中抓取三个字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!