从最后一个超链接标签获取文本 [英] Getting text from the last hyperlink tag
问题描述
我正在尝试访问超链接文本,这些文本将始终存储为某个网站上嵌套在100 div标签中的最后一个超链接标签.在下面的示例中,"00A17"就是我要从单个标签中提取的内容.
I'm trying to access hyperlink texts that will always be stored as the last hyperlink tag nested in 100 div tags on a certain website. In the below example, "00A17" would be what I'm trying to extract from a single tag.
HTML代码:
<div class="headlineText">
<a class="mrnum" title="Full MathSciNet Item" href="/mathscinet/search/publdoc.html?arg3=&co4=AND&co5=AND&co6=AND&co7=AND&dr=all&extend=1&l=100&pg4=AUCN&pg5=TI&pg6=PC&pg7=ALLF&pg8=ET&review_format=html&s4=&s5=&s6=&s7=%22Featured%20Review%22&s8=Journals&sort=Newest&vfpref=html&yearRangeFirst=&yearRangeSecond=&yrop=eq&r=9&mx-pid=2650657"><strong>MR2650657</strong></a>
<a class="item_status" href="/mathscinet/help/fullitem_help_full.html#indexed"><span>Indexed</span></a>
<a href="/mathscinet/search/author.html?mrauthid=115370">Logan, J. David</a> <span class="title"><span class="searchHighlight">Featured review</span>: <span class="it">Introduction to the foundations of applied mathematics</span> [<a href="/mathscinet/search/publdoc.html?pg1=MR&s1=2526777&loc=fromrevtext">MR2526777</a>].</span>
<a href="/mathscinet/search/journaldoc.html?id=5174"> <em>SIAM Rev.</em></a> <a href="/mathscinet/search/publications.html?pg1=ISSI&s1=281478"> 52 </a>
<a href="/mathscinet/search/publications.html?pg1=ISSI&s1=281478"> (2010), </a>
<a href="/mathscinet/search/publications.html?pg1=ISSI&s1=281478"> no. 1,</a> 173–178.
<a href="/mathscinet/search/mscdoc.html?code=00A17">00A17</a>
</div>
我写的试图做到这一点的代码基本上是一团糟
The code I wrote to attempt this is basically a mess
用于远程访问的Python代码:
Python Code for remote access:
headlineTexts = []
mscLinks = []
for x in range(2,102):
headlineTexts.extend(driver.find_elements_by_xpath("//*[@id='content']/form/div[3]/div[2]/div/div/div[%d]/div[2]"%x))
mscLinks.extend(headlineTexts[x-2].find_elements_by_tag_name("a")[last()])
基本上,有100个"headlineText" div标签(索引从2到101),我需要从每个链接中获取上述超链接文本(因此需要进行迭代).我基本上是创建一个headlineText元素列表,然后尝试为每个headlineText元素提取标记名称为"a"的最后一个子元素.不幸的是,当我尝试运行此程序时,我得到了
Essentially, there are 100 "headlineText" div tags (indexed from 2 to 101), and I need to get the aforementioned hyperlink text out of each one (hence the iteration). I'm basically creating a list of headlineText elements and then for each headlineText element I'm attempting to extract the last subelement with the tag name "a". Unfortunately, when I attempt to run this I get a
TypeError:"WebElement"对象不可迭代
TypeError: 'WebElement' object is not iterable
这对我来说很奇怪,因为我只使用复数的find_elements(),它应该生成一个可迭代的列表.我可能是错误地使用了last()吗?
which is strange to me because I'm only using the plural find_elements() and that should generate an iterable list. Am I perhaps using last() incorrectly?
推荐答案
检索超链接文本,该文本将始终作为某个网站上嵌套在100个div标签中的最后一个超链接标签存储. 00A17 ,您可以使用以下代码行:
To retrieve the hyperlink text that will always stored as the last hyperlink tag nested in 100 div tags on a certain website e.g. 00A17 you can use the following line of code :
print(driver.find_element_by_xpath("div[@class='headlineText']//a[last()]").get_attribute("innerHTML"))
这篇关于从最后一个超链接标签获取文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!