如何从 <p> 获取文本使用 XPath Selenium 和 Python 标记 [英] How get the text from the <p> tag using XPath Selenium and Python

查看:18
本文介绍了如何从 <p> 获取文本使用 XPath Selenium 和 Python 标记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要用 XPath 从 <p> 中的文本中捕获一行.我需要存储文本 Content-type: text/plain;charset=us-ascii 到 python 中的一个变量,但我得到下一个错误:

I need to catch with XPath one line from a text inside a <p>. I need to store the text Content-type: text/plain; charset=us-ascii into a variable in python but i get the next error:

selenium.common.exceptions.WebDriverException:消息:TypeError:需要一个元素或 WindowProxy,得到:[object Text] {}

这是我尝试的代码:

import selenium.webdriver as webdriver

browser = webdriver.Firefox()
browser.get('https://www.w3.org/Protocols/rfc1341/7_1_Text.html')

foo = browser.find_element_by_xpath('/html/body/p[5]/text()')
print(foo)

<h1>7.1  The Text Content-Type</h1>
<p>
The text Content-Type is intended for sending material which
is  principally textual in form.  It is the default Content-
Type.  A "charset" parameter may be  used  to  indicate  the
character set of the body text.  The primary subtype of text
is "plain".  This indicates plain (unformatted)  text.   The
default  Content-Type  for  Internet  mail  is  "text/plain;
charset=us-ascii".
<p>
Beyond plain text, there are many formats  for  representing
what might be known as "extended text" -- text with embedded
formatting and  presentation  information.   An  interesting
characteristic of many such representations is that they are
to some extent  readable  even  without  the  software  that
interprets  them.   It is useful, then, to distinguish them,
at the highest level, from such unreadable data  as  images,
audio,  or  text  represented in an unreadable form.  In the
absence  of  appropriate  interpretation  software,  it   is
reasonable to show subtypes of text to the user, while it is
not reasonable to do so with most nontextual data.
<p>
Such formatted textual  data  should  be  represented  using
subtypes  of text.  Plausible subtypes of text are typically
given by the common name of the representation format, e.g.,
"text/richtext".
<p>
<h3>7.1.1     The charset parameter</h3>
<p>
A critical parameter that may be specified in  the  Content-
Type  field  for  text  data  is the character set.  This is
specified with a "charset" parameter, as in:
<p>
     Content-type: text/plain; charset=us-ascii
<p>
Unlike some  other  parameter  values,  the  values  of  the
charset  parameter  are  NOT  case  sensitive.   The default
character set, which must be assumed in  the  absence  of  a
charset parameter, is US-ASCII.

推荐答案

打印文本 Content-type: text/plain;charset=us-ascii 你必须诱导 WebDriverWait对于 visibility_of_element_located(),您可以使用以下任一定位器策略:

To print the text Content-type: text/plain; charset=us-ascii you have to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • 使用 XPATHtext 属性:

driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).text)

  • 使用 XPATHget_attribute():

    driver.get("https://www.w3.org/Protocols/rfc1341/7_1_Text.html")
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//h3[contains(., 'The charset parameter')]//following-sibling::p[2]"))).get_attribute("innerHTML"))
    

  • 控制台输出:

  • Console Output:

    Content-type: text/plain; charset=us-ascii
    

  • 注意:您必须添加以下导入:

  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

  • 这篇关于如何从 &lt;p&gt; 获取文本使用 XPath Selenium 和 Python 标记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    相关文章
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆