当值具有实体时,Selenium WebDriver get_attribute返回href属性的截断值 [英] Selenium WebDriver get_attribute returns truncated value of href attribute when value has entities

查看:389
本文介绍了当值具有实体时,Selenium WebDriver get_attribute返回href属性的截断值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Selenium Webdriver(Python)从应用程序页面上的锚点选项卡中获取href属性值,并且返回的结果已被剥离.

I am trying to get href attribute value from anchor tab on a page in my application using selenium Webdriver (Python) and the result returned has part stripped off.

这是HTML代码段-

<a class="nla-row-text" href="/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0" data-reactid="790">

这是我正在使用的代码-

Here is the code I am using -

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.action_chains import ActionChains

driver = webdriver.Firefox()
driver.get("xxxx")

url_from_attr = driver.find_element(By.XPATH,"(//div[@class='nla-children mfr']/div/div/a)[1]").get_attribute("href")

url_from_attr_raw = "%r"%url_from_attr

print(" URL from attribute -->> " + url_from_attr)
print(" Raw string -->> " + url_from_attr_raw)

我得到的输出是-

/shopping/brands?search=kamera&page=0

而不是-

/shopping/brands?search=kamera&amp;nm=Canon&amp;page=0 OR
/shopping/brands?search=kamera&nm=Canon&page=0

这是由于URL中的实体表示,因为我看到实体之间的部分被剥离了吗?任何帮助或指针都很好

Is this because of the entity representation in the URL as I see part between entities stripped? Any help or pointer would be great

推荐答案

根据给定的 HTML ,您尝试过的定位器策略存在问题.您已经使用了索引[1]和容易出错的find_element.索引例如通过find_elements返回 List 时,可以应用[1].在此用例中,优化的表达式为:

As per the given HTML there is a issue with the Locator Strategy which you have tried. You have used an index [1] along with find_element which is error-prone. Index e.g. [1] can be applied when a List is returned through find_elements. In this usecase an optimized expression would be :

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']/div/div/a[@class='nla-row-text']").get_attribute("href")

定位器策略" 可以如下进行更优化:

The Locator Strategy can be more optimized as follows :

url_from_attr = driver.find_element(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text']").get_attribute("href")


更新A

根据您的评论,您仍然需要使用索引编制优化的 Locator Strategy :


Update A

As per your comment as you still need to use indexing the optimized Locator Strategy can be :

url_from_attr = driver.find_elements(By.XPATH,"//div[@class='nla-children mfr']//a[@class='nla-row-text'][1]").get_attribute("href")

get_attribute(attribute_name)

根据

get_attribute(attribute_name)

As per the Python-API Source :

    def get_attribute(self, name):
    """Gets the given attribute or property of the element.

    This method will first try to return the value of a property with the
    given name. If a property with that name doesn't exist, it returns the
    value of the attribute with the same name. If there's no attribute with
    that name, ``None`` is returned.

    Values which are considered truthy, that is equals "true" or "false",
    are returned as booleans.  All other non-``None`` values are returned
    as strings.  For attributes or properties which do not exist, ``None``
    is returned.

    :Args:
        - name - Name of the attribute/property to retrieve.

    Example::

        # Check if the "active" CSS class is applied to an element.
        is_active = "active" in target_element.get_attribute("class")

    """

    attributeValue = ''
    if self._w3c:
        attributeValue = self.parent.execute_script(
        "return (%s).apply(null, arguments);" % getAttribute_js,
        self, name)
    else:
        resp = self._execute(Command.GET_ELEMENT_ATTRIBUTE, {'name': name})
        attributeValue = resp.get('value')
        if attributeValue is not None:
        if name != 'value' and attributeValue.lower() in ('true', 'false'):
            attributeValue = attributeValue.lower()
    return attributeValue   


更新B

正如您在评论中提到的那样该方法返回的url值在页面上的任何地方都不存在,这意味着您也在尝试访问 href 属性早期的.因此,可以有以下两种解决方案:


Update B

As you mentioned in your comment the url value being returned by the method is not present anywhere on the page which implies that you are trying to access the href attribute too early. So there can be 2 solutions as follows :

  • Traverse the DOM Tree and construct a Locator which will uniquely identify the element and induce WebDriverwait with expected_conditions as element_to_be_clickable and then extract the href attribute.

出于调试目的,您可以为元素添加time.sleep(10)以便在 HTML DOM 中正确呈现,然后尝试提取 href 属性./p>

For debugging purpose you can add time.sleep(10) for the element to get rendered properly in the HTML DOM and then try to extract the href attribute.

这篇关于当值具有实体时,Selenium WebDriver get_attribute返回href属性的截断值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆