如何通过PhantomJS驱动程序从网站的HTML中提取数据 [英] How to extract data from the HTML of the website through PhantomJS Driver

查看:99
本文介绍了如何通过PhantomJS驱动程序从网站的HTML中提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试解析以下网页 https://shop.sprouts.com/shop/flyer 使用.Net,Selenium,PhantomJs.我在元素文本中看到的数据与我在屏幕上看到的数据完全不同.有没有更好的方法来解析网页?

i am trying to parse the following webpage https://shop.sprouts.com/shop/flyer using .Net, Selenium, PhantomJs. The data that I am seeing in the element's text is completely different than what I see on the screen. Is there a better way to parse the webpage?

using Microsoft.VisualStudio.TestTools.UnitTesting;
using OpenQA.Selenium;
using OpenQA.Selenium.PhantomJS;
[TestClass]
  public class UnitTest1
  {
    const string PhantomDirectory = @"..\..\..\packages\PhantomJS.2.1.1\tools\phantomjs";

[TestMethod]
    public void GetSproutsWeeklyAdDetails()
    {
      using (IWebDriver phantomDriver = new PhantomJSDriver(PhantomDirectory))
      {
        phantomDriver.Navigate().GoToUrl("https://shop.sprouts.com/shop/flyer");
        var elements = phantomDriver.FindElements(By.ClassName("cell-title-text"));
      }
    }
}

推荐答案

根据 WebSite https://shop.sprouts.com/shop/flyer解析元素文本中看到的数据,您需要引入 WebDriverWait 可实现所有所需元素的可见性,您可以使用以下解决方案:

As per the WebSite https://shop.sprouts.com/shop/flyer to parse the data that you are seeing in the element's text you need to induce WebDriverWait for the visibility of all the desired elements and you can use the following solution:

  • 解决方案:

  • Solution:

IList<IWebElement> elements = new WebDriverWait(driver, TimeSpan.FromSeconds(3)).Until(ExpectedConditions.VisibilityOfAllElementsLocatedBy(By.XPath("//span[@class='cell-title-text' and @ng-bind-html='productTitle()']")));
foreach (IWebElement element in elements)
{
    Console.WriteLine(element.GetAttribute("innerHTML"));
}

  • 等效的Python范例:

  • Equivalent Python Exmaple:

    driver.get('https://shop.sprouts.com/shop/flyer')
    myList = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.XPATH, "//span[@class='cell-title-text' and @ng-bind-html='productTitle()']")))
    for item in myList:
        print(item.text)
    

  • 控制台输出:

  • Console Output:

    Sweet Corn, 1 EA
    Cantaloupe Melons, 1 LB
    Red Cherries
    Half Chicken Breast
    Roma Tomatoes
    100% Grass Fed Ground Beef Value Pack
    Colby Jack Rbst Free
    Walnut Halves & Pieces
    

  • 这篇关于如何通过PhantomJS驱动程序从网站的HTML中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆