HTML Agility Pack屏幕抓取XPATH不返回数据 [英] HTML Agility Pack Screen Scraping XPATH isn't returning data

查看：58 发布时间：2021/5/15 18:36:29 c# screen-scraping html-agility-pack web-scraping

本文介绍了HTML Agility Pack屏幕抓取XPATH不返回数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试为Digikey编写屏幕刮板，这将使我们的公司在零件停产时能够准确跟踪价格，零件可用性和产品更换情况.我在Chrome Devtools中看到的XPATH以及Firefox上的Firebug和我的C#程序看到的似乎之间存在差异.

I'm attempting to write a screen scraper for Digikey that will allow our company to keep accurate track of pricing, part availability and product replacements when a part is discontinued. There seems to be a discrepancy between the XPATH that I'm seeing in Chrome Devtools as well as Firebug on Firefox and what my C# program is seeing.

我当前正在抓取的页面为

The page that I'm scraping currently is http://search.digikey.com/scripts/DkSearch/dksus.dll?Detail&name=296-12602-1-ND

我当前正在使用的代码非常快捷，肮脏...

The code I'm currently using is pretty quick and dirty...

   //This function retrieves data from the digikey
   private static List<string> ExtractProductInfo(HtmlDocument doc)
   {
       List<HtmlNode> m_unparsedProductInfoNodes = new List<HtmlNode>();
       List<string> m_unparsedProductInfo = new List<string>();

       //Base Node for part info
       string m_baseNode = @"//html[1]/body[1]/div[2]";

       //Write part info to list
       m_unparsedProductInfoNodes.Add(doc.DocumentNode.SelectSingleNode(m_baseNode + @"/table[1]/tr[1]/td[1]/table[1]/tr[1]/td[1]"));
       //More lines of similar form will go here for more info
       //this retrieves digikey PN

       foreach(HtmlNode node in m_unparsedProductInfoNodes)
       {
           m_unparsedProductInfo.Add(node.InnerText);
       }

       return m_unparsedProductInfo;
   }

尽管我使用的路径似乎是正确的"，但当我查看列表"m_unparsedProductInfoNodes"时，我一直得到NULL.

Although the path I'm using appears to be "correct" I keep getting NULL when I look at the list "m_unparsedProductInfoNodes"

你知道这里发生了什么吗?我还要补充一点，如果我在baseNode上执行"SelectNodes"，它将仅返回一个div，唯一的重要子元素为"cs = ####"，这似乎因浏览器用户代理而异.如果我仍然尝试使用此方法(在无法识别的浏览器路径中输入/cs = 0)，它会非常适合，坚持认为我的表达式不会求值到节点集，但仍然让它们留下所有数据过去的问题div [2]返回为NULL.

Any idea what's going on here? I'll also add that if I do a "SelectNodes" on the baseNode it only returns a div with the only significant child being "cs=####" which seems to vary with browser user agents. If I try to use this in anyway (putting /cs=0 in the path for the unidentifiable browser) it pitches a fit insisting that my expression doesn't evaluate to a node set, but leaving them still leaves the problem of all data past div[2] is returned as NULL.

HTML Agility Pack屏幕抓取XPATH不返回数据 [英] HTML Agility Pack Screen Scraping XPATH isn't returning data

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

HTML Agility Pack屏幕抓取XPATH不返回数据 [英] HTML Agility Pack Screen Scraping XPATH isn&#39;t returning data

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

HTML Agility Pack屏幕抓取XPATH不返回数据 [英] HTML Agility Pack Screen Scraping XPATH isn't returning data

登录关闭