HTML Agility Pack从< li>中获取特定内容.标签 [英] Html Agility Pack get specific content from a <li> tag

查看:66
本文介绍了HTML Agility Pack从< li>中获取特定内容.标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要此网站上的一些文字 https://具体来说,www.amazon.com/dp/B074J9SSPD ,我需要在关于产品"部分下提取数据.

I need some text from this website https://www.amazon.com/dp/B074J9SSPD, to be specific, I need to extract data under the "About the Product" section.

我尝试了

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = new HtmlDocument();
doc = web.Load("https://amazon.com/dp/B074J9SSPD");

foreach(var node in doc.DocumentNode.SelectNodes("//li[@class='showHiddenFeatureBullets']") {
  string ar = node.InnerText;
  HtmlAttribute att = node.Attributes["class"];
  MessageBox.Show(ar.ToString());
  if (att.Value.Contains("showHiddenFeatureBulletsway,

  }
}

请提出正确的方法,我得到的是空字符串.

Plz suggest the right way , I'm getting blank string.

推荐答案

您的原始代码(在第一次编辑之前)对我有用,只是在foreach循环上缺少正确的括号.我也将节点分解成它自己的变量,以使其更易于阅读,但这应该对您有用.我在本地对其进行了测试,并且对我有用.

Your original code (before that first edit) worked for me it just was missing the right parentheses on the foreach loop. I also broke out the nodes into it's own variable to make it easier to read but this should work for you. I tested it locally and it worked for me.

HtmlWeb web = new HtmlWeb();
HtmlDocument doc = new HtmlDocument();
doc = web.Load("https://amazon.com/dp/B074J9SSPD");

var aboutProductNodes = doc.DocumentNode.SelectNodes("//li[@class='showHiddenFeatureBullets']");

foreach (var node in aboutProductNodes)
{
    string ar = node.InnerText;
    HtmlAttribute att = node.Attributes["class"];
    MessageBox.Show(ar.ToString().Trim());
    if (att.Value.Contains("showHiddenFeatureBullets"))
    {

    }
}

但是我建议您研究一下Amazon API.它工作了大约一半的时间,然后另一半是亚马逊答复使用他们的api而不是通过网络抓取它们.所以那也可能是您的问题的一部分.

However I would suggest looking into the amazon API. It worked about half the time and then the other half was Amazon replying to use their api and not web scrape them. So that might have been a part of your problem too.

https://developer.amazon.com/services-and-apis

这篇关于HTML Agility Pack从< li>中获取特定内容.标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆