内容格式不正确的网页抓取问题 [英] Trouble Scraping Web Page With Malformed Content

查看：100 发布时间：2020/11/24 19:52:19 c# parsing screen-scraping html-agility-pack

本文介绍了内容格式不正确的网页抓取问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经编写了使用HtmlAgilityPack库的c#代码，以刮取位于以下位置的页面:世界上最大的市区(第2页).不幸的是，该页面包含格式错误的内容.

I have written c# code which utilizes the HtmlAgilityPack library in order to scrape a page located at: World's Largest Urban Areas (Page 2). Unfortunately the page consists of malformed content.

我对如何抓取此页面陷入僵局.我拥有的当前代码(如下所示)冻结了对HTML的解析:

I'm at an impasse on how to scrape this page. The current code I have (appearing below) freezes on parsing the HTML:

 HtmlNodeCollection cityRecords = _htmlDocument.DocumentNode.SelectNodes("//table[@class='boldtable']//tr[position() != 1]");
 CityNodes = (from node in cityRecords.Descendants()
              where node.Name == "td"
              select node).ToList();

目标是使用每个数据点解析页面上列出的每个城市；而已.寻找有关如何修改上述代码或使用其他免费提供的库的建议.

The goal is to parse each and every city listed on the page with each of the data points; nothing more. Looking for recommendations on how to modify the above code or use another freely available library.

谢谢！

内容格式不正确的网页抓取问题 [英] Trouble Scraping Web Page With Malformed Content

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录关闭

内容格式不正确的网页抓取问题 [英] Trouble Scraping Web Page With Malformed Content

问题描述

推荐答案

相关文章

C#/.NET最新文章

热门教程

热门工具

登录 关闭

登录关闭