Web抓取xpath返回null [英] Web scraping xpath is returning null

查看:98
本文介绍了Web抓取xpath返回null的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,



我正在进行网页抓取,但xpath返回null。

我从昨天开始尝试这个,当我运行我的时候今天早上的代码它返回了我未格式化的结果我不知道如何因为从那以后它再次返回null值。我不知道问题是什么。我非常感谢你的帮助。以下是我的代码。



Hello,

I am performing web scraping but xpath returns null.
I am trying this since yesterday and when i ran my code in the morning today it returned me unformatted result i am not sure how because since then it is returning null value again. I do not know what the problem is. I would highly appreciate your help. Below is my code.

private async Task<List<NameAndScore>> WebDateFromPage(int pagenum)
       {
           string url = "http://www.realtor.com/realestateagents/New-York_NY/photo-1";
 

           if (pagenum != 0)
               url = "http://www.realtor.com/realestateagents/New-York_NY/photo-1/pg-" + pagenum.ToString();
 
           var doc = await Task.Factory.StartNew(() => web.Load(url));
 

           //var nameNodes = doc.DocumentNode.SelectNodes("//*[@id=\"agent_list_wrapper\"]/div[2]/div[2]/div/div[1]/a");
           //var scoreNodes = doc.DocumentNode.SelectNodes("//*[@id=\"agent_list_wrapper\"]//div//div//div//div//span");

           var nameNodes = doc.DocumentNode.SelectNodes("//*[@id=\"agent_list_wrapper\"]//div//div//div/div//a");
           var scoreNodes = doc.DocumentNode.SelectNodes("//*[@id=\"agent_list_wrapper\"]//div//div//div//div");
 
           if (nameNodes == null || scoreNodes == null)
               return new List<NameAndScore>();
 
           var names = nameNodes.Select(node => node.InnerText);
           var scores = scoreNodes.Select(node => node.InnerText);
 
           return names.Zip(scores, (name, score) => new NameAndScore() { Name = name, Score = score }).ToList();
       }
 
        private async void Form1_Load(object sender, EventArgs e)
       {
           int pagenum = 0;
           var rankings = await WebDateFromPage(0);
           while (rankings.Count > 0)
           {
               foreach (var ranking in rankings)
                   table.Rows.Add(ranking.Name, ranking.Score);
               pagenum = pagenum + 1;
               rankings = await WebDateFromPage(pagenum);
           }
 
       }





我尝试过:



我已尝试过每种可能的XPATH组合。试图复制附加网站的XPATH的不同标签,但每次都返回null。我不知道问题是什么,因为它只返回一次



What I have tried:

I have tried every possible combination of XPATH. Tried to copy different tags of XPATH of the attached website but it returns null every time. I do not what the problem is as it returned value just once

推荐答案

也许正在返回的文件中包含格式错误的html。尝试将代码放在try / catch块中以查看会发生什么。



此外,尝试在WebDataFromPage方法中重新安装Web客户端。



最后,当你等待代码返回时,重点是使用异步代码是什么意思?我不确定那里有什么实实在在的好处。
Maybe the document being returned has malformed html in it. Try putting your code inside a try/catch block to see what happens.

Also, try reinstantiating the web client INSIDE your WebDataFromPage method.

Finally, what's the point is using async code when you're waiting for the code to return anyway? I'm not sure there's any tangible benefit there.


这篇关于Web抓取xpath返回null的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆