C#HtmlAgilityPack Xpath问题,找不到H4内在文字 [英] C# HtmlAgilityPack Xpath problems, trouble finding H4 innertext

查看:120
本文介绍了C#HtmlAgilityPack Xpath问题,找不到H4内在文字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一种方法可以在网页的一部分中找到我正在寻找的所有东西,除非我试图在节点内找到一个H4。 // div [@ class ='job']的xpath正确找到我正在寻找的所有8个事件。但是,在我尝试浏览8次发生之后,我遇到了问题。



这是我正在看的代码的HTML输出。

 < div class =job_art> 
< div style =background:#444 url​​('https://a.akamaihd.net/mwfb/mwfb/graphics/jobs/chicago/meet_with_the_south_gang_family_ 760x225_01.jpg')50%0不重复; >
< / div>
< / div>
< div class =job_details clearfix>
< h4>与南方家族会面< / h4>
< div class =mastery_bartitle =表示你已经掌握了多少这个工作,掌握了工作获得技能点>< div style =width:0% class =noHighlight>< / div>< p> 100%掌握< / p>< div style =width:0%>< p> 100%Mastered< / p>< / div>< / div>< ul class =使用clearfixstyle =width:100px;>< li class =energybase_value =2current_value =2title = > 2< / li>< / ul>< ul class =支付clearfixstyle =width:120pxtitle =在做Jobs时获得XP,City Cash和Loot项目。 >< li class =experiencebase_value =2current_value =2> 2< / li>< li class =cash_icon_jobs_8base_value =2current_value =2> / li>< / ul>< a id ='btn_dojob_1'class ='sexy_button_new sexy_energy_new medium orange impulse_buy'selector ='#inner_page'requirements ='{energy:2}'precall ='BrazilJobs.preDoJob' ='BrazilJobs .doJob'href ='remote / h.php?job = 1& tab = 1& clkdiv = btn_dojob_1'>< span>< span> Do Job< / span>< / span>< / a> < / div>< div class =job_additional_results>< div id =loot-bandit-1class =lootContainer>< / div>< div class =previous_loot>< ; / div>< / div>< div id =bandit-contextual-1class =contextual bandit-contextual>< / div>

它总是会找到像Clams(Bank)这样的东西,我不知道如何。问题始于

  string MissionName = node.SelectSingleNode(// h4)。InnerText;我已尝试过许多xpath,像// div [h4 [1]],h4 [1] 。我只需要第一次发生,因为它只发生一次。我的代码中出现了什么问题?



我需要内部文本与南方家人见面

  public static List< string> GetMissions()
{
列表< string> FoundMissions = new List< string>();

HTML_CONTENT = HTML_CONTENT.Replace(\r,);
HTML_CONTENT = HTML_CONTENT.Replace(\t,);
HTML_CONTENT = HTML_CONTENT.Replace(\\\
,);
HTML_CONTENT = HTML_CONTENT.Replace(\\,);

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.Load(new StringReader(HTML_CONTENT));

if(doc.DocumentNode == null)
return FoundMissions;
var DivNodes = doc.DocumentNode.SelectNodes(// div [@ class ='job']);
if(DivNodes!= null)
{
string Count = DivNodes.Count.ToString();

像我说的一样,它发现所有8次发生都很好。我调试并得到上面的HTML,我放在这个顶部,所以我认为这个部分是好的。

  foreach(HtmlNode节点在DivNodes中)
{

string MissionName = node.SelectSingleNode(// h4)。InnerText;
}
}

return FoundMissions;
}


}


解决方案

您需要通过添加单点(节点相关$ c>)开始:

  string MissionName = node.SelectSingleNode(.// h4)。InnerText; 

否则,XPath将从根节点搜索。这可能是因为您的尝试导致您的结果不正确。


I have a method that will find everything I am looking for in a section of a webpage, except I am getting stuck trying to find an H4 within nodes. The xpath for //div[@class='job '] correctly finds all 8 occurances that I am looking for. But after I try and traverse the 8 occurances I hit problems.

Here is the HTML output of the code I am looking inside.

<div class="job_art ">
<div style="background: #444      url('https://a.akamaihd.net/mwfb/mwfb/graphics/jobs/chicago/meet_with_the_south_gang_family_    760x225_01.jpg') 50% 0 no-repeat;">
</div>
</div>
<div class="job_details clearfix">
<h4>Meet With the South Gang Family</h4>
<div class="mastery_bar" title="Indicates how much of this Job you&#39;ve mastered.      Master Jobs to earn Skill Points."><div style="width: 0%" class="noHighlight"></div><p>100%     Mastered</p><div style="width: 0%"><p>100% Mastered</p></div></div><ul class="uses clearfix"     style="width:100px;"><li class="energy" base_value="2" current_value="2" title="Spend 2     Energy to do this Job once.">2</li></ul><ul class="pays clearfix" style="width:120px"     title="Earn XP, City Cash and Loot items while doing Jobs."><li class="experience" base_value="2" current_value="2">2</li><li class="cash_icon_jobs_8" base_value="2" current_value="2">2</li></ul><a id='btn_dojob_1' class='sexy_button_new sexy_energy_new medium orange impulse_buy' selector='#inner_page' requirements='{"energy":2}' precall='BrazilJobs.preDoJob' callback='BrazilJobs.doJob' href='remote/h.php?job=1&tab=1&clkdiv=btn_dojob_1'><span><span>Do Job</span></span></a></div><div class="job_additional_results"><div id="loot-bandit-1" class="lootContainer"></div><div class="previous_loot"></div></div><div id="bandit-contextual-1" class="contextual bandit-contextual"></div>

It always finds something else like "Clams(Bank)", which I have no idea how. The problem starts with

  string MissionName = node.SelectSingleNode("//h4").InnerText;

I have tried numerous xpath, like //div[h4[1]], h4[1]. I only need the first occurence since it only occurs once. Where does the problem start in my code?

I need the inner text "Meet With the South Gang Family"

public static List<string> GetMissions()
    {
        List<string> FoundMissions = new List<string>();

        HTML_CONTENT = HTML_CONTENT.Replace("\r", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\t", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\n", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\\", "");

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(new StringReader(HTML_CONTENT));

        if(doc.DocumentNode == null)
            return FoundMissions;
        var DivNodes = doc.DocumentNode.SelectNodes("//div[@class='job ']");
        if (DivNodes != null)
        {
            string Count = DivNodes.Count.ToString();

Like I said, it finds all 8 occurances fine. I debugged and got the above HTML i put at the top of this, so I think this part is fine.

            foreach (HtmlNode node in DivNodes)
            {

                string MissionName = node.SelectSingleNode("//h4").InnerText;
            }
        }

        return FoundMissions;
        }


    }

解决方案

You need to explicitly tell that the XPath query is relative to current node by adding single dot (.) at the beginning :

string MissionName = node.SelectSingleNode(".//h4").InnerText;

otherwise, the XPath will search from root node. That's likely what cause you got incorrect result with your attempt.

这篇关于C#HtmlAgilityPack Xpath问题,找不到H4内在文字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆