HtmlAgilityPack无法从网页获取所有html代码/文本 [英] HtmlAgilityPack isn't getting all the html code/text from a web page

查看:58
本文介绍了HtmlAgilityPack无法从网页获取所有html代码/文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于初学者,请先谢谢您!

For starters, thank you in advance!

我能够从网页中提取一段类似于以下代码块的代码.

I am able to extract a section of code from a web page that looks similar to the following block of code.

<div id="playerStats">
  <div id="hp"><span class="title">HP:</span></div>
  <div id="mp"><span class="title">MP:</span></div>
  <div id="magicResist"><span class="title">Magic Resist</span></div>
  <div id="physicalDefend"><span class="title">Physical Defence</span></div>
  <div id="phyCriticalReduceRate"><span class="title">Strike Resist</span></div>
  <div id="phyCriticalDamageReduce"><span class="title">Strike fortitude</span></div>
  <div id="physicalRight"><span class="title">Main Hand Attack</span></div>
  <div id="accuracyRight"><span class="title">Main Hand Accuracy</span></div>
  <div id="criticalRight"><span class="title">Main Hand Critical</span></div>
  <div id="physicalLeft"><span class="title">Off Hand Attack</span></div>
  <div id="accuracyLeft"><span class="title">Off Hand Accuracy</span></div>
  <div id="criticalLeft"><span class="title">Off Hand Critical</span></div>
  <div id="attackSpeed"><span class="title">Attack Speed</span></div>
  <div id="magicalBoost"><span class="title">Magic Boost</span></div>
  <div id="magicalAccuracy"><span class="title">Magic Accuracy</span></div>
  <div id="magicalCriticalRight"><span class="title">Crit Spell</span></div>
  <div id="castingTimeRatio"><span class="title">Casting Speed</span></div>
  <div id="block"><span class="title">Block</span></div>
  <div id="dodge"><span class="title">Evasion</span></div>
</div>

从以下uri

中获取有关此视频游戏角色统计信息页面的. (并且您应该清楚地在页面中间看到统计信息表.) 如果您使用类似于Google Chrome浏览器F-12的浏览器功能来查看html源代码,则会注意到/span和/div之间存在与以下代码相似的值:

from the following uri for this character statistics page of a video game. (And you should clearly see the table of stats in the middle of the page.) If you use your browser's function similar to Google Chrome's F-12 to view the html source code, you will notice there are values in between /span and /div similar to the following code:

<div id="playerStats">
  <div id="hp"><span class="title">HP:</span>"12213"</div>
  <div id="mp"><span class="title">MP:</span>"4000"</div>
  <div id="magicResist"><span class="title">Magic Resist</span>"4618"</div>
  <div id="physicalDefend"><span class="title">Physical Defence</span>"1725"</div>
  <div id="phyCriticalReduceRate"><span class="title">Strike Resist</span>"1518"</div>
  <div id="phyCriticalDamageReduce"><span class="title">Strike fortitude</span>"392"</div>
  <div id="physicalRight"><span class="title">Main Hand Attack</span>"201"</div>
  <div id="accuracyRight"><span class="title">Main Hand Accuracy</span>"201"</div>
  <div id="criticalRight"><span class="title">Main Hand Critical</span>"201"</div>
  <div id="physicalLeft"><span class="title">Off Hand Attack</span>"201"</div>
  <div id="accuracyLeft"><span class="title">Off Hand Accuracy</span>"201"</div>
  <div id="criticalLeft"><span class="title">Off Hand Critical</span>"201"</div>
  <div id="attackSpeed"><span class="title">Attack Speed</span>"201"</div>
  <div id="magicalBoost"><span class="title">Magic Boost</span>"201"</div>
  <div id="magicalAccuracy"><span class="title">Magic Accuracy</span>"201"</div>
  <div id="magicalCriticalRight"><span class="title">Crit Spell</span>"201"</div>
  <div id="castingTimeRatio"><span class="title">Casting Speed</span>"201"</div>
  <div id="block"><span class="title">Block</span>"201"</div>
  <div id="dodge"><span class="title">Evasion</span>"201"</div>
</div>

接着,我正在使用以下代码来检索上述的第一段html代码.

And to go on, I am using the following code to retrieve the first block of html code described above.

HtmlDocument doc = new HtmlDocument();
doc.Load(MyTestFile);

foreach(var node in doc.DocumentNode.SelectNodes("//div[@id='playerStats']/div/span"))
{
    Console.WriteLine(node.InnerText + " " + (node.NextSibling != null ?  node.NextSibling.InnerText : null));
}

我使用了WebRequest,WebClient,WebBrowser和HtmlWeb-agilitypack类将html文档从网络上拉下来.但是,我希望从中提取的最重要的部分并未在文档中被下拉,这是与Hp,mp等相关的值.期望的值在上面的html代码的第二个块中进行了描述.

I have used the WebRequest, WebClient, WebBrowser and HtmlWeb-agilitypack classes to pull the html document down from the web. However, the most important part from which I wish to extract is not being pulled down in the document which is the values associated with Hp, mp, etc... The expected values are described in the second block of html code above.

如何获取我的代码以将文档中的此简单文本也分解为我分析?

How can I get my code to bring down this simple text in the document for me to parse as well?

推荐答案

通过使用POST方法和一些参数调用http://psykopats.net/loadAion.php来动态加载玩家信息,其中一个参数是player并标识播放器.在您的情况下,参数为:

Player info is loaded dynamically by calling http://psykopats.net/loadAion.php with POST method and a few parameters, one of which is player and identifies the player. In your case, the parameters were:

server:66
type:1
player:299345

您可以查看此问题以了解如何在WebClient中使用POST.

You can take a look at this question to see how to use POST with WebClient.

响应是一个JSON字符串,其中包含您要查找的内容

The response is a JSON string that, among other things, contains what you are looking for:

stat: {baseCriticalResist:0, magicCriticalResist:0, physicalDefend:1402, baseMagicalSpeed:1,…}
accuracyLeft: 2617
accuracyRight: 2617
agi: 110
airResist: 0
attackSpeed: 1.1
baseAccuracyLeft: 1705
baseAccuracyRight: 1705
baseAgi: 110
baseAirResist: 0
baseAttackSpeed: 1.1
baseBlock: 837
baseCastingTimeRatio: "1.0"
baseCriticalDefend: 0
baseCriticalLeft: 53
baseCriticalResist: 0
baseCriticalRight: 103
baseDex: 110
baseDodge: 1839
baseDp: 4000
baseEarthResist: 0
baseFireResist: 0
baseHealBoost: 0
baseHealSkillBoost: 0
baseHp: 6688
baseKno: 90
baseMagCriticalDamageReduce: 0
baseMagCriticalReduceRate: 0
baseMagicCriticalDefend: 0
baseMagicCriticalResist: 0
baseMagicResist: 1384
baseMagicalAccuracy: ""
baseMagicalAttack: 0
baseMagicalBoost: 0
baseMagicalCriticalLeft: 50
baseMagicalCriticalRight: 50
baseMagicalSpeed: 1
baseMoveSpeed: 6
baseMp: 4318
baseParry: 1847
basePhyCriticalDamageReduce: 0
basePhyCriticalReduceRate: 190
basePhysicalDefend: 1162
basePhysicalLeft: 255
basePhysicalRight: 234
baseStr: 110
baseVit: 100
baseWaterResist: 0
baseWill: ""
block: 837
castingTimeRatio: 0.98
criticalDefend: 0
criticalLeft: 602
criticalResist: 0
criticalRight: ""
dex: 110
dodge: 2272
dp: 4000
earthResist: ""
fireResist: 0
healBoost: 0
healSkillBoost: 0
hp: 11210
kno: 90
magCriticalDamageReduce: 0
magCriticalReduceRate: 38
magicCriticalDefend: 0
magicCriticalResist: 0
magicResist: 1725
magicalAccuracy: 1201
magicalAttack: 0
magicalBoost: 0
magicalCriticalLeft: 50
magicalCriticalRight: 50
magicalSpeed: "1.0"
moveSpeed: 7.56
mp: 4618
parry: ""
phyCriticalDamageReduce: 201
phyCriticalReduceRate: 392
physicalDefend: 1402
physicalLeft: 658
physicalRight: 658
str: 110
vit: 100
waterResist: 0
will: 0

示例代码:

System.Net.WebClient wc = new System.Net.WebClient();
byte[] data = wc.UploadValues(
    "http://psykopats.net/loadAion.php",
    new System.Collections.Specialized.NameValueCollection(){
        {"server", "66"},
        {"type", "1"},
        {"player", "299345"}});
string json = System.Text.Encoding.ASCII.GetString(data);

这篇关于HtmlAgilityPack无法从网页获取所有html代码/文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆