使用HtmlAgilityPack获取C#中的特定数据并将其序列化为json [英] Using HtmlAgilityPack to get specific data in C# and serialize it to json

查看:108
本文介绍了使用HtmlAgilityPack获取C#中的特定数据并将其序列化为json的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经下载了html源代码,并且试图从中获取一些数据以将其序列化为"json"文件.

I've downloaded an html source code and I'm trying to get some data out of it to serialize it to a "json" file.

这是html源文件: https://drive.google .com/file/d/0BzweTZsfeoxMTWk2LVdnYTJMRUE/view?usp = sharing

This is the html source file: https://drive.google.com/file/d/0BzweTZsfeoxMTWk2LVdnYTJMRUE/view?usp=sharing

在html代码中,有两个我"希望从中收集数据的组.

In the html code there are "2" groups that I wish to collect data from.

此刻,我设法将代码放入"2"个组中,并使用标签在两个面板中进行显示.我的代码是休假的:

At the moment I managed to get the code inside this "2" groups and display it in two panels using labels. My code is as fallows:

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using HtmlAgilityPack;

namespace Parser_Test_1._0
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {

        }

        private void button1_Click(object sender, EventArgs e)
        {
            HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
            doc.Load(@"C:...\bin\Debug\xbFrSourceCode.txt");

            string datacollected1 = doc.DocumentNode.SelectNodes("//*[@id=\"favoritesContent\"]/div[2]/div[2]/ul")[0].InnerHtml;
            string datacollected2 = doc.DocumentNode.SelectNodes("//*[@id=\"friendsContent\"]/div[2]/div[2]")[0].InnerHtml;
            label1.Text = datacollected1;
            label2.Text = datacollected2;
        }      

    }
}

我希望从这两个组中收集用户,并为每个用户收集他们各自的数据,以将其序列化为json文件.

From this two groups I wish to collect the users in them and for each user, their respective data to serialize it unto a json file.

每个用户都被<li ...></li>

对于每个希望获得的用户:

For each user I wish to get:

  • 玩家标签:data-gamertag="this is the gamertag"
  • Gamerpic:在class="gamerpicWrapper" src="this is the gamerpic"
  • 真实姓名:<div class="realName">this is the realname</div>
  • PrimaryInfo:<div class="primaryInfo">this is the primaryinfo</div>
  • isOnline:<div class="statusIcon">如果此处有代码,则在json文件中此值将为true </div>
  • Gamertag: data-gamertag="this is the gamertag"
  • Gamerpic: it's in class="gamerpicWrapper" the src="this is the gamerpic"
  • Realname: <div class="realName">this is the realname</div>
  • PrimaryInfo: <div class="primaryInfo">this is the primaryinfo</div>
  • isOnline: <div class="statusIcon"> if there is code here, then in the json file this value will be true </div>

这是所需的"json"文件格式的示例(请注意,休闲代码可能写得不好.):

This is an example of the desired "json" file format (Note that the fallowing code is probably badly written.):

{
    "favorites" : 
    [
        {
            "gamertag" : "Gamertag1",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag2",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname2",
            "primaryInfo" : "primaryinfo2",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag3",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag4",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname4",
            "primaryInfo" : "",
            "isOnline" : true,
        }

    ]
    "friends" : 
    [
        {
            "gamertag" : "Gamertag1",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag2",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname2",
            "primaryInfo" : "primaryinfo2",
            "isOnline" : false,
        },
        {
            "gamertag" : "Gamertag3",
            "gamerpic" : "gamerpicURL",
            "realname" : "realname3",
            "primaryInfo" : "",
            "isOnline" : true,
        },
        {
            "gamertag" : "Gamertag4",
            "gamerpic" : "gamerpicURL",
            "realname" : "",
            "primaryInfo" : "",
            "isOnline" : false,
        }

    ]
}

如果有人可以向我展示如何执行此操作,我将不胜感激.

I would greatly appreciate if anyone could show me how to do this.

推荐答案

以下代码显示了xpath和HAP的适当用法. xpath的用法可以简化,但是您给了我一个4k的html文件,我不想学习所有的结构.但是,代码会将所需的所有内容都作为变量.现在,放入json结构是您的工作-但是,如果您不了解JSON,则可以考虑使用XML.

The following code shows an appropriate usage of xpath and HAP. The usage of xpath can be simplified, but you gave me a 4k html files and I don't feel like learning the structure of all of it. However the code gets everything you want as variables. Now it is your job to put into a json structure - but if you don't have any knowledge of JSON then consider using XML.

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.OptionFixNestedTags = true;
        doc.Load("damn.html");

        //First off we find the nodes we want to collect data from. Note that we are only looking for a singlenode compared to your code where you find all nodes
        //this could be cut down to selectnodes where we take all <li> tages with each div tag. But for simplicity.
        HtmlNodeCollection favoritesContent = doc.DocumentNode.SelectNodes("//div[@id='favoritesContent']/div[@class='personListWrapper']/div[@class='gamerList']/ul//li");

        foreach (HtmlNode x in favoritesContent)
        {
            //here we find the gamertag which is an attribute in <li> if <li> does not have that value
            //it will then return the deault value ""(empty string as specified)
            string gamerTag = x.GetAttributeValue("data-gamertag", "");
            HtmlNode temp = x.SelectSingleNode("./a[@class='gamerpicWrapper']/*/img[@class='favorite']");
            string srcOnPic = temp.GetAttributeValue("src", "not found");
            string realName = x.SelectSingleNode("./descendant::*//div[@class='realName']").InnerText;
            string primaryInfo = x.SelectSingleNode("./descendant::*//div[@class='primaryInfo']").InnerText;

            if (0 < x.SelectSingleNode("./div[@class='statusIcon']").InnerHtml.Length)
            {
                bool online = true;

            }
        }

这篇关于使用HtmlAgilityPack获取C#中的特定数据并将其序列化为json的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆