解析表,电池使用HTML敏捷性在C# [英] Parsing tables, cells with Html agility in C#

查看:167
本文介绍了解析表,电池使用HTML敏捷性在C#的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析HTML代码。更具体地,分析所有表中的每一行中的每个单元。每一行代表一个单独的对象和每个小区表示不同的属性。欲分析这些能够写带内的每个数据的XML文件(没有无用的HTML代码)。我已成功能够从HTML文件解析每个列,但现在我不知道我的选择是写这到一个XML文件。我感到莫名其妙



HTML:

 < TR>< ; TR> 
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFF>
1
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFalign =left>
< A HREF =?/冰/ player.htm ID = 8471675>悉尼克罗斯比< / A>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =中心>
PIT
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =中心>
C
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
39
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
32
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
33
< / TD>
< TD类=statBox分类的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#E0E0E0ALIGN =右>
将;字体颜色=#000000>
65
< / FONT>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
20
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
29
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
10
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
1
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
3
< / TD>
< TD类=statBox的风格=边界宽度:0像素0像素1px的0像素;背景颜色:#FFFFFFALIGN =右>
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
0
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
154
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
20.8
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
21:54
< / TD>
< TD类=statBox的风格=边界宽度:0像素1px的1px的0像素;背景颜色:#FFFFFFALIGN =右>
22.6
< / TD>
< TD类=statBox的风格=边界宽度:0像素0像素1px的0像素;背景颜色:#FFFFFFALIGN =右>
55.7
< / TD>
< / TR>< / TR>



C#:

 使用HtmlAgilityPack; 

命名空间统计
{
类StatsParser
{
私人字符串htmlCode;
私有静态字符串文件名=[+ DateTime.Now.ToShortDateString()+NHL统计资料]的.xml

公共StatsParser(字符串htmlCode)
{
this.htmlCode = htmlCode;
this.ParseHtml();
}

公共无效ParseHtml()
{
的HTMLDocument DOC =新的HTMLDocument();
doc.LoadHtml(htmlCode);


{
//获取所有表中的文件
HtmlNodeCollection表= doc.DocumentNode.SelectNodes(//表);

//迭代中的第一个表
HtmlNodeCollection行的所有行=表[0] .SelectNodes(.// TR);
的for(int i = 0; I< rows.Count ++ I)
{

//迭代在此行$ B $所有列b HtmlNodeCollection COLS =行[I] .SelectNodes(.// TD [@类='statBox']);
为(INT J = 0; J< cols.Count ++ j)条
{

//获取列的值,并打印
字符串值= COLS [J] .InnerText;
如果(价值=!)
System.Windows.MessageBox.Show(值);
}
}
}
赶上(的NullReferenceException)
{
System.Windows.Forms.MessageBox.Show(异常!);
}
}



XML:

 <?XML版本=1.0编码=UTF-8>?; 

<统计日期=2011-01-01>
将;玩家评级=1>
<名称>悉尼克罗斯比< /名称>
<团队及GT;凹窝LT; /团队及GT;
<地位与GT; C< /位置>
< GamesPlayed> 39 LT; / GamesPlayed>
<目标> 32 LT; /目标>
<协助> 33 LT; /助攻>
< /播放器>
< /统计>


解决方案

东张西望MSDN之后,我终于找到了一个实现方案我的问题:

 使用系统;使用HtmlAgilityPack 
;
使用的System.Xml;

命名空间HockeyStats
{
类StatsParser
{
私人字符串htmlCode;
私有静态字符串文件名=[+ DateTime.Now.ToShortDateString()+NHL统计资料]的.xml

公共StatsParser(字符串htmlCode)
{
this.htmlCode = htmlCode;

this.ParseHtml();
}

公共无效ParseHtml()
{

的HTMLDocument DOC =新的HTMLDocument();
doc.LoadHtml(htmlCode);
XmlWriter的作家=无效;


{
//创建一个XmlWriterSettings带有正确选项的对象。
XmlWriterSettings设置=新XmlWriterSettings();
settings.Indent = TRUE;
settings.IndentChars =();
settings.OmitXmlDeclaration = FALSE;

//创建XmlWriter对象,写一些内容。
作家= XmlWriter.Create(@.. \..\+文件名,设置);
writer.WriteStartElement(统计);
writer.WriteAttributeString(日期,DateTime.Now.ToShortDateString());

//迭代另一行
HtmlNodeCollection行= doc.DocumentNode.SelectNodes内的所有行(.// TR / TR);
的for(int i = 0; I< rows.Count ++ I)
{
//迭代在此行$ B $所有列b HtmlNodeCollection COLS =行[I] .SelectNodes(.// TD [@类='statBox']);
为(INT J = 0; J< 20; ++ j)条
{
开关(J)
{
的情况下0:
{
writer.WriteStartElement(玩家);
writer.WriteAttributeString(等级,COLS [J] .InnerText.Trim());打破;
}
案例1:writer.WriteElementString(姓名,COLS [J] .InnerText.Trim());打破;
案例2:writer.WriteElementString(团队,COLS [J] .InnerText.Trim());打破;
案例3:writer.WriteElementString(POS,COLS [J] .InnerText.Trim());打破;
壳体4:writer.WriteElementString(GPCOLS [j]的.InnerText.Trim());打破;
壳体5:writer.WriteElementString(G,COLS [j]的.InnerText.Trim());打破;
案例6:writer.WriteElementString(A,COLS [J] .InnerText.Trim());打破;
案例7:writer.WriteElementString(PlusMinus,COLS [J] .InnerText.Trim());打破;
案例8:writer.WriteElementString(PIM,COLS [J] .InnerText);打破;
案例9:writer.WriteElementString(PP,COLS [J] .InnerText);打破;
案例10:writer.WriteElementString(SH,COLS [J] .InnerText);打破;
案例11:writer.WriteElementString(GW,COLS [J] .InnerText);打破;
案例12:writer.WriteElementString(OT,COLS [J] .InnerText);打破;
案例13:writer.WriteElementString(镜头,COLS [J] .InnerText);打破;
案例14:writer.WriteElementString(ShotPctg,COLS [J] .InnerText);打破;
案例15:writer.WriteElementString(TOIPerGame,COLS [J] .InnerText);打破;
案例16:writer.WriteElementString(ShiftsPerGame,COLS [J] .InnerText);打破;
案例17:writer.WriteElementString(FOWinPctg,COLS [J] .InnerText);打破;

}
}
}
writer.WriteEndElement();
}
writer.WriteEndElement();
writer.Flush();
}
终于
{
如果
writer.Close()(作家!= NULL);
}
}
}
}



这给出了下面的XML文件作为输出:

 < XML版本=1.0编码=UTF-8? > 
<统计日期=2011-01-01>
将;玩家评级=1>
<名称>悉尼克罗斯比< /名称>
<团队及GT;凹窝LT; /团队及GT;
<平面> C< / POS>
<&GP GT; 39 LT; / GP>
< G> 32 LT; / G>
< A> 33 LT; / A>
将; PlusMinus> 20℃/ PlusMinus>
<&PIM GT; 29 LT; / PIM>
< PP> 10< / PP>
将; SH大于1&下; / SH>
< GW>第3版; / GW>
<射击和GT; 0℃; /射击和GT;
< ShotPctg> 154 LT; / ShotPctg>
< TOIPerGame> 20.8< / TOIPerGame>
< ShiftsPerGame> 21:54< / ShiftsPerGame>
< FOWinPctg> 22.6< / FOWinPctg>
< /播放器>
< /统计>


I need to parse Html code. More specifically, parse each cell of every rows in all tables. Each row represent a single object and each cell represent different properties. I want to parse these to be able to write an XML file with every data inside (without the useless HTML code). I have successfully been able to parse each column from the HTML file but now I don't know what my options are for writing this to an XML file. I am baffled.

HTML:

<tr><tr> 
<td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF"> 
    1
</td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="left"> 
        <a href="/ice/player.htm?id=8471675">Sidney Crosby</a> 
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> 
        PIT
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="center"> 
        C
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        39
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        32
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        33
    </td> 
    <td class="statBox sorted" style="border-width:0px 1px 1px 0px; background-color: #E0E0E0" align="right"> 
        <font color="#000000"> 
            65
        </font> 
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        20
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        29
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        10
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        1
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        3
    </td> 
    <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> 
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        0
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        154
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        20.8
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        21:54
    </td> 
    <td class="statBox" style="border-width:0px 1px 1px 0px; background-color: #FFFFFF" align="right"> 
        22.6
    </td> 
    <td class="statBox" style="border-width:0px 0px 1px 0px; background-color: #FFFFFF" align="right"> 
        55.7
    </td> 
</tr></tr>

C#:

using HtmlAgilityPack;

namespace Stats
{
    class StatsParser
    {
        private string htmlCode;
        private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml";

        public StatsParser(string htmlCode)
        {
            this.htmlCode = htmlCode;
            this.ParseHtml();
        }

        public void ParseHtml()
    {
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(htmlCode);

        try
        {
            // Get all tables in the document
            HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");

            // Iterate all rows in the first table
            HtmlNodeCollection rows = tables[0].SelectNodes(".//tr");
            for (int i = 0; i < rows.Count; ++i)
            {

                // Iterate all columns in this row
                HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='statBox']");
                for (int j = 0; j < cols.Count; ++j)
                {

                    // Get the value of the column and print it
                    string value = cols[j].InnerText;
                    if (value!="")
                        System.Windows.MessageBox.Show(value);
                }
            }
        }
        catch (NullReferenceException)
        {
            System.Windows.Forms.MessageBox.Show("Exception!!");
        }
    }

XML:

<?xml version="1.0" encoding="utf-8" ?>

<Stats Date="2011-01-01">
  <Player Rank="1">
    <Name>Sidney Crosby</Name>
    <Team>PIT</Team>
    <Position>C</Position>
    <GamesPlayed>39</GamesPlayed>
    <Goals>32</Goals>
    <Assists>33</Assists>
  </Player>
</Stats>

解决方案

After looking around MSDN, I finally found an implementation solution to my problem:

    using System;
    using HtmlAgilityPack;
    using System.Xml;

    namespace HockeyStats
    {
        class StatsParser
        {
            private string htmlCode;
            private static string fileName = "[" + DateTime.Now.ToShortDateString() + " NHL Stats].xml";

            public StatsParser(string htmlCode)
            {
                this.htmlCode = htmlCode;

                this.ParseHtml();
            }

            public void ParseHtml()
            {

                HtmlDocument doc = new HtmlDocument();
                doc.LoadHtml(htmlCode);
                XmlWriter writer = null;

                try
                {
                    // Create an XmlWriterSettings object with the correct options. 
                    XmlWriterSettings settings = new XmlWriterSettings();
                    settings.Indent = true;
                    settings.IndentChars = ("  ");
                    settings.OmitXmlDeclaration = false;

                    // Create the XmlWriter object and write some content.
                    writer = XmlWriter.Create(@"..\..\"+fileName, settings);
                    writer.WriteStartElement("Stats");
                    writer.WriteAttributeString("Date", DateTime.Now.ToShortDateString());

                // Iterate all rows within another row
                HtmlNodeCollection rows = doc.DocumentNode.SelectNodes(".//tr/tr");
                for (int i = 0; i < rows.Count; ++i)
                {
                    // Iterate all columns in this row
                    HtmlNodeCollection cols = rows[i].SelectNodes(".//td[@class='statBox']");
                    for (int j = 0; j < 20; ++j)
                    {
                                switch (j)
                                {
                                    case 0:
                                        {
                                            writer.WriteStartElement("Player");
                                            writer.WriteAttributeString("Rank", cols[j].InnerText.Trim()); break;
                                        }
                                    case 1: writer.WriteElementString("Name", cols[j].InnerText.Trim()); break;
                                    case 2: writer.WriteElementString("Team", cols[j].InnerText.Trim()); break;
                                    case 3: writer.WriteElementString("Pos", cols[j].InnerText.Trim()); break;
                                    case 4: writer.WriteElementString("GP", cols[j].InnerText.Trim()); break;
                                    case 5: writer.WriteElementString("G", cols[j].InnerText.Trim()); break;
                                    case 6: writer.WriteElementString("A", cols[j].InnerText.Trim()); break;
                                    case 7: writer.WriteElementString("PlusMinus", cols[j].InnerText.Trim()); break;
                                    case 8: writer.WriteElementString("PIM", cols[j].InnerText); break;
                                    case 9: writer.WriteElementString("PP", cols[j].InnerText); break;
                                    case 10: writer.WriteElementString("SH", cols[j].InnerText); break;
                                    case 11: writer.WriteElementString("GW", cols[j].InnerText); break;
                                    case 12: writer.WriteElementString("OT", cols[j].InnerText); break;
                                    case 13: writer.WriteElementString("Shots", cols[j].InnerText); break;
                                    case 14: writer.WriteElementString("ShotPctg", cols[j].InnerText); break;
                                    case 15: writer.WriteElementString("TOIPerGame", cols[j].InnerText); break;
                                    case 16: writer.WriteElementString("ShiftsPerGame", cols[j].InnerText); break;
                                    case 17: writer.WriteElementString("FOWinPctg", cols[j].InnerText); break;

                                }
                            }
                        }
                        writer.WriteEndElement();
                    }
                    writer.WriteEndElement();
                    writer.Flush();
                }
                finally
                {
                    if (writer != null)
                        writer.Close();
                }
            }
        }
    }

which gives the following XML file as an output:

<?xml version="1.0" encoding="utf-8" ?> 
<Stats Date="2011-01-01">
 <Player Rank="1">
  <Name>Sidney Crosby</Name> 
  <Team>PIT</Team> 
  <Pos>C</Pos> 
  <GP>39</GP> 
  <G>32</G> 
  <A>33</A> 
  <PlusMinus>20</PlusMinus> 
  <PIM>29</PIM> 
  <PP>10</PP> 
  <SH>1</SH> 
  <GW>3</GW> 
  <Shots>0</Shots> 
  <ShotPctg>154</ShotPctg> 
  <TOIPerGame>20.8</TOIPerGame> 
  <ShiftsPerGame>21:54</ShiftsPerGame> 
  <FOWinPctg>22.6</FOWinPctg> 
 </Player>
</Stats>

这篇关于解析表,电池使用HTML敏捷性在C#的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆