HtmlAgilityPack查询没有返回值 [英] HtmlAgilityPack query returning no value

查看:65
本文介绍了HtmlAgilityPack查询没有返回值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

挣扎了2天. 我在.NET 4.5 winforms项目中使用C#和HtmlAgilityPack从网站提取数据(我要提取的字段是$ flow和B/S ratio). 我到达该字段(流量:/n/t/t/t;而不是流量245 M),但我没有任何价值. 我不知道为什么查询网页时没有价值.想看看是否有人发现节点的原因=我的查询结果为空. 这是查询的网页的网址:

Been struggling for 2 days. I'm using C# and HtmlAgilityPack within a .NET 4.5 winforms project to extract data from a website (the field I want to extract is $ flow and B/S ratio). I get to the field (flow : /n/t/t/t; instead of flow 245 M) but I have no value. I have no idea why I get no value when I query while I see the value in the web page. Would like to see if someone else finds the reasons of nodes =null result of my query. This is the url of athe queried web page : http://finance.avafin.com/tradeFlow?type=BS_RATIO&date=06%2F14%2F2013&alertId=0&symbol=spy&sectorId=0&industryId=0

我使用上面的网址作为查询.

I use the url above as a query.

请注意,我使用了以下方法,但在另一个网页上使用了不同的查询,并且可以正常工作,有某事不适用于当前查询,或者我怀疑当前网页的字段已被混淆.

Notice that I used the below method but with a different query on another webpage and it worked, there is somethig that does not work with current query or I suspect an obfuscation of the field for this current web page.

使用的方法:

     /// <summary>
        ///     Gets the data.
        /// </summary>
        /// <param name="url"> The URL. </param>
        /// <returns> </returns>
        public List<string> GetFlowData(string url)
        {
            // ('//a[contains(@href, "genre")]')
            // <td class=" sorting_1">137.27B</td>
            //*[@id="tf_data"]/tbody/tr[1]/td[8] // this is the xpath as seen in navigator for first value => I get no value when used as a query  => (nodes = null)
            //*[@id="tf_data"]/tbody/tr[1]/td[9] //  this is the xpath as seen in navigator for second value => I get no value when used as a query => (nodes = null)

// //td[@class=''] => nodes null too


            // I see the b/s ratio node in body but no value /n/ttt instead using [@id='tf_data']/tbody
            var nodes = LoadHtmlDoc(url, "//*[@id='tf_data']/tbody");
            List<string> tickers = new List<string>();
            if (nodes == null)
            {
                return new List<string> { "Ticker not available" };
            }
            int i = 0;
            foreach (var v in nodes)
            {
                i++;

                    MessageBox.Show(v.InnerText + " " + i.ToString());
                //// The placement of the data containing bought/sold ratio
                //if (i == 7)
                //{
                //    tickers.Add(v.InnerText);
                //}
                //// The placement of the data containing $ Flow
                //if (i == 8)
                //{
                //    tickers.Add(CleanFlowData(v.InnerText));
                //}
            }

            return tickers;
        }

推荐答案

您正在查询的页面在ID为th_data的表中不包含任何数据.如果您要检查页面标记,则会看到:

Page you are querying does not contain any data in table with id th_data. If you will examine page markup, you'll see:

<table cellpadding="0" cellspacing="0" border="0" class="display" id="tf_data">
    <thead>
        <tr height="10">
            <th align="center"></th>
            <th align="center" width="90">CHART</th>
            <th align="left" width="70">SYMBOL</th>
            <th align="left">MARKET CAP</th>
            <th align="right" width="65">PRICE</th>
            <th align="center" width="80">CHANGE</th>
            <th align="right">VOL</th>
            <th align="right">B/S RATIO</th>
            <th align="right" width="80">NET CASH FLOW</th>
        </tr>
    </thead>
    <tbody> <-- empty!
    </tbody>
</table>

在加载文档后,所有数据都将通过Java脚本由浏览器添加到此表中(请参见$(document).ready函数).因此,如果您要从该URL获取html,则在浏览器运行Java Script代码之前将没有数据. IE.您无法解析任何内容.

All data are added to this table by browser via Java Script after document is loaded (see $(document).ready function). So if you are getting html from that url, there will be no data until browser will run Java Script code. I.e. there is nothing you can parse.

我建议您检查将JSON数据加载到页面中的脚本,并简单地从代码中调用相同的服务.

I suggest you to examine script which loads JSON data into page, and simply call same service from your code.

这超出了范围,但是要检索数据,可以使用System.Net.Http程序集中的HttpClient类.这是用法示例(由您决定分析查询字符串的组成方式):

Its out of question scope, but for retrieving data you can use HttpClient class from System.Net.Http assembly. Here is sample of usage (its up to you to analyze how query string should be composed):

HttpClient client = new HttpClient();
client.BaseAddress = new Uri("http://finance.avafin.com");
string url = "data?sEcho=2&iColumns=9&sColumns=&iDisplayStart=0&iDisplayLength=20&mDataProp_0=0&mDataProp_1=1&mDataProp_2=2&mDataProp_3=3&mDataProp_4=4&mDataProp_5=5&mDataProp_6=6&mDataProp_7=7&mDataProp_8=8&sSearch=&bRegex=false&sSearch_0=&bRegex_0=false&bSearchable_0=true&sSearch_1=&bRegex_1=false&bSearchable_1=true&sSearch_2=&bRegex_2=false&bSearchable_2=true&sSearch_3=&bRegex_3=false&bSearchable_3=true&sSearch_4=&bRegex_4=false&bSearchable_4=true&sSearch_5=&bRegex_5=false&bSearchable_5=true&sSearch_6=&bRegex_6=false&bSearchable_6=true&sSearch_7=&bRegex_7=false&bSearchable_7=true&sSearch_8=&bRegex_8=false&bSearchable_8=true&iSortCol_0=4&sSortDir_0=asc&iSortingCols=1&bSortable_0=true&bSortable_1=true&bSortable_2=true&bSortable_3=true&bSortable_4=true&bSortable_5=true&bSortable_6=true&bSortable_7=true&bSortable_8=true&type=BS_RATIO&date=06%2F14%2F2013&categoryName=&alertId=0&alertId2=&industryId=0&sectorId=0&symbol=spy&recom=&period=&perfPercent=";
var response = client.GetStringAsync(url).Result;

响应将包含您可以解析的html.

Response will contain html which you can parse.

这篇关于HtmlAgilityPack查询没有返回值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆