从HTML解析财务信息 [英] Parsing Financial information from HTML

查看:308
本文介绍了从HTML解析财务信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先尝试学习在Visual Studio和C#中使用HTML。我正在使用 html agility pack 图书库。

First attempt at learning to work with HTML in Visual Studio and C#. I am using html agility pack library. to do the parsing.

从此页面我试图从每个季度的净收入行中提取数字。

From this page I am attempting to pull out the numbers from the "Net Income" row for each quarter.

这是我目前的进度,但我不确定如何进一步处理:

here is my current progress, (But I am uncertain of how to proceed further):

        String url = "http://www.google.com/finance?q=NASDAQ:TXN&fstype=ii"
        var webGet = new HtmlWeb();
        var document = webGet.Load(url);
        var body = document.DocumentNode.Descendants()
                            .Where(n => n.Name == "body")
                            .FirstOrDefault();

        if (body != null)
        {

        }


推荐答案

好吧,首先没有必要先得到正文,你可以直接查询文档的内容。至于找到你正在寻找的值,这是你可以做到的:

Well, first of all there's no need to get the body first, you can directly query the document for what you want. As for finding the value you're looking for, this is how you could do it:

HtmlNode tdNode = document.DocumentNode.DescendantNodes()
  .FirstOrDefault(n => n.Name == "td"
    && n.InnerText.Trim() == "Net Income");
if (tdNode != null)
{
  HtmlNode trNode = tdNode.ParentNode;
  foreach (HtmlNode node in trNode.DescendantNodes().Where(n => n.NodeType == HtmlNodeType.Element))
  {
    Console.WriteLine(node.InnerText.Trim());
    //Output:
    //Net Income
    //265.00
    //298.00
    //601.00
    //672.00
    //666.00
  }
}

另请注意 因为在一些元素的内文中有换行符。

Also note the Trim calls because there are newlines in the innertext of some elements.

这篇关于从HTML解析财务信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆