HtmlAgilityPack获取标题和元 [英] HtmlAgilityPack obtain Title and meta

查看:70
本文介绍了HtmlAgilityPack获取标题和元的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试练习"HtmlAgilityPack",但是与此有关的我遇到了一些问题.这是我编写的代码,但是我无法正确获取网页的标题和说明... 如果有人能使我理解我的错误:)

I try to practice "HtmlAgilityPack ", but I am having some issues regarding this. here's what I coded, but I can not get correctly the title and the description of a web page ... If someone can enlighten me on my mistake :)

...
public static void Main(string[] args)
    {
        string link = null;
        string str;
        string answer;

        int curloc; // holds current location in response 
        string url = "http://stackoverflow.com/";

        try
        {

            do
            {
                HttpWebRequest HttpWReq = (HttpWebRequest)WebRequest.Create(url);
                HttpWReq.UserAgent = @"Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5";
                HttpWebResponse HttpWResp = (HttpWebResponse)HttpWReq.GetResponse();
                //url = null; // disallow further use of this URI 
                Stream istrm = HttpWResp.GetResponseStream();
                // Wrap the input stream in a StreamReader. 
                StreamReader rdr = new StreamReader(istrm);

                // Read in the entire page. 
                str = rdr.ReadToEnd();

                curloc = 0;
                //WebPage result;
                do
                {
                    // Find the next URI to link to. 
                    link = FindLink(str, ref curloc); //return the good link
                    Console.WriteLine("Title found: " + curloc);
                    //title = Title(str, ref curloc);

                    if (link != null)
                    {
                        Console.WriteLine("Link found: " + link);
                        using (System.Net.WebClient client = new System.Net.WebClient())
                        {
                            HtmlDocument htmlDoc = new HtmlDocument();
                            var html = client.DownloadString(url);
                            htmlDoc.LoadHtml(link); //chargement de HTMLAgilityPack
                            var htmlElement = htmlDoc.DocumentNode.Element("html");

                            HtmlNode node = htmlDoc.DocumentNode.SelectSingleNode("//meta[@name='description']");
                            if (node != null)
                            {
                                string desc = node.GetAttributeValue("content", "");
                                Console.Write("DESCRIPTION: " + desc);
                            }
                            else
                            {
                                Console.WriteLine("No description");
                            }

                            var titleElement =
                                                htmlDoc.DocumentNode
                                                   .Element("html")
                                                   .Element("head")
                                                   .Element("title");
                            if (titleElement != null)
                            {
                                string title = titleElement.InnerText;
                                Console.WriteLine("Titre: {0}", title);
                            }
                            else
                            {
                                Console.WriteLine("no Title");
                            }
                            Console.Write("Done");
                        }
                        Console.Write("Link, More, Quit?");
                        answer = Console.ReadLine();
                    }
                    else
                    {
                        Console.WriteLine("No link found.");
                        break;
                    }
                } while (link.Length > 0);

                // Close the Response.
                HttpWResp.Close();
            } while (url != null); 
        }
catch{ ...}

先谢谢您了:)

推荐答案

以这种方式进行操作:

HtmlNode mdnode = htmlDoc.DocumentNode.SelectSingleNode("//meta[@name='description']");

              if (mdnode != null)
              {
                 HtmlAttribute desc;

                 desc = mdnode.Attributes["content"];
                 string fulldescription = desc.Value;
                 Console.Write("DESCRIPTION: " + fulldescription);
              }

这篇关于HtmlAgilityPack获取标题和元的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆