Htmlagilitypack doc.loadhtml无法获取整个HTML字符串 [英] Htmlagilitypack doc.loadhtml can't get whole HTML string

查看：99 发布时间：2019/6/11 13:36:56 C#

本文介绍了Htmlagilitypack doc.loadhtml无法获取整个HTML字符串的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，

我正在尝试解析下面这个页面

我试图解析的网页 [ ^ ]

当我使用webrequest下载html字符串时，它没有完整的html字符串

所以我无法解析页面的内容部分

有人能帮助我吗？

Hello,
I'm trying to parse this page below
The webpage I'm trying to parse[^]
When I download html string using webrequest, it doesn't have whole html strings
so I can't parse the contents part of the page
Can anybody help me?

private void get_cotents(string contents_url)
        {
            string title = "";
            string contents = "";

            WebClient client = new WebClient();
            string sourceUrl = client.DownloadString(contents_url);
            HtmlAgilityPack.HtmlDocument mydoc = new HtmlAgilityPack.HtmlDocument();
            mydoc.LoadHtml(sourceUrl);

            string str =  mydoc.DocumentNode.InnerHtml;


            if (mydoc.DocumentNode != null)
            {
                var titleHeadline =               mydoc.DocumentNode.SelectSingleNode("//[@id='writeContents']");
     title = titleHeadline.InnerText;
             
             contents="I can't find the html code that has content";
             }
}

我的尝试：

我试过使用webclient获取html字符串和htmlweb

What I have tried:

I have tried getting html string using webclient and htmlweb

推荐答案

我认为你的问题在于获取数据流，这里是一个改编自CodeProject文章的例子：

I think your problem lies in getting the datastream, here is an example adapted from a CodeProject article:

/// <summary>
/// http://www.codeproject.com/Articles/18034/HttpWebRequest-Response-in-a-Nutshell-Part
/// </summary>
/// <param name="contents_url">The URL string.</param>
private static void get_cotents(string contents_url)
{
    byte[] buffer = new byte[1024];
    HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(contents_url);
    WebReq.Method = "POST";
    WebReq.ContentType = "application/x-www-form-urlencoded";
    WebReq.ContentLength = buffer.Length;
    Stream PostData = WebReq.GetRequestStream();
    //Now we write, and afterwards, we close. Closing is always important!
    PostData.Write(buffer, 0, buffer.Length);
    PostData.Close();
    //Get the response handle, we have no true response yet!
    HttpWebResponse WebResp = (HttpWebResponse)WebReq.GetResponse();

    //Let's show some information about the response
    Console.WriteLine(WebResp.StatusCode);
    Console.WriteLine(WebResp.Server);

    //Now, we read the response (the string), and output it.
    Stream datastream = WebResp.GetResponseStream();
    StreamReader answer = new StreamReader(datastream);
    Console.WriteLine(answer.ReadToEnd());
    datastream.Close();
    answer.Close();
}

我认为你可以自己完成剩下的代码......

I think you can finish the rest of the code yourself ...

问题是搜索内容div id ...

好像网站隐藏了内容区域ID。

我刚用xpath解决了这个问题，如下所示

HtmlNode node = mydoc.DocumentNode.SelectSingleNode（// @ id [。='sub_wkb_layout']）;

谢谢大家和codeproject

我喜欢这个网站：）

The problem was searching content div id...
It seems like the website hides the content area id.
I just solved this problem using xpath like this below

HtmlNode node = mydoc.DocumentNode.SelectSingleNode("//@id[.='sub_wkb_layout']");

Thank you guys and codeproject
I love this site :)

这篇关于Htmlagilitypack doc.loadhtml无法获取整个HTML字符串的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Htmlagilitypack doc.loadhtml无法获取整个HTML字符串 [英] Htmlagilitypack doc.loadhtml can't get whole HTML string

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

Htmlagilitypack doc.loadhtml无法获取整个HTML字符串 [英] Htmlagilitypack doc.loadhtml can&#39;t get whole HTML string

问题描述

推荐答案

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

Htmlagilitypack doc.loadhtml无法获取整个HTML字符串 [英] Htmlagilitypack doc.loadhtml can't get whole HTML string

登录关闭