如何使用asp.net从网站上提取文档文件。 [英] How to pull document files from website using asp.net.

查看:81
本文介绍了如何使用asp.net从网站上提取文档文件。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

亲爱的朋友们,亲爱的。我想从网站上获取文件。制作界面并放置任何网站的网址,然后从该网站获取文件。亲切的帮助。



我尝试过:



使用System; 
使用System.Net;
使用System.IO;

公共部分类Y:System.Web.UI.Page
{
protected void Page_Load(object sender,EventArgs e)
{
Text1。 Text = GetWebSiteContents(http://localhost/WebSite1/X.aspx);
}

protected string GetWebSiteContents(string url){
WebRequest req = WebRequest.Create(url);
//从返回的Web响应中获取流
StreamReader sr = new StreamReader(req.GetResponse()。GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
//一次读取一行,并将每一行放入stringbuilder
while((strLine = sr.ReadLine())!= null){
//忽略空白行
if(strLine.Length> 0)sb.Append(strLine);
}
sr.Close();
返回sb.ToString();
}
}

我在这段代码中得到了文字。

解决方案





班级 WebRequest 允许你获得网站的html(源代码)。

html,你可以阅读你的带有 HtmlAgilityPack 库的标签。

链接: Html Agility Pack | HAP



这里,网站读取html的代码:

  private  HtmlAgilityPack.HtmlDocument GetWebSiteContents( string  url)
{
string postString = ;

HttpWebRequest WebReq =(HttpWebRequest)WebRequest.Create( new Uri(url));

// 我们的方法是post,否则缓冲区(postvars)将无用
WebReq.Method = POST;
// 我们对postvars使用form contentType。
WebReq.ContentType = application / x-www-form-urlencoded;

WebReq.Accept = text / html,application / xhtml + xml,application / xml ; q = 0.9,* / *; q = 0.8\" ;

// 缓冲区的长度(postvars)用作contentlength。
WebReq.ContentLength = postString.Length;

// Enviar os atributos dos post
StreamWriter requestWriter = new StreamWriter(WebReq.GetRequestStream());
requestWriter.Write(postString);
requestWriter.Close();

// obter a stream de resposta do servidor e gerar um documento html
Stream stream = WebReq.GetResponse()。GetResponseStream();
HtmlAgilityPack.HtmlDocument htmldoc = new HtmlAgilityPack.HtmlDocument();
htmldoc.Load(stream);

// finalizar os objetos aberto
stream.Close( );
WebReq.GetResponse()。Close();

return htmldoc;

}





重要!如果网站有身份验证,则需要发送身份验证cookie。

 .... 

HttpWebRequest WebReq =(HttpWebRequest)WebRequest.Create(url);

// 身份验证
WebReq.CookieContainer = new CookieContainer();
WebReq.CookieContainer.SetCookies(url,在这里输入身份验证cookie );

// 结束身份验证

.. ..





阅读html标签元素:

  //   inicialize variable  
HtmlAgilityPack.HtmlDocument html = null ;
HtmlNode [] elems = null ;

// get html
html = .GetWebSiteContents(URL);

// 读取网站的特定输入标记。
elems = html.DocumentNode.Descendants( input)。 (n = > n。属性[ name ]!= null && n.Attributes [ name]。值== name_html_tag)。ToArray ();
string myvalue = elems [ 0 ]。属性[ value]。Value.ToString();


Hello Dear friends. I want to get documents from website. Making interface and putting url of any website and then get documents from that site.KindlyHelp.

What I have tried:

using System;
using System.Net;
using System.IO;

public partial class Y : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        Text1.Text = GetWebSiteContents("http://localhost/WebSite1/X.aspx");
    }

    protected string GetWebSiteContents(string url) {
        WebRequest req = WebRequest.Create(url);
        // Get the stream from the returned web response
        StreamReader sr = new StreamReader(req.GetResponse().GetResponseStream());
        System.Text.StringBuilder sb = new System.Text.StringBuilder();
        string strLine;
        // Read the stream a line at a time and place each one into the stringbuilder
        while ((strLine = sr.ReadLine()) != null) {
            // Ignore blank lines
            if (strLine.Length > 0) sb.Append(strLine);
        }
        sr.Close();
        return sb.ToString();
    }
}

And i got text through in this code.

解决方案

Hi,

The class WebRequest allow that you get the html of the site (source code).
The html, you can read your tags with the HtmlAgilityPack library.
Link: Html Agility Pack | HAP

Here, the code for read html of the site:

private HtmlAgilityPack.HtmlDocument GetWebSiteContents(string url)
        {
            string postString = "";

            HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(new Uri(url));

            //Our method is post, otherwise the buffer (postvars) would be useless
            WebReq.Method = "POST";
            //We use form contentType, for the postvars.
            WebReq.ContentType = "application/x-www-form-urlencoded";

            WebReq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8";

            //The length of the buffer (postvars) is used as contentlength.
            WebReq.ContentLength = postString.Length;

            //Enviar os atributos dos post 
            StreamWriter requestWriter = new StreamWriter(WebReq.GetRequestStream());
            requestWriter.Write(postString);
            requestWriter.Close();

            //obter a stream de resposta do servidor e gerar um documento html
            Stream stream = WebReq.GetResponse().GetResponseStream();
            HtmlAgilityPack.HtmlDocument htmldoc = new HtmlAgilityPack.HtmlDocument();
            htmldoc.Load(stream);
            
            //finalizar os objetos aberto
            stream.Close(); 
            WebReq.GetResponse().Close();

            return htmldoc;
            
        }



IMPORTANT! If the site have authentication, you need send the authentication cookie.

....

HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(url);

//  authentication
WebReq.CookieContainer = new CookieContainer();
WebReq.CookieContainer.SetCookies(url, enter here the authentication cookie);

//end authentication

....



To read the html tag elements:

//inicialize variable
HtmlAgilityPack.HtmlDocument html = null;
HtmlNode[] elems = null;

//get html
html = this.GetWebSiteContents(url);

//read specific input tag of the site.  
elems = html.DocumentNode.Descendants("input").Where(n => n.Attributes["name"] != null && n.Attributes["name"].Value == "name_html_tag").ToArray();
string myvalue = elems[0].Attributes["value"].Value.ToString();


这篇关于如何使用asp.net从网站上提取文档文件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆