如何使用asp.net从网站上提取文档文件。 [英] How to pull document files from website using asp.net.
本文介绍了如何使用asp.net从网站上提取文档文件。的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我尝试过:
使用System;
使用System.Net;
使用System.IO;
公共部分类Y:System.Web.UI.Page
{
protected void Page_Load(object sender,EventArgs e)
{
Text1。 Text = GetWebSiteContents(http://localhost/WebSite1/X.aspx);
}
protected string GetWebSiteContents(string url){
WebRequest req = WebRequest.Create(url);
//从返回的Web响应中获取流
StreamReader sr = new StreamReader(req.GetResponse()。GetResponseStream());
System.Text.StringBuilder sb = new System.Text.StringBuilder();
string strLine;
//一次读取一行,并将每一行放入stringbuilder
while((strLine = sr.ReadLine())!= null){
//忽略空白行
if(strLine.Length> 0)sb.Append(strLine);
}
sr.Close();
返回sb.ToString();
}
}
我在这段代码中得到了文字。
解决方案
班级 WebRequest 允许你获得网站的html(源代码)。
html,你可以阅读你的带有 HtmlAgilityPack 库的标签。
链接: Html Agility Pack | HAP
这里,网站读取html的代码:
private HtmlAgilityPack.HtmlDocument GetWebSiteContents( string url)
{
string postString = ;
HttpWebRequest WebReq =(HttpWebRequest)WebRequest.Create( new Uri(url));
// 我们的方法是post,否则缓冲区(postvars)将无用
WebReq.Method = POST;
// 我们对postvars使用form contentType。
WebReq.ContentType = application / x-www-form-urlencoded;
WebReq.Accept = text / html,application / xhtml + xml,application / xml ; q = 0.9,* / *; q = 0.8\" 跨度>;
// 缓冲区的长度(postvars)用作contentlength。
WebReq.ContentLength = postString.Length;
// Enviar os atributos dos post
StreamWriter requestWriter = new StreamWriter(WebReq.GetRequestStream());
requestWriter.Write(postString);
requestWriter.Close();
// obter a stream de resposta do servidor e gerar um documento html
Stream stream = WebReq.GetResponse()。GetResponseStream();
HtmlAgilityPack.HtmlDocument htmldoc = new HtmlAgilityPack.HtmlDocument();
htmldoc.Load(stream);
// finalizar os objetos aberto
stream.Close( );
WebReq.GetResponse()。Close();
return htmldoc;
}
重要!如果网站有身份验证,则需要发送身份验证cookie。
....
HttpWebRequest WebReq =(HttpWebRequest)WebRequest.Create(url);
// 身份验证
WebReq.CookieContainer = new CookieContainer();
WebReq.CookieContainer.SetCookies(url,在这里输入身份验证cookie );
// 结束身份验证
.. ..
阅读html标签元素:
// inicialize variable
HtmlAgilityPack.HtmlDocument html = null 跨度>;
HtmlNode [] elems = null ;
// get html
html = 此跨度> .GetWebSiteContents(URL);
// 读取网站的特定输入标记。
elems = html.DocumentNode.Descendants( input)。 (n = > n。属性[ name ]!= null && n.Attributes [ name]。值== name_html_tag)。ToArray ();
string myvalue = elems [ 0 ]。属性[ value]。Value.ToString();
Hello Dear friends. I want to get documents from website. Making interface and putting url of any website and then get documents from that site.KindlyHelp.
What I have tried:
using System; using System.Net; using System.IO; public partial class Y : System.Web.UI.Page { protected void Page_Load(object sender, EventArgs e) { Text1.Text = GetWebSiteContents("http://localhost/WebSite1/X.aspx"); } protected string GetWebSiteContents(string url) { WebRequest req = WebRequest.Create(url); // Get the stream from the returned web response StreamReader sr = new StreamReader(req.GetResponse().GetResponseStream()); System.Text.StringBuilder sb = new System.Text.StringBuilder(); string strLine; // Read the stream a line at a time and place each one into the stringbuilder while ((strLine = sr.ReadLine()) != null) { // Ignore blank lines if (strLine.Length > 0) sb.Append(strLine); } sr.Close(); return sb.ToString(); } }
And i got text through in this code.
解决方案
Hi,
The class WebRequest allow that you get the html of the site (source code).
The html, you can read your tags with the HtmlAgilityPack library.
Link: Html Agility Pack | HAP
Here, the code for read html of the site:
private HtmlAgilityPack.HtmlDocument GetWebSiteContents(string url) { string postString = ""; HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(new Uri(url)); //Our method is post, otherwise the buffer (postvars) would be useless WebReq.Method = "POST"; //We use form contentType, for the postvars. WebReq.ContentType = "application/x-www-form-urlencoded"; WebReq.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"; //The length of the buffer (postvars) is used as contentlength. WebReq.ContentLength = postString.Length; //Enviar os atributos dos post StreamWriter requestWriter = new StreamWriter(WebReq.GetRequestStream()); requestWriter.Write(postString); requestWriter.Close(); //obter a stream de resposta do servidor e gerar um documento html Stream stream = WebReq.GetResponse().GetResponseStream(); HtmlAgilityPack.HtmlDocument htmldoc = new HtmlAgilityPack.HtmlDocument(); htmldoc.Load(stream); //finalizar os objetos aberto stream.Close(); WebReq.GetResponse().Close(); return htmldoc; }
IMPORTANT! If the site have authentication, you need send the authentication cookie.
.... HttpWebRequest WebReq = (HttpWebRequest)WebRequest.Create(url); // authentication WebReq.CookieContainer = new CookieContainer(); WebReq.CookieContainer.SetCookies(url, enter here the authentication cookie); //end authentication ....
To read the html tag elements:
//inicialize variable HtmlAgilityPack.HtmlDocument html = null; HtmlNode[] elems = null; //get html html = this.GetWebSiteContents(url); //read specific input tag of the site. elems = html.DocumentNode.Descendants("input").Where(n => n.Attributes["name"] != null && n.Attributes["name"].Value == "name_html_tag").ToArray(); string myvalue = elems[0].Attributes["value"].Value.ToString();
这篇关于如何使用asp.net从网站上提取文档文件。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文