如何从网页上读取div内容? [英] How to read div content from a web page ?

查看:97
本文介绍了如何从网页上读取div内容?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

HI专家,



我开发了一个Windows窗体应用程序。使用这个应用程序我打开一个网站。该网站用于根据用户的姓氏和出生日期搜索用户的数据。我的申请是自动打开网站。将用户的姓氏和出生日期放在相应的文本框中。点击搜索按钮并获得结果后。



问题:在网页上获得结果后,我必须从页面中读取一些数据。这个结果来自div。所以如何获得div内容。



任何帮助将不胜感激..!



我尝试过:



HI Experts,

I have developed a windows form application. Using this application I am opening a website. This website used to search user's data on the basis of user's last name and birth date. My application is automatically opening the website. placing the user's last name and birth date to its corresponding textboxes. After this clicking the search button and getting the result.

Problem: After getting the result in the webpage I have to read some data from the page. This result is coming in div. so how to get that div content.

Any help will be appreciated..!

What I have tried:

<pre>using System;
using System.Drawing;
using System.Collections;
using System.ComponentModel;
using System.Windows.Forms;
using System.Data;
using mshtml;
using System.Collections.Generic;
using System.Configuration;
using System.IO;
using System.Text;
using System.Net;

namespace mshtml_automation_demo
{
	/// <summary>
	/// Summary description for Form1.
	/// </summary>
	public class MainForm : System.Windows.Forms.Form
	{
		private AxSHDocVw.AxWebBrowser axWebBrowser1;
		private int Task = 1; // global

		/// <summary>
		/// Required designer variable.
		/// </summary>
		private System.ComponentModel.Container components = null;

		public MainForm()
		{
			//
			// Required for Windows Form Designer support
			//
			InitializeComponent();

			//
			// TODO: Add any constructor code after InitializeComponent call
			//
		}

		/// <summary>
		/// Clean up any resources being used.
		/// </summary>
		protected override void Dispose( bool disposing )
		{
			if( disposing )
			{
				if (components != null) 
				{
					components.Dispose();
				}
			}
			base.Dispose( disposing );
		}

		#region Windows Form Designer generated code
		/// <summary>
		/// Required method for Designer support - do not modify
		/// the contents of this method with the code editor.
		/// </summary>
		private void InitializeComponent()
		{
			System.Resources.ResourceManager resources = new System.Resources.ResourceManager(typeof(MainForm));
			this.axWebBrowser1 = new AxSHDocVw.AxWebBrowser();
			((System.ComponentModel.ISupportInitialize)(this.axWebBrowser1)).BeginInit();
			this.SuspendLayout();
			// 
			// axWebBrowser1
			// 
			this.axWebBrowser1.Dock = System.Windows.Forms.DockStyle.Fill;
			this.axWebBrowser1.Enabled = true;
			this.axWebBrowser1.Location = new System.Drawing.Point(0, 0);
			this.axWebBrowser1.OcxState = ((System.Windows.Forms.AxHost.State)(resources.GetObject("axWebBrowser1.OcxState")));
			this.axWebBrowser1.Size = new System.Drawing.Size(616, 382);
			this.axWebBrowser1.TabIndex = 0;
			this.axWebBrowser1.DocumentComplete += new AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEventHandler(this.axWebBrowser1_DocumentComplete);
			// 
			// MainForm
			// 
			this.AutoScaleBaseSize = new System.Drawing.Size(5, 13);
			this.ClientSize = new System.Drawing.Size(616, 382);
			this.Controls.Add(this.axWebBrowser1);
			this.Name = "MainForm";
			this.Text = "Microsoft WebBrowser Automation";
			this.Load += new System.EventHandler(this.FrmMain_Load);
			((System.ComponentModel.ISupportInitialize)(this.axWebBrowser1)).EndInit();
			this.ResumeLayout(false);

		}
		#endregion

		/// <summary>
		/// The main entry point for the application.
		/// </summary>
		[STAThread]
		static void Main() 
		{
			Application.Run(new MainForm());
		}

		private void FrmMain_Load(object sender, System.EventArgs e)
		{
            object loc = "https://www.vca.ssvv.nl/";
			object null_obj_str = "";
			System.Object null_obj = 0;
			this.axWebBrowser1.Navigate2(ref loc , ref null_obj, ref null_obj, ref null_obj_str, ref null_obj_str);
		}
        public static void OutputText(string VcaTypeExpDate)
        {
            string output = "C:\\Export\\Temp\\Logs\\";
            //string output = "C:\\Export\\Raildocs\\Logs\\";
            // string output = "F:\\Test_Cord\\Log\\";
            StreamWriter Res;
            FileStream fileStream = null;
            DirectoryInfo logDirInfo = null;
            FileInfo logFileInfo;
            var ResFilePath = output;
            ResFilePath = ResFilePath + "Result.txt";
            logFileInfo = new FileInfo(ResFilePath);
            logDirInfo = new DirectoryInfo(logFileInfo.DirectoryName);
            if (!logDirInfo.Exists) logDirInfo.Create();
            if (!logFileInfo.Exists)
            {
                fileStream = logFileInfo.Create();
            }
            else
            {
                fileStream = new FileStream(ResFilePath, FileMode.Append);
            }
            Res = new StreamWriter(fileStream);
            Res.WriteLine(VcaTypeExpDate);
            Res.Close();
        }

		private void axWebBrowser1_DocumentComplete(object sender, AxSHDocVw.DWebBrowserEvents2_DocumentCompleteEvent e)
		{
            List<string> CVList = new List<string>();
            string inputCV = ConfigurationManager.AppSettings["inputFile"];
            string outputCV = ConfigurationManager.AppSettings["outputFile"];
            string[] lines = File.ReadAllLines(inputCV, Encoding.UTF8);
            foreach (var item in lines)
            {
                string CVData = item.ToString().Replace('@', ' ');
                CVList.Add(CVData);

            }
            foreach(var item in CVList)
            {
                string[] cvData = item.Split(',');
                switch(Task)
			{
				case 1:

					HTMLDocument myDoc = new HTMLDocumentClass();
					myDoc = (HTMLDocument) axWebBrowser1.Document;

					// a quick look at the google html source reveals: 
					// <INPUT maxLength="256" size="55" name="q">
					//
                    HTMLInputElement otxtSearchBox = (HTMLInputElement)myDoc.all.item("ctl00_ContentPlaceHolder1_ctl00_txtAchternaam", 0);

                    otxtSearchBox.value = cvData[0].ToString();

                    HTMLInputElement dtxtSearchBox = (HTMLInputElement)myDoc.all.item("ctl00_ContentPlaceHolder1_ctl00_txtGeboortedatum", 0);

                     dtxtSearchBox.value = cvData[1].ToString();

					// google html source for the I'm Feeling Lucky Button:
					// <INPUT type=submit value="I'm Feeling Lucky" name=btnI>
					//
                    HTMLInputElement btnSearch = (HTMLInputElement)myDoc.all.item("ctl00_ContentPlaceHolder1_ctl00_cmdOpvragen", 0);
					btnSearch.click();
                   



					Task++;

                        // write to text file 


					break;

				case 2:

					// continuation of automated tasks...
					break;
			}
            }
            MessageBox.Show("Operation Completed...!");
			
		}
	}
}

推荐答案

使用您可以阅读的HTMLDocument您需要的功能: htmldocument msdn



你可以使用

Use the HTMLDocument you can read all functions you need on: htmldocument msdn

You can use
mydoc.GetElementById(String)






Or

mydoc.GetElementsByTagName(String)





所以你需要知道一个元素ID才能得到结果。如果您不知道要搜索的元素ide,请使用c#中的字符串和find / replace函数,并使用mydoc.Write(String)。您可以使用



So you need to know a Element by Id for getting the result. If you don't know the element ide you searching for simple use the string and find/replace function in c# and apply with mydoc.Write(String). You can apply changes with

mydoc.CreateElement(String)
mydoc.Write(String)







试试:




try:

string urlAddress = "http://google.com.ssvv.nl/";
var doc = new HtmlDocument();

using (WebClient client = new WebClient())
{
    string htmlCode = client.DownloadString(urlAddress);
    doc.LoadHtml(htmlCode);
}


//do whatever you want with the doc
doc.GetElementById('youId')


$做任何你想做的事情b $ b

此外,如果你想要一个更稳定的解决方案,你需要签出这个htmlAgile项目: htmlagilitypack 它有更多的选择。但要复杂到这里写下来,你一定要看一下。



Also if you want a more stable solution you need to checkout this htmlAgile project: htmlagilitypack it got more options. But to complex to just write down here, you should definitely check it out.


这篇关于如何从网页上读取div内容?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆