我如何使用HtmlAgility Pack从某种形式获得输入?郎:C#.net [英] How would I get the inputs from a certain form with HtmlAgility Pack? Lang: C#.net

查看:39
本文介绍了我如何使用HtmlAgility Pack从某种形式获得输入?郎:C#.net的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

代码可以比我更好地解释这个问题.我还提供了其他尝试执行此操作的方法.如果可能,请解释为什么其他方法也不起作用.我已经没有足够的想法了,可惜的是,没有很多关于HtmlAgilityPack的例子.我目前正在浏览文档,以寻找更多的想法.

Code can explain this problem much better than I can. I have also included alternate ways i've tried to do this. If possible, please explain why these other methods didn't work either. I've ran out of ideas, and sadly there aren't many examples for HtmlAgilityPack. I'm currently going through the documentation looking for more ideas though.

我注意到的一件事是.nextSibling属性,并认为我可以使用while循环来遍历表单,直到找不到下一个同级或末尾为止.

One thing I noticed was the .nextSibling property, and was thinking I could use a while loop to go through the form until it found no next sibling or the end of form.

无论如何,这是代码:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.Collections;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string source = @"
                <form name='form1' action='action1' method='method1' id='id1'>
                <input type='text1.1' name='name1.1' value='value1.1' />
                <input type='text1.2' name='name1.2' value='value1.2' />
            </form>
            <form name='form2' action='action2' method='method2' id='id2'>
                <input type='text2.1' name='name2.1' value='value2.1' />
                <input type='text2.2' name='name2.2' value='value2.2' />
            </form>
                    ";
            List<HtmlAttribute> formAttributes = new List<HtmlAttribute>();//this is what i'm wanting to get for the current form.
            /**
             * I want to end up with a list that has
             * Name: type Value: text1.1
             * Name: name Value: 1.1
             * Name: value Value: value1.1
             * Name: type Value: text1.2
             * Name: name Value: name1.2
             * Name: value Value: value1.2
             * but I am ending up with the 2nd forms values as well
             * */
            HtmlDocument htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(source);

            var forms = htmlDoc.DocumentNode.Descendants("form");
            foreach (var form in forms)
            {
                Console.WriteLine(form.Attributes[0].Value); //simple writes the form name to the console to keep track of things

                HtmlNodeCollection inputs = form.SelectNodes("/input"); // gets all the inputs in the selected form, or so I thought. This is where the problem lies. Result: Shows both forms inputs.
                //HtmlNodeCollection inputs = form.SelectNodes("//input"); // not the best at xpath, but perhaps this could make a difference? Result: no difference
                //var inputs = form.Elements("input"); // Maybe the inputs are referred to as elements? Result: shows no input outerhtml at all.
                foreach (var input in inputs) //this has all 4 inputs from both forms. I only want it to have 2 inputs from the selected form.
                {
                    Console.WriteLine(input.OuterHtml);
                    List<HtmlAttribute> attributes = new List<HtmlAttribute>();
                    attributes = input.Attributes.ToList<HtmlAttribute>();
                    foreach (var att in attributes)
                    {
                        //add attributes to allattributes list code that will be done once problem of getting only inputs for specified form is fixed
                    }
                }
                // here comes an alternate method! Edit: Didn't work :'(
                //var inputs = form.Descendants("input"); // perhaps using the "Descendants class will make a difference. Result: Nope, didn't have any items at all!
                //IEnumerator e = inputs.GetEnumerator();
                //while (e.MoveNext())
                //{
                //    Console.WriteLine("input: " + e.Current);

                //}
                Console.WriteLine(); // Simply making everything look pretty with a newline after each form name/input outerhtml display.
            }
            Console.Read();
        }

    }
}

推荐答案

我找到了答案!请看下面的代码,因为其中包含解决方案和说明! :)

I found the answer! Look at code below as it contains solution and explanation! :)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;
using System.Collections;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            string source = @"
                <form name='form1' action='action1' method='method1' id='id1'>
                <input type='text1.1' name='name1.1' value='value1.1' />
                <input type='text1.2' name='name1.2' value='value1.2' />
            </form>
            <form name='form2' action='action2' method='method2' id='id2'>
                <input type='text2.1' name='name2.1' value='value2.1' />
                <input type='text2.2' name='name2.2' value='value2.2' />
            </form>
                    ";
            List<HtmlAttribute> formAttributes = new List<HtmlAttribute>();
            IEnumerable<HtmlNode> inputs;
            /*
             * The line below is the major reason that this solution "worked" and the other didn't
             * */
            HtmlNode.ElementsFlags.Remove("form");
            /*
             * I was going through the HtmlAgilityPack forum, and stumbled upon this little tidbit of info:
             * 
             * "This is because by default, Forms are parsed as empty nodes - this is because forms are allowed to
             overlap other elements in the HTML spec.

                In other words, the following is technically legal HTML, even though it gives us developer hives:

                <table>
                <form>
                <some input elements>
                </table>
                </form>

                Here, the form overlaps the closing of the table and when properly rendered, will be contained inside the table.
                Since HtmlDocument attempts to allow this as valid without automatically correcting the HTML, HtmlDocument by default
                makes no attempt to populate the child nodes of the form.
                Ok. All that is merely an introduction. You can get around this default behavior by adding the following line:
                HtmlNode.ElementsFlags.Remove("form");
                before you make ANY use of HtmlDocument. This will allow it to parse the nodes of the form, but it sacrifices 
                the ability of the form to overlap other nodes. It will force the form to be closed properly."
             * 
             * HtmlAgilityPack didn't put the inputs as the childnode of each form because of "technically legal HTML" that could mess things up a bit,
             * so the only thing I had to do is remove the element flag! Enjoy the code below, it should be pretty self explanatory.
             * */

            HtmlDocument htmlDoc = new HtmlDocument();
            htmlDoc.OptionOutputAsXml = true;
            htmlDoc.OptionAutoCloseOnEnd = true;
            htmlDoc.LoadHtml(source);

            var forms = htmlDoc.DocumentNode.Descendants("form");
            foreach (var form in forms)
            {
                inputs = form.ChildNodes
                    .Where<HtmlNode>(a => a.OriginalName.Contains("input")); // woo hoo, finally figuring out what linq is. Sort of like mysql when I was coding php!

                Console.WriteLine(form.Attributes[0].Value + " attributes:" + Environment.NewLine + "------------------");
                foreach (var input in inputs)
                {
                    IEnumerable<HtmlAttribute> attributes;
                    attributes = input.Attributes;
                    foreach (var att in attributes)
                    {
                        Console.WriteLine("Name: " + att.Name + Environment.NewLine
                               + "Value: " + att.Value + Environment.NewLine);
                        formAttributes.Add(att);
                    }
                }
                Console.WriteLine(); // Simply making everything look pretty with a newline after each form name/input outerhtml display.
            }
            Console.Read();
        }

    }
}

这篇关于我如何使用HtmlAgility Pack从某种形式获得输入?郎:C#.net的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆