凡在这树的遍历代码的bug? [英] Where's the bug in this tree traversal code?

查看:178
本文介绍了凡在这树的遍历代码的bug?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有在错误导线(二)这是造成多于一次迭代的节点。



窃听码



 公开的IEnumerable< HtmlNode>特拉弗斯()
{
的foreach(在_context VAR节点)
{
产量返回节点;
的foreach(VAR孩子儿童()遍历()。)
收益回报的孩子;
}
}

公共SharpQuery儿童()
{
返回新SharpQuery(_context.SelectMany(N => n.ChildNodes)。凡(N => n.NodeType == HtmlNodeType.Element),这一点);
}

公共SharpQuery(IEnumerable的< HtmlNode个节点,SharpQuery以前= NULL)
{
如果(节点== NULL)抛出新的ArgumentNullException(节点 );
_previous =前;
_context =新的List< HtmlNode>(节点);
}



测试代码



 静态无效的主要(字串[] args)
{
变种平方=新SharpQuery(@
< A>
< b>
< C />
< D />
< E />
< F>
<克/>
< H />
< I />
< / F>
< / b>
< / A>中);
VAR节点= sq.Traverse();
Console.WriteLine({0}节点:{1},nodes.Count(),的string.join(,,nodes.Select(N => n.Name)));
到Console.ReadLine();



输出




19节点:#文件,A,b,C,G,H,I,D,G,H,I,E,G,H,I,F,G,H,I




的预期输出




每个字母ai的打印一次。




似乎无法找出其中是它会错... node.ChildNodes 确实的回归只是直接孩子,对不对? (从HtmlAgilityPack)






满级(浓缩),如果你想尝试自己运行它。

 公共类的SQLite 
{
私人只读表< HtmlNode> _context =新的List< HtmlNode>();
私人只读的SQLite _previous = NULL;

公众的SQLite(字符串HTML)
{
变种DOC =新的HTMLDocument();
doc.LoadHtml(HTML);
_context.Add(doc.DocumentNode);
}

公共SQLite的(IEnumerable的< HtmlNode个节点,SQLite的前面的= NULL)
{
如果(节点== NULL)抛出新的ArgumentNullException(节点 );
_previous =前;
_context =新的List< HtmlNode>(节点);
}

公开的IEnumerable< HtmlNode>特拉弗斯()
{
的foreach(在_context VAR节点)
{
产量返回节点;
的foreach(VAR孩子儿童()遍历()。)
收益回报的孩子;
}
}

公共SQLite的儿童()
{
返回新的SQLite(_context.SelectMany(N => n.ChildNodes)。凡(N => n.NodeType == HtmlNodeType.Element),这一点);
}
}


解决方案

第一请注意,一切按照只要我们遍历没有任何兄弟元素来规划。只要我们打元素< C> ,事情开始去失控。这也是有趣的是, c为C> < D> < e基都认为它们包含< F>的孩子



让我们一起来在您的电话仔细看看的SelectMany()

  _context.SelectMany (N => n.ChildNodes)

这是通过迭代的项目在 _context 积累的每个项目的子节点。让我们来看看 _context 。一切都应该没问题,因为它是一个列表长度 1 。或者是什么?



我怀疑你的 SharpQuery(串)构造卖场兄弟相同的列表中的元素。在这种情况下, _context 可能不是长度 1 了,并记住,的SelectMany( )积累的列表中每个项目的子节点。



例如,如果 _context 是包含一个List c为C> < D> < E> < F> ,只< F> 有孩子,和的SelectMany()被调用一次的每个元素,它会积累并返回<的儿童; F> 四次



我认为这是你的错误



编辑:既然你已经发布了满级,我不必再猜测。看看的操作顺序,当你遍历< B> (迭代器换成列出了更好的理解):




  1. 呼叫儿童()< b> ,返回 c为C> < D> < E> < F>

  2. 呼叫导线(二)的结果清单:

    1. 呼叫儿童() c为C> _context 实际上包含 c为C> < D> < E> < F> ,不仅 c为C> ,使返回< G> < H> < I>

    2. 呼叫儿童()< D> ,同样的事情,因为 _context 是的相同的对的两个 c为C> < D> (和< E> < F> ),

    3. 车床,漂洗,重复。



There's a bug in Traverse() that's causing it to iterate nodes more than once.

Bugged Code

public IEnumerable<HtmlNode> Traverse()
{
    foreach (var node in _context)
    {
        yield return node;
        foreach (var child in Children().Traverse())
            yield return child;
    }
}

public SharpQuery Children()
{
    return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
}

public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null)
{
    if (nodes == null) throw new ArgumentNullException("nodes");
    _previous = previous;
    _context = new List<HtmlNode>(nodes);
}

Test Code

    static void Main(string[] args)
    {
        var sq = new SharpQuery(@"
<a>
    <b>
        <c/>
        <d/>
        <e/>
        <f>
            <g/>
            <h/>
            <i/>
        </f>
    </b>
</a>");
        var nodes = sq.Traverse();
        Console.WriteLine("{0} nodes: {1}", nodes.Count(), string.Join(",", nodes.Select(n => n.Name)));
        Console.ReadLine();

Output

19 nodes: #document,a,b,c,g,h,i,d,g,h,i,e,g,h,i,f,g,h,i

Expected Output

Each letter a-i printed once.

Can't seem to figure out where's it going wrong... node.ChildNodes does return just direct children, right? (from HtmlAgilityPack)


Full class (condensed) if you want to try and run it yourself.

public class SQLite
{
    private readonly List<HtmlNode> _context = new List<HtmlNode>();
    private readonly SQLite _previous = null;

    public SQLite(string html)
    {
        var doc = new HtmlDocument();
        doc.LoadHtml(html);
        _context.Add(doc.DocumentNode);
    }

    public SQLite(IEnumerable<HtmlNode> nodes, SQLite previous = null)
    {
        if (nodes == null) throw new ArgumentNullException("nodes");
        _previous = previous;
        _context = new List<HtmlNode>(nodes);
    }

    public IEnumerable<HtmlNode> Traverse()
    {
        foreach (var node in _context)
        {
            yield return node;
            foreach (var child in Children().Traverse())
                yield return child;
        }
    }

    public SQLite Children()
    {
        return new SQLite(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this);
    }
}

解决方案

First, note that everything goes according to plan as long as we're iterating over elements that don't have any sibling. As soon as we hit element <c>, things start going haywire. It's also interesting that <c>, <d> and <e> all think they contain <f>'s children.

Let's take a closer look at your call to SelectMany():

_context.SelectMany(n => n.ChildNodes)

That iterates through the items in _context and accumulates the child nodes of each item. Let's have a look at _context. Everything should be okay, since it's a List of length 1. Or is it?

I suspect your SharpQuery(string) constructor stores sibling elements inside the same list. In that case, _context may not be of length 1 anymore and, remember, SelectMany() accumulates the child nodes of each item in the list.

For example, if _context is a list containing <c>, <d>, <e> and <f>, only <f> has children, and SelectMany() is called once for each element, it will accumulate and return the children of <f> four times.

I think that's your bug.

EDIT: Since you've posted the full class, I don't have to guess anymore. Look at the sequence of operations when you iterate over <b> (iterators replaced by lists for better comprehension):

  1. Call Children() on <b>, which returns <c>, <d>, <e> and <f>,
  2. Call Traverse() on the resulting list:

    1. Call Children() on <c>, but _context actually contains <c>, <d>, <e> and <f>, not only <c>, so that returns <g>, <h> and <i>,
    2. Call Children() on <d>, same thing since _context is the same for both <c> and <d> (and <e>, and <f>),
    3. Lather, rinse, repeat.

这篇关于凡在这树的遍历代码的bug?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆