凡在这树的遍历代码的bug? [英] Where's the bug in this tree traversal code?
问题描述
有在错误导线(二)
这是造成多于一次迭代的节点。
窃听码
公开的IEnumerable< HtmlNode>特拉弗斯()
{
的foreach(在_context VAR节点)
{
产量返回节点;
的foreach(VAR孩子儿童()遍历()。)
收益回报的孩子;
}
}
公共SharpQuery儿童()
{
返回新SharpQuery(_context.SelectMany(N => n.ChildNodes)。凡(N => n.NodeType == HtmlNodeType.Element),这一点);
}
公共SharpQuery(IEnumerable的< HtmlNode个节点,SharpQuery以前= NULL)
{
如果(节点== NULL)抛出新的ArgumentNullException(节点 );
_previous =前;
_context =新的List< HtmlNode>(节点);
}
测试代码
静态无效的主要(字串[] args)
{
变种平方=新SharpQuery(@
< A>
< b>
< C />
< D />
< E />
< F>
<克/>
< H />
< I />
< / F>
< / b>
< / A>中);
VAR节点= sq.Traverse();
Console.WriteLine({0}节点:{1},nodes.Count(),的string.join(,,nodes.Select(N => n.Name)));
到Console.ReadLine();
输出
19节点:#文件,A,b,C,G,H,I,D,G,H,I,E,G,H,I,F,G,H,I
块引用>
的预期输出
每个字母ai的打印一次。
块引用>
似乎无法找出其中是它会错...
node.ChildNodes
确实的回归只是直接孩子,对不对? (从HtmlAgilityPack)
满级(浓缩),如果你想尝试自己运行它。
公共类的SQLite
{
私人只读表< HtmlNode> _context =新的List< HtmlNode>();
私人只读的SQLite _previous = NULL;
公众的SQLite(字符串HTML)
{
变种DOC =新的HTMLDocument();
doc.LoadHtml(HTML);
_context.Add(doc.DocumentNode);
}
公共SQLite的(IEnumerable的< HtmlNode个节点,SQLite的前面的= NULL)
{
如果(节点== NULL)抛出新的ArgumentNullException(节点 );
_previous =前;
_context =新的List< HtmlNode>(节点);
}
公开的IEnumerable< HtmlNode>特拉弗斯()
{
的foreach(在_context VAR节点)
{
产量返回节点;
的foreach(VAR孩子儿童()遍历()。)
收益回报的孩子;
}
}
公共SQLite的儿童()
{
返回新的SQLite(_context.SelectMany(N => n.ChildNodes)。凡(N => n.NodeType == HtmlNodeType.Element),这一点);
}
}
解决方案第一请注意,一切按照只要我们遍历没有任何兄弟元素来规划。只要我们打元素
< C>
,事情开始去失控。这也是有趣的是,c为C>
,< D>
和< e基
都认为它们包含< F>的孩子
让我们一起来在您的电话仔细看看
的SelectMany()
:_context.SelectMany (N => n.ChildNodes)
这是通过迭代的项目在
_context
和积累的每个项目的子节点。让我们来看看_context
。一切都应该没问题,因为它是一个列表
长度1
。或者是什么?
我怀疑你的
SharpQuery(串)
构造卖场兄弟相同的列表中的元素。在这种情况下,_context
可能不是长度1
了,并记住,的SelectMany( )
的积累的列表中每个项目的子节点。
例如,如果
_context
是包含一个Listc为C>
,< D>
,< E>
和< F>
,只< F>
有孩子,和的SelectMany()
被调用一次的每个元素,它会积累并返回<的儿童; F> 。code>四次
我认为这是你的错误
编辑:既然你已经发布了满级,我不必再猜测。看看的操作顺序,当你遍历
< B>
(迭代器换成列出了更好的理解):
- 呼叫
儿童()
在< b>
,返回c为C>
,< D>
,< E>
和< F>
,
- 呼叫
导线(二)
的结果清单:
- 呼叫
儿童()
在c为C>
的但的_context
实际上包含c为C>
,< D>
,< E>
和< F>
,不仅c为C>
,使返回< G>
,< H>
和< I>
,
- 呼叫
儿童()
在< D>
,同样的事情,因为_context
是的相同的对的两个的c为C>
和< D>
(和< E>
和< F>
),
- 车床,漂洗,重复。
There's a bug in
Traverse()
that's causing it to iterate nodes more than once.Bugged Code
public IEnumerable<HtmlNode> Traverse() { foreach (var node in _context) { yield return node; foreach (var child in Children().Traverse()) yield return child; } } public SharpQuery Children() { return new SharpQuery(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this); } public SharpQuery(IEnumerable<HtmlNode> nodes, SharpQuery previous = null) { if (nodes == null) throw new ArgumentNullException("nodes"); _previous = previous; _context = new List<HtmlNode>(nodes); }
Test Code
static void Main(string[] args) { var sq = new SharpQuery(@" <a> <b> <c/> <d/> <e/> <f> <g/> <h/> <i/> </f> </b> </a>"); var nodes = sq.Traverse(); Console.WriteLine("{0} nodes: {1}", nodes.Count(), string.Join(",", nodes.Select(n => n.Name))); Console.ReadLine();
Output
19 nodes: #document,a,b,c,g,h,i,d,g,h,i,e,g,h,i,f,g,h,i
Expected Output
Each letter a-i printed once.
Can't seem to figure out where's it going wrong...
node.ChildNodes
does return just direct children, right? (from HtmlAgilityPack)
Full class (condensed) if you want to try and run it yourself.
public class SQLite { private readonly List<HtmlNode> _context = new List<HtmlNode>(); private readonly SQLite _previous = null; public SQLite(string html) { var doc = new HtmlDocument(); doc.LoadHtml(html); _context.Add(doc.DocumentNode); } public SQLite(IEnumerable<HtmlNode> nodes, SQLite previous = null) { if (nodes == null) throw new ArgumentNullException("nodes"); _previous = previous; _context = new List<HtmlNode>(nodes); } public IEnumerable<HtmlNode> Traverse() { foreach (var node in _context) { yield return node; foreach (var child in Children().Traverse()) yield return child; } } public SQLite Children() { return new SQLite(_context.SelectMany(n => n.ChildNodes).Where(n => n.NodeType == HtmlNodeType.Element), this); } }
解决方案First, note that everything goes according to plan as long as we're iterating over elements that don't have any sibling. As soon as we hit element
<c>
, things start going haywire. It's also interesting that<c>
,<d>
and<e>
all think they contain<f>
's children.Let's take a closer look at your call to
SelectMany()
:_context.SelectMany(n => n.ChildNodes)
That iterates through the items in
_context
and accumulates the child nodes of each item. Let's have a look at_context
. Everything should be okay, since it's aList
of length1
. Or is it?I suspect your
SharpQuery(string)
constructor stores sibling elements inside the same list. In that case,_context
may not be of length1
anymore and, remember,SelectMany()
accumulates the child nodes of each item in the list.For example, if
_context
is a list containing<c>
,<d>
,<e>
and<f>
, only<f>
has children, andSelectMany()
is called once for each element, it will accumulate and return the children of<f>
four times.I think that's your bug.
EDIT: Since you've posted the full class, I don't have to guess anymore. Look at the sequence of operations when you iterate over
<b>
(iterators replaced by lists for better comprehension):
- Call
Children()
on<b>
, which returns<c>
,<d>
,<e>
and<f>
,- Call
Traverse()
on the resulting list:
- Call
Children()
on<c>
, but_context
actually contains<c>
,<d>
,<e>
and<f>
, not only<c>
, so that returns<g>
,<h>
and<i>
,- Call
Children()
on<d>
, same thing since_context
is the same for both<c>
and<d>
(and<e>
, and<f>
),- Lather, rinse, repeat.
这篇关于凡在这树的遍历代码的bug?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!