查找具有连续/递增属性值的节点? [英] Finding nodes with consecutive/incremented attribute values?

查看:60
本文介绍了查找具有连续/递增属性值的节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一些连续的节点< xref ref-type =" bibr" rid =" ref ..."> ...< / xref> (当有3个或更多时)在某些xml文件中,由
逗号或空格下划线并将它们写入日志文件或控制台。



注意:我正在尝试的连续节点识别应该具有各自的属性
rid 值增加+1减去文本 ref 。除了
refX 之外的任何其他
外部参照节点都不需要检查。 这是一个小的示例xml文件 以及示例文件的所需输出

< xref ref-type =" bibr" rid =" ref9"> [9]< / xref>,< xref ref-type =" bibr" rid =" ref10"> [10]< / xref>,< xref ref-type =" bibr" RID = QUOT; REF11"> [11]< /外部参照> < xref ref-type =" bibr" RID = QUOT; REF12"> [12]< /外部参照> 

< xref ref-type =" bibr" rid =" ref2"> [2]< / xref>,< xref ref-type =" bibr" rid =" ref3"> [3]< / xref>,< xref ref-type =" bibr" RID = QUOT; REF4"> [4]< /外部参照>

< xref ref-type =" bibr" rid =" ref1"> [1]< / xref>,< xref ref-type =" bibr" rid =" ref2"> [2]< / xref>,< xref ref-type =" bibr" RID = QUOT; REF3"> [3]< /外部参照>

< xref ref-type =" bibr" rid =" ref101"> 101< / xref>,< xref ref-type =" bibr" rid =" ref102"> 102< / xref>,< xref ref-type =" bibr" RID = QUOT; ref103"> 103< /外部参照> < xref ref-type =" bibr" RID = QUOT; ref104"> 104℃; /外部参照>

< xref ref-type =" bibr" rid =" ref11"> [11]< / xref>,< xref ref-type =" bibr" RID = QUOT; REF12"> [12]< /外部参照> < xref ref-type =" bibr" rid =" ref13"> [13]< / xref>




< span style ="color:#242729; font-family:Arial,'Helvetica Neue',Helvetica,sans-serif; font-size:15px">我尝试的代码如下

 XmlReaderSettings settings = new XmlReaderSettings(); 
settings.XmlResolver = null;
settings.ProhibitDtd = false;
var xmlfiles = Directory.GetFiles(@" D:\\\ xml"," * .xml",SearchOption.AllDirectories);

foreach(var xmlfile in xmlfiles){
XDocument xdoc = XDocument.Load(XmlReader.Create(xmlfile,settings),LoadOptions.SetLineInfo);

var cons = xdoc.Descendants(" xref")
.Where(x => x.Attribute(" rid")。Value.Contains(" ref") )
.GroupBy(x => x.Parent)
.Select(grp => new
{
Parent = grp.Key,
ConsecutiveNodes = grp .Select((n,i)=> new
{
Index = i + 1,
Node = n
}),
Count = grp.Count ()
})
.ToList();


foreach(缺点)
{
if(o.Count> 2)
{
//Console.WriteLine (xmlfile +" \r\ n" + new string('=',50)+" 3个或更多连续节点:\\\\ nnound in line:" +((IXmlLineInfo)o.Parent ).LineNumber + QUOT;," +((IXmlLineInfo)o.Parent).LinePosition);
}
}
}


Console.ReadLine();




如何打印输出,如$ b中所述$ b预期输出或至少得到每个匹配的行号和位置,如



D:\Test \ xml \ 0123.xml

=====================

在线发现:7,21

$
在线发现:8,18



在线发现:14,60



在线发现:14,341



D:\ Test \ xml \ 22 1.xml

=====================

在线发现:...

等。



注意:我最近发现了另一个问题...如果像
< title> 这样的父节点包含类似 


的内容

< title> METHODS AND < xref ref-type =" bibr" rid =" ref2"> [2]< / xref>,< xref ref-type =" bibr" RID = QUOT; REF3"> [3]< /外部参照>对象< xref ref-type =" bibr" RID = QUOT; REF4"> [4]< /外部参照>调查< / title> 

然后这也被认为是有效的搜索,但它不是在

< xref ref-type =" bibr" rid =" ref3"> 

< xref ref-type = " BIBR" rid =" ref4"> 

有些字符串不是单个空格逗号后跟单个空格

解决方案

Hello Don,


尝试下面的代码,它计算第一个元素的开始位置。

 Console.WriteLine(xmlfile +" \ r \ n" + new string('=',50)+" \\\\ nn" +" 3个或更多个连续节点:\ r \ n"); 

foreach(缺点)
{
if(o.Count> 2)
{
Console.WriteLine(o.ToString( ));
Console.WriteLine(" Found in line:" +((IXmlLineInfo)o.Parent).LineNumber +"," + o.Parent.ToString()。IndexOf("< xref" ));
}
}

>>有些字符串不是单一的空格
逗号后跟单个空格


至于如何计算xml元素之间的分隔字符串,一个简单的方法是使用linq来计算外部参照元素的起始位置和结束位置  ;然后检索它们之间的子串。或者如果
擅长编写正则表达式字符串,则使用正则表达式进行拆分。


祝你好运,


Neil Hu


I'm trying to find some consecutive nodes <xref ref-type="bibr" rid="ref...">...</xref> (when there are 3 or more) in some xml files that are separated by a comma or space and write them to a log file or console.

NOTE: The consecutive nodes that I'm trying to identify should have their respective attribute rid values incremented by +1 minus the text ref. Any other xref nodes with different rid values apart from refX are not required to check. Here is small sample xml file and the desired output for the sample file

<xref ref-type="bibr" rid="ref9">[9]</xref>, <xref ref-type="bibr" rid="ref10">[10]</xref>, <xref ref-type="bibr" rid="ref11">[11]</xref> <xref ref-type="bibr" rid="ref12">[12]</xref>

<xref ref-type="bibr" rid="ref2">[2]</xref>, <xref ref-type="bibr" rid="ref3">[3]</xref>, <xref ref-type="bibr" rid="ref4">[4]</xref>

<xref ref-type="bibr" rid="ref1">[1]</xref>, <xref ref-type="bibr" rid="ref2">[2]</xref>, <xref ref-type="bibr" rid="ref3">[3]</xref>

<xref ref-type="bibr" rid="ref101">101</xref>, <xref ref-type="bibr" rid="ref102">102</xref>, <xref ref-type="bibr" rid="ref103">103</xref> <xref ref-type="bibr" rid="ref104">104</xref>

<xref ref-type="bibr" rid="ref11">[11]</xref>, <xref ref-type="bibr" rid="ref12">[12]</xref> <xref ref-type="bibr" rid="ref13">[13]</xref>


The code I've tried is as below

XmlReaderSettings settings = new XmlReaderSettings();
settings.XmlResolver = null;
settings.ProhibitDtd = false;
var xmlfiles=Directory.GetFiles(@"D:\test\xml","*.xml",SearchOption.AllDirectories);

foreach (var xmlfile in xmlfiles) {
    XDocument xdoc = XDocument.Load(XmlReader.Create(xmlfile, settings),LoadOptions.SetLineInfo);

    var cons = xdoc.Descendants("xref")
        .Where(x=>x.Attribute("rid").Value.Contains("ref"))
        .GroupBy(x=>x.Parent)
        .Select(grp=> new
                {
                    Parent = grp.Key,
                    ConsecutiveNodes = grp.Select((n, i)=> new
                                                  {
                                                    Index = i+1,
                                                    Node = n
                                                  }),
                    Count = grp.Count()
                })
        .ToList();


    foreach(var o in cons)
    {
        if (o.Count>2)
        {
        //Console.WriteLine(xmlfile+"\r\n"+new string('=',50)+"3 or more consecutive nodes: \r\nFound in line: "+((IXmlLineInfo)o.Parent).LineNumber+","+((IXmlLineInfo)o.Parent).LinePosition);
        }
    }
}


Console.ReadLine();


How do I print the output as described in the expected output or at least get the line number and position of each of the matches found like

D:\Test\xml\123.xml
=====================
Found in line: 7,21

Found in line: 8,18

Found in line: 14,60

Found in line: 14,341

D:\Test\xml\221.xml
=====================
Found in line: ...
etc.

NOTE: There is also another issue that I recently discovered...if a parent node like <title> contains something like 

<title>METHODS AND <xref ref-type="bibr" rid="ref2">[2]</xref>, <xref ref-type="bibr" rid="ref3">[3]</xref> OBJECTS OF <xref ref-type="bibr" rid="ref4">[4]</xref> INVESTIGATION</title>

then that is also considered as a valid search but it is not as between

<xref ref-type="bibr" rid="ref3">

and

<xref ref-type="bibr" rid="ref4">

there are strings which are not a single space or a comma followed by a single space

解决方案

Hello Don,

Try to the below code, which calculate the start position of first element.

   Console.WriteLine(xmlfile + "\r\n" + new string('=', 50) +"\r\n" + "3 or more consecutive nodes: \r\n");

                foreach (var o in cons)
                {                   
                    if (o.Count > 2)
                    {
                        Console.WriteLine(o.ToString());
                        Console.WriteLine("Found in line: "+((IXmlLineInfo)o.Parent).LineNumber+","+o.Parent.ToString().IndexOf("<xref"));
                    }
                }

>>there are strings which are not a single spaceor a comma followed by a single space

As for how to calculate the separated string between xml elements, a simple way is use linq to calculate the start position and end position of xref element and then retrieve substring between their position. Or you use regex to split if you are good at writing regex string.

Best regards,

Neil Hu


这篇关于查找具有连续/递增属性值的节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆