是否有C#实用程序用于匹配(语法分析)树中的模式? [英] Is there a C# utility for matching patterns in (syntactic parse) trees?
问题描述
我正在一个自然语言处理(NLP)项目中,在该项目中,我使用语法分析器从给定的句子中创建语法分析树.
I'm working on a Natural Language Processing (NLP) project in which I use a syntactic parser to create a syntactic parse tree out of a given sentence.
示例输入: 我遇到了Joe和Jill,然后我们去购物了
示例输出: [TOP [S [S [S [NP [PRP I]]] [VP [VBD ran] [PP [IN into]] [NP [NNP Joe] [CC和] [NNP吉尔]]]]] [CC和] [S [ADVP [RB然后]] [NP [PRP我们]] [VP [VBD去了] [NP [NN购物]]]]]]]]
Example Input: I ran into Joe and Jill and then we went shopping
Example Output: [TOP [S [S [NP [PRP I]] [VP [VBD ran] [PP [IN into] [NP [NNP Joe] [CC and] [NNP Jill]]]]] [CC and] [S [ADVP [RB then]] [NP [PRP we]] [VP [VBD went] [NP [NN shopping]]]]]]
我正在寻找一个C#实用程序,它将使我能够执行类似以下的复杂查询:
I'm looking for a C# utility that will let me do complex queries like:
- 获取与"Joe"相关的第一个VBD
- 获取最接近购物"的NP
这是一个 Java实用程序,我正在寻找C#等效项.
任何帮助将不胜感激.
Here's a Java utility that does this, I'm looking for a C# equivalent.
Any help would be much appreciated.
推荐答案
我们已经使用
一种选择是将输出解析为C#代码,然后将其编码为XML,这样就可以节点放入中间的string.Format("<{0}>", this.Name);
和string.Format("</{0}>", this._name);
中,将所有子节点递归放置.
One option would be to parse the output into C# code and then encoding it to XML making every node into string.Format("<{0}>", this.Name);
and string.Format("</{0}>", this._name);
in the middle put all the child nodes recursively.
完成此操作后,我将使用查询XML/HTML的工具来解析树.成千上万的人已经使用查询选择器和jQuery基于节点之间的关系来解析树状结构.我认为这远远优于TRegex或其他过时且未维护的Java实用程序.
After you do this, I would use a tool for querying XML/HTML to parse the tree. Thousands of people already use query selectors and jQuery to parse tree-like structure based on the relation between nodes. I think this is far superior to TRegex or other outdated and un-maintained java utilities.
例如,这是为了回答您的第一个示例:
For example, this is to answer your first example:
var xml = CQ.Create(d.ToXml());
//this can be simpler with CSS selectors but I chose Linq since you'll probably find it easier
//Find joe, in our case the node that has the text 'Joe'
var joe = xml["*"].First(x => x.InnerHTML.Equals("Joe"));
//Find the last (deepest) element that answers the critiria that it has "Joe" in it, and has a VBD in it
//in our case the VP
var closestToVbd = xml["*"].Last(x => x.Cq().Has(joe).Has("VBD").Any());
Console.WriteLine("Closest node to VPD:\n " +closestToVbd.OuterHTML);
//If we want the VBD itself we can just find the VBD in that element
Console.WriteLine("\n\n VBD itself is " + closestToVbd.Cq().Find("VBD")[0].OuterHTML);
这是您的第二个例子
//Now for NP closest to 'Shopping', find the element with the text 'shopping' and find it's closest NP
var closest = xml["*"].First(x => x.InnerHTML.Equals("shopping")).Cq()
.Closest("NP")[0].OuterHTML;
Console.WriteLine("\n\n NP closest to shopping is: " + closest);
这篇关于是否有C#实用程序用于匹配(语法分析)树中的模式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!