如何从标记的fie中获取单词 [英] how to get words from a tagged fie

查看:98
本文介绍了如何从标记的fie中获取单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这些是用乌尔都语写的文本文件中的一些标记句子。

如何使用C#提取具有CC,JJ等特定标记的单词.Net



< w pos =NNMM1N>دردانہ< / w>

< w pos =JDNU> 60< / w>

< w pos =NNUM1O NNUM1N>باب< / w>

< w pos =CC RD>اور< / w>

< w pos =JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1>&text; / w>

解决方案

也许正则表达式可以帮助您。

< w pos =(JJ | CC)。*>(?< data>。*)< / w> 



确保设置了多行选项。

命名组DATA包含标签以CC resp开头的文本。 JJ



这里你是一个C#的例子:



字符串模式=<   w     pos   =   \ (JJ | CC)。* \ > (?<  数据 > 。*)<   / w  > ; 

正则表达式正则表达式=新正则表达式(模式);

匹配匹配= regex.Match(< w pos = \ JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1 \ > ; اصول< / w > );

if(match.Success)
{
string result = match.Groups [data]。Value;

MessageBox.Show(result);
}


这是您应用程序中的XML数据,您希望使用节点的属性值搜索(匹配)关键字(可能)文档中的子节点)并将这些节点的值作为搜索结果返回。要了解如何在.NET中序列化和反序列化XML文档,您可能需要阅读 MSDN上的这个文件 [ ^ ]。好的是,可以访问实际的类(XmlReader)以便在那里学习更多。



最后,我希望你阅读一篇文章以了解更多信息。 使用XML 的方法[ ^ ]。


这是一个带XML的LINQ方式获得所需结果:

  //  设置数据 - 注意添加根元素以使xml格式良好 
var xmlStr = < root>;
xmlStr + = < w pos = NNMM1N >دردانہ< / w>;
xmlStr + = < w pos = JDNU > 60< / w>;
xmlStr + = < w pos = NNUM1O NNUM1N >باب< / w>;
xmlStr + = < w pos = CC RD >اور< / w>;
xmlStr + = < w pos = JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1 >اصول< / w>;
xmlStr + = < / root>;
var xmlEle = XElement.Parse(xmlStr);



  //  填充EleList  - 举几个例子: 

// pos值 - 使用IndexOf检查是否存在匹配
var EleList =( from ele in xmlEle.Descendants( w
其中​​ ele.Attribute( pos)。Value.IndexOf( < span class =code-string> CC)> = 0
|| ele.Attribute( pos)。Value.IndexOf( JJ> = 0
选择 ele.Value.ToString()
).ToList();

//

// pos值 - 拆分成数组并根据Predicate< t>检查它们是否存在。匹配
var EleList =( from ele in xmlEle.Descendants( w
其中 Array.Exists(ele.Attribute( pos)。Value.Split(' '),
// 根据需要修改Predicate< t>匹配条件
s = > ; { return (s.Equals( CC)|| s.IndexOf( JJ> = 0 )? true false ;}

select ele.Value.ToString()
).ToList();

//
IEnumerable< string> EleList = xmlEle.Elements( w
.Where(ele = > Array.Exists(ele.Attribute( pos)。Value.Split(' '),
s = > { return (s.Equals( CC)|| s.Equals( JJU))? true false ;}
))
。选择( ele = > ele.Value.ToString())
.ToList();
< / string > < / t > < / t >



  foreach  var  s  in  EleList){
Console.WriteLine(s);
}
// 输出:
// اور
// اصول



以下是如何使用存在的链接: http://msdn.microsoft.com/en-us/library/yw84x8be(v = vs.110)的.aspx [< a href =http://msdn.microsoft.com/en-us/library/yw84x8be(v=vs.110).aspx\"target =_ blanktitle =New Window> ^ ]


these are some tagged sentences in a text file which is in urdu language.
how can i extract words with particular tag like CC, JJ using C#.Net

<w pos="NNMM1N">دردانہ</w>
<w pos="JDNU">60</w>
<w pos="NNUM1O NNUM1N">باب</w>
<w pos="CC RD">اور</w>
<w pos="JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1">اصول</w>

解决方案

Maybe regular expressions help you.

<w pos="(JJ|CC).*">(?<data>.*)</w>


Make sure the multiline option is set.
Named group "DATA" contains the text of where tags start with CC resp. JJ

Here you are a C# example:

string pattern = "<w pos=\"(JJ|CC).*\">(?<data>.*)</w>";

Regex regex = new Regex(pattern);

Match match = regex.Match("<w pos=\"JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1\">اصول</w>");

if(match.Success)
{
    string result = match.Groups["data"].Value;

    MessageBox.Show(result);
}


This is the XML data in your application and you want to search (match) the keyword with the attribute values of the nodes (maybe child nodes in the document) and return the value of those nodes as search result. To learn how you can serialize and deserialize the XML documents in .NET you might want to read this document on MSDN[^]. Good thing is, that the actual classes (XmlReader) can also be accessed for more learning there.

Finally, I would like you to read an article to learn more on methods for Working with XML[^] on CodeProject.


Here is a LINQ with XML way of getting the desired results:

//setup data - note root element added to make the xml well formed
var xmlStr = "<root>";
xmlStr += "<w pos="NNMM1N">دردانہ</w>";
xmlStr += "<w pos="JDNU">60</w>";
xmlStr += "<w pos="NNUM1O NNUM1N">باب</w>";
xmlStr += "<w pos="CC RD">اور</w>";
xmlStr += "<w pos="JJU NNUM1N NNUM1O NNUM1V NNUM2N NNUF1N NNUF1O NNUF1V RR VV0 VVIT1">اصول</w>";
xmlStr += "</root>";
var xmlEle = XElement.Parse(xmlStr);


//Populate EleList - a few examples:

//pos values - check if match exist using IndexOf
var EleList = (from ele in xmlEle.Descendants("w")
	where ele.Attribute("pos").Value.IndexOf("CC") >= 0 
		|| ele.Attribute("pos").Value.IndexOf("JJ") >= 0
	select ele.Value.ToString()
).ToList();

//OR

//pos values - split into array and check if they exist depending on Predicate<t> match
var EleList = (from ele in xmlEle.Descendants("w")
	where Array.Exists(ele.Attribute("pos").Value.Split(' '), 
		//modify the Predicate<t> match condition as needed
		s => {return (s.Equals("CC")||s.IndexOf("JJ")>=0)?true:false;}
	)
	select ele.Value.ToString()
).ToList();

//OR
IEnumerable<string> EleList = xmlEle.Elements("w")
	.Where(ele => Array.Exists(ele.Attribute("pos").Value.Split(' '), 
		s => {return (s.Equals("CC")||s.Equals("JJU"))?true:false;}
	))
	.Select(ele => ele.Value.ToString())
	.ToList();
</string></t></t>


foreach (var s in EleList){
	Console.WriteLine(s);
}	
//Output:
//	اور
//	اصول


Here is link on how to use Exists: http://msdn.microsoft.com/en-us/library/yw84x8be(v=vs.110).aspx[^]


这篇关于如何从标记的fie中获取单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆