识别和提取名词和修饰语 [英] Identify and extract noun and modifier
问题描述
示例: / p>
球阀2in用于带垫圈的绿色泵
应为: 球阀
任何帮助将不胜感激
根据您期望的句子类型,有一些不同的方法。在你的例子中,你要提取的两个词在句子的开始处,并由空格分开。如果你期望这是永远的,那么你可以使用一些简单的东西,如
函数getNoun(ByVal句子为String)
getNoun =
pos1 = InStr(1,句子,)'找到第一个空格
如果pos1 <= 0然后
getNoun = sentence'如果没有空格,那么假设只有名词
Exit Function
End If
pos2 = InStr(pos1 + 1,sentence,)'找到第二个空格
如果pos2 = 0然后
getNoun = sentence'如果没有第二个空格,那么假定只有名词和限定符
退出函数
End If
getNoun = Left(sentence, pos2 - 1)'如果有两个或更多的空格,得到第二个之前的所有字符
结束函数
在即时窗口中测试:
? getNoun(球阀2in绿色泵带垫圈)
球阀
? getNoun(球阀)
球阀
? getNoun(ball)
ball
如果您的场景更复杂,您需要使用特定标准来确定哪些词是所需的名词和限定词,您可能会发现用于正则表达式COM类(请参阅这个主题)。
编辑:基于在意见中,我明白职位是可变的,使用MS Word词典作为参考是可以接受的。如果代码将在Microsoft Word中运行,以下函数会告诉您一个单词是否是名词:
函数is_noun (ByVal wrd As String)
Dim s As Object,l As Variant
is_noun = False
Set s = SynonymInfo(wrd)
Let l = s.PartOfSpeechList
如果s.MeaningCount<> 0然后
对于i = LBound(l)到UBound(l)
如果l(i)= wdNoun然后
is_noun = True
结束如果
下一个
结束如果
结束功能
如果您没有在MS Word建议MS Excel),但MS Word安装在目标系统中,那么您可以调整上述代码以使用MS Word COM自动化对象。
然后,您可以提取第一个名词和下一个单词 - 如果有的话,从句子中,这样的东西就像这样
函数getNoun(ByVal sentence As String )
$ p但是,请注意,您可以盲目信任MS Word的Word数据库,如果您的句子包含例如可能是动词或一个名词取决于上下文。此外,上述示例将使用您的MS Word设置的默认语言(可以使用不同的语言 - 如果已安装 - 通过在
getNoun =
Dim wrds()As String
wrds = Split(sentence)
For i = LBound(wrds)To UBound(wrds)
If is_noun(wrds(i))然后
getNoun = wrds(i)
如果i < UBound(wrds)然后
getNoun = getNoun& & wrds(i + 1)
结束如果
退出函数
结束如果
下一个
结束函数
SynonymInfo
中包含语言参数)Any idea how to Identify and extract noun and modifier using VBA (excel)
Example:
ball valve 2in for green pump with gasket
Should be: ball valve
Any help will be appreciated
解决方案There are some different approaches, depending on the type of sentence you expect. In your example, the two words you want to extract are on the beginning of the sentence, and separated by whitespaces. If you expect this to be always the case, then you could use something simple as
Function getNoun(ByVal sentence As String) getNoun = "" pos1 = InStr(1, sentence, " ") 'find the first whitespace If pos1 <= 0 Then getNoun = sentence 'if no whitespace, then assume there is only the noun Exit Function End If pos2 = InStr(pos1 + 1, sentence, " ") 'find the second whitespace If pos2 <= 0 Then getNoun = sentence 'if no second whitespace, then assume there is only the noun and qualifier Exit Function End If getNoun = Left(sentence, pos2 - 1) 'if there are two or more spaces, get all chars before the second one End Function
Tests in immediate window:
? getNoun("ball valve 2in for green pump with gasket") ball valve ? getNoun("ball valve") ball valve ? getNoun("ball") ball
If your scenario is more complex and you need to use specific criteria to determine which words are the desired noun and qualifier, you would probably find use for the Regex COM class (see this topic for example).
EDIT: Based on the comments, I understand that positions are variable, and that it is acceptable to use the MS Word thesaurus as a reference. If the code will run in Microsoft Word, the following function will tell you whether or not a word is a noun:
Function is_noun(ByVal wrd As String) Dim s As Object, l As Variant is_noun = False Set s = SynonymInfo(wrd) Let l = s.PartOfSpeechList If s.MeaningCount <> 0 Then For i = LBound(l) To UBound(l) If l(i) = wdNoun Then is_noun = True End If Next End If End Function
If you are not running on MS Word (your tags suggest MS Excel) but MS Word is installed in the target system, then you can adapt the above code to use MS Word COM automation object.
Then you can extract the first noun, and the next word - if any -, from a sentence, with something like this
Function getNoun(ByVal sentence As String) getNoun = "" Dim wrds() As String wrds = Split(sentence) For i = LBound(wrds) To UBound(wrds) If is_noun(wrds(i)) Then getNoun = wrds(i) If i < UBound(wrds) Then getNoun = getNoun & " " & wrds(i + 1) End If Exit Function End If Next End Function
Notice, however, that with this you are trusting blindly in MS Word's word database and may get weird results if your sentences contain, for example, words that may be a verb or a noun depending on context. Also, the above example will use the default language of your setup of MS Word (it is possible to use a different one - if installed - by including a language parameter in
SynonymInfo
)这篇关于识别和提取名词和修饰语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!