识别和提取名词和修饰语 [英] Identify and extract noun and modifier

查看:169
本文介绍了识别和提取名词和修饰语的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

任何想法如何使用 VBA(excel)识别和提取名词和修饰符



示例: / p>

球阀2in用于带垫圈的绿色泵



应为: 球阀



任何帮助将不胜感激

解决方案

根据您期望的句子类型,有一些不同的方法。在你的例子中,你要提取的两个词在句子的开始处,并由空格分开。如果你期望这是永远的,那么你可以使用一些简单的东西,如

 函数getNoun(ByVal句子为String)
getNoun =
pos1 = InStr(1,句子,)'找到第一个空格
如果pos1 <= 0然后
getNoun = sentence'如果没有空格,那么假设只有名词
Exit Function
End If
pos2 = InStr(pos1 + 1,sentence,)'找到第二个空格
如果pos2 = 0然后
getNoun = sentence'如果没有第二个空格,那么假定只有名词和限定符
退出函数
End If

getNoun = Left(sentence, pos2 - 1)'如果有两个或更多的空格,得到第二个之前的所有字符
结束函数

在即时窗口中测试:

 ? getNoun(球阀2in绿色泵带垫圈)
球阀
? getNoun(球阀)
球阀
? getNoun(ball)
ball

如果您的场景更复杂,您需要使用特定标准来确定哪些词是所需的名词和限定词,您可能会发现用于正则表达式COM类(请参阅这个主题)。



编辑:基于在意见中,我明白职位是可变的,使用MS Word词典作为参考是可以接受的。如果代码将在Microsoft Word中运行,以下函数会告诉您一个单词是否是名词:

 函数is_noun (ByVal wrd As String)
Dim s As Object,l As Variant
is_noun = False
Set s = SynonymInfo(wrd)
Let l = s.PartOfSpeechList
如果s.MeaningCount<> 0然后
对于i = LBound(l)到UBound(l)
如果l(i)= wdNoun然后
is_noun = True
结束如果
下一个
结束如果
结束功能

如果您没有在MS Word建议MS Excel),但MS Word安装在目标系统中,那么您可以调整上述代码以使用MS Word COM自动化对象。



然后,您可以提取第一个名词和下一个单词 - 如果有的话,从句子中,这样的东西就像这样

 函数getNoun(ByVal sentence As String )
getNoun =
Dim wrds()As String
wrds = Split(sentence)
For i = LBound(wrds)To UBound(wrds)
If is_noun(wrds(i))然后
getNoun = wrds(i)
如果i < UBound(wrds)然后
getNoun = getNoun& & wrds(i + 1)
结束如果
退出函数
结束如果
下一个
结束函数
SynonymInfo 中包含语言参数)


Any idea how to Identify and extract noun and modifier using VBA (excel)

Example:

ball valve 2in for green pump with gasket

Should be: ball valve

Any help will be appreciated

解决方案

There are some different approaches, depending on the type of sentence you expect. In your example, the two words you want to extract are on the beginning of the sentence, and separated by whitespaces. If you expect this to be always the case, then you could use something simple as

Function getNoun(ByVal sentence As String)
    getNoun = ""
    pos1 = InStr(1, sentence, " ") 'find the first whitespace
    If pos1 <= 0 Then
        getNoun = sentence 'if no whitespace, then assume there is only the noun
        Exit Function
    End If
    pos2 = InStr(pos1 + 1, sentence, " ") 'find the second whitespace
    If pos2 <= 0 Then
        getNoun = sentence 'if no second whitespace, then assume there is only the noun and qualifier
        Exit Function
    End If

    getNoun = Left(sentence, pos2 - 1) 'if there are two or more spaces, get all chars before the second one
End Function

Tests in immediate window:

? getNoun("ball valve 2in for green pump with gasket")
ball valve
? getNoun("ball valve")
ball valve
? getNoun("ball")
ball

If your scenario is more complex and you need to use specific criteria to determine which words are the desired noun and qualifier, you would probably find use for the Regex COM class (see this topic for example).

EDIT: Based on the comments, I understand that positions are variable, and that it is acceptable to use the MS Word thesaurus as a reference. If the code will run in Microsoft Word, the following function will tell you whether or not a word is a noun:

 Function is_noun(ByVal wrd As String)
  Dim s As Object, l As Variant
  is_noun = False
  Set s = SynonymInfo(wrd)
  Let l = s.PartOfSpeechList
  If s.MeaningCount <> 0 Then
      For i = LBound(l) To UBound(l)
          If l(i) = wdNoun Then
              is_noun = True
          End If
      Next
  End If
End Function

If you are not running on MS Word (your tags suggest MS Excel) but MS Word is installed in the target system, then you can adapt the above code to use MS Word COM automation object.

Then you can extract the first noun, and the next word - if any -, from a sentence, with something like this

Function getNoun(ByVal sentence As String)
   getNoun = ""
   Dim wrds() As String
   wrds = Split(sentence)
   For i = LBound(wrds) To UBound(wrds)
        If is_noun(wrds(i)) Then
            getNoun = wrds(i)
            If i < UBound(wrds) Then
                getNoun = getNoun & " " & wrds(i + 1)
            End If
            Exit Function
        End If
    Next
End Function

Notice, however, that with this you are trusting blindly in MS Word's word database and may get weird results if your sentences contain, for example, words that may be a verb or a noun depending on context. Also, the above example will use the default language of your setup of MS Word (it is possible to use a different one - if installed - by including a language parameter in SynonymInfo)

这篇关于识别和提取名词和修饰语的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆