从文本中提取所需的关键字 [英] Extract required keywords from text

查看:70
本文介绍了从文本中提取所需的关键字的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

我有这样的文字,格式如下.

Hi everyone,

I have texts like this and the formats are given below.

Salary is 3.6L PA
Salary is 3.5 LPA
Salary is 30,000KPM
Salary is 30,000 KPM
Experience: 3-5years
Experience: 3+ years




现在,我需要找到3.5或30,000之类的薪水以及3年的经验或最低经验.如果有空间,那么体验很好,但是薪水不起作用.但是,如果经验"3"和"+"没有位置,就无法获得结果.

任何人都可以请我建议一下如何获得薪水和经验的逻辑.

唯一的条件是薪金,金额始终在同一行
和经验和价值也将在同一行.

提前谢谢.


这是我的经验示例代码.




Now I need to find Salary like 3.5 or 30,000 and Experience or Minimum experience 3years. If there is space then the experience is working fine but Salary is not working. But if experience "3" and "+" have no place with in it, cant get the result.

Can any one please suggest me the logic to how to get those for both salary and experience.

The only Condition is Salary and the amount will always be in same line
and Experience and value will also be in same line.

Thanks in advance.


This is my sample code for experience.

if (emailInLowerCase.Contains(KeyWords[i].ToLower()))
{
   index_mime = emailInLowerCase.IndexOf(KeyWords[i].ToLower(), 0);
   if (index_mime != -1)
   {
    index_Termination = EmailBodyForKeyWords.IndexOf("\r\n", index_mime + KeyWords[i].Length + 2);
        string _FetchExperience = EmailBodyForKeyWords.Substring(index_mime + KeyWords[i].Length + 2, index_Termination - (index_mime + KeyWords[i].Length + 2)).Trim();
        string[] ExperienceStructure = _FetchExperience.Split(' ');
        if (ExperienceStructure.Length > 0)
        {
            int exp1 = 0;
                int exp2 = 0;
                for (int j = 0; j < ExperienceStructure.Length; j++)
                {
                    int Num;
                        bool isNum = int.TryParse(ExperienceStructure[j].Trim(), out Num);
                        if (isNum)
                        {
                            if (exp1 == 0)
                                {
                                    exp1 = Convert.ToInt32(ExperienceStructure[j].Trim());
                                }
                                else
                                {
                                        exp2 = Convert.ToInt32(ExperienceStructure[j].Trim());
                                }
                                if (exp1 < exp2)
                                {
                                        ExperienceFrom = exp1;
                                        ExperienceTo = exp2;
                                }
                                else
                                {
                                        ExperienceFrom = exp2;
                                        ExperienceTo = exp1;
                                }
                                if ((ExperienceFrom != 0 && ExperienceTo == 0) || (ExperienceTo != 0 && ExperienceFrom == 0))
                                {
                                    if (ExperienceFrom != 0)
                                        {
                                            ExperienceTo = ExperienceFrom;
                                        }
                                        else
                                        {
                                            ExperienceFrom = ExperienceTo;
                                        }
                                }
                         }
                   }
          }
    }
}

推荐答案

您是否考虑过使用正则表达式!

薪水
.* Salary.* \ s(?< salary> \ d +(?:,\ d +)*(?:\.\ d +)?).*

体验
.* Experience.* \ s(?< experience> \ d + \ s *(?:\ + |(?:\ s *-\ s * \ d +)?)).*

从这个匹配结果中,您可以轻松提取数据.</experience></salary>
Did you think of using Regular Expressions!

For salary
.*Salary.*\s(?<salary>\d+(?:,\d+)*(?:\.\d+)?).*

For experience
.*Experience.*\s(?<experience>\d+\s*(?:\+|(?:\s*-\s*\d+)?)).*

From this match results you can easily extract data.</experience></salary>


我没有完全研究代码.
我发现您正在尝试用空格分割整个字符串.
取而代之的是,您可以尝试获取 Experience::(冒号)的索引,例如 X years的索引,说 Y .然后从 X Y 取主字符串的子字符串.然后,您将获得正确的体验.您可以将-(连字符)或点或介于两者之间的任何内容替换为所需的任何内容.

进行一些类似的逻辑查找索引,并在出现Salary情况下采用子字符串.
I didnt study the code completely.
What i found was you were trying to split the entire string with space.
Instead of it, you can try taking the index of Experience: or :(colon), say X and the index of years, say Y. Then take the substring of the main string starting from X up to Y. THen you will get the correct Experience. You can replace the -(hyphen) or dot or any thing in between to any thing you need.

Do some similar logic of finding the index and taking the substring in case of Salary also.


这篇关于从文本中提取所需的关键字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆