Java名称解析库? [英] Java name parse library?

查看:73
本文介绍了Java名称解析库?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一个功能类似于Perl Lingua :: EN :: NameParse 模块。
基本上,我想解析像'先生'这样的字符串Bob R. Smith'成为前缀,名字,姓氏和名称后缀组件。谷歌在找到这样的东西方面没有多少帮助,如果可能的话,我宁愿不自己动手。有人知道OSS Java库可以以复杂的方式做到这一点吗?

I'm searching for a library similar in functionality to the Perl Lingua::EN::NameParse module. Essentially, I'd like to parse strings like 'Mr. Bob R. Smith' into prefix, first name, last name, and name suffix components. Google hasn't been much help in finding something like this and I'd prefer not to roll my own if possible. Anyone know of a OSS Java library that can do this in a sophisticated way?

推荐答案

我简直不敢相信某人没有'为此共享了一个库 - 我在github中查找了一个javascript名称解析器,可以很容易地转换为java: https: //github.com/joshfraser/JavaScript-Name-Parser

I just can't believe someone hasn't shared a library for this - well I looked in github and there's a javascript name parser that could be easily translated to java: https://github.com/joshfraser/JavaScript-Name-Parser

我还修改了其中一个答案中的代码,以便更好地工作并包含在内测试用例:

I also modified the code in one of the answers to work a little better and have included a test case:

import java.util.ArrayList;
import java.util.List;

import org.apache.commons.lang.StringUtils;

public class NameParser {
    private String firstName = "";
    private String lastName = "";
    private String middleName = "";
    private List<String> middleNames = new ArrayList<String>();
    private List<String> titlesBefore = new ArrayList<String>();
    private List<String> titlesAfter = new ArrayList<String>();
    private String[] prefixes = { "dr", "mr", "ms", "atty", "prof", "miss", "mrs" };
    private String[] suffixes = { "jr", "sr", "ii", "iii", "iv", "v", "vi", "esq", "2nd", "3rd", "jd", "phd",
            "md", "cpa" };

    public NameParser() {
    }

    public NameParser(String name) {
        parse(name);
    }

    private void reset() {
        firstName = lastName = middleName = "";
        middleNames = new ArrayList<String>();
        titlesBefore = new ArrayList<String>();
        titlesAfter = new ArrayList<String>();
    }

    private boolean isOneOf(String checkStr, String[] titles) {
        for (String title : titles) {
            if (checkStr.toLowerCase().startsWith(title))
                return true;
        }
        return false;
    }

    public void parse(String name) {
        if (StringUtils.isBlank(name))
            return;
        this.reset();
        String[] words = name.split(" ");
        boolean isFirstName = false;

        for (String word : words) {
            if (StringUtils.isBlank(word))
                continue;
            if (word.charAt(word.length() - 1) == '.') {
                if (!isFirstName && !this.isOneOf(word, prefixes)) {
                    firstName = word;
                    isFirstName = true;
                } else if (isFirstName) {
                    middleNames.add(word);
                } else {
                    titlesBefore.add(word);
                }
            } else {
                if (word.endsWith(","))
                    word = StringUtils.chop(word);
                if (isFirstName == false) {
                    firstName = word;
                    isFirstName = true;
                } else {
                    middleNames.add(word);
                }
            }
        }
        if (middleNames.size() > 0) {
            boolean stop = false;
            List<String> toRemove = new ArrayList<String>();
            for (int i = middleNames.size() - 1; i >= 0 && !stop; i--) {
                String str = middleNames.get(i);
                if (this.isOneOf(str, suffixes)) {
                    titlesAfter.add(str);
                } else {
                    lastName = str;
                    stop = true;
                }
                toRemove.add(str);
            }
            if (StringUtils.isBlank(lastName) && titlesAfter.size() > 0) {
                lastName = titlesAfter.get(titlesAfter.size() - 1);
                titlesAfter.remove(titlesAfter.size() - 1);
            }
            for (String s : toRemove) {
                middleNames.remove(s);
            }
        }
    }

    public String getFirstName() {
        return firstName;
    }

    public String getLastName() {
        return lastName;
    }

    public String getMiddleName() {
        if (StringUtils.isBlank(this.middleName)) {
            for (String name : middleNames) {
                middleName += (name + " ");
            }
            middleName = StringUtils.chop(middleName);
        }
        return middleName;
    }

    public List<String> getTitlesBefore() {
        return titlesBefore;
    }

    public List<String> getTitlesAfter() {
        return titlesAfter;
    }

}

测试用例:

import junit.framework.Assert;

import org.junit.Test;

public class NameParserTest {

    private class TestData {
        String name;

        String firstName;
        String lastName;
        String middleName;

        public TestData(String name, String firstName, String middleName, String lastName) {
            super();
            this.name = name;
            this.firstName = firstName;
            this.lastName = lastName;
            this.middleName = middleName;
        }

    }

    @Test
    public void test() {

        TestData td[] = { new TestData("Henry \"Hank\" J. Fasthoff IV", "Henry", "\"Hank\" J.", "Fasthoff"),
                new TestData("April A. (Caminez) Bentley", "April", "A. (Caminez)", "Bentley"),
                new TestData("fff lll", "fff", "", "lll"),
                new TestData("fff mmmmm lll", "fff", "mmmmm", "lll"),
                new TestData("fff mmm1      mm2 lll", "fff", "mmm1 mm2", "lll"),
                new TestData("Mr. Dr. Tom Jones", "Tom", "", "Jones"),
                new TestData("Robert P. Bethea Jr.", "Robert", "P.", "Bethea"),
                new TestData("Charles P. Adams, Jr.", "Charles", "P.", "Adams"),
                new TestData("B. Herbert Boatner, Jr.", "B.", "Herbert", "Boatner"),
                new TestData("Bernard H. Booth IV", "Bernard", "H.", "Booth"),
                new TestData("F. Laurens \"Larry\" Brock", "F.", "Laurens \"Larry\"", "Brock"),
                new TestData("Chris A. D'Amour", "Chris", "A.", "D'Amour") };

        NameParser bp = new NameParser();
        for (int i = 0; i < td.length; i++) {
            bp.parse(td[i].name);
            Assert.assertEquals(td[i].firstName, bp.getFirstName());
            Assert.assertEquals(td[i].lastName, bp.getLastName());
            Assert.assertEquals(td[i].middleName, bp.getMiddleName());
        }
    }

}

这篇关于Java名称解析库?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆