SQL:从全名字段中解析名字、中间名和姓氏 [英] SQL: parse the first, middle and last name from a fullname field

查看:32
本文介绍了SQL:从全名字段中解析名字、中间名和姓氏的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 SQL 从全名字段中解析名字、中间名和姓氏?

How do I parse the first, middle, and last name out of a fullname field with SQL?

我需要尝试匹配与全名不直接匹配的名称.我希望能够获取全名字段并将其分解为名字、中间名和姓氏.

I need to try to match up on names that are not a direct match on full name. I'd like to be able to take the full name field and break it up into first, middle and last name.

数据不包括任何前缀或后缀.中间名是可选的.数据格式为First Middle Last".

The data does not include any prefixes or suffixes. The middle name is optional. The data is formatted 'First Middle Last'.

我对一些实用的解决方案很感兴趣,可以让我完成 90% 的工作.如前所述,这是一个复杂的问题,所以我会单独处理特殊情况.

I'm interested in some practical solutions to get me 90% of the way there. As it has been stated, this is a complex problem, so I'll handle special cases individually.

推荐答案

这是一个自包含的示例,其中包含易于操作的测试数据.

Here is a self-contained example, with easily manipulated test data.

在此示例中,如果您的名称包含三个以上的部分,则所有额外"内容都将放入 LAST_NAME 字段中.标识为titles"的特定字符串除外,例如DR"、MRS"和MR".

With this example, if you have a name with more than three parts, then all the "extra" stuff will get put in the LAST_NAME field. An exception is made for specific strings that are identified as "titles", such as "DR", "MRS", and "MR".

如果缺少中间名,那么您只会得到 FIRST_NAME 和 LAST_NAME(MIDDLE_NAME 将为 NULL).

If the middle name is missing, then you just get FIRST_NAME and LAST_NAME (MIDDLE_NAME will be NULL).

你可以将它粉碎成一个巨大的嵌套的 SUBSTRING 块,但可读性已经够难了,因为它在 SQL 中这样做.

You could smash it into a giant nested blob of SUBSTRINGs, but readability is hard enough as it is when you do this in SQL.

编辑——处理以下特殊情况:

1 - NAME 字段为 NULL

2 - NAME 字段包含前导/尾随空格

3 - NAME 字段在名称中有 > 1 个连续空格

4 - NAME 字段只包含名字

5 - 将原始全名作为单独的列包含在最终输出中,以提高可读性

6 - 将特定的前缀列表作为单独的标题"列处理

SELECT
  FIRST_NAME.ORIGINAL_INPUT_DATA
 ,FIRST_NAME.TITLE
 ,FIRST_NAME.FIRST_NAME
 ,CASE WHEN 0 = CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
       THEN NULL  --no more spaces?  assume rest is the last name
       ELSE SUBSTRING(
                       FIRST_NAME.REST_OF_NAME
                      ,1
                      ,CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)-1
                     )
       END AS MIDDLE_NAME
 ,SUBSTRING(
             FIRST_NAME.REST_OF_NAME
            ,1 + CHARINDEX(' ',FIRST_NAME.REST_OF_NAME)
            ,LEN(FIRST_NAME.REST_OF_NAME)
           ) AS LAST_NAME
FROM
  (  
  SELECT
    TITLE.TITLE
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)
         THEN TITLE.REST_OF_NAME --No space? return the whole thing
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,1
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)-1
                       )
    END AS FIRST_NAME
   ,CASE WHEN 0 = CHARINDEX(' ',TITLE.REST_OF_NAME)  
         THEN NULL  --no spaces @ all?  then 1st name is all we have
         ELSE SUBSTRING(
                         TITLE.REST_OF_NAME
                        ,CHARINDEX(' ',TITLE.REST_OF_NAME)+1
                        ,LEN(TITLE.REST_OF_NAME)
                       )
    END AS REST_OF_NAME
   ,TITLE.ORIGINAL_INPUT_DATA
  FROM
    (   
    SELECT
      --if the first three characters are in this list,
      --then pull it as a "title".  otherwise return NULL for title.
      CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,1,3)))
           ELSE NULL
           END AS TITLE
      --if you change the list, don't forget to change it here, too.
      --so much for the DRY prinicple...
     ,CASE WHEN SUBSTRING(TEST_DATA.FULL_NAME,1,3) IN ('MR ','MS ','DR ','MRS')
           THEN LTRIM(RTRIM(SUBSTRING(TEST_DATA.FULL_NAME,4,LEN(TEST_DATA.FULL_NAME))))
           ELSE LTRIM(RTRIM(TEST_DATA.FULL_NAME))
           END AS REST_OF_NAME
     ,TEST_DATA.ORIGINAL_INPUT_DATA
    FROM
      (
      SELECT
        --trim leading & trailing spaces before trying to process
        --disallow extra spaces *within* the name
        REPLACE(REPLACE(LTRIM(RTRIM(FULL_NAME)),'  ',' '),'  ',' ') AS FULL_NAME
       ,FULL_NAME AS ORIGINAL_INPUT_DATA
      FROM
        (
        --if you use this, then replace the following
        --block with your actual table
              SELECT 'GEORGE W BUSH' AS FULL_NAME
        UNION SELECT 'SUSAN B ANTHONY' AS FULL_NAME
        UNION SELECT 'ALEXANDER HAMILTON' AS FULL_NAME
        UNION SELECT 'OSAMA BIN LADEN JR' AS FULL_NAME
        UNION SELECT 'MARTIN J VAN BUREN SENIOR III' AS FULL_NAME
        UNION SELECT 'TOMMY' AS FULL_NAME
        UNION SELECT 'BILLY' AS FULL_NAME
        UNION SELECT NULL AS FULL_NAME
        UNION SELECT ' ' AS FULL_NAME
        UNION SELECT '    JOHN  JACOB     SMITH' AS FULL_NAME
        UNION SELECT ' DR  SANJAY       GUPTA' AS FULL_NAME
        UNION SELECT 'DR JOHN S HOPKINS' AS FULL_NAME
        UNION SELECT ' MRS  SUSAN ADAMS' AS FULL_NAME
        UNION SELECT ' MS AUGUSTA  ADA   KING ' AS FULL_NAME      
        ) RAW_DATA
      ) TEST_DATA
    ) TITLE
  ) FIRST_NAME

这篇关于SQL:从全名字段中解析名字、中间名和姓氏的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆