Sql,需要一个函数或查询,将名称解析为名字,姓氏和中间名: [英] Sql, need a function or query that would parse out the name into first name, last name and middle name:

查看:536
本文介绍了Sql,需要一个函数或查询,将名称解析为名字,姓氏和中间名:的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

命名中存在很多差异。我们需要在sql中进行查询,将其解析为头号,姓氏和中间名称和标题:



命名方差:

- ORDWAY,BRENDA



姓:BRENDA,名字:ORDWAY,中间名:NULL,标题:NULL



--METHENY,DORIS A



姓:METHENY,名字:DORIS,中间名:A,标题:NULL



--MARABELLA,LUCILLE C

姓:MARABELLA,名字:LUCILLE,中间名:C,标题:NULL



--BURROUGHS JR,JOSEPH

姓:BURROUGHS,名:JOSEPH,中间名:,标题:JR





- ZUNIGA,SLOAN ELIZABETH-ROSS

姓:ZUNIGA,名字:SLOAN,中间名:ELIZABETH-ROSS,标题:NULL





--ALDRIDGE III,VERNON G

姓:ALDRIDGE,名字:VERNON,中间名:G,标题:III



--ANDREWS-WATLEY,SAVANNAH

姓:ANDREWS-WATLEY ,名字:SAVANNAH,中间名:NULL,标题:NULL



--ASHELL BELL,CARRIE C

姓:ASHMORE BELL,名字:CARRIE,中间名:C,标题:NULL



--ATKINS ROBINSON,BARONETTE

姓:ATKINS ROBINSON,名字:BARONETTE,中间名:NULL,标题:NULL



--FARRELL II,BENJAMIN TARENCE

姓:FARRELL,名字:BENJAMIN,中间名: TARENCE,标题:II



-



我的尝试:



LTRIM(RTRIM(LEFT([ParticipantName],CHARINDEX(',',[ParticipantName]) - 1)))作为lastName,


$ LTRIM(RTRIM(LEFT(名字,CHARINDEX('',名字)-1)))作为FirstName,

SUBSTRING(名字,1,CHARINDEX时的情况) ('',名字)= 0然后LEN(名字),否则CHARINDEX('',名字)-1结束)作为FirstName,

there are alot of difference variance in the naming. We need a query in sql that will parse it out into first, last and middle name and title:

Naming Variance:
--ORDWAY,BRENDA

Lastname: BRENDA, Firstname: ORDWAY, Middle Name: NULL, Title: NULL

--METHENY,DORIS A

Lastname: METHENY, Firstname: DORIS, Middle Name: A, Title: NULL

--MARABELLA,LUCILLE C
Lastname: MARABELLA, Firstname: LUCILLE, Middle Name: C, Title: NULL

--BURROUGHS JR,JOSEPH
Lastname: BURROUGHS, Firstname: JOSEPH, Middle Name: , Title: JR


--ZUNIGA,SLOAN ELIZABETH-ROSS
Lastname: ZUNIGA, Firstname: SLOAN, Middle Name: ELIZABETH-ROSS , Title: NULL


--ALDRIDGE III,VERNON G
Lastname: ALDRIDGE, Firstname: VERNON, Middle Name: G , Title: III

--ANDREWS-WATLEY,SAVANNAH
Lastname: ANDREWS-WATLEY, Firstname: SAVANNAH, Middle Name:NULL , Title: NULL

--ASHMORE BELL,CARRIE C
Lastname: ASHMORE BELL, Firstname: CARRIE, Middle Name: C, Title: NULL

--ATKINS ROBINSON,BARONETTE
Lastname: ATKINS ROBINSON, Firstname: BARONETTE, Middle Name: NULL, Title: NULL

--FARRELL II,BENJAMIN TARENCE
Lastname: FARRELL, Firstname: BENJAMIN, Middle Name: TARENCE, Title: II

-

What I have tried:

LTRIM(RTRIM(LEFT([ParticipantName],CHARINDEX(',',[ParticipantName])-1))) as lastName,

LTRIM(RTRIM(LEFT(Firstname,CHARINDEX(' ',Firstname)-1)))as FirstName,
SUBSTRING(Firstname, 1 ,case when CHARINDEX(' ', Firstname ) = 0 then LEN(Firstname) else CHARINDEX(' ', Firstname) -1 end) as FirstName,

推荐答案

你找不到一个 - 或者至少你赢了没有找到一个没有大量标题的名单,所以你可以区分

--ALDRIDGE III,VERNON G

来自

--ASHORE BELL,CARRIE C



即便如此,并非所有文化都以与西方国家相同的方式列出姓氏。

你会对Richard Richard Francis Burton KCMG FRGS上尉做些什么?



如果你想准确地打破一个名字,你需要在输入点进行,而不是将其存储为单个字符串。
You won't find one - or at least you won't find one that doesn't involve a huge list of "titles" so you can distinguish
--ALDRIDGE III,VERNON G
from
--ASHMORE BELL,CARRIE C

And even then, not all cultures list "last name" in the same way Western countries do.
And what would you do with "Captain Sir Richard Francis Burton KCMG FRGS"?

If you want to "break out" a name accurately, you need to do it at the point of input, not when it's stored as a single string.


@OriginalGriff没有错,但我确实有一个类来执行此操作。它用于分解第三方提供的数据文件中的名称,然后尝试将这些名称与我们数据库中的名称(保持正确)相匹配。



它是不是用SQL编写的,我也不能与你分享,因为版权归我所工作的公司所有(尽管我写的是这样)。我可以分享我使用的技术......



我有几张桌子可以提供帮助:

- 可以使用的标题列表用过的。这一切都是小写的,包括非常尊敬和非常复兴之类的东西。我很幸运,因为这张表很小,并且基于我们的实际数据,而不是任何可能的数据。创建这样的表时,这些列表可能非常有用:

英语敬语 - 维基百科 [ ^ ]

标题列表 - 维基百科 [ ^ ],但我建议你限制你最有可能的标题。



- 可以后缀的列表使用。也是小写的,包括高级,iii,obe之类的东西。看看后缀(名称) - 维基百科 [ ^ ]



- 一系列的事情复合名称,例如von,van,van der,della



- 名字的替代品清单(你可能不需要这个)不试图匹配事物) - 例如Alex代表Alexander或Alexandra,Rob,Bob,Robbie,Robby代表Robert



- 有其他处理公司名称的表格。





我使用的方法是蛮力...



1.删除所有标点符号 - ,,。,;。如果我删除一个连字符,我会记下这个,所以如果需要,我可以把它放回去。



2.检查多个名字(例如先生和夫人)



3.删除所有后缀 - 我在每个名称的末尾添加一个空格,以便后缀被每端的空格包围 - 否则你最终可能会删除正版部分名称



4.删除标题 - 如上所述



5.应对任何化合物名称。



6.使用空格作为分隔符将零件分开



7.重建零件必要的名称 - 例如是否有连字符,我们认为它是姓氏或姓氏。注意安妮 - 玛丽·迪克森 - 布朗这样的事情



这可能会让你觉得很慢,尽管公平地认为这是最需要的匹配部分的时间。最重要的是,正如@OriginalGriff所暗示的那样,它在100%的时间内并不准确。特别是在你的情况下,因为没有足够的区分姓氏和姓氏 - 尽管如果模式总是在第一个逗号之后加上名字你可以利用这个事实。



无论如何,我认为SQL不是前进的方法。
@OriginalGriff is not wrong, but I do have a class that does this. It is used to break down names from data files supplied by third parties, I then try to match those names to names (held correctly) on our databases.

It is not written in SQL, I also cannot share it with you because copyright belongs with the company I work for (despite the fact I wrote it). I can share the techniques I used though...

I have several tables that assist:
- A list of titles that can be used. This is all in lower case and includes things like "the very reverend" and "the very revd". I am lucky in that this table is quite small and based on our actual data rather than anything possible. These lists might prove useful when creating such a table:
English honorifics - Wikipedia[^]
List of titles - Wikipedia[^], but I advise you to restrict what is in there to your most likely titles.

- A list of suffixes that can be used. Also in lowercase and including things like "senior", "iii", "obe". Have a look at Suffix (name) - Wikipedia[^]

- A list of things for compound names e.g. "von", "van", "van der", "della"

- A list of alternatives for names (you probably don't need this as you are not trying to match things) - e.g. "Alex" for "Alexander" or "Alexandra", "Rob", "Bob", "Robbie", "Robby" for "Robert"

- There are other tables that deal with corporate names.


The method I use is "brute force"...

1. Remove any punctuation - ",", ".", ";". If I remove a hyphen I make a note of that so I can put it back if need be later.

2. Check for multiple names (e.g. Mr and Mrs)

3. Remove any suffixes - I add a space to the end of each name so that suffixes are surrounded by a space at each end - otherwise you can end up removing genuine parts of names

4. Remove the title - as above

5. Cope with any compound names.

6. Separate out the parts using space as a separator

7. Rebuild the parts of the name as necessary - e.g. was there a hyphen and do we think it's in the surname or the forename(s). Watch out for things like "Anne-Marie Dixon-Brown"

This can be painfully slow as you can imagine, although to be fair it's the matching part that takes most of the time. Most importantly, as intimated by @OriginalGriff, it is not going to be accurate 100% of the time. Especially so in your case as there is not enough to distinguish between forenames and surnames - although if the pattern is always to put forenames after the first comma you could utilise that fact.

Whichever way, I don't think SQL is the way forward with this.


这篇关于Sql,需要一个函数或查询,将名称解析为名字,姓氏和中间名:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆