Initcap 跳过小于 4 个字符的单词 [英] Initcap skip words smaller than 4 characters

查看:38
本文介绍了Initcap 跳过小于 4 个字符的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 initcap 的问题.是否可以创建一个 initcap 语句来跳过小于 4 个字符的单词的更改.

I've got a question about initcap. Is it posibible to create an initcap statement to skip the change of words that are smaller than 4 characters.

因为我必须在完成 initcap 后将少于 4 个字符的单词改回正常状态.
所以我认为有可能创建一个函数/过程/触发器来跳过单词?该词用于Son En Breugel"之类的地名中,中间的En"必须变低.
字符串的第一个字母不需要更改,只需空格后的第一个小词(如字符串中间)

Because i have to change the words with less than 4 characters back to normal, after i've finished the initcap.
So i tought mabye there is a possibility to create an function/procedure/trigger that will just skip the words?? The words are used in an location name like "Son En Breugel", the "En" in the middle must become lower.
The first letter of the string doesn't need to change, only the first small words after a space(Like in the middle of the string)


我已经开始创建一个程序,但它需要一些微调


I've started to create an procedure, but it needs a bit of finetuning

*所有不需要用initcap改变的字符串都改回来
*Initcap 与 xDutch 格式

*All strings that don't need to be changed with initcap are changed back
*Initcap with xDutch format

--还需要想办法把'S改成's,我想我已经用这个脚本删除了记录?

--Still need to find way to change 'S into 's, i think i've deleted the record with this script?

有人可以帮助我吗??

create or replace PROCEDURE Location_Name_Routine IS
    BEGIN
        DELETE
            FROM Location
            WHERE Name LIKE '%[^0-9a-zA-Z]%';
        UPDATE Location
          set Name = nls_initcap(Name, 'NLS_SORT=xDutch');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' En',' en');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' Van',' van');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' De',' de');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' Den',' den');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' Over','over');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' Aan',' aan');
        UPDATE Location
            SET  Name = REGEXP_REPLACE(Name,' Bij',' bij');     
    END;    

推荐答案

根本问题可能没有简单的答案.我假设您正在尝试正确地将荷兰语地址大写,这个问题与 昨天的另一个问题.

There may not be a simple answer to the underlying question. I assume you are trying to properly capitalize addresses in Dutch and this question is related to this other question from yesterday.

结合问题,目前至少有三种特殊情况:

Combining the questions, there are at least three special cases so far:

'S GRAVENHAGE   =>  's Gravenhage
IJSLAND         =>  IJsland
SON EN BREUGEL  =>  Son en Breugel

INITCAP 甚至 NLS_INITCAP('...', 'NLS_SORT=xDutch') 都无法正确处理它们.在开始编码之前,您应该收集所有需求.这些是荷兰语大写的唯一规则,还是还有更多规则?

INITCAP and even NLS_INITCAP('...', 'NLS_SORT=xDutch') fail to properly handle them. Before you start coding you should collect all the requirements. Are these the only rules for Dutch capitalization, or are there many more?

到目前为止发布的答案可能有助于解决一个特定的异常.但是很可能您不能简单地组合正则表达式并全部解决它们.您可能希望在此处采用更自上而下的方法.

The answers posted so far may help to solve one specific exception. But chances are you cannot simply combine regular expressions and solve them all. You may want to take a more top-down approach here.

更新

基于 wolφi 的想法,可以通过使用所有现有名称来暴力破解问题.NLS_INITCAP 单独工作 95% 的时间.使用 此链接 电子表格中的 431 个名称可以建立一个包含所有 25 个例外情况的列表.

Based on wolφi's idead, it is possible to brute-force the problem by using all existing names. NLS_INITCAP alone works 95% of the time. Using the 431 names from the spreadsheet at this link it is possible to build a list of all 25 exceptional cases.

运行此语句一次以构建一个 DECODE 表达式来处理所有非平凡的情况:

Run this statement once to build a DECODE expression to handle all non-trivial cases:

--Build decode for UPDATE.
select
  --Start the decode
  'decode(upper(name),'||
  --List all the exceptions.  Single quotes are a mess, no way around it.
  listagg(
    --Upper case version to match
    ''''||upper(replace(column_value, '''', ''''''))||
    --Pre-defined init-capped version
    ''','''||replace(column_value, '''', '''''')||''''
    , ','||chr(10)
  )
  within group (order by column_value)
  ||
  --Default to NLS_INITCAP
  ',nls_initcap(name, ''NLS_SORT=xDutch''))'
from table(sys.odcivarchar2list('Bellingwedde','Menterwolde','Oldambt','Pekela','Stadskanaal','Veendam','Vlagtwedde','Appingedam','Delfzijl','Loppersum','Bedum','Ten Boer','Eemsmond','Groningen','Grootegast','Haren','Hoogezand-Sappemeer','Leek','De Marne','Marum','Slochteren','Winsum','Zuidhorn','Achtkarspelen','Ameland','het Bildt','Boarnsterhim','Dantumadiel','Dongeradeel','Ferwerderadiel','Franekeradeel','Harlingen','Kollumerland en Nieuwkruisland','Leeuwarden','Leeuwarderadeel','Littenseradiel','Menaldumadeel','Schiermonnikoog','Terschelling','Tytsjerksteradiel','Vlieland','Bolsward','Gaasterlân-Sleat','Lemsterland','Nijefurd','Sneek','Wûnseradiel','Wymbritseradiel','Heerenveen','Ooststellingwerf','Opsterland','Skarsterlân','Smallingerland','Weststellingwerf','Aa en Hunze','Assen','Midden-Drenthe','Noordenveld','Tynaarlo','Borger-Odoorn','Coevorden','Emmen','Hoogeveen','Meppel','Westerveld','De Wolden','Dalfsen','Hardenberg','Kampen','Ommen','Staphorst','Steenwijkerland','Zwartewaterland','Zwolle','Deventer','Olst-Wijhe','Raalte','Almelo','Borne','Dinkelland','Enschede','Haaksbergen','Hellendoorn','Hengelo','Hof van Twente','Losser','Oldenzaal','Rijssen-Holten','Tubbergen','Twenterand','Wierden','Apeldoorn','Barneveld','Ede','Elburg','Epe','Ermelo','Harderwijk','Hattem','Heerde','Nijkerk','Nunspeet','Oldebroek','Putten','Scherpenzeel','Voorst','Wageningen','Buren','Culemborg','Geldermalsen','Lingewaal','Maasdriel','Neder-Betuwe','Neerijnen','Tiel','West Maas en Waal','Zaltbommel','Aalten','Berkelland','Bronckhorst','Brummen','Doetinchem','Lochem','Montferland','Oost Gelre','Oude IJsselstreek','Winterswijk','Zutphen','Arnhem','Beuningen','Doesburg','Druten','Duiven','Groesbeek','Heumen','Lingewaard','Millingen aan de Rijn','Nijmegen','Overbetuwe','Renkum','Rheden','Rijnwaarden','Rozendaal','Ubbergen','Westervoort','Wijchen','Zevenaar','Almere','Dronten','Lelystad','Noordoostpolder','Urk','Zeewolde','Abcoude','Amersfoort','Baarn','De Bilt','Breukelen','Bunnik','Bunschoten','Eemnes','Houten','IJsselstein','Leusden','Loenen','Lopik','Maarssen','Montfoort','Nieuwegein','Oudewater','Renswoude','Rhenen','De Ronde Venen','Soest','Utrecht','Utrechtse Heuvelrug','Veenendaal','Vianen','Wijk bij Duurstede','Woerden','Woudenberg','Zeist','Andijk','Anna Paulowna','Drechterland','Enkhuizen','Harenkarspel','Den Helder','Hoorn','Koggenland','Medemblik','Niedorp','Opmeer','Schagen','Stede Broec','Texel','Wervershoof','Wieringen','Wieringermeer','Zijpe','Alkmaar','Bergen (NH.)','Heerhugowaard','Heiloo','Langedijk','Schermer','Beverwijk','Castricum','Heemskerk','Uitgeest','Velsen','Bloemendaal','Haarlem','Haarlemmerliede en Spaarnwoude','Heemstede','Zandvoort','Wormerland','Zaanstad','Aalsmeer','Amstelveen','Amsterdam','Beemster','Diemen','Edam-Volendam','Graft-De Rijp','Haarlemmermeer','Landsmeer','Oostzaan','Ouder-Amstel','Purmerend','Uithoorn','Waterland','Zeevang','Blaricum','Bussum','Hilversum','Huizen','Laren','Muiden','Naarden','Weesp','Wijdemeren','Hillegom','Kaag en Braassem','Katwijk','Leiden','Leiderdorp','Lisse','Noordwijk','Noordwijkerhout','Oegstgeest','Teylingen','Voorschoten','Zoeterwoude','''s-Gravenhage','Leidschendam-Voorburg','Pijnacker-Nootdorp','Rijswijk','Wassenaar','Zoetermeer','Delft','Midden-Delfland','Westland','Alphen aan den Rijn','Bergambacht','Bodegraven','Boskoop','Gouda','Nieuwkoop','Reeuwijk','Rijnwoude','Schoonhoven','Vlist','Waddinxveen','Albrandswaard','Barendrecht','Bernisse','Binnenmaas','Brielle','Capelle aan den IJssel','Cromstrijen','Dirksland','Goedereede','Hellevoetsluis','Korendijk','Krimpen aan den IJssel','Lansingerland','Maassluis','Middelharnis','Nederlek','Oostflakkee','Oud-Beijerland','Ouderkerk','Ridderkerk','Rotterdam','Rozenburg','Schiedam','Spijkenisse','Strijen','Vlaardingen','Westvoorne','Zuidplas','Alblasserdam','Dordrecht','Giessenlanden','Gorinchem','Graafstroom','Hardinxveld-Giessendam','Hendrik-Ido-Ambacht','Leerdam','Liesveld','Nieuw-Lekkerland','Papendrecht','Sliedrecht','Zederik','Zwijndrecht','Hulst','Sluis','Terneuzen','Borsele','Goes','Kapelle','Middelburg','Noord-Beveland','Reimerswaal','Schouwen-Duiveland','Tholen','Veere','Vlissingen','Bergen op Zoom','Breda','Drimmelen','Etten-Leur','Geertruidenberg','Halderberge','Moerdijk','Oosterhout','Roosendaal','Rucphen','Steenbergen','Woensdrecht','Zundert','Aalburg','Alphen-Chaam','Baarle-Nassau','Dongen','Gilze en Rijen','Goirle','Hilvarenbeek','Loon op Zand','Oisterwijk','Tilburg','Waalwijk','Werkendam','Woudrichem','Bernheze','Boekel','Boxmeer','Boxtel','Cuijk','Grave','Haaren','''s-Hertogenbosch','Heusden','Landerd','Lith','Maasdonk','Mill en Sint Hubert','Oss','Schijndel','Sint Anthonis','Sint-Michielsgestel','Sint-Oedenrode','Uden','Veghel','Vught','Asten','Bergeijk','Best','Bladel','Cranendonck','Deurne','Eersel','Eindhoven','Geldrop-Mierlo','Gemert-Bakel','Heeze-Leende','Helmond','Laarbeek','Nuenen, Gerwen en Nederwetten','Oirschot','Reusel-De Mierden','Someren','Son en Breugel','Valkenswaard','Veldhoven','Waalre','Beesel','Bergen (L.)','Gennep','Horst aan de Maas','Mook en Middelaar','Peel en Maas','Venlo','Venray','Echt-Susteren','Leudal','Maasgouw','Nederweert','Roerdalen','Roermond','Weert','Beek','Brunssum','Eijsden','Gulpen-Wittem','Heerlen','Kerkrade','Landgraaf','Maastricht','Margraten','Meerssen','Nuth','Onderbanken','Schinnen','Simpelveld','Sittard-Geleen','Stein','Vaals','Valkenburg aan de Geul','Voerendaal'))
where column_value <> nls_initcap(column_value, 'NLS_SORT=xDutch');

使用该语句的结果构建一个 UPDATE,如下所示:

Use the result from that statement to build an UPDATE like this:

--Update names to properly init-capped name, as defined by:
--http://epp.eurostat.ec.europa.eu/portal/page/portal/nuts_nomenclature/local_administrative_units
update location
set name = 
  decode(upper(name),'''S-GRAVENHAGE','''s-Gravenhage',
  '''S-HERTOGENBOSCH','''s-Hertogenbosch',
  'AA EN HUNZE','Aa en Hunze',
  'ALPHEN AAN DEN RIJN','Alphen aan den Rijn',
  'BERGEN (NH.)','Bergen (NH.)',
  'BERGEN OP ZOOM','Bergen op Zoom',
  'CAPELLE AAN DEN IJSSEL','Capelle aan den IJssel',
  'GILZE EN RIJEN','Gilze en Rijen',
  'HAARLEMMERLIEDE EN SPAARNWOUDE','Haarlemmerliede en Spaarnwoude',
  'HOF VAN TWENTE','Hof van Twente',
  'HORST AAN DE MAAS','Horst aan de Maas',
  'KAAG EN BRAASSEM','Kaag en Braassem',
  'KOLLUMERLAND EN NIEUWKRUISLAND','Kollumerland en Nieuwkruisland',
  'KRIMPEN AAN DEN IJSSEL','Krimpen aan den IJssel',
  'LOON OP ZAND','Loon op Zand',
  'MILL EN SINT HUBERT','Mill en Sint Hubert',
  'MILLINGEN AAN DE RIJN','Millingen aan de Rijn',
  'MOOK EN MIDDELAAR','Mook en Middelaar',
  'NUENEN, GERWEN EN NEDERWETTEN','Nuenen, Gerwen en Nederwetten',
  'PEEL EN MAAS','Peel en Maas',
  'SON EN BREUGEL','Son en Breugel',
  'VALKENBURG AAN DE GEUL','Valkenburg aan de Geul',
  'WEST MAAS EN WAAL','West Maas en Waal',
  'WIJK BIJ DUURSTEDE','Wijk bij Duurstede',
  'HET BILDT','het Bildt',
  nls_initcap(name, 'NLS_SORT=xDutch'));

这篇关于Initcap 跳过小于 4 个字符的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆