如何分离的&QUOT字;句子QUOT;用空格? [英] How to separate words in a "sentence" with spaces?
问题描述
展望中自动创建的JasperServer域。域是用于创建即席报表数据的说法。列的名称必须以可读的方式psented用户$ P $。
Looking to automate creating Domains in JasperServer. Domains are a "view" of data for creating ad hoc reports. The names of the columns must be presented to the user in a human readable fashion.
有来自该组织在理论上要包括在报表中的数据超过2000件可能。
There are over 2,000 possible pieces of data from which the organization could theoretically want to include on a report. The data are sourced from non-human-friendly names such as:
payperiodmatch code
labordistribution codedesc
dependentrelationship actionendoption
actionendoptiondesc地址类型
addresstypedesc historytype
psaddresstype角色名
bankaccountstatus
bankaccountstatusdesc bankaccounttype
bankaccounttypedesc beneficiaryamount
beneficiaryclass beneficiarypercent
benefitsubclass beneficiaryclass
beneficiaryclassdesc benefitaction code
benefitaction codedesc
benefitagecontrol
benefitagecontroldesc
ageconrolagelimit
ageconrolnoticeperiod
payperiodmatchcode labordistributioncodedesc dependentrelationship actionendoption actionendoptiondesc addresstype addresstypedesc historytype psaddresstype rolename bankaccountstatus bankaccountstatusdesc bankaccounttype bankaccounttypedesc beneficiaryamount beneficiaryclass beneficiarypercent benefitsubclass beneficiaryclass beneficiaryclassdesc benefitactioncode benefitactioncodedesc benefitagecontrol benefitagecontroldesc ageconrolagelimit ageconrolnoticeperiod
你会如何自动这样的名称更改为:
Question
How would you automatically change such names to:
- 支付周期匹配code
- 劳动力分布code递减
- 的依赖关系
-
使用谷歌的<一个href=\"http://www.google.co.uk/search?q=caseaction$c$c&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla%3aen-US%3aofficial&client=firefox-a#sclient=psy&hl=en&client=firefox-a&rls=org.mozilla%3aen-US%3Aofficial&q=labordistribution$c$cdesc&aq=f&aqi=&aql=&oq=labordistribution$c$cdesc&gs_rfai=&pbx=1&fp=1&bav=on.2,or.r_gc.r_pw.&cad=b\"相对=nofollow>您是不是要找引擎,但我认为它违反了他们的服务条款:
Use Google's Did you mean engine, however I think it violates their TOS:
猞猁突降«网址»| grep的你的意思是| AWK ...
任何语言是好的,但文本解析器如Perl很可能会非常适合。 (列名英语只)。
Any language is fine, but text parsers such as Perl would probably be well-suited. (The column names are English-only.)
我们的目标是在不破拆开的话100%完美;下面的结果是可以接受的:
The goal is not 100% perfection in breaking words apart; the following outcome is acceptable:
- enrollmenteffectivedate - >注册生效日期
- enrollmentenddate - >登记男人往往日期
- enrollmentrequirementset - >入学要求设置
无论什么时候,人类就需要仔细检查的结果和纠正很多。削了一组2000结果下降到600编辑将是一个巨大的节省时间。要注视的部分的有多种可能性(例如,therapistname)就是完全错过了点。案件
No matter what, a human will need to double-check the results and correct many. Whittling a set of 2,000 results down to 600 edits would be a dramatic time savings. To fixate on some cases having multiple possibilities (e.g., therapistname) is to miss the point altogether.
推荐答案
有时,的暴力破解是可以接受的:
Sometimes, bruteforcing is acceptable:
#!/usr/bin/perl
use strict; use warnings;
use File::Slurp;
my $dict_file = '/usr/share/dict/words';
my @identifiers = qw(
payperiodmatchcode labordistributioncodedesc dependentrelationship
actionendoption actionendoptiondesc addresstype addresstypedesc
historytype psaddresstype rolename bankaccountstatus
bankaccountstatusdesc bankaccounttype bankaccounttypedesc
beneficiaryamount beneficiaryclass beneficiarypercent benefitsubclass
beneficiaryclass beneficiaryclassdesc benefitactioncode
benefitactioncodedesc benefitagecontrol benefitagecontroldesc
ageconrolagelimit ageconrolnoticeperiod
);
my @mydict = qw( desc );
my $pat = join('|',
map quotemeta,
sort { length $b <=> length $a || $a cmp $b }
grep { 2 < length }
(@mydict, map { chomp; $_ } read_file $dict_file)
);
my $re = qr/$pat/;
for my $identifier ( @identifiers ) {
my @stack;
print "$identifier : ";
while ( $identifier =~ s/($re)\z// ) {
unshift @stack, $1;
}
# mark suspicious cases
unshift @stack, '*', $identifier if length $identifier;
print "@stack\n";
}
输出:
payperiodmatchcode : pay period match code
labordistributioncodedesc : labor distribution code desc
dependentrelationship : dependent relationship
actionendoption : action end option
actionendoptiondesc : action end option desc
addresstype : address type
addresstypedesc : address type desc
historytype : history type
psaddresstype : * ps address type
rolename : role name
bankaccountstatus : bank account status
bankaccountstatusdesc : bank account status desc
bankaccounttype : bank account type
bankaccounttypedesc : bank account type desc
beneficiaryamount : beneficiary amount
beneficiaryclass : beneficiary class
beneficiarypercent : beneficiary percent
benefitsubclass : benefit subclass
beneficiaryclass : beneficiary class
beneficiaryclassdesc : beneficiary class desc
benefitactioncode : benefit action code
benefitactioncodedesc : benefit action code desc
benefitagecontrol : benefit age control
benefitagecontroldesc : benefit age control desc
ageconrolagelimit : * ageconrol age limit
ageconrolnoticeperiod : * ageconrol notice period
又见曾经是软件工程的的一大壮举拼写检查。
这篇关于如何分离的&QUOT字;句子QUOT;用空格?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!