Perl - 在关联数组中查找并保存单词和单词上下文 [英] Perl - find and save in an associative array word and word context

查看:20
本文介绍了Perl - 在关联数组中查找并保存单词和单词上下文的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个这样的数组(它只是一个小小的概述,但它有 2000 多行这样的):

I have an array like this (it's just a little overview but it has 2000 and more lines like this):

@list = (
        "affaire,chose,question",
        "cause,chose,matière",
);

我想要这个输出:

%te = (
affaire => "chose", "question",
chose => "affaire", "question", "cause", "matière", 
question => "affaire", "chose",
cause => "chose", "matière",
matière => "cause", "chose"
);

我已经创建了这个脚本,但效果不佳,而且我认为它太复杂了..

I've created this script but it doesn't work very well and I think is too much complicated..

use Data::Dumper;
@list = (
        "affaire,chose,question",
        "cause,chose,matière",
);

%te;

for ($a = 0; $a < @list; $a++){
    @split_list = split (/,/,$list[$a]);
}

foreach $elt (@split_list){
print "SPLIT ELT : $split_list[$elt]
";

for ($i = 0; $i < @list; $i++){

    $test = $list[$i]; #$test = "affaire,chose,question"

    if (exists $te{$split_list[$elt]}){ #if exists affaire in %te

        @t = split (/,/,$test); # @t = affaire chose question
        print "T : @t
";

        @temp = grep(!/$split_list[$elt]/, @t); 
        print "GREP : @temp
";#@temp = chose question

        @fin = join(', ', @temp); #@fin = chose, question;

        for ($k = 0; $k < @fin; $k++){
            $te{$split_list[$elt]} .= $fin[$k]; #affaire => chose, question
        }

    }
    else {

                @t = split (/,/,$test); # @t = affaire chose question
        print "T : @t
";

        @temp = grep(!/$split_list[$elt]/, @t); 
        print "GREP : @temp
";#@temp = chose question

        @fin = join(', ', @temp); #@fin = chose, question;

        for ($k = 0; $k < @fin; $k++){
                $te{$split_list[$elt]} = $fin[$k];
                }
    }
}

}



print Dumper \%te;

输出:

SPLIT ELT : cause
T : affaire chose question
GREP : affaire chose question
T : cause chose matière
GREP : chose matière
SPLIT ELT : cause
T : affaire chose question
GREP : affaire chose question
T : cause chose matière
GREP : chose matière
SPLIT ELT : cause
T : affaire chose question
GREP : affaire chose question
T : cause chose matière
GREP : chose matière
$VAR1 = {
          'cause' => 'affaire, chose, questionchose, matièreaffaire, chose, questionchose, matièreaffaire, chose, questionchose, matière'
        };

推荐答案

认为我明白您要做什么:索引单词之间的语义链接,然后是同义词列表.我对么?:-)

I think I see what you're trying to do: index semantic links between words followed by lists of synonyms. Am I correct? :-)

如果一个词出现在多个同义词列表中,那么您可以为该词创建一个散列条目,以该词作为键,并使用它最初是同义词的关键字作为值......或类似的东西.使用数组的散列 - 如@Lee Duhem 的解决方案 - 您可以获得每个关键字的同义词列表(数组).这是一种常见的模式.不过,您最终会得到很多哈希条目.

Where a word appears in more than one synonym list, then for that word you create a hash entry with the word as a key and using the keywords for which it was originally a synonym as values ... or something like that. Using a hash of arrays - as in the solution by @Lee Duhem - you get a list (array) of synonyms for each key word. This is a common pattern. You do end up with a lot of hash entries though.

我一直在玩一个由 @miygawa 编写的简洁模块,名为 Hash::MultiValue 采用不同的方法来访问与每个散列键关联的值列表:多值散列.一些不错的功能是,您可以从多值散列动态创建数组引用的散列,展平"散列,编写回调以使用 ->each() 方法,和其他整洁的东西,所以它非常灵活.我相信该模块没有依赖项(除了用于测试).另外它是由@miyagawa(和其他贡献者)提供的,所以使用它和阅读它对你有好处:-)

I've been playing with a neat module by @miygawa called Hash::MultiValue that takes a different approach to accessing a list of values associated with each hash key: multi-value hash. A few nice features are that you can create hash of array references on the fly from the multi-value hash, "flatten" the hash, write callbacks to go with the ->each() method, and other neat things so it's pretty flexible. I believe the module has no dependencies (other than for testing). Plus it's by @miyagawa (and other contributors) so using it and reading it is good for you :-)

我不是专家,我不确定它是否适合您的需求 - 作为 Lee 方法的一种变体,您可能会遇到以下情况:

I'm no expert and I'm not sure it's appropriate for what you want - as a variation on Lee's approach you might have something like:

#!/usr/bin/env perl
use strict;
use warnings;
use Hash::MultiValue;

my $words_hash = Hash::MultiValue->new();

# set up the mvalue hash
for my $words (<DATA>) {
  my @synonyms = split (',' , $words) ; 
  $words_hash->add( shift @synonyms => (@synonyms[0..$#synonyms]) ) ;
};

for my $key (keys %{ $words_hash } ) {
  print "$key --> ", join(", ",  $words_hash->get_all($key)) ;
};

print "
";

sub synonmize {
  my $bonmot = shift;
  my @bonmot_syns ;

  # check key "$bonmot" for word to search and show values
  push @bonmot_syns , $words_hash->get_all($bonmot);

  # now grab values but leave out synonym's synonyms
  foreach (keys %{ $words_hash } ) {
    if ($_ !~ /$bonmot/ && grep {/$bonmot/} $words_hash->get_all($_)) {
      push @bonmot_syns, grep {!/$bonmot/} $words_hash->get_all($_);
    }
  }

  # show the keys with values containing target word
  $words_hash->each(
    sub { push @bonmot_syns,  $_[0] if grep /$bonmot/ ,  @_[1..$#_] ; }
  );

  chomp @bonmot_syns ;
  print "synonymes pour "$bonmot": @bonmot_syns 
" ;
}

# find synonyms 
synonmize("chose");
synonmize("truc");
synonmize("matière");

__DATA__
affaire,chose,question
cause,chose,matière
chose,truc,bidule
fille,demoiselle,femme,dame

输出:

fille --> demoiselle, femme, dame
affaire --> chose, question
cause --> chose, matière
chose --> truc, bidule

synonymes pour "chose": truc bidule question matière affaire cause 
synonymes pour "truc": bidule chose 
synonymes pour "matière": chose cause

Tie::Hash::MultiValue 是另一种选择.感谢@Lee 提供快速清洁的解决方案 :-)

Tie::Hash::MultiValue is another alternative. Kudos to @Lee for a quick clean solution :-)

这篇关于Perl - 在关联数组中查找并保存单词和单词上下文的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆