如何计算Ruby字符串中唯一的多个单词? [英] How do I count unique multiple words in a Ruby string?

查看:101
本文介绍了如何计算Ruby字符串中唯一的多个单词?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试编写Ruby代码,该代码将对唯一单词进行计数并返回其总出现次数.

Trying to write a Ruby code that will count unique words and return their total occurrences.

因此,假设我想在下面的句子中找到Sally,Marina和Tina的出现次数:星期一Tina会见Sally和Harris.然后Tina会拜访她的妈妈Marina.Marina和Tina会见David吃晚饭."

So suppose I want to find number of occurrences for Sally, Marina and Tina in the following sentence "Monday Tina will meet Sally and Harris. Then Tina will visit her mom Marina. Marina and Tina will meet David for dinner."

我尝试了以下方法,但这击败了干本人.有没有更好的办法?

I tried the following but this defeats the dry principal. Is there a better way?

string = "Monday Tina will meet Sally and Harris. Then Tina will visit her mom Marina. Marina and Tina will meet David for dinner. Sally will then take Tina out for a late night party." 

puts "Marina appears #{string.split.count("brown").to_i} times."
puts "Tina appears #{string.split.count("grey").to_i} times."
puts "Sally appears #{string.split.count("blue").to_i} times."

预期结果:程序会在文本中查找唯一的单词并将其返回.

Expected result: program looks through the text for unique words and returns them.

实际:我必须在其唯一的PUTS行上对每个唯一单词进行硬编码,并执行string.split.count(用于该唯一单词)

Actual: I had to hard code each unique word on its own PUTS line and do string.split.count(for that unique word)

注意: 我尝试了以下内容,但这给了我每个单词.我需要对其进行优化,以使它仅满足我的要求.这就是我在努力的地方.

Note: I tried the following but this gives me EVERY word. I need to refine it to give me just the ones I ask for. This is where I am struggling.

def cw(string)
  w = string.split(' ')
  freq = Hash.new(0)
  w.each { |w| freq[w.downcase] += 1 }
  return freq
end
puts cw(string)

推荐答案

def count_em(str, who)
  str.gsub(/\b(?:#{who.join('|')})\b/i).
      each_with_object(Hash.new(0)) { |person,h| h[person] += 1 }
end

str = "Monday Tina will meet Sally and Harris. Then Tina will visit her " +
      "mom Marina. Marina and Tina will meet David for dinner. Sally will " +
      "then take Tina out for a late night party." 

who = %w| Sally Marina Tina |

count_em(str, who)
  #> {"Tina"=>4, "Sally"=>2, "Marina"=>2}

第一步如下.

r = /\b(?:#{who.join('|')})\b/i
  #=> /\b(?:Sally|Marina|Tina)\b/i
enum = str.gsub(r)
  #=> #<Enumerator: "Monday Tina will meet Sally and Harris. Then
  #   ...
  #   for a late night party.":gsub(/\b(?:Sally|Marina|Tina)\b/i)>

我们可以将其转换为数组以查看将传递给each_with_object的值.

We can convert this to an array to see the values that will be passed to each_with_object.

enum.to_a
  #=> ["Tina", "Sally", "Tina", "Marina", "Marina", "Tina", "Sally", "Tina"]

然后我们只需计算由enum生成的唯一值实例的数量.

We then simply count the number of instances of the unique values generated by enum.

enum.each_with_object(Hash.new(0)) { |person,h| h[person] += 1 }
  #=> {"Tina"=>4, "Sally"=>2, "Marina"=>2}

请参见 String#gsub ,特别是只有一个参数且没有任何块的情况.公认这是gsub的不寻常用法,因为它没有进行替换,但是在这里我更喜欢String#scan,因为gsub返回一个枚举数,而scan会生成一个临时数组.

See String#gsub, in particular the case when there is one argument and no block. This is admittedly an unusual use of gsub, as it is making no substitutions, but here I prefer it to String#scan because gsub returns an enumerator whereas scan produces a temporary array.

另请参见 Hash :: new ,即new接受参数且没有任何块的情况.该参数称为默认值.如果h是这样定义的哈希,则h如果没有键k,则默认值由h[k]返回.哈希没有改变.

See also Hash::new, the case where new takes an argument and no block. The argument is called the default value. If h is the hash so-defined, the default value is returned by h[k] if h does not have a key k. The hash is not altered.

此处的默认值为零.解析表达式h[person] += 1时,它将转换为:

Here the default value is zero. When the expression h[person] += 1 it is parsed it is converted to:

h[person] = h[person] + 1

如果person等于"Tina",并且这是枚举器首次生成"Tina"并将其传递给块,则h将没有键"Tina",因此表达式变为:

If person equals "Tina", and it is the first time "Tina" is generated by the enumerator and passed to the block, h will not have a key "Tina", so the expression becomes:

h["Tina"] = 0 + 1

作为0

是默认值.下次将"Tina"传递给该块时,该哈希具有键"Tina"(值为1),因此将执行以下计算.

as 0 is the default value. The next time "Tina" is passed to the block the hash has a key "Tina" (with value 1), so the following calculation is performed.

h["Tina"] = h["Tina"] + 1 #=> 1 + 1 #=> 2

这篇关于如何计算Ruby字符串中唯一的多个单词?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆