IBM Watson语音对文本的单词置信度差异 [英] Difference in word confidence in IBM Watson Speech to text

查看:83
本文介绍了IBM Watson语音对文本的单词置信度差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用节点sdk来使用IBM watson语音转文本模块.发送音频样本并收到响应后,置信度看起来很奇怪.

{
  "results": [
    {
       "word_alternatives": [
      {
      "start_time": 3.31,
      "alternatives": [
        {
          "confidence": 0.7563,
          "word": "you"
        },
        {
          "confidence": 0.0254,
          "word": "look"
        },
        {
          "confidence": 0.0142,
          "word": "Lou"
        },
        {
          "confidence": 0.0118,
          "word": "we"
        }
      ],
      "end_time": 3.43
    },
...

...
],
"alternatives": [
    {
      "word_confidence": [
        [
          "you",
          0.36485132893469713
        ],
...

并且我要求使用此配置进行识别:

 var params = {
    audio: fs.createReadStream(req.file.path),
    content_type: 'audio/wav',
    'interim_results': false,
    'word_confidence': true,
    'timestamps': true,
    'max_alternatives': 3,
    'continuous': true,
    'word_alternatives_threshold': 0.01,
    'smart_formatting': true
  };

请注意,"you"一词的置信度在两个地方都不同.这些数字之一有什么不同吗?这是怎么回事?

混淆网络是从晶格派生而来的,但是包含了对假设空间的不同表示,这解释了为什么来自一个或另一个的置信度值可能会有所不同.

在这种情况下,句子仅包含一个单词,这就是为什么差异非常明显的原因.

I am using the node sdk to use the IBM watson speech-to-text module. After sending the audio sample and receiving a response, the confidence factor looks weird.

{
  "results": [
    {
       "word_alternatives": [
      {
      "start_time": 3.31,
      "alternatives": [
        {
          "confidence": 0.7563,
          "word": "you"
        },
        {
          "confidence": 0.0254,
          "word": "look"
        },
        {
          "confidence": 0.0142,
          "word": "Lou"
        },
        {
          "confidence": 0.0118,
          "word": "we"
        }
      ],
      "end_time": 3.43
    },
...

and

...
],
"alternatives": [
    {
      "word_confidence": [
        [
          "you",
          0.36485132893469713
        ],
...

and I am asking for recognition with this config:

 var params = {
    audio: fs.createReadStream(req.file.path),
    content_type: 'audio/wav',
    'interim_results': false,
    'word_confidence': true,
    'timestamps': true,
    'max_alternatives': 3,
    'continuous': true,
    'word_alternatives_threshold': 0.01,
    'smart_formatting': true
  };

Notice how the confidence factors for the word "you" is different in both places. Is one of these numbers something different? What is going on here?

解决方案

John, confidence values coming in the "word_alternatives" are derived from confusion networks, and are at the word-level, while confidence values coming in the list of "alternatives" are computed over lattices, at the sentence level.

Confusion networks are derived from lattices, but contain a different representation of the hypothesis space, which explains why confidence values coming from one or the other could differ.

In this case the sentence contains only one word, that's why the difference is very visible.

这篇关于IBM Watson语音对文本的单词置信度差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆