在标点符号LUIS.ai上禁用令牌中断 [英] Disable token breaks on punctuation LUIS.ai

查看:73
本文介绍了在标点符号LUIS.ai上禁用令牌中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Microsoft Cognitive Service的语言理解服务API, LUIS.ai .

I am working with Microsoft Cognitive Service's Language Understanding Service API, LUIS.ai.

每当LUIS解析文本时,总是在标点符号周围插入空格标记.

Whenever text is parsed by LUIS, whitespace tokens are always inserted around punctuation.

根据文档,此行为是故意的.

This behavior is intentional, according to the documentation.

英语,法语,意大利语,西班牙语:在任意位置插入令牌符 空格以及所有标点符号."

"English, French, Italian, Spanish: token breaks are inserted at any whitespace, and around any punctuation."

对于我的项目,我需要保留没有这些标记的原始查询字符串,因为为我的模型训练的某些实体将包括标点符号,并且从解析的实体中剥离多余的空白很烦人,而且有点麻烦.

For my project, I need to preserve the original query string, without these tokens, as some entities trained for my model will include punctuation, and it's annoying and a bit hacky to strip the extra whitespace from the parsed entities.

此行为的示例:

是否可以禁用此功能?这样会节省很多精力.

Is there a way to disable this? It would save quite a bit of effort.

谢谢!

推荐答案

不幸的是,暂时没有办法禁用它,但是好消息是返回的预测将处理原始字符串,而不是您在其中看到的带标记的字符串标记过程示例.

Unfortunately there's no way to disable that for now, but the good news is that the predictions returned will deal with the original string, not the tokenized one you see in the example labeling process.

如何理解JSON响应,您可以在示例输出保存器中看到原始查询"字符串,并且提取的实体具有从零开始的字符索引("startIndex", "endIndex" )在原始字串中;这将使您可以处理索引而不是解析的实体短语.

Here in the documentation of how to understand the JSON response you can see the example output preservers the original "query" string, and the extracted entities have the zero based character indices ("startIndex", "endIndex") in the original string; this will allow you to deal with the indices instead of parsed entity phrases.

{
"query": "Book me a flight to Boston on May 4",
"intents": [
  {
    "intent": "BookFlight",
    "score": 0.919818342
  },
  {
    "intent": "None",
    "score": 0.136909246
  },
  {
    "intent": "GetWeather",
    "score": 0.007304534
  }
],
"entities": [
  {
    "entity": "boston",
    "type": "Location::ToLocation",
    "startIndex": 20,
    "endIndex": 25,
    "score": 0.621795356
  },
  {
    "entity": "may 4",
    "type": "builtin.datetime.date",
    "startIndex": 30,
    "endIndex": 34,
    "resolution": {
      "date": "XXXX-05-04"
    }
  }
]

}

这篇关于在标点符号LUIS.ai上禁用令牌中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆