检测日语字符输入和“Romajis” (ASCII) [英] Detect Japanese character input and "Romajis" (ASCII)

查看:586
本文介绍了检测日语字符输入和“Romajis” (ASCII)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想检测用户何时:


  1. 输入日语字符(汉字或假名)

  2. 输入罗马字符(专用)

目前我正在使用ASCII范围(C#语法) / p>

  string searchKeyWord = Console.ReadLine(); 
var romajis = from c in searchKeyWord where c> =''&&& c'='select c;

if(romajis.Any())
{
// Romajis
}
else
{
//日语输入
}

有更好,更快这个?



编辑:问题可以推广到任何其他具有非ascii字符集的语言。

解决方案

维基百科很好,在右上角有平假名,片假名和汉字的unicode范围。我们可以使用它来优化您的算法,并获得其他字符集。

  private static IEnumerable< char> GetCharsInRange(string text,int min,int max)
{
return text.Where(e => e> = min&& e< = max);
}

用法:

  var romaji = GetCharsInRange(searchKeyword,0x0020,0x007E); 
var hiragana = GetCharsInRange(searchKeyword,0x3040,0x309F);
var katakana = GetCharsInRange(searchKeyword,0x30A0,0x30FF);
var kanji = GetCharsInRange(searchKeyword,0x4E00,0x9FBF);请注意,这应该和你的一样快,只是一个更好/更好的imo:) p>

确定一般语言集



是的,您可以检测这样的字符集,但不是真正的语言。由于法语,德语等分享了很多字符与英语和日语共享很多汉字与中文(显然)。你不能清楚地说,一个单一的字符是一个单一的语言,对于很多字符没有巨大的查找图表。



还有一个事实,日语使用英语(和标点符号),你的方法会考虑包含罗马字或表情符号的任何东西为罗马字。


I would like to be able to detect when the user:

  1. Inputs Japanese characters (Kanji or Kana)
  2. Inputs Roman characters (exclusively)

Currently I am using the ASCII range like this (C# syntax):

string searchKeyWord = Console.ReadLine();
var romajis = from c in searchKeyWord where c >= ' ' && c <= '~' select c;

if (romajis.Any())
{
    // Romajis
}
else
{
    // Japanese input
}

Is there a better, faster (stronger...) way to do this?

EDIT: the question can be generalized to any other language with a non-ascii character set.

解决方案

Wikipedia is nice and has the unicode ranges in the top right corner for hiragana, katakana and kanji. We can use this to our advantage to refine your algorithm and also get the other character sets.

private static IEnumerable<char> GetCharsInRange(string text, int min, int max)
{
    return text.Where(e => e >= min && e <= max);
}

Usage:

var romaji = GetCharsInRange(searchKeyword, 0x0020, 0x007E);
var hiragana = GetCharsInRange(searchKeyword, 0x3040, 0x309F);
var katakana = GetCharsInRange(searchKeyword, 0x30A0, 0x30FF);
var kanji = GetCharsInRange(searchKeyword, 0x4E00, 0x9FBF);

Note that this should be as fast as your, just a little nicer/better imo :)

Determining general language sets

Yes you can detect sets of characters like that, but not really languages. Since French, German, etc. share a lot of characters with English and Japanese shares a lot of Kanji with Chinese (obviously). You can't clearly say that a single character is from a single language for a lot of characters without a giant lookup chart.

There is also the fact that Japanese use English (and punctuation) quite a bit, your method would consider anything that contains a romanised word or an emoticon to be romaji.

这篇关于检测日语字符输入和“Romajis” (ASCII)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆