日语/字符编程技巧 [英] Programming tips with Japanese Language/Characters

查看:88
本文介绍了日语/字符编程技巧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

自从我学习语言以来,我想编写一些Web应用程序来帮助我,也许其他人可以更好地学习日语.

I have an idea for a few web apps to write to help me, and maybe others, learn Japanese better since I am studying the language.

我的问题是该站点将大部分使用英语,因此它需要将流利的日语字符混合在一起,通常是平假名和片假名,但后来是汉字.我正在接近完成这一目标.我发现页面和源文件必须是unicode和utf-8内容类型.

My problem is the site will be in mostly english, so it needs to mix fluently Japanese Characters, usually hirigana and katakana, but later kanji. I am getting closer to accomplishing this; I have figured out that the pages and source files need to be unicode and utf-8 content types.

但是,我的问题出在实际的编码中.我需要的是操纵假名文本字符串.一个例子是:

However, my problem comes in the actual coding. What I need is to manipulate strings of text that are kana. One example is:

けす我需要采取该动词并将其转换为te形式けして.我宁愿在javascript中执行此操作,因为它将帮助您进行更多的操作,但如果需要的话,只需执行DB调用并将所有内容保存在DB中即可.

けす I need to take that verb and convert it to the te-form けして. I would prefer to do this in javascript as it will help down the road to do more manipulation, but if I have to will just do DB calls and hold everything in a DB.

我的问题不仅是如何用JavaScript做到这一点,还有一些用其他语言来做这些事情的技巧和策略是什么.我希望能更多地从事语言学习应用程序的开发,但是在这方面迷失了.

My question is not only how to do it in javascript, but what are some tips and strategies to doing these kinds of things in other languages, too. I am hoping to get more into doing language learning apps, but am lost when it comes to this.

推荐答案

我的问题不仅是如何做 在javascript中,但有哪些提示 和做这些事情的策略 其他语言中的事物.

My question is not only how to do it in javascript, but what are some tips and strategies to doing these kinds of things in other langauges too.

您想要做的是非常基本的字符串操作-如Barry所述,除了缺少的单词分隔符外,尽管这不是技术问题.

What you want to do is pretty basic string manipution - apart from the missing word separators, as Barry notes, though that's not a technical problem.

基本上,对于现代的可识别Unicode的编程语言(我相信JavaScript自1.3版本以来就是这种语言),日语假名或日文汉字与拉丁字母之间没有真正的区别-它们全都是字符.字符串就是字符串.

Basically, for a modern Unicode-aware programming language (which JavaScript has been since version 1.3, I believe) there is no real difference between a Japanese kana or kanji, and a latin letter - they're all just characters. And a string is just, well, a string of characters.

当您必须在字符串和字节之间转换时,会变得很困难,因为这时您需要注意所使用的编码.不幸的是,由于ASCII是拉丁字母的事实上的标准编码,因此许多程序员,尤其是说英语的人往往会掩盖这个问题,而其他编码通常会尝试兼容.如果您只需要拉丁字母,那么您就可以轻松地对字符编码一无所知,相信字节和字符基本上是同一回事-编写可破坏非ASCII内容的程序.

Where it gets difficult is when you have to convert between strings and bytes, because then you need to pay attention to what encoding you are using. Unfortunately, many programmers, especially native English speakers tend to gloss over this problem because ASCII is the de facto standard encoding for latin letters and other encodings usually try to be compatible. If latin letters are all you need, then you can get along being blissfully ignorant about character encodings, believe that bytes and characters are basically the same thing - and write programs that mutilate anything that's not ASCII.

因此,支持Unicode的编程的秘密"是这样的:学会识别何时将字符串/字符转换为字节,以及何时将字节转换为字节,并确保在所有这些位置均使用了正确的编码,即将用于反向转换,并且可以对您正在使用的所有字符进行编码. UTF-8逐渐成为事实上的标准,通常应在任何有选择的地方使用.

So the "secret" of Unicode-aware programming is this: learn to recognize when and where strings/characters are converted to and from bytes, and make sure that in all those places the correct encoding is used, i.e. the same that will be used for the reverse conversion and one that can encode all the character's you're using. UTF-8 is slowly becoming the de-facto standard and should normally be used wherever you have a choice.

典型示例(并非详尽无遗):

Typical examples (non-exhaustive):

  • 使用非ASCII字符串文字编写源代码时(在编辑器/IDE中配置编码)
  • 在编译或解释此类源代码时(编译器/解释器需要知道编码)
  • 在将字符串读/写到文件时(必须在API或文件的元数据中的某个位置指定编码)
  • 将字符串写入数据库时​​(必须在数据库或表的配置中指定编码)
  • 通过网络服务器交付HTML页面时(必须在HTML标头或页面的meta标头中指定编码;表单可能更加棘手)

这篇关于日语/字符编程技巧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆