孟加拉语文本不显示在Unicode CSV文件中 [英] Bengali-language text not displayed in Unicode CSV file

查看:146
本文介绍了孟加拉语文本不显示在Unicode CSV文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个孟加拉语的Excel文件。要正确显示孟加拉语文本,我需要安装在PC上的孟加拉语字体。



我使用Office 2010将Excel文件转换为CSV。但它只显示?标记的孟加拉文字。然后,我使用Google文档进行转换,具有相同的问题,但使用不可读的字符而不是'?'。我从HTML文件中将文件中的提取物粘贴到HTML文件中,并尝试在浏览器中查看不完整。



如何从.xlsx文件中获取CSV文件孟加拉语,以便我可以将其导入MySQL数据库?



编辑这个SO 问题让我进入Google文档。

解决方案

根据问题的答案使用UTF8编码的Excel到CSV ,Google文档应正确保存CSV,与Excel相反,破坏了所使用的ANSI编码中无法表示的所有字符。但是也许他们改变了这种情况,或者是出了问题,或者分析情况是不正确的。



对于在MS Office程序中处理的正确编码的Bangla(孟加拉语)不需要任何Bangla字体,因为Arial Unicode MS字体(随Office一起提供)包含Bangla字符。数据实际上还是依赖于特殊编码字体的一些非标准编码?在这种情况下,应该首先将其转换为Unicode,尽管可能会以某种方式使用始终使用该特定字体的程序进行管理。



在Excel中,使用另存为,您可以选择Unicode文本(* .txt)。它以UTF-16编码将数据保存为TSV(制表符分隔值)。然后,您可能需要将其转换为使用逗号作为分隔符而不是制表符,和/或从UTF-16转换为UTF-8。但是,如果原始数据被正确编码,则此功能才起作用。


I have an Excel file in the Bengali language. To display the Bengali text properly I need Bengali fonts installed on the PC.

I converted the Excel file into CSV using Office 2010. But it only shows '?' marks instead of the Bengali characters. Then I used the Google Docs for the conversion, with the same problem, but with unreadable characters rather than '?'s. I pasted extracts from that file in an HTML file and tried to view it in my browser unsuccesfully.

What should I do to get a CSV file from an .xlsx file in Bengali so that I can import that into a MySQL database?

Edit: The answer accepted in this SO question made me go to Google Docs.

解决方案

According to the answers to the question Excel to CSV with UTF8 encoding, Google Docs should save CSV properly, contrary to Excel, which destroys all characters that are not representable in the "ANSI" encoding being used. But maybe they changed this, or something wrong, or the analysis of the situation is incorrect.

For properly encoded Bangla (Bengali) processed in MS Office programs, there should be no need for any "Bangla fonts", since the Arial Unicode MS font (shipped with Office) contains the Bangla characters. So is the data actually in some nonstandard encoding that relies on a specially encoded font? In that case, it should first be converted to Unicode, though possibly it can be somehow managed using programs that consistently use that specific font.

In Excel, when using Save As, you can select "Unicode text (*.txt)". It saves the data as TSV (tab-separated values) in UTF-16 encoding. You may then need to convert it to use comma as separator instead of tab, and/or from UTF-16 to UTF-8. But this only works if the original data is properly encoded.

这篇关于孟加拉语文本不显示在Unicode CSV文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆