如何导出umlaut(或任何外国字符)在Matlab eps格式? [英] How to export umlaut (or any foreign character) in Matlab eps format?

查看:250
本文介绍了如何导出umlaut(或任何外国字符)在Matlab eps格式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在MATLAB中的一个图例命令中使用一个变音符号。 Google快速告诉我我想要的表单是 char(146),可以很好地显示文件或打印到tif。



但是当我打印到EPS格式(或epsc,eps2,epsc2)时,文件中显示不同的字符。我试过打印前300个字符,他们肯定改变(虽然很慢,其中一半是A,随后一个符号),但这似乎是一个相当缓慢的方法,我不是保证实际找到我想要的符号。



我使用的是MATLAB R2011a,我的默认字符集是UTF-8,我的打印行看起来像..

  legend(plot_id,strcat('lala',char(146)))



我的打印行看起来像..

  print -depsc2 -tiff -r600< filename> 

(但关闭tiff缩图生成没有任何效果)

解决方案

当MATLAB字符编码为UTF-8时出现问题,这通常是Linux用户的情况(因此,使用CP1252的Amro配置没有问题)。当MATLAB字符集编码(用 slCharacterEncoding()获得)是UTF-8时,MATLAB eps导出功能被窃取(至少直到R2011b),因为它导出非ASCII字符以八进制转义的UTF-8格式(2字节),而Postscript解释器设置为解码1字节格式。



让我们用字符öU + 00F6,其中一些表示形式是:




  • UTF-16:0x00F6

  • UTF-8:0xC3 0xB6

  • C八进制转义UTF-8:\303\266

  • XML小数实体:&#246



MATLAB创建的eps文件包含:

  Helvetica / ISOLatin1Encoding 120 FMSR 
(\303\266)s

eps文件的一个函数 FMSR 将Helvetica字体重新编码为另一个编码,这里ISOLatin1Encoding ,它是两个内置编码向量之一,并且与ISO-8859-1(Latin1)标准紧密匹配(参见Postscript语言参考手册中的第329-330页)细节)。简而言之,编码向量是将字符名称与字符代码相关联的256元素数组。所以它只读取1字节的字符代码。在ISO-8859-1中,\303 = 195 =Ã和\266 = 182 =¶。



使用UTF-8语言环境导出非ASCII ISO-8859-1字符的选项




  1. 将八进制UTF-8代码转换为八进制ISO-8859-1代码,这很容易,因为非ASCII ISO-8859-1字符遵循布局在UTF-8。例如,使用可以从命令窗口或从导出脚本运行的程序sed:

     !sed -i -e / \\302\(\\2 [4-7] [0-7] \)/ \1 / g'-e'/ \\303\ \2\([0-7] [0-7] \)/ \\3\1 / g'file.eps 

    因此, \303\266 变为 \366 = 246 =ö。


  2. 更改MATLAB字符集编码 slCharacterEncoding('ISO-8859-1 '),如果从命令窗口添加文本,则对非ASCII字符使用char(number)。如果您使用绘图工具直接在图中添加文本,则可以输入非ASCII字符。这个解决方案是不理想的,因为非ASCII字符没有出现在默认字体(Helvetica默认情况下,在Linux上的MATLAB),并且它需要使用char(数字)如果你脚本创建的图。 p>


  3. 稍后通过使用用户提交的MATLAB函数(如LaPrint或其中的一个fork),使用LaTex渲染文本,这将创建一个tex文件,图和一个eps文件与图的非文本部分。类似的解决方案是matlab2tikz,它创建一个tikz / pgfplot文件和一个tex文件。


  4. 使用MATLAB的Latex解释器: \ {o} 。MATLAB通过将ASCII字符与其变音符号组合创建字符,但是结果质量较差,因为相对定位不正确(与字符相比,变音符号有点过分)。MATLAB使用来自Computer Modern字体的字形,并将字体嵌入到eps文件(增加〜80Ko)中。此外,从eps创建的pdf中的原始文本不包含ö,但




导出非ISO-8859-1字符



用于导出不在ISO-8859-1中的字符,该字符在这里,可能有一个合理的解决方案,如果所需的字符数小于256(8位格式),理想情况下为标准编码集合。它包含以下步骤:


  1. 将八进制代码转换为Unicode字符;

  2. 保存文件转换为目标编码标准(以8位格式);

  3. 为目标编码集添加编码向量。

例如,如果要导出波兰语文本,则需要将该文件转换为ISO-8859-2。下面是在Linux上使用Bash实现:

 #!/ bin / bash 
name = $(basename$ 1 .eps)
ascii2uni -a K$ 1> /tmp/eps_uni.eps
iconv -t ISO-8859-2 /tmp/eps_uni.eps -o$ name_latin2.eps
sed -i -e'/%EndPageSetup / r ISOLatin2Encoding。 ps'-e'/ ISOLatin1Encoding / MyEncoding /'$ name_latin2.eps

另存为eps_lat2;然后运行命令 sh eps_lat2 file.eps 使用Latin-2编码创建file_latin2.eps。文件ISOLatin2Encoding.ps包含以下内容:

  / MyEncoding 
%前144个条目与ISO Latin -1编码。
ISOLatin1Encoding 0 144 getinterval aload pop
%\22x
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
%\24x
/ nbspace / Aogonek / breve / Lslash / currency / Lcaron / / section
/ dieresis / Scaron / Scedilla / Tcaron / Zacute / hyphen / Zcaron / Zdotaccent
/ degree / aogonek / ogonek / lslash / acute / lcaron / sacute / caron
/ cedilla / scaron / scedilla / tcaron / zacute / hungarumlaut / zcaron / zdotaccent
%\30x
/ Racute / Aacute / Acircumflex / Abreve / Adieresis / Lacute / Cacute / Ccedilla
/ Ccaron / Eacute / Eogonek / Edieresis / Ecaron / Iacute / Icircumflex / Dcaron
/ Dcroat / Nacute / Ncaron / Oacute / Ocircumflex / Ohungarumlaut / Odieresis / multiply
/ Rcaron / Uring / Uacute / Uhungarumlaut / Udieresis / Yacute / Tcedilla / germandbls
%\34x
/ racute / aacute / acircumflex / abreve / adieresis / lacute / cacute / ccedilla
/ ccaron / eacute / eogonek / ediresis / ecaron / iacute / icircumflex / dcaron
/ dcroat / nacute / ncaron / oacute / ocircumflex / ohungarumlaut / odieresis / divide
/ rcaron / uring / uacute / uhungarumlaut / udieresis / yacute / tcedilla / dotaccent
256 packedarray def

这里是Python的另一个实现(因此它也可以在Windows和Mac上工作):

 #!/ usr / bin / python 
# - * - coding:utf-8 - * -
import sys,codecs
input = sys.argv [1]
fo = codecs.open(input [: - 4] +'_ latin2.eps','w','latin2')
with codecs.open输入,'r','string_escape')as fi:
data = fi.readlines()
with open('ISOLatin2Encoding.ps')as fenc:
for data in data:
fo.write(line.decode('utf-8')。replace('ISOLatin1Encoding','MyEncoding'))
如果line.startswith('%% EndPageSetup'):
fo .write(fenc.read())
fo.close()

另存为eps_lat2 .py;然后运行命令 python eps_lat2.py file.eps 创建带有Latin-2编码的file_latin2.eps。



可以通过更改脚本中的编码向量和iconv(或codecs.open)参数轻松适应其他8位编码标准。


I'm trying to use an umlaut within a legend command in MATLAB. A quick Google tells me the form I want is char(146), and that works fine for displaying the file, or printing it to tif.

But when I print to EPS format (or epsc, eps2, epsc2) then a different character is displayed in the file. I've tried printing the first 300-odd characters, and they certainly change (albeit very slowly, a good half of which are "A" with a symbol immediately afterward), but this seems a pretty slow approach, and I'm not guaranteed to actually find the symbol I want. So, does anyone here have any ideas on what I can try?.

I'm using MATLAB R2011a, my default character-set is UTF-8, my print line looks something like..

legend( plot_id , strcat('lala',char(146)) )

and my print line looks like..

print -depsc2 -tiff -r600 <filename>

(but switching off the tiff thumbnail generation doesn't have any effect)

解决方案

The problem appears when MATLAB character encoding is UTF-8, which is usually the case for Linux users (hence no problem for Amro's configuration using CP1252). When MATLAB character set encoding (get it with slCharacterEncoding()) is UTF-8, MATLAB eps export function is bugged (at least until R2011b) as it exports the non-ASCII characters in the octal escaped UTF-8 format (2 bytes) whereas the Postscript interpreter is set to decode 1-byte format.

Let's illustrate the bug with the character ö U+00F6 whose some representations are:

  • UTF-16: 0x00F6
  • UTF-8: 0xC3 0xB6
  • C octal escaped UTF-8: \303\266
  • XML decimal entity: &#246

The eps file created by MATLAB contains:

/Helvetica /ISOLatin1Encoding 120 FMSR
(\303\266) s

MATLAB defines in the eps file a function FMSR that re-encodes Helvetica font into another encoding, here ISOLatin1Encoding which is one of the two built-in encoding vectors and closely matches the ISO-8859-1 (Latin1) standard (see p.329-330 of the Postscript Language Reference Manual for more details). Briefly, encoding vectors are 256-element arrays that associates a character name to a character code. So it only reads 1-byte character codes. In ISO-8859-1, \303=195=à and \266=182=¶. As a result, it prints ö.

Options for exporting non-ASCII ISO-8859-1 characters with a UTF-8 locale environment

  1. Convert the octal UTF-8 codes into octal ISO-8859-1 codes, which is easy because non-ASCII ISO-8859-1 characters follow the same layout in UTF-8. For example, with the program sed, which can be run from the Command window or from your export script:

    !sed -i -e 's/\\302\(\\2[4-7][0-7]\)/\1/g' -e 's/\\303\\2\([0-7][0-7]\)/\\3\1/g' file.eps
    

    Thus, \303\266 becomes \366=246=ö. You can directly type the non-ASCII characters in MATLAB.

  2. Change the MATLAB character set encoding slCharacterEncoding('ISO-8859-1') before adding text to the figure and, if you add text from the Command window, use char(number) for non-ASCII characters. If you add text directly in the figure with the plot tools, you can enter the non-ASCII characters. This solution is not ideal because the non-ASCII characters do not appear on the figure in the default font (Helvetica by default with MATLAB on Linux) and it requires to use char(number) if you script the creation of the figure.

  3. Render the text later with LaTex by using a user-submitted MATLAB function such as LaPrint or one of its forks, which creates a tex-file with the text of the figure and an eps-file with the non-text part of the figure. A similar solution is matlab2tikz which creates a tikz/pgfplot file and a tex file.

  4. Use the Latex interpreter of MATLAB: \"{o}. MATLAB creates the character by combining the ASCII character with its diacritic but the result is low quality because of bad relative positioning (the diacritic is a bit too much on the right compared to the character). MATLAB uses the glyphs from Computer Modern font and embeds the font in the eps file (which adds ~ 80 Ko). Furthermore, the raw text in the pdf created from the eps does not contain ö but o ̈.

Exporting non-ISO-8859-1 characters

For exporting characters that are not in ISO-8859-1, which was asked on here, there is probably a reasonable solution if the number of characters needed is less than 256 (8-bit format) and ideally in a standard encoding set. It involves the following steps:

  1. Convert the octal code into the Unicode character;
  2. Save the file into the target encoding standard (in a 8-bit format);
  3. Add the encoding vector for the target encoding set.

For example, if you want to export Polish text, you need to convert the file into ISO-8859-2. Here is an implementation on Linux with Bash:

#!/bin/bash
name=$(basename "$1" .eps)
ascii2uni -a K "$1" > /tmp/eps_uni.eps
iconv -t ISO-8859-2 /tmp/eps_uni.eps -o "$name"_latin2.eps
sed -i -e '/%EndPageSetup/ r ISOLatin2Encoding.ps' -e 's/ISOLatin1Encoding/MyEncoding/' "$name"_latin2.eps

saved as eps_lat2; then running the command sh eps_lat2 file.eps creates file_latin2.eps with Latin-2 encoding. The file ISOLatin2Encoding.ps contains this:

/MyEncoding
% The first 144 entries are the same as the ISO Latin-1 encoding.
ISOLatin1Encoding 0 144 getinterval aload pop
% \22x
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
    /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef
% \24x
    /nbspace /Aogonek /breve /Lslash /currency /Lcaron /Sacute /section
    /dieresis /Scaron /Scedilla /Tcaron /Zacute /hyphen /Zcaron /Zdotaccent
    /degree /aogonek /ogonek /lslash /acute /lcaron /sacute /caron
    /cedilla /scaron /scedilla /tcaron /zacute /hungarumlaut /zcaron /zdotaccent
% \30x
    /Racute /Aacute /Acircumflex /Abreve /Adieresis /Lacute /Cacute /Ccedilla
    /Ccaron /Eacute /Eogonek /Edieresis /Ecaron /Iacute /Icircumflex /Dcaron
    /Dcroat /Nacute /Ncaron /Oacute /Ocircumflex /Ohungarumlaut /Odieresis /multiply
    /Rcaron /Uring /Uacute /Uhungarumlaut /Udieresis /Yacute /Tcedilla /germandbls
% \34x
    /racute /aacute /acircumflex /abreve /adieresis /lacute /cacute /ccedilla
    /ccaron /eacute /eogonek /edieresis /ecaron /iacute /icircumflex /dcaron
    /dcroat /nacute /ncaron /oacute /ocircumflex /ohungarumlaut /odieresis /divide
    /rcaron /uring /uacute /uhungarumlaut /udieresis /yacute /tcedilla /dotaccent
256 packedarray def

Here is another implementation with Python (so it can work also on Windows and Mac):

#!/usr/bin/python
# -*- coding: utf-8 -*-
import sys,codecs
input = sys.argv[1]
fo = codecs.open(input[:-4]+'_latin2.eps','w','latin2')
with codecs.open(input,'r','string_escape') as fi:
    data = fi.readlines()
with open('ISOLatin2Encoding.ps') as fenc:
    for line in data:
        fo.write(line.decode('utf-8').replace('ISOLatin1Encoding','MyEncoding'))
        if line.startswith('%%EndPageSetup'):
            fo.write(fenc.read())
fo.close()

saved as eps_lat2.py; then running the command python eps_lat2.py file.eps creates file_latin2.eps with Latin-2 encoding.

It can easily be adapted to other 8-bit encoding standards by changing the encoding vector and the iconv (or codecs.open) parameter in the script.

这篇关于如何导出umlaut(或任何外国字符)在Matlab eps格式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆