将外国文本翻译成HTML代码 - 帮助 [英] Translating foreign text into html code - help

查看:64
本文介绍了将外国文本翻译成HTML代码 - 帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一段文字粘贴到word文档中,它是用波兰语写的,


带有波兰字符。它们在单词中显示得很好,但是我用于网页编程的程序,HomeSite,不会翻译

它。当我将文本粘贴到代码中时,特殊字符缺少
。如果他们出现在那里,我可以使用替换特殊

字符功能将其更改为正确的代码,但它甚至不能正确地粘贴到它上面。有没有办法让Word做到这一点,或者

另一个程序或网站可以做到这一点?我一直在搜索

网页并发布问题并询问所有人,但到目前为止还没有运气。我确实找到了Mac的程序,但是我正在使用PC。

I have a paragraph of text pasted into a word document, it''s in Polish,

complete with polish characters. They show up just fine in word, but
the program I use for web page programming, HomeSite, won''t translate
it. When I paste the text into the code, the special characters are
missing. If they would show up there I could use the Replace Special
Characters feature to change it to the proper code, but it won''t even
paste into it correctly. Is there a way to get Word to do this, or
another program or web site than can do this? I''ve been searching the
web and posting questions and askig everyone but with no luck so far. I
did find a program for Mac, but I''m using a PC.

推荐答案


gr***@kcls.org 写道:
我有一段文字粘贴进入word文档,它是波兰语,

完成波兰语字符。它们表现得很好,但是我用于网页编程的程序,HomeSite,不会翻译它。


假设你想要波兰语字母翻译 to& #xxx;

表示法,而不是波兰语文本到英文文本:


当我将文本粘贴到代码中时,缺少特殊字符。如果它们出现在那里我可以使用替换特殊字符功能将其更改为正确的代码,但它甚至不会正确粘贴到它中。
I have a paragraph of text pasted into a word document, it''s in Polish,

complete with polish characters. They show up just fine in word, but
the program I use for web page programming, HomeSite, won''t translate
it.
Assuming you want to have the Polish letters "translated" to &#xxx;
notation, rather than Polish text to English text:

When I paste the text into the code, the special characters are missing. If they would show up there I could use the Replace Special
Characters feature to change it to the proper code, but it won''t even
paste into it correctly.




要么为您的网络创作工作获得一个新程序,要么获得一个工具

更改所有特殊字符到数字引用。 UniRed是一个很好的选择: http://unired.sourceforge.net / 。将您的波兰语文本粘贴到

UniRed中,并使用选项Unicode representation:& #DDDD;保存它。


顺便说一下,什么都没有特别关于波兰语字母,除了

他们不属于Latin-1字符集。如果您打算发布更多

波兰语文本,您应该考虑以utf-8的形式提供您的内容,这样您就可以使用您需要的字符,而不是混乱您的内容页面

带数字引用。它将使您的代码更清晰,并且更加直观。这可能意味着你将不得不放弃HomeSite,如果

它不具备Unicode功能。


祝你好运,
Garmt de Vries。



Either get a new program for your web authoring work, or get a tool to
change all "special" characters to numeric references. UniRed is a good
choice: http://unired.sourceforge.net/. Paste your Polish text into
UniRed, and save it with the option "Unicode representation: &#DDDD;".

By the way, there''s nothing "special" about Polish letters, except that
they''re not in the Latin-1 character set. If you intend to publish more
Polish text, you should consider serving your content as utf-8, so you
can just use the characters you need, rather than cluttering your page
with numeric references. It will keep your code more legible, and is
far more intuitive. It may mean you''ll have to give up on HomeSite, if
it isn''t Unicode-capable.

Good luck,
Garmt de Vries.


Garmt de Vries写道:
Garmt de Vries wrote:
要么获得新程序为您的网络创作工作,或获得一个工具来改变所有特殊字符到数字引用。
Either get a new program for your web authoring work, or get a tool to
change all "special" characters to numeric references.




如果他使用的是足够新版本的MS Word,他可以只需

选择文件/另存为选择像网页(已过滤)之类的格式

获得带有数字引用的HTML版本。 过滤的是指过滤的。事情或

类似的意思是MS Word避免吐出大部分

通常的Office XML。东西,你会得到一些合理的东西,比如


< p class = MsoNormal>< span lang = FI>这是波兰语:Wałęsa< / span>< / p>


这就是Word产生的版本。当然,lang属性

插入比废话更糟糕。这部分是我的错,因为我懒得

而且没有在Word中设置语言。如果我绘制文本,将其

语言设置为英语,然后单击波兰语名称并设置其语言

为波兰语,并保存如上,我得到(这里我再引用一点):


< body lang = EN-US>


< div class = Section1>


< p class = MsoNormal>这是波兰语:< span lang = PL>Wałęsa< / span>< / p>


< ; / div>


< / body>


还不错。 class属性本身当然没有任何效果,而且它有时甚至可能是有用的。有一天,有人可能希望对使用MS Office软件生成的段落使用一些

样式,并且该类

名称MsoNormal实际上是一个相当可靠的指标。


将语言设置为波兰语在

存在时几乎没有任何明显的影响,但它仍然是正确的做法。 (我猜最有价值的可能情况是,有人在MS Word或某些兼容程序中打开HTML

文档时,会识别出

lang markup并在拼写或语法检查中使用这些信息。

有点欺骗性,我的MS Word版本没有这样的支票

波兰语,所以任何东西我声称波兰语将通过,即不会被MS Word标记为
。)



If he is using a sufficiently new version of MS Word, he could just
select File/Save As and select a format like "Web page (filtered)" to
get an HTML version, with numeric references. The "filtered" thing or
something like that means that MS Word refrains from spitting out most
its usual "Office XML" stuff and you get something reasonable like

<p class=MsoNormal><span lang=FI>This is Polish: Wałęsa</span></p>

That''s what a version of Word produced. Of course, the lang attribute it
inserts is worse than nonsense. It''s partly my fault, since I was lazy
and didn''t set the language in Word. If I paint the text, set its
language to English, then click on the Polish name and set its language
to Polish, and save as above, I get (here I quote a little more):

<body lang=EN-US>

<div class=Section1>

<p class=MsoNormal>This is Polish: <span lang=PL>Wałęsa</span></p>

</div>

</body>

Not bad. The class attribute has of course no effect per se, and it
might even be useful at times. Some day someone might wish to use some
styling for paragraphs generated using MS Office software, and the class
name MsoNormal is in practice a rather reliable indicator.

Setting the language to Polish has hardly any noticeable effect at
present, but it''s still the right thing to do. (I guess the most
probable situation where it is useful is when someone opens the HTML
document in MS Word or some compatible program, which recognizes the
lang markup and uses this information in its spelling or grammar checks.
Somewhat deceptively, my version of MS Word has no such checks available
for Polish, so anything I claim to be Polish will "pass", i.e. will not
be flagged by MS Word.)


周三,2005年10月26日,Jukka K. Korpela写道:
On Wed, 26 Oct 2005, Jukka K. Korpela wrote:
如果他使用的是足够新版本的MS Word,他可以选择文件/另存为并选择一个格式如网页(过滤)
获取HTML版本,带有数字引用。


Word是否仍然存在这种令人讨厌的习惯,即& #number;引用

在128-159十进制范围内,W3C指定为UNUSED?


另外,要注意Word允许用户使用的习惯插入符号,

,然后生成伪HTML,使用符号字体引用

Latin-1字符,而不是使用正确的Unicode引用。作为一名网络管理员,我不断从基于Word的

作者那里获得这种滥用的例子。这样的伪HTML不适用于兼容www的客户端(因为你很明显知道,但是我正在为任何其他读者写这个......

避风港'还没遇到这个问题。

过滤后的事情或类似的事情意味着MS Word
不会吐出大多数通常的Office XML。和你有什么合理的东西

< p class = MsoNormal>< span lang = FI>这是波兰语:Wałęsa< / span>< / p>

这就是Word的一个版本。当然,它插入的lang
属性比废话更糟糕。这部分是我的错,
因为我很懒,而且没有在Word中设置语言。


这让我想起我的教授在他的Word版本设置为安装时创建了大量Word

文档的时间

默认为美国英语;他终于抱怨所提出的所有错误的b
拼写。但当他将Word

设置改为英国英语时,他发现它将所有现有的

文件视为外语(因此不会添加)拼写

到他的本地词典)。一切都相当令人困惑,真的。但那几年前就是

a。

将语言设置为波兰语对于
目前几乎没有任何明显的影响,但它仍然是正确的可以。
If he is using a sufficiently new version of MS Word, he could just
select File/Save As and select a format like "Web page (filtered)"
to get an HTML version, with numeric references.
Does Word still have this nasty habit of saving &#number; references
in the range 128-159 decimal, which W3C specifies to be UNUSED?

Also, beware of Word''s habit of allowing the user to insert symbols,
and then generating pseudo-HTML which uses Symbol font referring to
Latin-1 characters, instead of using the proper Unicode references. As
a web admin, I keep getting examples of this abuse from our Word-based
authors. Such pseudo-HTML won''t work on www-compatible clients (as
you obviously know, but I''m writing this for any other readers who
haven''t met this problem yet).
The "filtered" thing or something like that means that MS Word
refrains from spitting out most its usual "Office XML" stuff and you
get something reasonable like

<p class=MsoNormal><span lang=FI>This is Polish: Wałęsa</span></p>

That''s what a version of Word produced. Of course, the lang
attribute it inserts is worse than nonsense. It''s partly my fault,
since I was lazy and didn''t set the language in Word.
Which reminds me of the time that my Prof. created lots of Word
documents while his version of Word was set to its installation
default of US English; he finally complained about all the wrong
spellings it was proposing to him. But when he changed the Word
setting to British English, he found that it treated all the existing
documents as being in a foreign language (so it wouldn''t add spellings
to his local dictionary). All rather confusing, really. But that was
a few years back.
Setting the language to Polish has hardly any noticeable effect at
present, but it''s still the right thing to do.




它可以调整说话浏览器的发音(IBM HPR

支持这个概念,虽然它不支持波兰语我看过的最后一次





It could adjust the pronunciation of a speaking browser (IBM HPR
supports the concept, although it didn''t support Polish the last time
that I looked).


这篇关于将外国文本翻译成HTML代码 - 帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆