思路转换直引号为弯引号 [英] Ideas for converting straight quotes to curly quotes

查看:205
本文介绍了思路转换直引号为弯引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含直(正常,ASCII)报价文件,我试图将它们转换为实际的引号字形(大行情,U + 2018至U + 201D)。由于转型从两个不同的引号字符到一个单一之一是有损摆在首位,显然是没有办法自动执行此转换;不过我怀疑一些启发将覆盖大多数情况下。因此,该计划是一个脚本(在Emacs),做类似以下内容:每个直引号字符,

  1. 在猜测使用哪个卷曲引号字符,如果可能的话
  2. 询问用户(我)确认,或作出选择

这个问题是关于第一步:这将是一个很好的算法(heuristics集,更象)使用,正常英文文本(一个新的,例如)?下面是一些preliminary的想法,我相信,双引号工作(反例,欢迎!):

  1. 如果一个双引号在一行的开头,猜测它是一个开放的报价。
  2. 如果一个双引号在一行的末尾,猜测收盘报价。
  3. 如果一个双引号是由空间pceded $ P $,想开口的报价。
  4. 如果一个双引号后面加一个空格,猜右引号。
  5. 如果一个双引号不适合上述类别之一,猜测它是最近使用的一种双引号。
  6. 的对立面

单引号是棘手的,因为可能要么是开引号,收盘报价,的撇号,我们要离开撇号单独(不能写绝不能)。一些相同的规则上面适用,但是这是一件可能的撇号的单词(或线)的起点,虽然它比较少见'第三世界科学院过去。我不能随便想到的规则,将妥善处理好样片段[我喜欢'70年代秀',她说。这可能需要看的不仅仅是邻近字符,引号之间的计算距离,例如...

任何更多的想法?它是好的,如果不是所有可能的情况都覆盖;我们的目标是尽可能智能尽可能但没有进一步的。 : - )

修改:一些更多的东西,可能是值得思考(或可能是无关紧要的,不知道):

  • 报价可能并不总是匹配对:对于单引号这是显而易见的,为什么如上。但即使是双引号,当有报价,对多个段落扩展,常用的印刷约定(不要问我为什么)是开始每一段有一个引号,即使它没有在$关闭p $ pvious之一。因此,只是保持一个状态机,两种状态之间交替会的没有的工作!
  • 在嵌套报价(提到了我喜欢'70年代秀上面的例子):这可能使任何类型的引用的没有的是preceded或后面加一个空格。
  • 在英国/美国标点风格:是引号内的逗号或外
  • 在很多字处理器(如Microsoft Word)中已做一些这样的转换。虽然他们不是完美的,往往是恼人的,它可能是有益的,了解它们是如何工作...
解决方案
  

猜测其卷曲引号字符使用,如果可能的话

有不,在一般情况下

简单的算法,大多数自动转换器使用的就是看你的或前输入了previous信。如果它是一个空间,起始线,打开支架或其它开口报价,选择开放报价,别的缩小。这种方法的优点是,它可以运行作为你型,所以当它选择了错误的通常可以纠正它。

  

我们想独自离开撇号

我同意!但很多人都不知道。这是正常的排版做法是把一个单引号变成向左的单引号。我个人preFER离开他们,因为他们是,从封闭的报价区别开来,使文字更容易(我觉得)阅读,并能自动处理。

不过,这真的只是我的口味,而不是通常被认为是合理的,只是因为性格是由统一code标准作为APOSTROPHE定义。

  

炎可能撇号单词的开头

确实。有没有办法分辨一个潜在开放的报价单引号的情况下,像经典的鱼'N'芯片,缺乏文化背景的巨大数额。

(更不用提质数,okinas,声门站和其他各种用途的撇号......)

做的最好的事情,当然是安装一个键盘布局,可以直接输入智能引号。我有'的AltGr键+ [],关于AltGr键+ SHIFT + [] - 在AltGr键+ [Shift]键+破折号,等等

I have a file that contains "straight" (normal, ASCII) quotes, and I'm trying to convert them to real quotation mark glyphs ("curly" quotes, U+2018 to U+201D). Since the transformation from two different quote characters into a single one has been lossy in the first place, obviously there is no way to automatically perform this conversion; nevertheless I suspect a few heuristics will cover most cases. So the plan is a script (in Emacs) that does something like the following: for each straight quote character,

  1. guess which curly quote character to use, if possible
  2. ask the user (me) to confirm, or make a choice

This question is about the first step: what would be a good algorithm (a set of heuristics, more like) to use, for normal English text (a novel, for example)? Here are some preliminary ideas, which I believe work for double-quotes (counterexamples are welcome!):

  1. If a double-quote is at the beginning of a line, guess that it is an opening quote.
  2. If a double-quote is at the end of a line, guess a closing quote.
  3. If a double-quote is preceded by a space, guess an opening quote.
  4. If a double-quote is followed by a space, guess a closing quote.
  5. If a double-quote doesn't fit into one of the above categories, guess that it is the "opposite" of the most recently used kind of double-quote.

Single quotes are trickier, because a ' might be either an opening quote, closing quote, or apostrophe, and we want to leave apostrophes alone (mustn't write "mustn’t"). Some of the same rules as above apply, but 'tis possible apostrophes are at the beginning of words (or lines), although it's less common than 'twas in the past. I can't offhand think of rules that would properly handle fragments like ["I like 'That '70s show'", she said]. It might require looking at more than just neighbouring characters, and compute distances between quotes, for example…

Any more ideas? It is okay if not all possible cases are covered; the goal is to be as intelligent as possible but no further. :-)

Edit: Some more things that might be worth thinking about (or might be irrelevant, not sure):

  • quotes might not always be in matching pairs: For single quotes it's obvious why as above. But even for double quotes, when there is a quotation that extends for more than one paragraph, usual typographic convention (don't ask me why) is to start each paragraph with a quotation mark, even though it has not been closed in the previous one. So simply keeping a state machine that alternates between two states will not work!
  • Nested quotation (alluded to in the "I like 'That '70s show'" example above): this might make either kind of quote not be preceded or followed by a space.
  • British/American punctuation style: are commas inside the quotes or outside?
  • Many word processors (e.g Microsoft Word) already do some sort of conversion like this. Although they are not perfect and can often be annoying, it might be instructive to learn how they work...

解决方案

guess which curly quote character to use, if possible

It is not, in the general case.

The simple algorithm that most automatic converters use is just to look at the previous letter you typed before the ' or ". If it's a space, start of line, opening bracket or other opening quote, choose opening quote, else closing. The advantage of this method is that it can run as-you-type, so when it chooses the wrong one you can generally correct it.

we want to leave apostrophes alone

I agree! But not many people do. It's normal typesetting practice to turn an apostrophe into a left-facing single quote. Personally I prefer to leave them as they are, to distinguish them from enclosing quotes, making the text easier (I find) to read, and possible to process automatically.

However this really is just my taste and is not generally considered justified merely because the character is defined by the Unicode standard as being APOSTROPHE.

'tis possible apostrophes are at the beginning of words

Indeed. There is no way to tell an apostrophe from a potential open quote in cases like the classic Fish 'n' Chips, short of enormous amounts of cultural context.

(Not to mention primes, okinas, glottal stops and various other uses of the apostrophe...)

The best thing to do, of course, is install a keyboard layout that can type smart quotes directly. I have ‘’ on AltGr+[], "" on AltGr+Shift+[], –— on AltGr+[Shift]+dash, and so on.

这篇关于思路转换直引号为弯引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆