什么是Pascal Strings? [英] What are Pascal Strings?

查看:185
本文介绍了什么是Pascal Strings?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

他们是用编程语言命名的还是数学家?



Pascal字符串的定义特征是什么?在维基百科关于字符串的文章,似乎定义的特性是将字符串的长度存储在第一个字节。在另一篇文章中,我得到一个印象:字符串的内存布局也很重要。



虽然在使用一个无关的SO线程时,有人提到 Pascal字符串使Excel快速。 Pascal字符串在空终止字符串上的优点是什么?或者更一般地说,在什么情况下,Pascal的字符串优于



Pascal字符串是否以任何其他语言实现?



最后,我是否使用这两个词(Pascal Strings)或仅使用第一个(Pascal String)?我是一名技术作家...

解决方案

Pascal字符串被一个具体而非常有影响力的Pascal实现流行,命名为UCSD。所以 UCSD字符串是一个更好的术语。这是使字节码解释器流行的一样的实现。



一般来说,它不是一种特定的类型,而是以字符数据为前缀的基本原则。这使得获取长度是一个恒定的时间操作(O(1)),而不是扫描nul字符的字符数据。



并不是所有的Pascals都使用这个概念。 IIRC,原来(七十年代)的惯例是为了分配空间,向后扫描非空格字符(使得字符串不可能有终止空间)。此外,由于软件主要被孤立使用,所以使用各种方案,通常基于对实现/架构有利的方案。



Borland最流行的方言(Turbo Pascal,Delphi和Free Pascal)通常基于UCSD方言,因此具有pascal字符串,Delphi目前有5个这样的字符串。 (short / ansi / wide / unicode / open)



另一方面,这意味着在循环中,您需要根据索引进行一些额外的检查,以检查



所以,而是使用



复制一个字符串$

  while(p ^)开始P ^ = p2 ^; inc(p)inc(p2);结束; 

完全相当于

  while(* s ++ = * t ++);使用优化编译器时,C中的



你需要做例如

  while(len> 0)do begin p ^:= p2 ^; inc(p)inc(p2); dec(len);结束; 

甚至

 code> i:= 1; 
while(i< = len)do begin p [i]:= p2 [i]; inc(i);结束;

这使得Pascal字符串循环中的指令数量略大于等效的零终止字符串,以及增加一个活价值。此外,UCSD是一种字节码(p-code)解释器语言,后一个基于pascal字符串使用的代码是安全的。



内置后置增量(++)操作符(像PDP-8,11的C原来是开发的),指针版本更便宜,特别是没有优化。现在优化编译器可以轻松地检测到任何这些结构,并将它们转换成最好的。



更重要的是,由于九十年代初的安全性变得更加重要,一般只是依赖在null终止的字符串属性被皱眉,因为验证中的小错误可能会导致潜在的可利用的缓冲区溢出问题。 C及其标准因此不再使用旧的字符串使用,现在使用需要传递最大长度的较旧字符串例程(strNcpy等)的-n-版本。这很像一个手工管理的Pascal字符串原理,程序员必须照顾传递长度(或最大缓冲区大小)。



长度前缀字符串也是广泛用于文件格式,因为显然,前面读取的字节数是有用的。


Are they named after the programming language, or the mathematician?

What are the defining characteristics of Pascal strings? In Wikipedia's article on strings it seems like the defining characteristic is storing the length of the string in the first byte. In another article I get the impression that the memory layout of the strings is also important.

While perusing an unrelated SO thread somebody mentioned that Pascal strings make Excel fast. What are the advantages of Pascal strings over null-terminated strings? Or more generally, in what situations do Pascal strings excel?

Are Pascal strings implemented in any other languages?

Last, do I capitalize both words ("Pascal Strings") or only the first ("Pascal strings")? I'm a technical writer...

解决方案

Pascal strings were made popular by one specific, but huge influential Pascal implementation, named UCSD. So UCSD Strings is a better term. This is the same implementation that made bytecode interpreters popular.

In general it is not one specific type, but the basic principle of having the size prefixed to the character data. This makes getting the length a constant time operation (O(1)) instead of scanning the character data for a nul character.

Not all Pascals used this concept. IIRC, the original (seventies) convention was to space pad an allocation, and scan backwards for a non space character (making it impossible for strings to have a terminating space). Moreover, since software was mostly used in isolation, all kinds of schemes were used, often based on what was advantageous for that implementation/architecture.

The most popular dialects from Borland (Turbo Pascal, Delphi and Free Pascal) generally base themselves on UCSD dialect, and thus have pascal strings, Delphi currently has 5 such strings. (short/ansi/wide/unicode/open)

On the other hand, this means that in a loop, you need some additional check based on indexes to check for the end of the string.

So instead by copying a string using

while (p^) do begin P^=p2^; inc(p) inc(p2); end;

which is wholly equivalent to

while (*s++ = *t++);

in C when using an optimizing compiler.

you need to do e.g.

while (len>0) do begin p^:=p2^; inc(p) inc(p2); dec(len); end;

or even

i:=1;
while (i<=len) do begin p[i]:=p2[i]; inc(i); end;

This made the number of instructions in a Pascal string loop slightly larger than the equivalent zero terminated string, and adds one more live value. Additionally, UCSD was a bytecode (p-code) interpreter language, and the latter code based on pascal string use is "safe".

In case of an architecture that had built in post increment (++) operators (like the PDP-8,11's C was developed for originally), the pointer version was even cheaper, specially without optimization. Nowadays optimizing compilers could easily detect any of these constructs and convert them to whatever is best.

More importantly, since the early nineties security became more important, and in general solely relying on null terminated strings property is frowned upon because small errors in validation can cause potentially exploitable buffer overflow issues. C and the its standards therefore deprecated the old string use, and now use "-n-" versions of the older string routines (strNcpy etc) that need a maximal length to be passed. This is pretty much like a manually managed Pascal strings principle, where the programmer must take care of passing the length (or maximum buffer size) around.

Length prefixed strings are also used extensively in file format, because, obviously, it is useful the number of bytes to read up front.

这篇关于什么是Pascal Strings?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆