为什么不推荐使用 strtok() ? [英] Why should strtok() be deprecated?

查看:50
本文介绍了为什么不推荐使用 strtok() ?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从很多程序员那里听说 strtok 的使用在不久的将来可能会被弃用.有人说还是.为什么这是一个糟糕的选择?strtok() 在标记给定字符串方面效果很好.它与时间和空间的复杂性有关系吗?我在互联网上找到的最佳链接是 这个.但这似乎并不能解决我的好奇心.如果可能,建议任何替代方案.

I hear this from a lot of programmers that the use of strtok maybe deprecated in near future. Some say it is still. Why is it a bad choice? strtok() works great in tokenizing a given string. Does it have to do anything with the time and space complexities? Best link I found on the internet was this. But that doesn't seem to solve my curiousity. Suggest any alternatives if possible.

推荐答案

为什么这是一个糟糕的选择?

Why is it a bad choice?

通过编程解决问题的基本技术是构造抽象,可以可靠地解决子问题,然后组合将这些子问题的解决方案转化为更大问题的解决方案.

The fundamental technique for solving problems by programming is to construct abstractions which can be used reliably to solve sub-problems, and then compose solutions to those sub-problems into solutions to larger problems.

strtok 的行为以多种方式直接违背这些目标;这是一个糟糕的抽象,不可靠,因为它的组合很差.

strtok's behaviour works directly against these goals in a variety of ways; it is a poor abstraction that is unreliable because it composes poorly.

标记化的基本问题是:给定字符串中的一个位置,给出从该位置开始的标记结尾的位置.如果 strtok 只那样做,那就太好了.它将有一个清晰的抽象,它不会依赖隐藏的全局状态,它不会修改它的输入.

The fundamental problem of tokenization is: given a position in a string, give the position of the end of the token beginning at that position. If strtok did only that, it would be great. It would have a clear abstraction, it would not rely on hidden global state, it would not modify its inputs.

要查看 strtok 的局限性,请想象尝试对一种我们希望用空格分隔标记的语言进行标记化,除非标记包含在 " " 中,其中如果我们希望对引用区域的内容应用不同的标记化规则,然后使用空格分隔规则进行处理.strtok 与自身的组合非常糟糕,因此仅对最琐碎的标记化任务有用.

To see the limitations of strtok, imagine trying to tokenize a language where we wish to separate tokens by spaces, unless the token is enclosed in " ", in which case we wish to apply a different tokenization rule to the contents of the quoted area, and then pick up with the space separation rule after. strtok composes very poorly with itself, and is therefore only useful for the most trivial of tokenization tasks.

它是否与时间和空间的复杂性有关?

Does it have to do anything with the time and space complexities?

没有

如果可能,建议任何替代方案.

Suggest any alternatives if possible.

词法分析器不难写;只写一个!

Lexers are not hard to write; just write one!

如果您编写了一个不可变词法分析器,则可以获得加分.不可变词法分析器是一个小结构,它包含对被词法分析的字符串的引用、词法分析器的当前位置以及词法分析器所需的任何状态.要提取一个标记,你调用下一个标记"方法,传入词法分析器,然后你得到标记和一个新的词法分析器.然后可以使用新的词法分析器对 next 标记进行词法分析,如果愿意,您可以丢弃前一个词法分析器.

Bonus points if you write an immutable lexer. An immutable lexer is a little struct that contains a reference to the string being lexed, the current position of the lexer, and any state needed by the lexer. To extract a token you call a "next token" method, pass in the lexer, and you get back the token and a new lexer. The new lexer can then be used to lex the next token, and you discard the previous lexer if you wish.

不可变词法分析器技术比修改状态的词法分析器更容易推理.您可以通过将丢弃的词法分析器保存在列表中来调试它们,现在您可以立即查看完整的标记化操作历史记录.

The immutable lexer technique is easier to reason about than lexers which modify state. And you can debug them by saving the discarded lexers in a list, and now you have the complete history of tokenization operations open to inspection at once.

这篇关于为什么不推荐使用 strtok() ?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆