如何检查输入是否为Erlang中的字符串? [英] How to check whether input is a string in Erlang?

查看:262
本文介绍了如何检查输入是否为Erlang中的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想编写一个函数来检查输入是否为字符串:

I would like to write a function to check if the input is a string or not like this:

is_string(Input) ->
  case check_if_string(Input) of
    true  -> {ok, Input};
    false -> error
  end.

但是我发现检查输入是否为Erlang中的字符串是很棘手的。
Erlang中的字符串定义在这里: http://erlang.org/doc/ man / string.html

But I found it is tricky to check whether the input is a string in Erlang. The string definition in Erlang is here: http://erlang.org/doc/man/string.html.

有什么建议吗?

谢谢。

推荐答案

在Erlang中,字符串实际上可以包含很多东西,因此有多种方法可以实现,具体取决于您的意思。 一个字符串。值得记住的是,Erlang中的每种字符串都是某种形式的字符或词素值的列表。

In Erlang a string can be actually quite a few things, so there are a few ways to do this depending on exactly what you mean by "a string". It is worth bearing in mind that every sort of string in Erlang is a list of character or lexeme values of some sort.

编码并不是一件简单的事情,尤其是当Unicode是参与。字符几乎可以是任意高的值,词素在整数的深列表中被聚集在一起,而Erlang iolist() s(超级有用)是混合整数和二进制值的深层列表,它们在某些操作期间会自动变平并转换。如果您要处理的不是可打印ASCII值的平面列表,那么我强烈建议您阅读以下内容:

Encodings are not simple things, particularly when Unicode is involved. Characters can be almost arbitrarily high values, lexemes are globbed together in deep lists of integers, and Erlang iolist()s (which are super useful) are deep lists of mixed integer and binary values that get automatically flattened and converted during certain operations. If you are dealing with anything other than flat lists of printable ASCII values then I strongly recommend you read these:

  • Unicode module docs
  • String module docs
  • IO Library module docs

所以...这不是一个非常简单的问题。

So... this is not a very simple question.

如何处理所有混乱?

始终有效的快速答案:考虑数据的来源

Quick answer that always works: Consider the origin of the data.

您应该知道哪种您正在处理的数据,无论是通过套接字还是来自文件,或者是尤其是(如果您自己生成)。但是,在系统的边缘,您可能需要一些帮助来净化数据,因为网络客户端会不时发送各种随机垃圾。

You should know what kind of data you are dealing with, whether it is coming over a socket or from a file, or especially if you are generating it yourself. On the edges of your system you may need some help purifying data, though, because network clients send all sorts of random trash from time to time.

某些帮助功能最常见的情况存在于io_lib模块中:

Some helper functions for the most common cases live in the io_lib module:

  • io_lib:char_list/1: Returns true if the input is a list of characters in the unicode range.
  • io_lib:deep_char_list/1: Returns true if the input is a deep list of legal chars.
  • io_lib:deep_latin1_char_list/1: Returns true if the input is a deep list of Latin-1 (your basic printable ASCII values from 32 to 126).
  • io_lib:latin1_char_list/1: Returns true if the input is a flat list of Latin-1 characters (90% of the time this is what you're looking for)
  • io_lib:printable_latin1_list/1: Returns true if the input is a list of printable Latin-1 (If the above isn't what you wanted, 9% of the time this is the one you want)
  • io_lib:printable_list/1: Returns true if the input is a flat list of printable chars.
  • io_lib:printable_unicode_list/1: Returns true if the input is a flat list of printable unicode chars (for that 1% of the time that this is your problem -- except that for some of us, myself included here in Japan, this covers 99% of my input checking cases).

对于更特殊的情况,您可以使用 re模块中的正则表达式或编写您的正则表达式自己的递归函数,在某些特殊情况下正则表达式无法通过正则表达式压缩,该正则表达式不适合,不可能或可能使您容易受到正则表达式攻击

For more particular cases you can either use a regex from the re module or write your own recursive function that zips through a string for those special cases where a regex either doesn't fit, is impossible, or could make you vulnerable to regex attacks.

这篇关于如何检查输入是否为Erlang中的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆