erlang使用正则表达式将字符串解析为数据类型 [英] erlang parse string to data types using regex
问题描述
我正在尝试使用erlang创建解析器,以希望识别字符串中的数据类型。搜索之后,我找不到作为我的任何存在的问题:
I'm trying to make a parser in erlang, in hope to recognize data types inside a string. After searching, I couldnt find any existed problem as mine:
-
原始字符串:
atom1, string2 、、 \ \,,{tuple3, pasces \,{[test]}, _#,test},< binary4\,> >> ;,#{map5 => 5,element => {e1,e2}},#record6 {r1 = 1,r2 = 2},<<< 300:16>>
需要解析的字符串: atom1,\ string2,,\\ \\ \\\,\,{tuple3,s \\\ \,{[test]},\ _#\ ,test},< \ binary4\\\ ,,> \> ;,#{map5 => 5,element => {e1,e2} },#record6 {r1 = 1,r2 = 2},<<<< 300:16>>>
预期输出:
+ number of params: 7
+ value ------> type"
- atom1 ------> Atom
- "string2,,\"\"," ------> String
- {tuple3, "s pa ces \"", {[test]},"_#",test} ------> Tuple
- <<"binary4\",,>>">> ------> Binary
- #{map5=>5, element=>{e1,e2}} ------> Map
- #record6{r1 = 1, r2 = 2} ------> Record
- <<300:16>> ------> Binary
但是我当前的代码无法按预期工作,这里是:
But my current code doesnt work as expected, here it is:
comma_parser(Params) ->
{ok, R} = re:compile("(\".*?\"|[^\",\\s]+)(?=\\s*,|\\s*$)"),
{match, Matches} = re:run(Params, R, [{capture, [1], list}, global]),
?DEBUG("truonggv1 - comma_parser: Matches: ~p~n", [Matches]),
[M || [M] <- Matches].
当前输出:
+ number of params: 14
+ value ------> type
- atom1 ------> Atom
- "string2,,\"\" ------> String
- ",{tuple3, "s pa ces \"" ------> String
- {[test]} ------> Tuple
- "_#" ------> String
- test} ------> Atom
- "binary4\" ------> String
- >> ------> Atom
- #{map5=>5 ------> Map
- element=>{e1 ------> Atom
- e2}} ------> Atom
- 1 ------> Atom
- 2} ------> Atom
- <<300:16>> ------> Binary
有人知道如何更正吗?
使用参数更新我的代码是我上面提到的需要解析的字符串:
update my codes with Params is the "string that is need to be parsed" that I have noted above:
check_params_by_comma(Params) ->
case string:str(Params, ",") of
0 ->
Result = Params;
1 ->
Result = "param starts with character ',' ~n";
_Comma_Pos ->
Parse_String = comma_parser(Params),
Result = "number of params: " ++ integer_to_list(length(Parse_String))
++ "\n\n\r\t value ------> type \n\r"
++ "\t*********************\n\r"
++ ["\t" ++ X ++ " ------> " ++ check_type(X) ++ "\n\r"|| X <- Parse_String]
end,
Result.
check_type(X) ->
Binary = string:str(X, "<<"),
String = string:str(X, "\""),
Tuple = string:str(X, "{"),
List = string:str(X, "["),
Map = string:str(X, "#{"),
case X of
_ when 1 == Binary -> "Binary";
_ when 1 == String -> "String";
_ when 1 == Tuple -> "Tuple";
_ when 1 == List -> "List";
_ when 1 == Map -> "Map";
_ -> "Atom"
end.
comma_parser(Params) ->
{ok, R} = re:compile("(\".*?\"|[^\",\\s]+)(?=\\s*,|\\s*$)"),
{match, Matches} = re:run(Params, R, [{capture, [1], list}, global]),
[M || [M] <- Matches].
推荐答案
我不太确定我了解您要达到的目标,但是让我告诉您我对您的意见做了什么,让我们看看是否对您有帮助。
您的情况似乎迫切需要 erl_scan:string 和 erl_parse:parse_exprs ,因此我尝试的第一件事。
I'm not entirely sure I understand what you're trying to achieve, but let me tell you what I did with your input and let's see if that helps you at all. Your situation seemed to be desperately calling for erl_scan:string and erl_parse:parse_exprs, so that's the first thing I tried.
这是我解析的原始版本:
This was my original version of the parsing:
-module(x).
-export([test/0, check_params_by_comma/1]).
test() ->
Input =
"atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
"{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
"#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
io:format("~p~n", [check_params_by_comma(Input)]).
check_params_by_comma(Params) ->
{ok, Tokens, _} = erl_scan:string(Params ++ "."),
{ok, Exprs} = erl_parse:parse_exprs(Tokens),
Exprs.
当然,这还不是全部,因为您需要不同类型的输出,但是我们差不多。从您的原始问题中复制演示文稿代码,我必须使用 erl_prettypr:format / 1 渲染术语,我最终得到了类似的东西:
Of course that was not all, since you wanted a different kind of output, but we were almost there. Copying the presentation code from your original question, I had to use erl_prettypr:format/1 to render the terms and I ended up with something like:
-module(x).
-export([test/0, check_params_by_comma/1]).
test() ->
Input =
"atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
"{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
"#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
io:format("~s~n", [check_params_by_comma(Input)]).
check_params_by_comma(Params) ->
Parse_String = comma_parser(Params),
"number of params: " ++ integer_to_list(length(Parse_String))
++ "\n\n\r\t value ------> type \n\r"
++ "\t*********************\n\r"
++ ["\t" ++ erl_prettypr:format(X) ++ " ------> " ++ check_type(X) ++ "\n\r"|| X <- Parse_String].
comma_parser(Params) ->
{ok, Tokens, _} = erl_scan:string(Params ++ "."),
{ok, Exprs} = erl_parse:parse_exprs(Tokens),
Exprs.
check_type({Type, _, _}) -> atom_to_list(Type);
check_type({Type, _, _, _}) -> atom_to_list(Type).
我认为这应该足以解决您的问题,但是作为奖励,让我重构一下使用iolist几乎可以完全获得所需的预期输出:
I think this should be enough to solve your problem but, as a bonus track, let me refactor this a bit using iolists to obtain almost exactly what the expected output required:
-module(x).
-export([test/0, check_params_by_comma/1]).
test() ->
Input =
"atom1,\"string2,,\\\"\\\",\",{tuple3, \"s pa ces \\\"\", "
"{[test]},\"_#\",test},<<\"binary4\\\",,>>\">>, "
"#{map5=>5, element=>{e1,e2}}, #record6{r1 = 1, r2 = 2} , <<300:16>>",
io:format("~s~n", [check_params_by_comma(Input)]).
check_params_by_comma(Params) ->
{ok, Tokens, _} = erl_scan:string(Params ++ "."),
{ok, Exprs} = erl_parse:parse_exprs(Tokens),
[
io_lib:format("+ number of params: ~p~n", [length(Exprs)]),
"+ value ------> type \n"
| lists:map(fun format_expr/1, Exprs)
].
format_expr(Expr) ->
io_lib:format(
"\t- ~s ------> ~s~n",
[erl_prettypr:format(Expr), string:titlecase(type(Expr))]
).
%% or you can do type(Expr) -> atom_to_list(hd(tuple_to_list(Expr))).
type({Type, _, _}) -> atom_to_list(Type);
type({Type, _, _, _}) -> atom_to_list(Type).
希望这会有所帮助:)
这篇关于erlang使用正则表达式将字符串解析为数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!