如何在http请求体中支持中文?二郎神 [英] how to support chinese in http request body? erlang

查看:254
本文介绍了如何在http请求体中支持中文?二郎神的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

    URL = http://example.com,
Header = [],
Type = "application/json",
Content = "我是中文",

Body = lists:concat(["{\"type\":\"0\",\"result\":[{\"url\":\"test.cn\",\"content\":\"", unicode:characters_to_list(Content), "\"}]}"]),
lager:debug("URL:~p, Body:~p~n", [URL, Body]),
HTTPOptions = [],
Options = [],
Response = httpc:request(post, {URL, Header, Type, Body}, HTTPOptions, Options),

但http服务器收到http请求的身份不是我是中文。
如何解决这个问题?

but http request body received by http server is not 我是中文。 how to sovle this problem?

推荐答案

运气的编码



您必须特别注意确保输入是您认为的,因为它可能与您期望的不同。

Luck of the Encoding

You must take special care to ensure input is what you think it is because it may differ from what you expect.

此答案适用于我正在运行的Erlang版本,它是 R16B03-1 。我会尝试获得所有的细节在这里,所以你可以测试与您自己的安装和验证。

This answer applies to the Erlang release that I'm running which is R16B03-1. I'll try to get all of the details in here so you can test with your own install and verify.

如果你不采取具体的行动来改变它,字符串将被解释如下:

If you don't take specific action to change it, a string will be interpreted as follows:

TerminalContent = "我是中文",
TerminalContent = [25105,26159,20013,25991].

在终端中,字符串被解释为unicode字符的列表。

In the terminal the string is interpreted as a list of unicode characters.

BytewiseContent = "我是中文",
BytewiseContent = [230,136,145,230,152,175,228,184,173,230,150,135].

在一个模块中,默认编码为 latin1 和包含unicode字符的字符串被解释为 bytewise 列表(UTF8字节)。

In a module, the default encoding is latin1 and strings containing unicode characters are interpreted bytewise lists (of UTF8 bytes).

如果使用像 BytewiseContent unicode:characters_to_list / 1 将双字编码汉字和ææ¯ä 将发送到您预期的服务器我是中文

If you use data encoded like BytewiseContent, unicode:characters_to_list/1 will double-encode the Chinese characters and ææ¯ä will be sent to the server where you expected 我是中文.


  1. 指定每个源文件和术语文件的编码。

  2. 如果运行 erl 命令行,确保设置为使用unicode。

  3. 如果从文件读取数据,请从 bytewise 编码到unicode之前处理(这是为使用 httpc:request / N 获取的二进制数据)。

  1. Specify the encoding for each source file and term file.
  2. If you run an erl command line, ensure it is setup to use unicode.
  3. If you read data from files, translate the bytes from the bytewise encoding to unicode before processing (this goes for binary data acquired using httpc:request/N as well).

如果您在模块中嵌入unicode字符,请确保通过在模块的前两行注释来指示尽可能多: / p>

If you embed unicode characters in your module, ensure that you indicate as much by commenting within the first two lines of your module:

%% -*- coding: utf-8 -*-

这将改变模块解释字符串的方式:

This will change the way the module interprets the string such that:

UnicodeContent = "我是中文",
UnicodeContent = [25105,26159,20013,25991].

确保连接字符而不是字节后,连接是安全的。不要使用 unicode:characters_to_list / 1 转换你的字符串/列表,直到整个事情被建立起来。

Once you have ensured that you are concatenating characters and not bytes, the concatenation is safe. Don't use unicode:characters_to_list/1 to convert your string/list until the whole thing has been built up.

当给定 Url 和unicode字符列表内容

The following function works as expected when given a Url and a list of unicode character Content:

http_post_content(Url, Content) ->
    ContentType = "application/json",
    %% Concat the list of (character) lists
    Body = lists:concat(["{\"content\":\"", Content, "\"}"]),
    %% Explicitly encode to UTF8 before sending
    UnicodeBin = unicode:characters_to_binary(Body),
    httpc:request(post,
        {
            Url,
            [],          % HTTP headers
            ContentType, % content-type
            UnicodeBin   % the body as binary (UTF8)
            },
        [],            % HTTP Options
        [{body_format,binary}] % indicate the body is already binary
        ).

要验证结果,我使用 node.js express 。这个 简单服务器的唯一目的就是要健全检查问题和解决方案。

To verify results I wrote the following HTTP server using node.js and express. The sole purpose of this dead-simple server is to sanity check the problem and solution.

var express = require('express'),
bodyParser = require('body-parser'),
util = require('util');

var app = express();

app.use(bodyParser());

app.get('/', function(req, res){
  res.send('You probably want to perform an HTTP POST');
});

app.post('/', function(req, res){
  util.log("body: "+util.inspect(req.body, false, 99));
  res.json(req.body);
});

app.listen(3000);

Gist

再次在Erlang中,以下功能将检查确保HTTP响应包含回显的JSON,并确保返回确切的unicode字符。

Again in Erlang, the following function will check to ensure that the HTTP response contains the echoed JSON, and ensures the exact unicode characters were returned.

verify_response({ok, {{_, 200, _}, _, Response}}, SentContent) ->
    %% use jiffy to decode the JSON response
    {Props} = jiffy:decode(Response),
    %% pull out the "content" property value
    ContentBin = proplists:get_value(<<"content">>, Props),
    %% convert the binary value to unicode characters,
    %% it should equal what we sent.
    case unicode:characters_to_list(ContentBin) of
        SentContent -> ok;
        Other ->
            {error, [
                {expected, SentContent},
                {received, Other}
                ]}
    end;
verify_response(Unexpected, _) ->
    {error, {http_request_failed, Unexpected}}.

完整的 example.erl 模块发布在Gist

The complete example.erl module is posted in a Gist.

一旦你我们已经编写了示例模块,并且运行了一个echo服务器,您将在Erlang shell中运行这样的操作:

Once you've got the example module compiled and an echo server running you'll want to run something like this in an Erlang shell:

inets:start().

Url = example:url().

Content = example:content().

Response = example:http_post_content(Url, Content).

如果您已经设置了 jiffy 您还可以验证进行往返的内容:

If you've got jiffy set up you can also verify the content made the round trip:

example:verify_response(Response, Content).

您现在应该可以确认任何unicode内容的往返编码。

You should now be able to confirm round-trip encoding of any unicode content.

虽然我解释了上面的编码,你会注意到 TerminalContent BytewiseContent UnicodeContent 都是整数列表。你应该以一种允许你确定你手头上的方式来努力编写代码。

While I explained the encodings above you will have noticed that TerminalContent, BytewiseContent, and UnicodeContent are all lists of integers. You should endeavor to code in a manner that allows you to be certain what you have in hand.

奇怪的编码是 bytewise ,当使用不是unicode感知的模块时,可能会出现这种情况。 Erlang关于使用unicode的指导在标题下面附近提到这个标题 UTF-8字节列表。要翻译 bytewise 列表使用:

The oddball encoding is bytewise which may turn up when working with modules that are not "unicode aware". Erlang's guidance on working with unicode mentions this near the bottom under the heading Lists of UTF-8 Bytes. To translate bytewise lists use:

%% from http://www.erlang.org/doc/apps/stdlib/unicode_usage.html
utf8_list_to_string(StrangeList) ->
    unicode:characters_to_list(list_to_binary(StrangeList)).






我的设置



据我所知,我没有修改Erlang行为的本地设置。我的Erlang是 R16B03-1 ,由 Erlang Solutions ,我的机器运行OS X 10.9.2。


My Setup

As far as I know, I don't have local settings that modify Erlang's behavior. My Erlang is R16B03-1 built and distributed by Erlang Solutions, my machine runs OS X 10.9.2.

这篇关于如何在http请求体中支持中文?二郎神的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆