Python POST请求编码 [英] Python POST request encoding

查看:904
本文介绍了Python POST请求编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这里的情况是,我发送POST请求并试图用Python
获取响应问题是它会扭曲非拉丁字母,当我通过直接链接获取相同页面时,它不会发生没有搜索结果),但POST请求不会生成链接



这是我做的:

 导入urllib 
导入urllib2
url ='http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'
data =' q =总线&安培; ieskoti =真&安培;语言1 = EN&安培; LANG2 = EN + - %3E + LT +%28 + 71813 + lygiagre%C4%8DI%C5%B3 + sakini%C5%B3 +%29&安培; lentele = vertikalus&安培; REG =假安培; rodyti = dalis& rusiuoti = freq'
req = urllib2.Request(url,data)
response = urllib2.urlopen(req)
the_page = response.read()
file = open(pagesource.txt,w)
file.write(the_page)
file.close()

每当我尝试时

  thepage = the_page.encode('utf-8' )

ig et error':

  UnicodeDecodeError:'ascii'编解码器无法解码位置1008中的字节0xc5:序号不在范围内(128 )

每当我尝试改变回应标题Content-Type:text / html; charset = utf-8 ,我做

$ $ p $ response ['Content-Type'] ='text / html; charset = utf-8'

我得到这个错误:

  AttributeError:addinfourl实例没有属性'__setitem__'

我的问题:是否有可能编辑或删除响应或请求标头?
如果没有,是否有另一种方法来解决这个问题,其他复制源记事本++和手动固定编码?



我是新来的Python和数据挖掘,真的希望你能让我知道我是否做错了什么?



谢谢

解决方案

为什么不试试 thepage = the_page.decode('utf-8')而不是 encode 因为你想要从utf-8编码文本转换为unicode - 编码不可知 - 内部字符串?

here's the situation, i'm sending POST requests and trying to fetch the response with Python problem is that it distorts non latin letters, which doesn't happen when i fetch the same page with direct link (with no search results), but POST requests wont generate a link

here's what i do:

import urllib
import urllib2
url = 'http://donelaitis.vdu.lt/main_helper.php?id=4&nr=1_2_11'
data = 'q=bus&ieskoti=true&lang1=en&lang2=en+-%3E+lt+%28+71813+lygiagre%C4%8Di%C5%B3+sakini%C5%B3+%29&lentele=vertikalus&reg=false&rodyti=dalis&rusiuoti=freq' 
req = urllib2.Request(url, data)
response = urllib2.urlopen(req)
the_page = response.read()
file = open("pagesource.txt", "w")
file.write(the_page)
file.close()

whenever i try

thepage = the_page.encode('utf-8')

i get this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc5 in position 1008: ordinal not in range(128)

whenever i try do change response header Content-Type:text/html;charset=utf-8, i do

response['Content-Type'] = 'text/html;charset=utf-8'

i get this error:

AttributeError: addinfourl instance has no attribute '__setitem__'

My question: is it possible to edit or remove response or request headers? if not, is there another way to solve this problem other that copying source to notepad++ and fixing encoding manually?

i'm new to python and data mining, really hope you'd let me know if i;m doing something wrong

thanks

解决方案

Why don't your try thepage = the_page.decode('utf-8')instead of encode since what you want is to move from utf-8 encoded text to unicode - coding agnostic - internal strings?

这篇关于Python POST请求编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆