UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束 [英] UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 34: unexpected end of data

查看：5259 发布时间：2016/11/19 15:56:19 python utf-8 character-encoding decoding

本文介绍了UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想写一个剪贴簿，但我遇到了编码问题。当我试图复制我正在寻找到我的文本文件的字符串， python2.7 告诉我它不能识别编码，尽管没有特殊字符。不知道这是否有用的信息。

我的代码如下所示：

 来自urllib import FancyURLopener 
 import os 
 
 class MyOpener（FancyURLopener）：#spoofs一个真实的浏览器窗口
 version ='Mozilla / 5.0（Windows; U; Windows NT 5.1; it; rv：1.8.1.11）Gecko / 20071127 Firefox / 2.0.0.11'
 
 print什么是webaddress？ 
 webaddress = raw_input（8 ::>）
 
打印文件夹名称？ 
 foldername = raw_input（8 ::>）
 
如果不是os.path.exists（文件夹名）：
 os.makedirs（文件夹名）
 
 def urlpuller（start，page）：
 while page [start]！=''：
 start + = 1 
 close = start 
 ！=''：
 close + = 1 
 return page [start：close] 
 
 myopener = MyOpener（）
 
 response = myopener。 open（webaddress）
 site = response.read（）
 
 nexturl =''
 counter = 0 
 
 while（nexturl！= webaddress）： 
 counter + = 1 
 start = 0 
 
 for i in range（len（site）-35）：
 if site [i：i + 35]。 decode（'utf-8'）== u'< img id =imgSizedclass =slideImg'：
 start = i + 40 
 break 
 else：
 printSomething's broken，chief。Error = 1
 
 next = 0 
 
 for i in range（start，8，-1）：
 if site [i：i + 8] == u'< a href ='：
 next = i 
 break 
 else：
 print Error = 2
 
 nexturl = urlpuller（next，site）
 
 myopener.retrieve（urlpuller（start，site），foldername +'/'+ foldername + str +'。jpg'）
 
 print（Retrieval of+ foldername +completed。）

b $ b

当我尝试使用我使用的网站运行它时，它返回错误：

  （最近最近一次调用）：
在< module>中的文件yada / yadayada / Python / scraper.py，第37行，
 if site [i：i + 35] .decode -8'）== u'< img id =imgSizedclass =slideImg'：
文件/usr/lib/python2.7/encodings/utf_8.py，第16行， 
 return codecs.utf_8_decode（input，errors，True）
 UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外结束数据

指向 http://google.com 时，它工作正常罚款。

 < meta http-equiv =Content-Typecontent =text / html; charset = utf-8>

但是当我尝试使用utf-8解码时，

解决方案

<$>

< p $ p>

 site [i：i + 35] .decode（'utf-8'）

b $ b

你不能随机分割你收到的字节，然后要求UTF-8解码它。UTF-8是一个多字节编码，意思是你可以有1到6个字节表示一个字符。你把它砍成一半，并要求Python解码它，它会抛出你的意外的数据错误。

查看为您制作的工具。 BeautifulSoup 或 lxml 是两种选择。

I'm trying to write a scrapper, but I'm having issues with encoding. When I tried to copy the string I was looking for into my text file, python2.7 told me it didn't recognize the encoding, despite no special characters. Don't know if that's useful info.

My code looks like this:

from urllib import FancyURLopener
import os

class MyOpener(FancyURLopener): #spoofs a real browser on Window
   version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

print "What is the webaddress?"
webaddress = raw_input("8::>")

print "Folder Name?"
foldername = raw_input("8::>")

if not os.path.exists(foldername):
    os.makedirs(foldername)

def urlpuller(start, page):
   while page[start]!= '"':
      start += 1
   close = start
   while page[close]!='"':
      close += 1
   return page[start:close]

myopener = MyOpener()

response = myopener.open(webaddress)
site = response.read()

nexturl = ''
counter = 0

while(nexturl!=webaddress):
   counter += 1
   start = 0

   for i in range(len(site)-35):
       if site[i:i+35].decode('utf-8') == u'<img id="imgSized" class="slideImg"':
         start = i + 40
         break
   else:
      print "Something's broken, chief. Error = 1"

   next = 0

   for i in range(start, 8, -1):
      if site[i:i+8] == u'<a href=':
         next = i
         break
   else:
      print "Something's broken, chief. Error = 2"

   nexturl = urlpuller(next, site)

   myopener.retrieve(urlpuller(start,site),foldername+'/'+foldername+str(counter)+'.jpg')

print("Retrieval of "+foldername+" completed.")

When I try to run it using the site I'm using, it returns the error:

Traceback (most recent call last):
  File "yada/yadayada/Python/scraper.py", line 37, in <module>
    if site[i:i+35].decode('utf-8') == u'<img id="imgSized" class="slideImg"':
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 34: unexpected end of data

When pointed at http://google.com, it worked just fine.

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

but when I try to decode using utf-8, as you can see, it does not work.

Any suggestions?

解决方案

site[i:i+35].decode('utf-8')

You cannot randomly partition the bytes you've received and then ask UTF-8 to decode it. UTF-8 is a multibyte encoding, meaning you can have anywhere from 1 to 6 bytes to represent one character. If you chop that in half, and ask Python to decode it, it will throw you the unexpected end of data error.

Look into a tool that has this built for you. BeautifulSoup or lxml are two alternatives.

这篇关于UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束 [英] UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 34: unexpected end of data

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束 [英] UnicodeDecodeError: &#39;utf8&#39; codec can&#39;t decode byte 0xc3 in position 34: unexpected end of data

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

UnicodeDecodeError：'utf8'编解码器无法解码字节0xc3在位置34：意外的数据结束 [英] UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 34: unexpected end of data

登录关闭