Python asyncio / aiohttp:ValueError:Windows上的select()中的文件描述符过多 [英] Python asyncio/aiohttp: ValueError: too many file descriptors in select() on Windows

查看:236
本文介绍了Python asyncio / aiohttp:ValueError:Windows上的select()中的文件描述符过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述


大家好,
我在尝试理解asyncio和aiohttp并使两者正常工作方面遇到困难。不仅我不正确地了解自己在做什么,这时我遇到了一个我不知道如何解决的问题。



m使用Windows 10 64位(最新更新)。



以下代码使用asyncio向我返回了标头的Content-Type中不含html的页面列表。

  import asyncio 
import aiohttp

MAXitems = 30

异步定义getHeaders(url,session,sema):
与会话异步:
与sema异步:
试试:
与会话异步。 head(url)作为响应:
尝试:
如果response.headers [ Content-Type]中的 html:
返回url,True
否则:
返回网址,假
除外:
返回网址,假
除外:
返回网址,False


def checkUrlsWithoutHtml(listOfUrls):
headersWithoutHtml = set()
while(len(listOfUrls)!= 0):
blockurls = []
print(len(listOfUrls))
项目= 0
表示范围(0,len(listOfUrls))中的num:如果num<
最大值:
blockurls.append(listOfUrls [num-items])
listOfUrls.remove(listOfUrls [num-items])
项目+ = 1
loop = asyncio.get_event_loop( )
semaphoreHeaders = asyncio.Semaphore(50)
会话= aiohttp.ClientSession()
data = loop.run_until_complete(asyncio.gather(*(getHeaders(url,session,semaphoreHeaders)for url)在blockurls中)))
用于数据中的标头:
如果False == header [1]:
headersWithoutHtml.add(header)
返回headersWithoutHtml


listOfUrls = ['http://www.google.com,'http://www.reddit.com']
headersWithoutHtml = checkUrlsWithoutHtml(listOfUrls)

对于headersWithoutHtml中的标题:
print(header [0])


当我运行它时,假设有2000个网址(有时),返回的内容如下:

 数据= loop.run_until_compl ete(asyncio.gather(*(getHeaders(url,session,semaphoreHeaders)用于blockurl中的URL)))
文件 USER\AppData\Local\Programs\Python\Python36-32\lib syncasyncio\base_events.py,行454,在run_until_complete
self.run_forever()
文件 USER\AppData\Local\Programs\Python\Python36-32\ lib_asyncio\base_events.py,第421行,位于run_forever
self._run_once()
文件 USER\AppData\Local\Programs\Python\Python36-32\ \lib\asyncio\base_events.py,行1390,在_run_once中
event_list = self._selector.select(timeout)
文件 USER\AppData\Local\Programs\Python \Python36-32\lib\selectors.py,第323行,在select
r中,w,_ = self._select(self._readers,self._writers,[],超时)
文件 USER\AppData\Local,Programs\Python\Python36-32\lib\selectors.py,第314行,在_select
r,w,x = select.select(r, w,w,超时)
ValueError:select()

<斯特罗ng> Note1 :我在用户中用USER编辑了我的名字。



Note2 :无论出于何种原因,reddit.com返回因为它不包含HTML,所以这是一个完全独立的问题,我将尝试解决,但是,如果您发现我的代码中有其他一些不一致之处可以解决,请指出。



Note3 :我的代码结构不好,因为我试图更改许多东西来调试此问题,但是我没有运气。



我听说某处是Windows的限制,无法绕开它,问题是:



a )我直接不明白 select()中的文件描述符太多的含义。



b)Windows无法处理的我在做什么错?我见过人们使用asyncio和aiohttp推送成千上万的请求,但是即使遇到了麻烦,我也无法在没有出现值错误的情况下推送30-50?



编辑:结果显示MAXitems = 10尚未使我崩溃,但是由于我无法遵循模式,所以我不知道为什么或怎么告诉我任何事情。



Edit2 :没关系,它需要更多的时间来崩溃,但最终即使使用MAXitems = 10

也确实崩溃了。

方案

默认情况下,Windows在asyncio循环中只能使用64个套接字。这是基础选择() API调用。



要增加限制,请使用 ProactorEventLoop 。可以在此处找到安装说明。。 p>

Hello everyone, I'm having trouble trying to understand asyncio and aiohttp and making both work together properly. Not only I don't properly understand what I'm doing, at this point I've run into a problem that I have no idea how to solve.

I'm using Windows 10 64 bits, latest update.

The following code returns me a list of pages that do not contain html in the Content-Type in the header using asyncio.

import asyncio
import aiohttp

MAXitems = 30

async def getHeaders(url, session, sema):
    async with session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def checkUrlsWithoutHtml(listOfUrls):
    headersWithoutHtml = set()
    while(len(listOfUrls) != 0):
        blockurls = []
        print(len(listOfUrls))
        items = 0
        for num in range(0, len(listOfUrls)):
            if num < MAXitems:
                blockurls.append(listOfUrls[num - items])
                listOfUrls.remove(listOfUrls[num - items])
                items +=1
        loop = asyncio.get_event_loop()
        semaphoreHeaders = asyncio.Semaphore(50)
        session = aiohttp.ClientSession()
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if False == header[1]:
                headersWithoutHtml.add(header)
    return headersWithoutHtml


listOfUrls = ['http://www.google.com', 'http://www.reddit.com']
headersWithoutHtml=  checkUrlsWithoutHtml(listOfUrls)

for header in headersWithoutHtml:
    print(header[0])

When I run it with, let's say, 2000 urls (sometimes) it returns something like:

data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 454, in run_until_complete
    self.run_forever()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 421, in run_forever
    self._run_once()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 1390, in _run_once
    event_list = self._selector.select(timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 323, in select
    r, w, _ = self._select(self._readers, self._writers, [], timeout)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\selectors.py", line 314, in _select
    r, w, x = select.select(r, w, w, timeout)
ValueError: too many file descriptors in select()

Note1: I edited out my name with USER in the user.

Note2: For whatever reason reddit.com returns as it doesn't contain HTML, this is a completly separate problem that I will try to solve, however if you notice some other inconsistency in my code that would fix that please point it out.

Note3: My code is badly constructed because I've tried to change many things to try to debug this problem, but I've got no luck.

I've heard somewhere that this is a restriction of Windows and there is no way to bypass it, the problem is that:

a) I directly don't understand what "too many file descriptors in select()" means.

b) What I'm doing wrong that Windows can't handle? I've seen people push thousands of requests with asyncio and aiohttp but even with my chuncking I can't push 30-50 without getting a Value Error?

Edit: Turns out with MAXitems = 10 it hasn't crashed me yet, but because I can't follow the pattern I have no idea why or how that tells me anything.

Edit2: Nevermind, it needed more time to crash, but it did eventually even with MAXitems = 10

解决方案

By default Windows can use only 64 sockets in asyncio loop. This is a limitation of underlying select() API call.

To increase the limit please use ProactorEventLoop. Instructions for installation can be found here.

这篇关于Python asyncio / aiohttp:ValueError:Windows上的select()中的文件描述符过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆