Comment limiter la simultanéité avec Python asyncio?

Question

Supposons que nous ayons un tas de liens à télécharger et que chacun des liens puisse prendre un temps différent à télécharger. Et je suis autorisé à télécharger en utilisant seulement 3 connexions maximum. Maintenant, je veux m'assurer de le faire efficacement en utilisant asyncio.

Voici ce que j'essaie de réaliser: à tout moment, essayez de vous assurer que j'ai au moins 3 téléchargements en cours d'exécution.

Connection 1: 1---------7---9--- Connection 2: 2---4----6----- Connection 3: 3-----5---8-----

Les chiffres représentent les liens de téléchargement, tandis que les tirets représentent En attente de téléchargement.

Voici le code que j'utilise en ce moment

from random import randint import asyncio count = 0 async def download(code, permit_download, no_concurrent, downloading_event): global count downloading_event.set() wait_time = randint(1, 3) print('downloading {} will take {} second(s)'.format(code, wait_time)) await asyncio.sleep(wait_time) # I/O, context will switch to main function print('downloaded {}'.format(code)) count -= 1 if count < no_concurrent and not permit_download.is_set(): permit_download.set() async def main(loop): global count permit_download = asyncio.Event() permit_download.set() downloading_event = asyncio.Event() no_concurrent = 3 i = 0 while i < 9: if permit_download.is_set(): count += 1 if count >= no_concurrent: permit_download.clear() loop.create_task(download(i, permit_download, no_concurrent, downloading_event)) await downloading_event.wait() # To force context to switch to download function downloading_event.clear() i += 1 else: await permit_download.wait() await asyncio.sleep(9) if __name__ == '__main__': loop = asyncio.get_event_loop() try: loop.run_until_complete(main(loop)) finally: loop.close()

Et la sortie est comme prévu:

downloading 0 will take 2 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 2 second(s) downloaded 0 downloading 4 will take 3 second(s) downloaded 1 downloaded 3 downloading 5 will take 2 second(s) downloading 6 will take 2 second(s) downloaded 5 downloaded 6 downloaded 4 downloading 7 will take 1 second(s) downloading 8 will take 1 second(s) downloaded 7 downloaded 8

Mais voici mes questions:

Pour le moment, j'attends simplement 9 secondes pour que la fonction principale continue de fonctionner jusqu'à ce que les téléchargements soient terminés. Existe-t-il un moyen efficace d'attendre la fin du dernier téléchargement avant de quitter la fonction principale? (Je sais qu'il y a asyncio.wait, mais je devrai stocker toutes les références de tâche pour que cela fonctionne)
Qu'est-ce qu'une bonne bibliothèque qui fait ce genre de tâche? Je sais que javascript a beaucoup de bibliothèques asynchrones, mais qu'en est-il de Python?

Edit: 2. Qu'est-ce qu'une bonne bibliothèque qui prend en charge les modèles asynchrones courants? (Quelque chose comme https://www.npmjs.com/package/async )

user4815162342 · Accepted Answer

Vous avez essentiellement besoin d'un pool de taille fixe de tâches de téléchargement. asyncio n'est pas livré avec une telle fonctionnalité, mais il est facile d'en créer une: conservez simplement un ensemble de tâches et ne lui permettez pas de dépasser la limite. Bien que la question indique votre réticence à emprunter cette voie, le code se révèle beaucoup plus élégant:

async def download(code): wait_time = randint(1, 3) print('downloading {} will take {} second(s)'.format(code, wait_time)) await asyncio.sleep(wait_time) # I/O, context will switch to main function print('downloaded {}'.format(code)) async def main(loop): no_concurrent = 3 dltasks = set() i = 0 while i < 9: if len(dltasks) >= no_concurrent: # Wait for some download to finish before adding a new one _done, dltasks = await asyncio.wait( dltasks, return_when=asyncio.FIRST_COMPLETED) dltasks.add(loop.create_task(download(i))) i += 1 # Wait for the remaining downloads to finish await asyncio.wait(dltasks)

Une alternative est de créer un nombre fixe de coroutines effectuant le téléchargement, un peu comme un pool de threads de taille fixe, et de les alimenter en utilisant un asyncio.Queue. Cela supprime la nécessité de limiter manuellement le nombre de téléchargements, qui sera automatiquement limité par le nombre de coroutines invoquant download():

# download() defined as above async def download_from(q): while True: code = await q.get() if code is None: # pass on the Word that we're done, and exit await q.put(None) break await download(code) async def main(loop): q = asyncio.Queue() dltasks = [loop.create_task(download_from(q)) for _ in range(3)] i = 0 while i < 9: await q.put(i) i += 1 # Inform the consumers there is no more work. await q.put(None) await asyncio.wait(dltasks)

Quant à votre autre question, le choix évident serait aiohttp .

Mikhail Gerasimov · Answer

Si je ne me trompe pas, vous recherchez asyncio.Semaphore . Exemple d'utilisation:

import asyncio from random import randint async def download(code): wait_time = randint(1, 3) print('downloading {} will take {} second(s)'.format(code, wait_time)) await asyncio.sleep(wait_time) # I/O, context will switch to main function print('downloaded {}'.format(code)) sem = asyncio.Semaphore(3) async def safe_download(i): async with sem: # semaphore limits num of simultaneous downloads return await download(i) async def main(): tasks = [ asyncio.ensure_future(safe_download(i)) # creating task starts coroutine for i in range(9) ] await asyncio.gather(*tasks) # await moment all downloads done if __name__ == '__main__': loop = asyncio.get_event_loop() try: loop.run_until_complete(main()) finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close()

Production:

downloading 0 will take 3 second(s) downloading 1 will take 3 second(s) downloading 2 will take 1 second(s) downloaded 2 downloading 3 will take 3 second(s) downloaded 1 downloaded 0 downloading 4 will take 2 second(s) downloading 5 will take 1 second(s) downloaded 5 downloaded 3 downloading 6 will take 3 second(s) downloading 7 will take 1 second(s) downloaded 4 downloading 8 will take 2 second(s) downloaded 7 downloaded 8 downloaded 6

Un exemple de téléchargement asynchrone avec aiohttp peut être trouvé ici .

MadeR · Answer

La bibliothèque asyncio-pool fait exactement ce dont vous avez besoin.

https://pypi.org/project/asyncio-pool/

 LIST_OF_URLS = ("http://www.google.com, ......) pool = AioPool(size=3) await pool.map(your_download_coroutine, LIST_OF_URLS)