Skip to content

Using concurrent execution techniques to make handling large amount of http/ftp/sftp requests faster

Notifications You must be signed in to change notification settings

fartzy/make-requests-fast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Make Requests Fast

This project is used to explore and test different libraries for parallelizing I/O tasks with a single machine.
There is a list of websites, each used as a target for a simple get request. The default number of requests is 150 requests, as there are 150 different websites listed in the file. This number is just altered manually for now.

Install Poetry

This project uses Poetry version 1.1.0rc1. The latest at this time is 1.1.3 so just use that.

pip install --user poetry

poetry self update 1.1.3

Also, make sure that you update your shell startup file is updated. I am using zsh, and oh-my-zsh, so these are the commands to update my ~/.zshrc.

poetry completions zsh > ~/.zfunc/_poetry

mkdir $ZSH_CUSTOM/plugins/poetry

poetry completions zsh > $ZSH_CUSTOM/plugins/poetry/_poetry

Check out the poetry doc for other shell helpers. Poetry shell helpers. Of course you should restart your shell or source the startup file after the modifications.

Poetry is a tool that makes dependency managmeent cleaner and packaging easier. Poetry documentation

Install project

Download the code from Github and then use poetry to install the dependencies on your machine

git clone https://github.com/fartzy/make-requests-fast.git

Create a new python environment with Poetry for this project . Im using python 3.9 ( this is June 2021 ).

poetry env use 3.9

Install the dependencies for this project using Poetry

poetry install

Start the poetry shell

poetry shell

Run

  • This project uses the typer module. To execute a type of Requestor, give the name of the file and the type of requestor, without "Requestor". Only the abosulte path has been tested.
  • Examples:

    mrf -r ChunkedLoopedThreadPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv

    mrf -r BufferedChunkedThreadPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv

    mrf -r Sequential -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv

    mrf -r ChunkedProcessPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv

    mrf -r Aiohttp -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv

Requestors

Each Requestor uses a different way to parallelize http requests ( except for SequentialRequestor one which does not parallelize )

  • SequentialRequestor
    • All requests are issued sequentially
  • ChunkedThreadPoolRequestor
    • Uses ThreadPoolExecutor from concurrent.futures
    • The futures are all returned when the whole chunk is done
    • A new chunk of futures is scheduled
    • Since the GIL is released, this can improve upon sequential
  • BufferedChunkedThreadPoolRequestor
    • Uses ThreadPoolExecutor from concurrent.futures
    • Each individual future is returned as soon as it is done
    • The program stays in a loop while and futures are not done
    • New future(s) are scheduled as they finish, up to the chunk size amount
    • Since the GIL is released, this can improve upon sequential
  • ChunkedProcessPoolRequestor
    • Uses ProcessPoolExecutor from concurrent.futures
    • The futures are all returned when the whole chunk is done
    • A new chunk of futures is scheduled
  • AiohttpRequestor
    • Uses aiohttp which uses asycnio
    • Not currently using the speedup libraries (Will add them in future)
      • cchardet, aiodns, brotlipy
    • Creates and event loop and adds tasks to the event loop
    • Each task is a coroutine which executes an individual http request
  • DaskStreamzRequestor
    • Uses streamz reactive API
    • scatter() causes the stream to be distributed to dask cluster
    • buffer ( the amount of partitions ) is set at total number of cores / 2
    • the dask cluster is local only

About

Using concurrent execution techniques to make handling large amount of http/ftp/sftp requests faster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages