This project is used to explore and test different libraries for parallelizing I/O tasks with a single machine.
There is a list of websites, each used as a target for a simple get request. The default number of requests is 150 requests, as there are 150 different websites listed in the file. This number is just altered manually for now.
This project uses Poetry version 1.1.0rc1. The latest at this time is 1.1.3 so just use that.
pip install --user poetry
poetry self update 1.1.3
Also, make sure that you update your shell startup file is updated. I am using zsh, and oh-my-zsh, so these are the commands to update my ~/.zshrc
.
poetry completions zsh > ~/.zfunc/_poetry
mkdir $ZSH_CUSTOM/plugins/poetry
poetry completions zsh > $ZSH_CUSTOM/plugins/poetry/_poetry
Check out the poetry doc for other shell helpers. Poetry shell helpers. Of course you should restart your shell or source the startup file after the modifications.
Poetry is a tool that makes dependency managmeent cleaner and packaging easier. Poetry documentation
Download the code from Github and then use poetry to install the dependencies on your machine
Create a new python environment with Poetry for this project . Im using python 3.9 ( this is June 2021 ).
poetry env use 3.9
Install the dependencies for this project using Poetry
poetry install
Start the poetry shell
poetry shell
- This project uses the typer module. To execute a type of Requestor, give the name of the file and the type of requestor, without "Requestor". Only the abosulte path has been tested.
- Examples:
mrf -r ChunkedLoopedThreadPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv
mrf -r BufferedChunkedThreadPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv
mrf -r Sequential -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv
mrf -r ChunkedProcessPool -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv
mrf -r Aiohttp -f /path/to/make-requests-fast/make_requests_fast/resources/urls.csv
Each Requestor uses a different way to parallelize http requests ( except for SequentialRequestor one which does not parallelize )
- SequentialRequestor
- All requests are issued sequentially
- ChunkedThreadPoolRequestor
- Uses ThreadPoolExecutor from concurrent.futures
- The futures are all returned when the whole chunk is done
- A new chunk of futures is scheduled
- Since the GIL is released, this can improve upon sequential
- BufferedChunkedThreadPoolRequestor
- Uses ThreadPoolExecutor from concurrent.futures
- Each individual future is returned as soon as it is done
- The program stays in a loop while and futures are not done
- New future(s) are scheduled as they finish, up to the chunk size amount
- Since the GIL is released, this can improve upon sequential
- ChunkedProcessPoolRequestor
- Uses ProcessPoolExecutor from concurrent.futures
- The futures are all returned when the whole chunk is done
- A new chunk of futures is scheduled
- AiohttpRequestor
- Uses aiohttp which uses asycnio
- Not currently using the speedup libraries (Will add them in future)
- cchardet, aiodns, brotlipy
- Creates and event loop and adds tasks to the event loop
- Each task is a coroutine which executes an individual http request
- DaskStreamzRequestor
- Uses streamz reactive API
scatter()
causes the stream to be distributed to dask cluster- buffer ( the amount of partitions ) is set at total number of cores / 2
- the dask cluster is local only