Skip to content

The libary sometimes cannot send files over 800MB with local bot server.[BUG] #4339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
daviddanielng opened this issue Jul 2, 2024 · 18 comments · Fixed by #4388
Closed

The libary sometimes cannot send files over 800MB with local bot server.[BUG] #4339

daviddanielng opened this issue Jul 2, 2024 · 18 comments · Fixed by #4388

Comments

@daviddanielng
Copy link

daviddanielng commented Jul 2, 2024

Steps to Reproduce

  1. Download any file that is more 1gb from the internet,
  2. I tried to upload a video, a .mkv file,
  3. try to upload to a group with FilesBot.send_docuemnt (I have put a link to FilesBot program or class at the end, it is in additional context).

Expected behaviour

Libary should send the file to Local server, then local server should process the file and upload it to group while libaray should return the result

Actual behaviour

No error nor exception, everthing just stop.

Operating System

ubuntu 22LTS on github codespace

Version of Python, python-telegram-bot & dependencies

python-telegram-bot 21.1.1 (0.1.5-18-g067c55a)
Bot API 7.2
Python 3.10.13 (main, Mar  5 2024, 18:35:01) [GCC 9.4.0]

Relevant log output

Log genearted by FilesBot.send_document

2024-07-10 09:31:56,459 - asyncio - DEBUG - Using selector: EpollSelector - /usr/local/python/3.10.13/lib/python3.10/asyncio/selector_events.py:54 function __init__ process:7013 module:selector_events
2024-07-10 09:31:56,461 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpx/_config.py:80 function load_ssl_context process:7013 module:_config
2024-07-10 09:31:56,462 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem' - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpx/_config.py:146 function load_ssl_context_verify process:7013 module:_config
2024-07-10 09:31:56,483 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpx/_config.py:80 function load_ssl_context process:7013 module:_config
2024-07-10 09:31:56,484 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem' - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpx/_config.py:146 function load_ssl_context_verify process:7013 module:_config
2024-07-10 09:31:56,489 - root - INFO - Getting bot delay for group - /workspaces/[redacted]/utils/database.py:362 function getBotDelay process:7013 module:database
2024-07-10 09:31:56,507 - root - INFO - Making request to http://[redacted]/?action=get-bot-delay&query=group&by=%40[redacted] with method GET - /workspaces/[redacted]/utils/utils.py:69 function make_request process:7013 module:utils
2024-07-10 09:31:56,517 - urllib3.connectionpool - DEBUG - Starting new HTTP connection (1): [redacted] - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/urllib3/connectionpool.py:244 function _new_conn process:7013 module:connectionpool
2024-07-10 09:31:56,605 - urllib3.connectionpool - DEBUG - http://[redacted] "GET /?action=get-bot-delay&query=group&by=%40[redacted] HTTP/1.1" 200 26 - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/urllib3/connectionpool.py:549 function _make_request process:7013 module:connectionpool
2024-07-10 09:31:56,606 - root - INFO - Bot delay for group is 0.0 - /workspaces/[redacted]/utils/database.py:369 function getBotDelay process:7013 module:database
2024-07-10 09:31:56,607 - root - INFO - Uploading file video.mvk[1.6gb file] to group - /workspaces/[redacted]/bot/filesbot.py:195 function send_document process:7013 module:filesbot
2024-07-10 09:32:06,078 - telegram.Bot - DEBUG - Calling Bot API endpoint `sendDocument` with parameters `{'chat_id': '@[redacted]', 'document': <telegram._files.inputfile.InputFile object at 0x7cb7f50d3bc0>, 'disable_content_type_detection': True, 'disable_notification': True, 'caption': 'video at lagos store 1.1.3'}` - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/telegram/_bot.py:643 function _do_post process:7013 module:_bot
2024-07-10 09:32:06,100 - httpcore.connection - DEBUG - connect_tcp.started host='[redacted]' port=8080 local_address=None timeout=1800 socket_options=None - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpcore/_trace.py:85 function atrace process:7013 module:_trace
2024-07-10 09:32:06,118 - httpcore.connection - DEBUG - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7cb7f5089210> - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpcore/_trace.py:85 function atrace process:7013 module:_trace
2024-07-10 09:32:06,119 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']> - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpcore/_trace.py:85 function atrace process:7013 module:_trace
2024-07-10 09:32:06,120 - httpcore.http11 - DEBUG - send_request_headers.complete - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpcore/_trace.py:85 function atrace process:7013 module:_trace
2024-07-10 09:32:06,120 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']> - /home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/httpcore/_trace.py:85 function atrace process:7013 module:_trace

Additional Context

the filesbot program is this.send_document is the one uploading the video

I was able to upload the file after some time, i also used a third party libaray request toolbelt

    async def sendDocumentRaw(
        self, path: str, filename: str, caption="", retries: int = 0
    ) -> int:
        """
        Sends a document to a group chat.

        Args:
            path (str): The path to the document file `Warning: this will remove the file after sending it.`.
            filename (str): The name of the document file.
            caption (str, optional): The caption for the document. Defaults to "".
            retries (int, optional): The number of retries in case of failure. Defaults to 0.

        Returns:
            int: The message ID of the sent document, or 0 if an error occurred.
        """
        logger = self.logger
        logger.info(f"Uploading file {filename} to group")
        returnData: int = 0
        await asyncio.sleep(self.database.getBotDelay("group", self._group_name))

        try:
            if os.path.exists(path):
                url = f"{self._server}{self._token}/sendDocument"
                encoder = MultipartEncoder(
                    fields={
                        "chat_id": self._group_name,
                        "caption": caption,
                        "disable_notification": "true",
                        "document": (filename, open(path, "rb"), "text/plain"),
                    }
                )
                response = requests.post(
                    url, data=encoder, headers={"Content-Type": encoder.content_type}
                )
                if response.status_code == 200:
                    returnData = response.json()["result"]["message_id"]
                    logger.info(f"file {filename} uploaded to group")
                else:
                    respondJson = response.json()
                    if "description" in respondJson:
                        if "bad request" in respondJson["description"].lower():
                            if retries < self.maxRetries:
                                retries += 1
                                logger.warning(
                                    f"A retry after error occurred while trying to upload file , retrying in {self.retryIn} seconds"
                                )
                                await asyncio.sleep(self.retryIn)
                                returnData = await self.sendDocumentRaw(
                                    path, filename, caption, retries
                                )
                            else:
                                logger.error(
                                    f"An error occurred while trying to upload file max retries reached, could not get message id respond is : {response.text}"
                                )
                                returnData = 0
                        else:
                            logger.error(
                                f"An error occurred while trying to upload file, could not get message id respond is : {response.text}"
                            )
                            returnData = 0
                    else:
                        logger.error(
                            f"An error occurred while trying to upload file, could not get message id respond is : {response.text}"
                        )
                        returnData = 0
            else:
                returnData = 0
                logger.error(
                    f"An error occurred while trying to upload file, file {path} does not exist"
                )

        except Exception as e:
            if retries < self.maxRetries:
                retries += 1
                logger.warning(
                    f"A retry after error occurred while trying to upload file , retrying in {self.retryIn} seconds"
                )
                await asyncio.sleep(self.retryIn)
                returnData = await self.sendDocumentRaw(
                    path, filename, caption, retries
                )
            else:
                logger.exception(f"An error occurred while trying to upload file {e}")
            returnData = 0
        finally:
            try:
                os.remove(path)
                logger.info(f"file {path} deleted")
            except Exception as e:
                logger.warning(f"unable to delete file {path}")
        return returnData
@Poolitzer
Copy link
Member

Hi as asked in the group, please enable debug logging and share the output of a failed send

@daviddanielng
Copy link
Author

daviddanielng commented Jul 3, 2024

Hi as asked in the group, please enable debug logging and share the output of a failed send

I set logging level to DEBUG
I cleared the log file, then I called FilesBot.send_document.
The log in Relevant log output is solely made by FilesBot.send_document

Like I said, there is no error nor exception. FilesBot.send_document just stops executing.

In additional context, I included a function I used to upload the file successfully.

The local server log says it did not receive any data or it is receiving bits of data instead of megabytes of data.

I will include the local server log in some minutes.

@Poolitzer
Copy link
Member

The log you provided is not from PTB when it sends a file...

@daviddanielng
Copy link
Author

daviddanielng commented Jul 3, 2024

The log you provided is not from PTB when it sends a file...

I will check and update the issue.

@daviddanielng
Copy link
Author

The log you provided is not from PTB when it sends a file...

You are correct, the log is not from PTB. Within 2 hours (I am away from my PC) I will include the correct log, I will also include the log from local server.

@Bibo-Joshi Bibo-Joshi added the 📋 pending-reply work status: pending-reply label Jul 9, 2024
@daviddanielng
Copy link
Author

I have updated the log with an authentic for PTB.

@Bibo-Joshi Bibo-Joshi removed the 📋 pending-reply work status: pending-reply label Jul 11, 2024
@Bibo-Joshi
Copy link
Member

Hi. I see that you made multiple edits to your original issue discription. TBH it's rather hard to figure out which log is now generated from where and what code you're using. Please

  1. write a minimal reproducable example that shows the offending behavior. please remove as much unrelated code as possible - e.g. check if a simple call to bot.send_document without your FilesBot class works already.
  2. determine a file that can be used to reproduce the behavior and make it available either as file upload or link share
  3. collect the logs from running that example
  4. upload everything both here in a new comment so that we have within a single comment both the relevant log output and the code & data that was used to generate it.

@daviddanielng
Copy link
Author

Hi. I see that you made multiple edits to your original issue discription. TBH it's rather hard to figure out which log is now generated from where and what code you're using. Please

  1. write a minimal reproducable example that shows the offending behavior. please remove as much unrelated code as possible - e.g. check if a simple call to bot.send_document without your FilesBot class works already.
  2. determine a file that can be used to reproduce the behavior and make it available either as file upload or link share
  3. collect the logs from running that example
  4. upload everything both here in a new comment so that we have within a single comment both the relevant log output and the code & data that was used to generate it.

Hello Joshi. Here is the MME. I have also included the bot log and the local server bot log. In the local bot server log, you will notice that the server is having trouble receiving data from the client. I wanted to include the log generated by the server where I used the custom method 'send_document_raw' for comparison, but I can't as the file size rose to 150 MB. It would take some time to download to my device before I can upload it to gist.

'bot.send_document' did not work; I can upload files of up to 600 MB without the program abruptly stopping. I cannot provide the document I was trying to upload as it contains sensitive accounting data (we are trying to take advantage of Telegram unlimited storage as temporary storage). Nonetheless, I did try uploading HD movies I obtained from the internet which did not work with send_document but did with send_document_raw.

I have also inlcuded the binary for the bot sever, it is complied in ubuntu 24.04 (LTS) x64.

Thank You!

MME file

bot log

local bot server log

local bot binary

@Bibo-Joshi
Copy link
Member

Thanks for the updates and elaborated information!

I have also included the bot log and the local server bot log. In the local bot server log, you will notice that the server is having trouble receiving data from the client.

TBH I'm a bit a loss here, not having worked much with the local api server. IISC correctly the relevant section is

[ 4][t 6][1720721926.802261829][HttpReader.cpp:292][!HttpInboundConnection]	Parse headers in multipart form data: "Content-Disposition: form-data; name="document"; filename="text_upload.mp4"
Content-Type: video/mp4
"
[ 4][t 6][1720721926.802263736][HttpReader.cpp:260][!HttpInboundConnection]	Parsing multipart form data in state 3 with already read length 0
[ 3][t 6][1720721926.802266836][HttpConnectionBase.cpp:122][!HttpInboundConnection]	Slow HTTP connection: migrate to 6
[ 4][t 8][1720721926.803691148][HttpReader.cpp:260][!HttpInboundConnection]	Parsing multipart form data in state 3 with already read length 0
[ 4][t 8][1720721926.804478168][HttpReader.cpp:840][!HttpInboundConnection]	Created temporary file /root/temp/temp/text_upload.mp4
[ 4][t 8][1720721926.804539680][HttpReader.cpp:852][!HttpInboundConnection]	Save file part of size 21528 to file /root/temp/temp/text_upload.mp4
[ 4][t 8][1720721926.804592847][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.804618835][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 4][t 8][1720721926.804639577][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721926.804661750][BufferedFd.h:206][!HttpInboundConnection]	Flush read: +7240B[total:7240B]
[ 4][t 8][1720721926.804673194][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.804681539][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 4][t 8][1720721926.815855026][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721926.815989494][BufferedFd.h:206][!HttpInboundConnection]	Flush read: +17376B[total:24616B]
[ 4][t 8][1720721926.816084623][HttpReader.cpp:260][!HttpInboundConnection]	Parsing multipart form data in state 3 with already read length 0
[ 4][t 8][1720721926.816166162][HttpReader.cpp:852][!HttpInboundConnection]	Save file part of size 24616 to file /root/temp/temp/text_upload.mp4
[ 4][t 8][1720721926.816230297][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.816232919][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 4][t 8][1720721926.816243886][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721926.816259622][BufferedFd.h:206][!HttpInboundConnection]	Flush read: +7240B[total:7240B]
[ 4][t 8][1720721926.816262245][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.816262960][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 4][t 8][1720721926.829812765][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721926.829871654][BufferedFd.h:206][!HttpInboundConnection]	Flush read: +24616B[total:31856B]
[ 4][t 8][1720721926.829884290][HttpReader.cpp:260][!HttpInboundConnection]	Parsing multipart form data in state 3 with already read length 0
[ 4][t 8][1720721926.830017089][HttpReader.cpp:852][!HttpInboundConnection]	Save file part of size 31856 to file /root/temp/temp/text_upload.mp4
[ 4][t 8][1720721926.830084085][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.830085992][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 4][t 8][1720721926.830093383][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721926.830099582][BufferedFd.h:206][!HttpInboundConnection]	Flush read: +1448B[total:1448B]
[ 4][t 8][1720721926.830101728][HttpConnectionBase.cpp:156][!HttpInboundConnection]	Can write to the connection
[ 4][t 8][1720721926.830102205][BufferedFd.h:140][!HttpInboundConnection]	Nothing to write to [fd:34]
[ 2][t13][1720721927.598697662][Stats.cpp:34][!RunOnSchedulerWorker]	CPU usage: 0.806452%
[ 4][t 8][1720721928.586987972][HttpConnectionBase.cpp:102][!HttpInboundConnection]	Can read from the connection
[ 4][t 8][1720721928.587024450][HttpConnectionBase.cpp:186][!HttpInboundConnection]	Can close the connection
[ 3][t 8][1720721928.587025880][HttpConnectionBase.cpp:194][!HttpInboundConnection]	Close connection while reading request/response
[ 4][t 8][1720721928.587112903][HttpReader.cpp:868][!HttpInboundConnection]	Close temporary file /root/temp/temp/text_upload.mp4
[ 4][t 8][1720721928.587125062][HttpReader.cpp:877][!HttpInboundConnection]	Unlink temporary file /root/temp/temp/text_upload.mp4

but I'm not sure if I understand it correctly. I guess "Close connection while reading request/response" means that the server aborts the connection but I don't get where you see that the server has trouble reading data from the client … Not sure if seeing functioning logs will help me here, but at least for completeness it would interest me 😅

On PTB side, the relevant networking logs apparently are of the form

2024-07-12 21:40:06,751 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-07-12 21:40:06,751 - httpcore.http11 - DEBUG - send_request_body.complete

where the second line is missing in your case, i.e. the request never completes.
If I understood correctly that the server is closing the connection and if the client doesn't notice that and indefinitely tries to upload the data, that would ofc be a problem …

To double check: You've specified a generous timeout of 1800 seconds = 30mins. How long does your send_document_raw method take to upload your test file? does your send_document method time out after 30mins or does it hang even then?

PTB is using the httpx library for the network communication by default. So if something is going wrong on client side, it could be either within httpx or within how PTB uses httpx. To narrow that down, we should find out if you can successfully send the file if you use httpx directly. Moreover, I would like to see a plain call to send_document without the wrapper logic of FilesBot.
Can you please run below MWE? Probably first try with the cloud-api-server and a small file to verify that it is working for you and then move to your local server. I would be interested to see the results of that as well as the logs.

httpx mwe
import asyncio
import logging
from pathlib import Path
from typing import Union

import httpx

from telegram import Bot


logging.basicConfig(
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.DEBUG
)


async def send_with_ptb(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with Bot(
        token=token,
        base_url=base_url,
    ) as bot:
        await bot.send_document(
            chat_id=chat_id,
            document=file_path.read_bytes(),
            filename=filename,
            read_timeout=timeout,
            write_timeout=timeout,
            connect_timeout=timeout,
            **kwargs
        )


async def send_with_httpx(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            url=f"{base_url}{token}/sendDocument",
            data={"chat_id": chat_id, **kwargs},
            files={"document": (filename, file_path.read_bytes(), "text/plain")},
            timeout=timeout,
        )
        response.raise_for_status()


async def main():
    token = "token"
    chat_id = 123456
    file_path = Path("tests/data/telegram.png")
    filename = "test_file_name.png"
    kwargs = {
        "caption": "caption",
        "disable_content_type_detection": True,
        "disable_notification": True,
    }
    base_url = "https://api.telegram.org/bot"
    timeout = 1800

    for callback, description in (
        (send_with_ptb, "send_with_ptb"),
        (send_with_httpx, "send_with_httpx"),
    ):
        try:
            await callback(
                timeout=timeout,
                base_url=base_url,
                token=token,
                chat_id=chat_id,
                filename=filename,
                file_path=file_path,
                kwargs=kwargs,
            )
        except Exception as exc:
            print(f"{description} failed: {exc}")


if __name__ == "__main__":
    asyncio.run(main())

One last thing that I noticed is that you set the mime type as text/plain in send_doucment_raw. while the file is an mp4 file. PTB currently infers the mime type from the file name. Does send_document_raw still work if you set the mime type to video/mp4?

@daviddanielng
Copy link
Author

TBH I'm a bit a loss here, not having worked much with the local api server. IISC correctly the relevant section is

Yes it is the relevant part.

I don't get where you see that the server has trouble reading data from the client

sorry, the client is the one having issue sending the file. i know don't much about networking but i did come to the conclusion that the client is the one having issue that was why i wrote send_document_raw. i got it mixed up when writing the comment.

Not sure if seeing functioning logs will help me here, but at least for completeness it would interest me 😅

Below is what the log should look like, i can't upload the full log as the server generates a big log per file upload. this is just 4mb of what is generated when i use send_document_raw or upload a file less than 700mb with send_document.
file upload success log
I will include the full log when i run the mme you provided.

where the second line is missing in your case, i.e. the request never completes.

Yes, that was when the program stops.

To double check: You've specified a generous timeout of 1800 seconds = 30mins. How long does your send_document_raw method take to upload your test file? does your send_document method time out after 30mins or does it hang even then?

The server is hosted on a digital ocean drop so i do have a very good upload and download speed.
using send_document_raw takes no more than 5 minutes to upload the 1.5gb file, the program doesn't hang when i use send_document , it just stops. ( Note: the file is already on google drive so i am just downloading from drive then uploading it to telegram for more backups just in case the business loss the google account and can't recover it, it already happened once)

it could be either within httpx or within how PTB uses httpx.

When i was stuck, i suspected that PTB or httpx was the issue so i tried using requests directly to upload the file, it didn't work (can't remember if it worked for small files). so i search on google and reddit. i came across something called stream uploading (can't remember where). i then search how i can make a stream upload with requests. An answer on a stackoverflow question gave me a way to solve it, then i wrote send_document_raw around it.

Can you please run below MWE?

I will, please bear with me as it could takes a few hours or 1 day.

One last thing that I noticed is that you set the mime type as text/plain in send_doucment_raw. while the file is an mp4 file. PTB currently infers the mime type from the file name. Does send_document_raw still work if you set the mime type to video/mp4?

Yes, it does work with video/mp4 or any mime type. i had mime type set to text/plain because the bot will only upload text files.

To be clear, i can upload file less than 700mb or somewhere between 0 mb - 756mb. i don't actually know where it stops. i do know that i was able to upload a 756mb file but failed for an 800mb one.

Please don't mind my grammar, i hope you understand me.

@Bibo-Joshi
Copy link
Member

I will, please bear with me as it could takes a few hours or 1 day.

not to worry

Yes, it does work with video/mp4 or any mime type. i had mime type set to text/plain because the bot will only upload text files.

👍

sorry, the client is the one having issue sending the file. i know don't much about networking but i did come to the conclusion that the client is the one having issue that was why i wrote send_document_raw. i got it mixed up when writing the comment.

👍

Thanks also for the pointes to the SO question, which also lead me to psf/requests#1584. Again, I'm not sure if I understand the issue with the requests library here, but the observation that I did make is that in the requests examples as well as the httpx examples, people usually pass an open file descripter rather than the loaded bytes and IISC that's also the basic idea of MultipartEncoder. This is indeed a difference that I see with how PTB handles files: In your send_document you first read the bytes and pass that to PTB. Even if you would pass the file handler, PTB will read the complete file content and pass that to httpx.

TL;DR: I'm interested to see if passing a file handle to httpx works better. I extended the MWE for that:

MWE extended for file handle
import asyncio
import logging
from pathlib import Path
from typing import Union

import httpx

from telegram import Bot

logging.basicConfig(
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.DEBUG
)


async def send_with_ptb(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with Bot(
        token=token,
        base_url=base_url,
    ) as bot:
        await bot.send_document(
            chat_id=chat_id,
            document=file_path.read_bytes(),
            filename=filename,
            read_timeout=timeout,
            write_timeout=timeout,
            connect_timeout=timeout,
            **kwargs,
        )


async def send_with_httpx_postponed_loading(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            url=f"{base_url}{token}/sendDocument",
            data={"chat_id": chat_id, **kwargs},
            files={"document": (filename, file_path.open("rb"), "text/plain")},
            timeout=timeout,
        )
        response.raise_for_status()


async def send_with_httpx_preponed_loading(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            url=f"{base_url}{token}/sendDocument",
            data={"chat_id": chat_id, **kwargs},
            files={"document": (filename, file_path.read_bytes(), "text/plain")},
            timeout=timeout,
        )
        response.raise_for_status()


async def main():
    token = "token"
    chat_id = 123456
    file_path = Path("tests/data/telegram.mp4")
    filename = "test_file_name.png"
    kwargs = {
        "caption": "caption",
        "disable_content_type_detection": True,
        "disable_notification": True,
    }
    base_url = "https://api.telegram.org/bot"
    timeout = 1800

    for callback in (
        send_with_httpx_postponed_loading,
        send_with_httpx_preponed_loading,
        send_with_ptb,
    ):
        try:
            await callback(
                timeout=timeout,
                base_url=base_url,
                token=token,
                chat_id=chat_id,
                filename=filename,
                file_path=file_path,
                kwargs=kwargs,
            )
        except Exception as exc:
            print(f"{callback.__name__} failed: {exc}")


if __name__ == "__main__":
    asyncio.run(main())

@daviddanielng
Copy link
Author

daviddanielng commented Jul 13, 2024

@Bibo-Joshi

MWE extended for file handle

I ran the mwe, the program stops after uploading the file once . so i added an info log to tell which function is causing the program to stop, the log indicated that the program stops after send_with_httpx_preponed_loading started.

see log
2024-07-13 16:23:14,874 - asyncio - DEBUG - Using selector: EpollSelector
2024-07-13 16:23:14,876 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False
2024-07-13 16:23:14,878 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem'
2024-07-13 16:23:14,908 - root - INFO - send_with_httpx_postponed_loading started
2024-07-13 16:23:14,925 - httpcore.connection - DEBUG - connect_tcp.started host='[redacted]' port=8080 local_address=None timeout=1800 socket_options=None
2024-07-13 16:23:14,943 - httpcore.connection - DEBUG - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7111f2baf7f0>
2024-07-13 16:23:14,943 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-07-13 16:23:14,944 - httpcore.http11 - DEBUG - send_request_headers.complete
2024-07-13 16:23:14,944 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-07-13 16:23:27,769 - httpcore.http11 - DEBUG - send_request_body.complete
2024-07-13 16:23:27,769 - httpcore.http11 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2024-07-13 16:25:07,830 - httpcore.http11 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Connection', b'keep-alive'), (b'Content-Type', b'application/json'), (b'Content-Length', b'499')])
2024-07-13 16:25:07,831 - httpx - INFO - HTTP Request: POST http://[redacted]:8080/bot6859301078:AAH-2zJuFAuCGl1M5Zk7uUUM3Uf_1ghGBds/sendDocument "HTTP/1.1 200 OK"
2024-07-13 16:25:07,831 - httpcore.http11 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2024-07-13 16:25:07,831 - httpcore.http11 - DEBUG - receive_response_body.complete
2024-07-13 16:25:07,831 - httpcore.http11 - DEBUG - response_closed.started
2024-07-13 16:25:07,832 - httpcore.http11 - DEBUG - response_closed.complete
2024-07-13 16:25:07,832 - root - INFO - send_with_httpx_postponed_loading took 112.92 seconds
2024-07-13 16:25:07,832 - httpcore.connection - DEBUG - close.started
2024-07-13 16:25:07,832 - httpcore.connection - DEBUG - close.complete
2024-07-13 16:25:07,833 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False
2024-07-13 16:25:07,833 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem'
2024-07-13 16:25:07,838 - root - INFO - send_with_httpx_preponed_loading started
2024-07-13 16:25:08,569 - httpcore.connection - DEBUG - connect_tcp.started host='[redacted]' port=8080 local_address=None timeout=1800 socket_options=None
2024-07-13 16:25:08,588 - httpcore.connection - DEBUG - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x7111f29fb1f0>
2024-07-13 16:25:08,588 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-07-13 16:25:08,589 - httpcore.http11 - DEBUG - send_request_headers.complete
2024-07-13 16:25:08,589 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>

so i removed send_with_httpx_preponed_loading from the loop so i can test send_with_ptb which also stops the program after it started.

see log
2024-07-13 16:32:30,326 - asyncio - DEBUG - Using selector: EpollSelector
2024-07-13 16:32:30,326 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False
2024-07-13 16:32:30,328 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem'
2024-07-13 16:32:30,333 - root - INFO - send_with_httpx_postponed_loading started
2024-07-13 16:32:30,348 - httpcore.connection - DEBUG - connect_tcp.started host='[redacted]' port=8080 local_address=None timeout=1800 socket_options=None
2024-07-13 16:32:30,366 - httpcore.connection - DEBUG - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x750d28bfb7f0>
2024-07-13 16:32:30,366 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-07-13 16:32:30,366 - httpcore.http11 - DEBUG - send_request_headers.complete
2024-07-13 16:32:30,366 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-07-13 16:32:43,924 - httpcore.http11 - DEBUG - send_request_body.complete
2024-07-13 16:32:43,925 - httpcore.http11 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2024-07-13 16:34:24,360 - httpcore.http11 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Connection', b'keep-alive'), (b'Content-Type', b'application/json'), (b'Content-Length', b'499')])
2024-07-13 16:34:24,361 - httpx - INFO - HTTP Request: POST http://[redacted]:8080/[redacted]/sendDocument "HTTP/1.1 200 OK"
2024-07-13 16:34:24,362 - httpcore.http11 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2024-07-13 16:34:24,362 - httpcore.http11 - DEBUG - receive_response_body.complete
2024-07-13 16:34:24,362 - httpcore.http11 - DEBUG - response_closed.started
2024-07-13 16:34:24,362 - httpcore.http11 - DEBUG - response_closed.complete
2024-07-13 16:34:24,362 - root - INFO - send_with_httpx_postponed_loading took 114.03 seconds
2024-07-13 16:34:24,362 - httpcore.connection - DEBUG - close.started
2024-07-13 16:34:24,363 - httpcore.connection - DEBUG - close.complete
2024-07-13 16:34:24,363 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False
2024-07-13 16:34:24,363 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem'
2024-07-13 16:34:24,368 - httpx - DEBUG - load_ssl_context verify=True cert=None trust_env=True http2=False
2024-07-13 16:34:24,369 - httpx - DEBUG - load_verify_locations cafile='/home/codespace/.cache/pypoetry/virtualenvs/[redacted]/lib/python3.10/site-packages/certifi/cacert.pem'
2024-07-13 16:34:24,374 - telegram.Bot - DEBUG - Calling Bot API endpoint `getMe` with parameters `{}`
2024-07-13 16:34:24,374 - httpcore.connection - DEBUG - connect_tcp.started host='[redacted]' port=8080 local_address=None timeout=5.0 socket_options=None
2024-07-13 16:34:24,392 - httpcore.connection - DEBUG - connect_tcp.complete return_value=<httpcore._backends.anyio.AnyIOStream object at 0x750d28a475b0>
2024-07-13 16:34:24,393 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-07-13 16:34:24,393 - httpcore.http11 - DEBUG - send_request_headers.complete
2024-07-13 16:34:24,393 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>
2024-07-13 16:34:24,393 - httpcore.http11 - DEBUG - send_request_body.complete
2024-07-13 16:34:24,393 - httpcore.http11 - DEBUG - receive_response_headers.started request=<Request [b'POST']>
2024-07-13 16:34:24,410 - httpcore.http11 - DEBUG - receive_response_headers.complete return_value=(b'HTTP/1.1', 200, b'OK', [(b'Connection', b'keep-alive'), (b'Content-Type', b'application/json'), (b'Content-Length', b'231')])
2024-07-13 16:34:24,410 - httpx - INFO - HTTP Request: POST http://[redacted]:8080/[redacted]/getMe "HTTP/1.1 200 OK"
2024-07-13 16:34:24,410 - httpcore.http11 - DEBUG - receive_response_body.started request=<Request [b'POST']>
2024-07-13 16:34:24,410 - httpcore.http11 - DEBUG - receive_response_body.complete
2024-07-13 16:34:24,410 - httpcore.http11 - DEBUG - response_closed.started
2024-07-13 16:34:24,411 - httpcore.http11 - DEBUG - response_closed.complete
2024-07-13 16:34:24,411 - telegram.Bot - DEBUG - Call to Bot API endpoint `getMe` finished with return value `{'id': 6859301078, 'is_bot': True, 'first_name': 'My test Bot', 'username': '[redacted]', 'can_join_groups': True, 'can_read_all_group_messages': False, 'supports_inline_queries': False, 'can_connect_to_business': False}`
2024-07-13 16:34:24,411 - root - INFO - send_with_ptb started
2024-07-13 16:34:25,815 - telegram.Bot - DEBUG - Calling Bot API endpoint `sendDocument` with parameters `{'chat_id': '@[redacted]', 'document': <telegram._files.inputfile.InputFile object at 0x750d28a5bf40>, 'disable_content_type_detection': True, 'disable_notification': True, 'caption': 'caption'}`
2024-07-13 16:34:25,816 - httpcore.http11 - DEBUG - send_request_headers.started request=<Request [b'POST']>
2024-07-13 16:34:25,816 - httpcore.http11 - DEBUG - send_request_headers.complete
2024-07-13 16:34:25,816 - httpcore.http11 - DEBUG - send_request_body.started request=<Request [b'POST']>

only send_with_httpx_postponed_loading was able to send the file successfully.

Unfortunately, i can't provide the server log file as my supervisor denied me access to the bot server.
All three functions are able to send files less than 700mb successfully.

@Bibo-Joshi
Copy link
Member

the log indicated that the program stops after send_with_httpx_preponed_loading started
only send_with_httpx_postponed_loading was able to send the file successfully.

Nice :) Then I guess we've found the problem and should now look for a way to allow passing file handlers directly to the networking backend. I guess already makes sense from the viewpoint that you don't want to always load large files into memory, especially if several of these requests are running in parallel.

We'll have to review how the connection between InputFile and BaseRequest currently works and how we can improve/extend that in a backward-compatible way. That'll probably take a bit …

@Bibo-Joshi
Copy link
Member

I looked into it a bit.

Not reading data in InputFile

This is techincally a small problem. We just need to also allow a file handle for the attribute InputFile.input_file_content and adjust the return type of InputFile.field_tuple accordingly. That cascades through RequestParameter.multipart_data and RequestData.multipart_data and ultimately be available in BaseRequest.do_request.

The more difficult parts are the next ones:

User interface

We somehow need to enable the user to decide if postponed reading is required. To clarify: IMO immediately reading the data is still the sane default behavior as we can't know how long a file handle will remain open. It also enables use cases like

with file.open('rb') as file:
    create_task(bot.send_document(file)

# after the context manager the file is closed but the coroutine might not even have started

Hence, I would see this as a more advanced use case, so I'd be okay with not providing too much convenience functionality for it. E.g. we could add a parameter flag read_file_handle(?) to InputFile that determines if file handles should be read or kept as-is. Users would then be required to explicitly pass an InputFile to send_*.
Additional convenience fonutionalty (e.g. corresponding flags in send_*) could be added later on based on demand.

Backward compatibility

This is the most tricky bit IMO. We currently document that RequestData.multipart_data contains only byte payloads and if a user has implemented a custom subclass of BaseRequset this is what they'll be relying on. So simply extending to also include file handles would be a breaking change.

While writing this I noticed that we actually do not currently document the type of RequestData.multipart_data (or any of the other parameters). While we should change that, this does allow us to simply introduce it 🤯 I'd also like to note that even if someone is explicitly relying on this currently, a simple .read() will solve it …

@Poolitzer
Copy link
Member

Backwards compatibility with a note in the release notes should be plenty.

@Bibo-Joshi
Copy link
Member

I've started implementing this in #4388. Haven't gotten around to unit testing yet, but it's already functional on my end.

@daviddanielng could you kindly

  • install that branch via pip install git+https://github.com/python-telegram-bot/python-telegram-bot.git@bug/large-files
  • run the below (reduced) MWE to verify that this now works for you with large files?

Thanks!

import asyncio
import logging
from pathlib import Path
from typing import Union

from telegram import Bot, InputFile

logging.basicConfig(
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s", level=logging.DEBUG
)


async def send_with_ptb(
    token: str,
    base_url: str,
    chat_id: Union[str, int],
    filename: str,
    file_path: Path,
    kwargs: dict[str, Union[str, bool]],
    timeout: int,
):
    async with Bot(
        token=token,
        base_url=base_url,
    ) as bot:
        await bot.send_document(
            chat_id=chat_id,
            document=InputFile(
                obj=file_path.open("rb"), filename=filename, attach=True, read_file_handle=False
            ),
            filename=filename,
            read_timeout=timeout,
            write_timeout=timeout,
            connect_timeout=timeout,
            **kwargs,
        )


async def main():
    token = "TOKEN"
    chat_id = 123
    file_path = Path("tests/data/telegram.mp4")
    filename = "test_file_name.png"
    kwargs = {
        "caption": "caption",
        "disable_content_type_detection": True,
        "disable_notification": True,
    }
    base_url = "https://api.telegram.org/bot"
    timeout = 1800

    for callback in (
        send_with_ptb,
    ):
        try:
            await callback(
                timeout=timeout,
                base_url=base_url,
                token=token,
                chat_id=chat_id,
                filename=filename,
                file_path=file_path,
                kwargs=kwargs,
            )
        except Exception as exc:
            print(f"{callback.__name__} failed: {exc}")


if __name__ == "__main__":
    asyncio.run(main())

@daviddanielng
Copy link
Author

@Bibo-Joshi

I ran it, It worked flawlessly. I noticed that you set the file as Path("/path/to/file") . I was curious so i changed it to "/path/to/file" to see if the file still be uploaded if i passed the path as a string. it didn't work. it stopped with error send_with_ptb failed: 'str' object has no attribute 'open'.
I understand what the error means i just wanted to let you know.

Thanks for the work you put into this library.

@Bibo-Joshi
Copy link
Member

Awesome, then I'll try to get the unit tests up to speed :)

Yes, in my implementation InputFile requires an explicit open file handler, i.e. obj=file_path.open("rb"). Allowing a string file path here is a bit premature IMO.

@github-actions github-actions bot locked and limited conversation to collaborators Aug 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants