Replies: 5 comments 23 replies
-
more info, I added a log of the metrics on the prisma client after the prisma query.
then i ran the job again. it failed on attempt 1, passed on attempt 2 (exponential backoff delay of 1 second). on the first attempt, 0 queries were successfully made so nothing logged (it timed out on the very first query attempt). on the 2nd attempt, all the queries were made. here are the metrics after the 1st query of the successful run: first query:
as you can see, its the 1st connection pool used, nothing is waiting on the pool, and it all runs 1s after the initial failed run. the only place where 2 runs is indicated is in the the query that was run is the following (same query that failed to run while waiting for the connection pool). in prisma its just a
|
Beta Was this translation helpful? Give feedback.
-
upon a little more debugging, I found that another cloud run service makes a bunch of db calls at the same time. At the time of the example failure above, 28 db calls are made from that other service - each calls takes 1-7ms averaging around 2.5ms. none of them fail with any issues and the requests are never stuck waiting as far as I can tell (or if so its imperceptible, all these queries combined finish in <1s). I can't imagine how this would cause any of the failures above given this is using a different connection pool (also with 5 limit and 10s timeout) and nothing fails in that service (nor has it ever as far as I know). Just calling it out because its more information |
Beta Was this translation helpful? Give feedback.
-
Hi @arnmishra 👋 Thank you for taking the time to share all the details about your issue! This seems very strange indeed but we’re happy to look into this together with you 🙏. Could you try to increase the database
We find this to be weird because we do not expect a connection pool timeout when there hasn't been any activity. Have you considered increasing the pool_timeout parameter? Can you also check Cloud SQL postgres database's performance metrics and limits to see if you are not hitting maximum connection limits? |
Beta Was this translation helpful? Give feedback.
-
@ludralph I'm happy to increase the timeout or pool size but I didn't want to blindly do that given it doesn't feel like we should be dealing with these timeouts. The operation it keeps timing out on is pretty time sensitive so just adding more delay felt like a dangerous solution for us in this case. Looking at the cloud sql postgres stats, everything seems relatively standard (no issues with query latency, CPU, etc.). The connections to the db do peak at the time of the failure (from a usual 6-12 connections it peaks at 13 at the time of the failure). However, if I run Honestly there is no way for it to get anywhere near that 400 number given even if we had all 5 cloud run services at max usage, the connection pools only allow 5 connections each so we'd peak at 25 connections. This screenshot shows the connections to the db, its the |
Beta Was this translation helpful? Give feedback.
-
Is there any way to get eyes on this a little more urgently? Very risky situation for us and I can't imagine what is going wrong here and communicating once per day feels like it may take a while - happy to be available later in the day if that's helpful with timezones |
Beta Was this translation helpful? Give feedback.
-
Question
I've read pretty much any prisma documentation, github issue, q&a discussion, stack overflow, etc on this but haven't been able to come up with what is causing our issue.
We have been seeing the following error
Timed out fetching a new connection from the connection pool. More info: http://pris.ly/d/connection-pool (Current connection pool timeout: 10, connection limit: 5)
. This is happening on a process running in a google cloud run container. I understand that this means we are running out of available connections. We are using prisma client 5 on a postgres db in cloud sql.We have a set of 5 cloud run instances that all speak to our postgres database. All 5 have their own prisma client that has a connection pool with a default limit of 5 (2 cores). The failures all have been happening so far in the same cloud run service that runs async jobs for us. This service runs a small typescript app that triggers jobs using BullMQ (https://docs.bullmq.io/).
In the course of 17 attempts to trigger a job in the last 3 days:
(all failures were the same prisma client timeout)
Every time the query that fails is on the very first query made to the db in that container. At the time of the failure, there is no other activity in that container, and only a very minimal light load, if any, on the other 4 cloud run services (which all have their own prisma connection pool). Therefore, I can't imagine what would cause a timeout. I thought it may have to do with cold starts somehow but we've configured all our cloud run containers with a minimum of 1 instance always running.
How to reproduce (optional)
No response
Expected behavior (optional)
No response
Information about Prisma Schema, Client Queries and Environment (optional)
The query its failing on is a simple
findUniqueOrThrow
Database: PostgresQL
Node.js version: 18.16
Run
prisma -v
to see your Prisma version and paste it5.2.0
Beta Was this translation helpful? Give feedback.
All reactions