Published on

Long-running processes in Serverless Environments

Authors

Last week I launched Gumcrm, and I wanted to share some issues I had with long-running functions on Vercel and Supabase.

SaasRock v0.7 -Building a Low-code CRUD generator on steroids for SaaS apps

Requirement - 25,128 DB calls under 1 min

My app syncs Gumroad sales and contacts. Since I'm using my SaaS boilerplate, every sale row needs a lot of Database queries:

  1. Insert sale row
  2. Check if the current subscription can add another contact
  3. Check if the current user has entity.contacts.create
  4. Grab the next contact folio (CTC-0001, CTC-0002...)
  5. Insert/update contact row
  6. Insert/update dynamic row values (custom properties)
  7. Insert log
  8. Call events.rows.created webhook event
  9. Insert company if any (repeats 2-8 steps but for company)
  10. Insert contact tags based on sale (churned, active....)
  11. Insert entity.contacts.tags if didn't exist (for filtering/reporting...)

18 database calls for each sale. I have 1,396 gumroad "sales" (some are $0), that's 1,396 x 18 = 25,128 DB calls.

Problem #1 - Database (Supabase) timeout

On local development (M1 + postgres.app), it took 15 seconds to insert those 1k+ sales + contacts + companies.

On "launch day" (last Friday), I was ready to test in production (Supabase), but the first problem I encountered was:

Error: Timed out fetching a new connection from the pool. Please consider reducing the number of requests or increasing the `connection_limit` parameter (https://www.prisma.io/docs/concepts/components/prisma-client/connection-management#connection-pool). Current limit: 10.

I thought doing what the error message said would fix it:

  • Set connection_limit to 0, 1, 2, 5, 10, 20...
  • Set pool_timeout to 0, 1, 2, 5, 10, 20...

But this led to the next problem.

Problem #2 - Host (Vercel) timeout

It seemed like I "fixed" the problem... connection pool now waited for all incoming DB calls. But now I got a Vercel timeout error:

504 Error 'FUNCTION_INVOCATION_TIMEOUT'.

After a bit of research, I remembered that Vercel's Hobby plans allow for 10-second functions and Pro plans give 60-second functions, but my function lasted more than that, and +60s-functions is only for Enterprise customers.

I tried everything:

  • Remove logs, webhook calls, company inserts...
  • Use Prisma.createMany function (does not return created IDs)
  • Go through each line of code to fix "N+1 select" problems
  • Played more with connection_limit and pool_timeout

Commits

I was really worried that my boilerplate was not production-ready, but then I thought of something in the shower at 4 am.

Solution - Batches of 250 rows

It may seem obvious now, but the solution was simple:

Send batches of 250 rows (yes, I tried batches of 100, 200, 500, 1,000...).

This is how it looks now on development (15s) and production (55s).

If you liked this post, follow me on Twitter or subscribe to the Newsletter for more 😃