Doc API
Back to blog

Batch PDF Generation: Generating Hundreds of PDFs at Scale

·7 min read

Generating one PDF on demand is easy. Generating 500 PDFs for a monthly payroll run, 10,000 certificates after a conference, or nightly reports for every customer — that's a different problem.

This guide covers patterns for batch PDF generation: concurrency control, storage, queue-based processing, and error handling.

The naive approach and why it fails

The obvious implementation loops over records and generates one PDF at a time:

for (const employee of employees) {
  const pdf = await generatePdf(employee)
  await uploadToS3(pdf, employee.id)
}

For 500 employees, this might take 500 × 1.5s = 12.5 minutes. If any single request fails, the job stops or you lose track of what completed.

Concurrent generation with a queue

Use a concurrency-limited queue to parallelize requests without overloading the API or your own infrastructure:

import PQueue from 'p-queue'
 
interface BatchResult {
  id: string
  success: boolean
  error?: string
}
 
async function generateBatch(
  items: { id: string; html: string }[],
  apiKey: string,
  concurrency = 10
): Promise<BatchResult[]> {
  const queue = new PQueue({ concurrency })
  const results: BatchResult[] = []
 
  await queue.addAll(
    items.map(item => async () => {
      try {
        const response = await fetch('https://api.docapi.co/v1/pdf', {
          method: 'POST',
          headers: {
            'x-api-key': apiKey,
            'Content-Type': 'application/json',
          },
          body: JSON.stringify({
            html: item.html,
            options: { format: 'A4', printBackground: true },
          }),
        })
 
        if (!response.ok) {
          const text = await response.text()
          results.push({ id: item.id, success: false, error: `HTTP ${response.status}: ${text}` })
          return
        }
 
        const buffer = Buffer.from(await response.arrayBuffer())
        await uploadToS3(buffer, `pdfs/${item.id}.pdf`)
        results.push({ id: item.id, success: true })
      } catch (err) {
        const msg = err instanceof Error ? err.message : String(err)
        results.push({ id: item.id, success: false, error: msg })
      }
    })
  )
 
  return results
}

concurrency = 10 means at most 10 in-flight requests at once. If each takes 1.5s, 500 PDFs finish in ~75 seconds instead of 12 minutes.

Choosing the right concurrency

The right concurrency depends on your plan limits and PDF complexity:

  • Simple documents (invoices, pay stubs): 10–20 concurrent
  • Complex documents (charts, large images, web fonts): 5–10 concurrent
  • Huge batches (1000+): Use a queue system (see below)

Monitor the X-Credits-Remaining header to avoid running out mid-batch:

const response = await fetch('https://api.docapi.co/v1/pdf', { ... })
const remaining = parseInt(response.headers.get('X-Credits-Remaining') ?? '9999')
 
if (remaining < items.length - processed) {
  throw new Error(`Insufficient credits: ${remaining} remaining, ${items.length - processed} left to generate`)
}

Storing PDFs in S3

For batch jobs, stream to S3 rather than holding all buffers in memory:

import { S3Client, PutObjectCommand } from '@aws-sdk/client-s3'
 
const s3 = new S3Client({ region: 'us-east-1' })
 
async function uploadToS3(buffer: Buffer, key: string): Promise<string> {
  await s3.send(new PutObjectCommand({
    Bucket: process.env.PDF_BUCKET!,
    Key: key,
    Body: buffer,
    ContentType: 'application/pdf',
    ServerSideEncryption: 'AES256',
  }))
 
  return `s3://${process.env.PDF_BUCKET}/${key}`
}

For pay stubs and other sensitive documents, enable SSE. For certificates and public reports, you can skip encryption.

Progress tracking with a database

For long-running batch jobs, track progress so you can resume after failures and show users a progress bar:

// Schema:
// batch_jobs(id, total, completed, failed, status, created_at, completed_at)
// batch_items(id, batch_id, item_id, status, error, pdf_url)
 
async function runBatchJob(batchId: string, items: BatchItem[]) {
  await db.batchJobs.update(batchId, { status: 'running' })
 
  const queue = new PQueue({ concurrency: 10 })
 
  await queue.addAll(
    items.map(item => async () => {
      await db.batchItems.update(item.id, { status: 'processing' })
 
      try {
        const html = await buildHtml(item)
        const response = await fetch('https://api.docapi.co/v1/pdf', {
          method: 'POST',
          headers: { 'x-api-key': process.env.DOCAPI_KEY!, 'Content-Type': 'application/json' },
          body: JSON.stringify({ html, options: { format: 'A4', printBackground: true } }),
        })
 
        const buffer = Buffer.from(await response.arrayBuffer())
        const pdfUrl = await uploadToS3(buffer, `batches/${batchId}/${item.id}.pdf`)
 
        await db.batchItems.update(item.id, { status: 'done', pdf_url: pdfUrl })
        await db.batchJobs.increment(batchId, 'completed')
      } catch (err) {
        const error = err instanceof Error ? err.message : String(err)
        await db.batchItems.update(item.id, { status: 'failed', error })
        await db.batchJobs.increment(batchId, 'failed')
      }
    })
  )
 
  const stats = await db.batchJobs.get(batchId)
  await db.batchJobs.update(batchId, {
    status: stats.failed > 0 ? 'partial' : 'complete',
    completed_at: new Date(),
  })
}

Queue-based processing for very large batches

For batches over 1,000 items, don't run everything in a single process — use a job queue.

With BullMQ (Redis-backed):

import { Queue, Worker } from 'bullmq'
import Redis from 'ioredis'
 
const redis = new Redis(process.env.REDIS_URL!)
const pdfQueue = new Queue('pdf-generation', { connection: redis })
 
// Producer: enqueue all items
async function enqueueBatch(batchId: string, items: BatchItem[]) {
  await pdfQueue.addBulk(
    items.map(item => ({
      name: 'generate-pdf',
      data: { batchId, itemId: item.id },
      opts: { attempts: 3, backoff: { type: 'exponential', delay: 2000 } },
    }))
  )
}
 
// Worker: process items
const worker = new Worker('pdf-generation', async (job) => {
  const { batchId, itemId } = job.data
  const item = await db.batchItems.get(itemId)
  const html = await buildHtml(item)
 
  const response = await fetch('https://api.docapi.co/v1/pdf', {
    method: 'POST',
    headers: { 'x-api-key': process.env.DOCAPI_KEY!, 'Content-Type': 'application/json' },
    body: JSON.stringify({ html, options: { format: 'A4', printBackground: true } }),
  })
 
  const buffer = Buffer.from(await response.arrayBuffer())
  const pdfUrl = await uploadToS3(buffer, `batches/${batchId}/${itemId}.pdf`)
 
  await db.batchItems.update(itemId, { status: 'done', pdf_url: pdfUrl })
}, {
  connection: redis,
  concurrency: 20,  // 20 concurrent jobs per worker process
})

BullMQ handles retries, dead-letter queues, and rate limiting automatically. Scale by adding more worker processes.

AWS Lambda batch handler

For serverless environments, trigger batch generation via a Lambda function invoked by SQS:

import { SQSHandler } from 'aws-lambda'
 
export const handler: SQSHandler = async (event) => {
  for (const record of event.Records) {
    const { batchId, itemId } = JSON.parse(record.body)
 
    const item = await db.batchItems.get(itemId)
    const html = await buildHtml(item)
 
    const response = await fetch('https://api.docapi.co/v1/pdf', {
      method: 'POST',
      headers: { 'x-api-key': process.env.DOCAPI_KEY!, 'Content-Type': 'application/json' },
      body: JSON.stringify({ html, options: { format: 'A4', printBackground: true } }),
    })
 
    const buffer = Buffer.from(await response.arrayBuffer())
    await uploadToS3(buffer, `batches/${batchId}/${itemId}.pdf`)
    await db.batchItems.update(itemId, { status: 'done' })
  }
}

SQS provides natural retry behavior — failed messages return to the queue and are reprocessed. Configure maxReceiveCount on the DLQ to limit retries.

Error handling and retries

Always build retry logic for transient failures (network timeouts, temporary API errors):

async function generateWithRetry(html: string, maxRetries = 3): Promise<Buffer> {
  let lastError: Error | null = null
 
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await fetch('https://api.docapi.co/v1/pdf', {
        method: 'POST',
        headers: { 'x-api-key': process.env.DOCAPI_KEY!, 'Content-Type': 'application/json' },
        body: JSON.stringify({ html, options: { format: 'A4', printBackground: true } }),
        signal: AbortSignal.timeout(30_000),  // 30s timeout
      })
 
      if (response.status === 429) {
        // Rate limited — wait and retry
        const retryAfter = parseInt(response.headers.get('Retry-After') ?? '5')
        await new Promise(r => setTimeout(r, retryAfter * 1000))
        continue
      }
 
      if (!response.ok) {
        throw new Error(`HTTP ${response.status}`)
      }
 
      return Buffer.from(await response.arrayBuffer())
    } catch (err) {
      lastError = err instanceof Error ? err : new Error(String(err))
      if (attempt < maxRetries) {
        await new Promise(r => setTimeout(r, 1000 * attempt))  // exponential backoff
      }
    }
  }
 
  throw lastError
}

Sending batch results to users

After a payroll run, email each employee a link to their pay stub:

const results = await runBatchJob(batchId, employees)
 
for (const result of results.filter(r => r.success)) {
  const signedUrl = await getSignedS3Url(`batches/${batchId}/${result.id}.pdf`, 7 * 24 * 3600)
 
  await sendEmail({
    to: result.email,
    subject: 'Your pay stub is ready',
    body: `Download your pay stub: ${signedUrl}`,
  })
}

Use signed S3 URLs with a 7-day expiry so employees can download their document without exposing your bucket publicly.

Get started

npm install @docapi/sdk p-queue
DOCAPI_KEY=pk_your_key_here

Get a key at docapi.co/signup — 10 free PDFs, no credit card required.

Batch PDF Generation: Generating Hundreds of PDFs at Scale | Doc API Blog | Doc API