Home Features Docs Blog Philosophy Examples FAQ Live Demo Hosting
Documentation

Production Deployment

Deploy djust with uvicorn, Redis state backend, and Nginx load balancing

Full documentation is on docs.djust.org

This page is a lightweight reference. The complete guide — with tutorials, theming, code examples, and more — lives on our dedicated docs site.

View on docs.djust.org

On this page

Production Deployment

This guide covers deploying djust applications to production with horizontal scaling, Redis state backend, and WebSocket-aware load balancing.

Architecture Overview

                +--------------+
                | Load Balancer|
                |   (Nginx)    |
                +------+-------+
                       |
      +----------------+----------------+
      |                |                |
+-----v-----+   +-----v-----+   +-----v-----+
| Server 1  |   | Server 2  |   | Server 3  |
| (uvicorn) |   | (uvicorn) |   | (uvicorn) |
+-----+-----+   +-----+-----+   +-----+-----+
      |                |                |
      +----------------+----------------+
                       |
                 +-----v-----+
                 |   Redis   |
                 |  (State)  |
                 +-----------+

The diagrammed setup is Tier 1 (pilot / soft-launch, ≤50 concurrent active users). For larger deployments — auto-scaling, dedicated channel-layer Redis, RDS Proxy, CloudFront — see Sizing and Scaling Tiers below.

State Backend Configuration

In-Memory (Development Only)

DJUST_CONFIG = {
    'STATE_BACKEND': 'memory',
    'SESSION_TTL': 3600,
}

Single server only. State lost on restart.

Redis (Production)

import os

DJUST_CONFIG = {
    'STATE_BACKEND': 'redis',
    'REDIS_URL': os.environ.get('REDIS_URL', 'redis://localhost:6379/0'),
    'SESSION_TTL': 7200,  # 2 hours
}

Requires Redis 6.0+ and redis-py:

pip install redis

Deploy-time state invalidation

Each session's RustLiveView is cached in Redis (default TTL 1 hour) and used as the diff baseline on WebSocket reconnect. When you deploy a new release that changes a template's structure, attributes, or whitespace, a reconnecting client's cached pre-deploy view will not match the new render — the resulting patches target text-node positions that no longer exist, the patch fails, recovery HTML may be unavailable on a fresh consumer, and the user is forced through a window.location.reload().

Since v0.9.4 this is handled automatically. Per PR #1367, the Redis cache key includes an 8-hex template-source hash:

djust:state:<session>_liveview_<view_path>[_<query_hash>]_t<template_8hex>

Edit any byte of a primary template, the per-template hash flips, the cache key flips, the next reconnect misses the cache, and a fresh RustLiveView is constructed cleanly with no stale baseline.

Operators no longer need to manually rotate the prefix on every deploy. Earlier releases (pre-v0.9.4) of djust required a pattern like:

# settings.py — pre-v0.9.4 only; NO LONGER NEEDED
import os
_BUILD_ID = os.environ.get("BUILD_ID", "dev")[:12]  # truncated git SHA

DJUST_CONFIG = {
    "STATE_BACKEND": "redis",
    "REDIS_URL": "redis://...",
    "REDIS_KEY_PREFIX": f"djust:{_BUILD_ID}:",  # obsolete since v0.9.4
}

Multi-template caveat. The cache key uses the primary template's hash. If you use {% include %} / {% extends %} patterns and edit only a sub-template, the primary's source bytes don't change → the hash doesn't flip → the old key keeps hitting until the existing entries TTL out (default 1 hour). For an immediate invalidation in that edge case, run djust clear --all after the deploy. Stale entries from before the deploy still expire within default_ttl regardless.

Recovery HTML semantics

When a VDOM patch fails on the client, djust falls back to the server-rendered "recovery HTML" snapshot to repair the DOM without forcing a page reload. Two important properties to be aware of in production deployments:

  1. Recovery HTML is per-consumer and one-shot. After it's used once, the consumer clears self._recovery_html = None. Any subsequent patch failure within the same WebSocket lifetime falls through to the forced-reload error ("Recovery HTML unavailable — the server may have restarted. A page reload will fix this.").

  2. Fresh consumers start with no recovery state at all. A fresh consumer is spawned on every WebSocket reconnect — including the natural ones (rolling deploys, autoscaling events, network blips). The new consumer has no recovery HTML staged until the next successful render produces one. Any patch failure in the brief window before the first render lands skips the recovery branch entirely and goes straight to the forced reload.

Multi-task deployments amplify the user-visible impact. With auto-scaling and rolling-deploy patterns, the proportion of WebSocket frames hitting freshly-spawned consumers is non-trivial. Any patch-failure trigger that exists at all (stale state, structural shift, race) gets channeled into the forced-reload path more often than the per-failure rate would suggest.

Cross-reference. v0.9.4-1's keyed conditional VDOM diff (#1358 / PR #1365) is the architectural escape hatch: it eliminates the most common patch-failure trigger by making {% if %} structural changes diff cleanly via dj-if boundary markers, so the recovery path is exercised much less often in normal operation.

Channel Layer (for cross-process push)

DJUST_STATE_BACKEND and Django Channels' CHANNEL_LAYERS are two separate concerns that both happen to use Redis. Don't conflate them:

ConcernSettingWhat it does
State backendDJUST_STATE_BACKEND (or DJUST_CONFIG['STATE_BACKEND'])Stores per-view state (assigns) so a reconnecting client gets its view back. Single-process traffic.
Channel layerCHANNEL_LAYERS (Django Channels)Routes cross-process messages — required when ANY view uses push_to_view(), presence, cursor tracking, or DB-change notifications.

Required configuration any time your app has multiple ASGI processes or uses cross-process push:

# settings/prod.py
import os

CHANNEL_LAYERS = {
    "default": {
        "BACKEND": "channels_redis.core.RedisChannelLayer",
        "CONFIG": {"hosts": [os.environ["REDIS_CHANNEL_URL"]]},
    },
}
pip install channels_redis

For small deployments, point REDIS_URL and REDIS_CHANNEL_URL at the same Redis instance — it serves both concerns fine. Separate the instances when:

  • ElastiCache memory utilization climbs past 70% (the channel layer holds short-lived messages; the state backend holds longer-lived per-view state — eviction policies want to differ).
  • You want to scale the channel layer independently (e.g., to a Redis cluster) without touching the state backend.
  • You're auditing for blast radius and want a Redis outage to fail one concern at a time.

The InMemoryChannelLayer is development-only — it doesn't cross processes, so multi-worker / multi-server push_to_view silently no-ops.

Redis Setup

Installation

# Ubuntu/Debian
sudo apt update && sudo apt install redis-server
sudo systemctl enable redis-server && sudo systemctl start redis-server

# macOS
brew install redis && brew services start redis

# Docker
docker run -d -p 6379:6379 --name redis redis:7-alpine

Production Configuration

Edit /etc/redis/redis.conf:

bind 127.0.0.1 ::1
requirepass your-strong-password-here

# Persistence
save 900 1
save 300 10
save 60 10000

# Memory management
maxmemory 256mb
maxmemory-policy allkeys-lru

# Disable dangerous commands
rename-command FLUSHDB ""
rename-command FLUSHALL ""

ASGI Server

uvicorn myproject.asgi:application \
    --host 0.0.0.0 \
    --port 8000 \
    --workers 4 \
    --loop uvloop \
    --ws websockets \
    --log-level warning \
    --access-log

Quantified Daphne → Uvicorn benchmark

Concrete numbers from a 1 vCPU / 2 GB Fargate task running the a representative downstream-consumer app (mixed workload — health endpoint plus a DB-touching home page), identical traffic, single-task per ASGI server:

EndpointDaphnegunicorn -k uvicorn.workers.UvicornWorker -w 2
/health/ rps @ c=2018.8120.3 (6.4×)
/health/ p99 @ c=201815 ms218 ms (8.3× faster)
/ (home) rps @ c=1011.318.5 (1.6×)

Disclaimer. Numbers are from a 1 vCPU / 2 GB Fargate task with the a representative downstream-consumer app. Per-app variance is expected; absolute numbers will differ for your workload, but the relative improvement (~6× rps headroom on health-check endpoints) has been consistent across multiple deployments. The home-page row is a useful indicator that DB-touching pages also gain headroom — just not as dramatically, because the bottleneck shifts from ASGI server overhead to ORM/DB latency.

Gunicorn + Uvicorn Workers

For containerized production deployments (Fargate, GKE, Cloud Run, Fly.io machines, etc.), Gunicorn supervising Uvicorn workers gives you process-based supervision (worker recycling, graceful reloads, memory-based restarts) on top of Uvicorn's async event loop:

gunicorn -k uvicorn.workers.UvicornWorker \
    -w 2 \
    -b 0.0.0.0:8000 \
    --timeout 120 \
    --keep-alive 5 \
    --access-logfile - \
    --error-logfile - \
    myproject.asgi:application

Flag rationale:

  • -w 2 — for ASGI workers, the right rule is cpu_count, not (2*cpu)+1 (that's for synchronous Gunicorn workers). Each Uvicorn worker is itself async-concurrent; oversubscribing thrashes the event loop. On a 1 vCPU container, -w 2 gives you 2× headroom on a single CPU at the cost of context-switching — usually still a win for mixed workloads where one worker can be busy with sync_to_async while the other handles WS frames.
  • --timeout 120 — Gunicorn's default 30s is too short for AI/OCR/upload mutations. Pick a timeout long enough that legitimate slow handlers don't get killed but short enough that runaway tasks get cleaned up. Raise to 300s if you have multi-minute synchronous handlers (consider Celery instead, though).
  • --keep-alive 5 — keep slightly less than your LB's idle timeout (AWS ALB defaults to 60s, Cloudflare to 100s). The mismatch direction matters: app-side timeout must be SHORTER than LB-side, otherwise the LB closes a connection the app still thinks is open.
  • --access-logfile - and --error-logfile - — log to stdout/stderr so the container collector picks them up. No logrotate, no log volume, no permission issues.

When to use Uvicorn standalone (the previous section) vs Gunicorn+Uvicorn:

  • Uvicorn standalone: single-process dev or single-container with N processes via Uvicorn's own --workers flag.
  • Gunicorn+Uvicorn: production. Worker recycling (--max-requests), graceful restarts on SIGHUP, memory-based restarts (--max-requests-jitter) all live at the Gunicorn layer. Most cloud-platform health checks expect a Gunicorn-supervised process tree.

Worker count formula: - For pure-async workloads (-k uvicorn.workers.UvicornWorker): cpu_count (e.g., -w 2 on 1 vCPU, -w 4 on 2 vCPU). - For sync workloads (-k sync — not used by djust): (2 x cpu_count) + 1.

WebSocket per-message compression (permessage-deflate)

VDOM patches are highly compressible — typical gzip ratios of 60-80% reduction in wire size for repetitive HTML fragments and JSON patch structures. Both Uvicorn (with the websockets library) and Daphne support the permessage-deflate WebSocket extension out of the box and negotiate it with any modern browser client.

No code change needed in your djust app — the compression is transparent. To verify it's active:

# Confirm the djust config reflects compression state
from djust.config import config
config.get("websocket_compression")  # True by default

Memory tradeoff. Each active connection holds a zlib compression context, roughly ~64 KB per connection. For typical deployments (<10k concurrent WebSockets per worker) this is fine — the bandwidth savings dwarf the RSS cost. High-connection-density deployments (100k+ per worker) may want to disable compression:

# settings.py
DJUST_WS_COMPRESSION = False  # disable permessage-deflate negotiation

Note that disabling compression in djust's config is advisory — the actual wire-level compression is negotiated by the ASGI server. To enforce the no-compression decision at the server level, pass the appropriate flag to Uvicorn / Daphne (e.g. Uvicorn's --ws-per-message-deflate=false when using websockets). See the Deployment Checklist for the full set of ASGI-server flags.

Do not combine with a compressing CDN. Cloudflare, AWS CloudFront, and similar CDNs will double-compress and burn CPU on both sides. Either turn off compression at the CDN for the /ws/ path, or disable it in djust.

Nginx Configuration

upstream djust_backend {
    server 10.0.1.10:8000;
    server 10.0.1.11:8000;
    server 10.0.1.12:8000;
    ip_hash;  # Sticky sessions for WebSocket
}

server {
    listen 443 ssl http2;
    server_name example.com;

    # WebSocket
    location /ws/ {
        proxy_pass http://djust_backend;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_read_timeout 86400;
    }

    # HTTP
    location / {
        proxy_pass http://djust_backend;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
}

Database Connection Pooling

djust apps using Postgres need to manage DB connections carefully — every web process and every Celery worker opens its own connections, and the count multiplies fast. Three layers, in order of effort:

Layer 1: Django connection reuse (CONN_MAX_AGE)

Free, built into Django:

# settings/prod.py
DATABASES = {
    "default": {
        "ENGINE": "django.db.backends.postgresql",
        "NAME": ...,
        # Reuse the connection for 60s instead of opening a fresh one per request.
        "CONN_MAX_AGE": 60,
        # Django 4.1+: probe the connection on checkout; reconnect if dead.
        "CONN_HEALTH_CHECKS": True,
    },
}

Sufficient for small deployments (< 10 web processes total). Each process keeps a small pool of warm connections; the request-rate amortizes the connect cost.

Layer 2: PgBouncer (cloud-agnostic external pooler)

When CONN_MAX_AGE isn't enough — typically when you have many small containers, or you start hitting Postgres's max_connections ceiling — run PgBouncer in front of Postgres:

# /etc/pgbouncer/pgbouncer.ini
[databases]
* = host=postgres-primary.internal port=5432

[pgbouncer]
listen_port = 6432
listen_addr = 0.0.0.0
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

App-side: point DATABASE_URL at the PgBouncer port (6432) instead of Postgres (5432).

Caveat — transaction-pooling mode breaks LISTEN/NOTIFY: PgBouncer transaction mode multiplexes connections across clients between transactions, so LISTEN registrations don't survive. djust's database-change notifications feature uses LISTEN/NOTIFY; if you use that feature, either run PgBouncer in pool_mode = session (less efficient) or run a dedicated direct-connection path for the listener. Same caveat applies to other Postgres features that rely on session-scoped state (advisory locks across statements, cursors, prepared statements).

Layer 3: Cloud-managed pooler (AWS RDS Proxy / GCP Cloud SQL Proxy)

Cloud-specific managed alternative to PgBouncer. AWS RDS Proxy example:

# Terraform
resource "aws_db_proxy" "main" {
  name                   = "djust-app-proxy"
  engine_family          = "POSTGRESQL"
  role_arn               = aws_iam_role.rds_proxy.arn
  vpc_subnet_ids         = var.private_subnet_ids
  require_tls            = true

  auth {
    auth_scheme = "SECRETS"
    iam_auth    = "REQUIRED"
    secret_arn  = aws_secretsmanager_secret.db.arn
  }
}

Tuning: - max_connections_percent = 75 — leave 25% of your DB's max_connections for ad-hoc admin connections, migrations, and emergency direct access. - idle_client_timeout — match your Django SESSION_COOKIE_AGE. When a logged-out user's session expires, their app-side connection cleans up in sync. - require_tls = true — the proxy terminates TLS to the proxy itself; the proxy-to-DB hop runs in the VPC.

When using RDS Proxy, set CONN_MAX_AGE = 0 in Django — let the proxy pool. Double-pooling (Django pool + RDS Proxy pool) defeats the proxy's purpose and can cause stale-connection errors.

When to escalate

  • Tier 1: CONN_MAX_AGE = 60. Sufficient up to ~30 active Aurora connections.
  • Tier 2: PgBouncer or RDS Proxy. Required when max_connections ceiling is in sight or when your container count keeps growing.
  • Tier 3: same poolers, sized up; add Aurora read replicas for read/write split (separate concern; covered in Sizing and Scaling Tiers).

Docker Compose

version: '3.8'

services:
  redis:
    image: redis:7-alpine
    command: redis-server --requirepass ${REDIS_PASSWORD}
    volumes:
      - redis-data:/data

  web:
    build: .
    command: uvicorn myproject.asgi:application --host 0.0.0.0 --port 8000
    environment:
      - REDIS_URL=redis://:${REDIS_PASSWORD}@redis:6379/0
      - SESSION_TTL=7200
    depends_on:
      - redis
    deploy:
      replicas: 3

  nginx:
    image: nginx:alpine
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
    ports:
      - "80:80"
      - "443:443"
    depends_on:
      - web

volumes:
  redis-data:

Celery Integration

Most production djust apps use Celery for background work — long-running mutations, scheduled jobs, OCR/AI calls, email sending, scheduled DB cleanup. djust doesn't require Celery, but if you have it, deploy it correctly.

Why Celery for djust apps

  • Long-running mutations: handlers that need to call slow external APIs (Textract, S3, third-party AI). Better to enqueue and notify the user when done than block the WebSocket event handler.
  • Side-effect isolation: emails, audit logs, third-party webhooks. Failure of these shouldn't fail the user's request.
  • Scheduled jobs: deadline checks, periodic syncs, cleanup tasks via celery beat.

Enqueue from views: transaction.on_commit()

Always enqueue Celery tasks via transaction.on_commit(), never directly:

from django.db import transaction
from .tasks import process_document

@event_handler
def upload_document(self, file):
    doc = Document.objects.create(file=file)
    transaction.on_commit(lambda: process_document.delay(doc.id))

Why: a direct .delay(doc.id) enqueue can fire BEFORE the transaction commits. The Celery worker then picks up the task, queries for Document.objects.get(pk=doc.id), and gets DoesNotExist — race between web-side write and worker-side read.

Broker choice: Redis vs SQS

BrokerUse caseProsCons
RedisSelf-hosted; small/medium deployments; same Redis already in useLow ops, fast, in-process at scaleSingle point of failure unless clustered; message durability depends on persistence config
AWS SQSAWS-native deploymentsManaged, durable, scales horizontally for free, IAM-authedAWS-specific, slightly higher latency, requires kombu[sqs]

Redis broker config:

# settings/prod.py
CELERY_BROKER_URL = os.environ["CELERY_BROKER_URL"]  # redis://:pw@host:6379/1
CELERY_RESULT_BACKEND = os.environ["CELERY_RESULT_BACKEND"]  # redis://:pw@host:6379/2

SQS broker config:

# settings/prod.py
if CELERY_BROKER_URL.startswith("sqs://"):
    CELERY_BROKER_TRANSPORT_OPTIONS = {
        "region": os.environ.get("AWS_REGION", "us-east-1"),
        "predefined_queues": {
            "celery": {"url": os.environ["CELERY_SQS_QUEUE_URL"]},
            "default": {"url": os.environ["CELERY_SQS_QUEUE_URL"]},
        },
    }
pip install 'celery[redis]'   # for Redis broker
pip install 'kombu[sqs]'      # for SQS broker (alongside celery)

Pool choice: prefork vs gevent

PoolUse caseConcurrency rule
prefork (default)CPU-bound tasks, lots of RAM per task, can't run untrusted code in shared process--concurrency=NvCPU * 1 (Django bootstrap costs RAM per process)
geventI/O-bound tasks: HTTP calls, S3, SES, Textract, third-party APIs--concurrency=N20-50 per vCPU for purely I/O

For djust apps doing lots of network I/O (file uploads, AI/OCR, email), gevent is dramatically more efficient: 20+ in-flight tasks on a single vCPU vs 1 with prefork.

celery -A myproject worker -l info \
    -Q celery,default \
    --concurrency=20 \
    --pool=gevent \
    --without-gossip --without-mingle
pip install gevent

gevent monkey-patch gotcha: gevent patches socket, ssl, time, etc. at import time — and the patches MUST be applied beforeimport django.*. The --pool=gevent flag handles this for the standard worker entrypoint, but if you have a custom startup script that imports Django modules before invoking Celery, you need to call gevent.monkey.patch_all() first explicitly. Symptom of this bug: occasional RecursionError or ssl: WRONG_VERSION_NUMBER from gevent-patched code calling pre-patched primitives.

Beat scheduler is a singleton — never scale it

celery -A myproject beat is the scheduler that fires periodic tasks. Run exactly one instance, ever. Two beat instances = duplicate task firing = duplicate side effects (double emails, double DB writes, double API calls).

celery -A myproject beat -l info --schedule=/tmp/celerybeat-schedule

If you're running on Fargate / ECS / Kubernetes, lock desired_count = 1 (or replicas: 1) in your infrastructure-as-code AND add a lifecycle guard / comment so a future "let's scale this up" change doesn't accidentally double-schedule. Some teams use django-celery-beat's database scheduler with a leader-election lock; that's overkill for most apps — just don't scale.

Worker auto-scaling: scale on queue depth, not CPU

I/O-bound workers using gevent have low CPU even when work is piling up — auto-scaling on CPU sees a flat 5% and never scales out. Scale on queue depth instead:

  • AWS SQS: ApproximateNumberOfMessagesVisible > 50 → scale-out target.
  • Redis broker: LLEN celery (queue list length) > 50 → scale-out target.
  • RabbitMQ: queue messages_ready count.

CPU-bound prefork workers can scale on CPU as usual.

Beat schedule example

# settings/prod.py
from celery.schedules import crontab

CELERY_BEAT_SCHEDULE = {
    "cleanup-expired-sessions": {
        "task": "myapp.tasks.cleanup_expired_sessions",
        "schedule": crontab(minute=0, hour="*/2"),  # Every 2 hours
    },
    "sync-third-party-data": {
        "task": "myapp.tasks.sync_third_party",
        "schedule": crontab(minute=15, hour=2),  # Daily 2:15am
    },
}

Session Management

Session TTL Guidelines

Use CaseRecommended TTL
E-commerce1-2 hours
Admin dashboards4-8 hours
Public content30-60 minutes
Real-time apps15-30 minutes

Cleanup for In-Memory Backend

Redis handles expiration automatically. For the memory backend, set up periodic cleanup:

# management/commands/cleanup_sessions.py
from django.core.management.base import BaseCommand
from djust.live_view import cleanup_expired_sessions

class Command(BaseCommand):
    help = 'Clean up expired LiveView sessions'

    def handle(self, *args, **options):
        cleaned = cleanup_expired_sessions(ttl=3600)
        self.stdout.write(f'Cleaned {cleaned} expired sessions')
# Cron: run every hour
0 * * * * cd /path/to/project && python manage.py cleanup_sessions

Health Check Endpoint

from django.http import JsonResponse
from djust.state_backend import get_backend

def health_check(request):
    try:
        stats = get_backend().get_stats()
        return JsonResponse({
            'status': 'healthy',
            'backend': stats['backend'],
            'sessions': stats['total_sessions'],
        })
    except Exception as e:
        return JsonResponse({'status': 'unhealthy', 'error': str(e)}, status=503)

Deploying Behind an L7 Load Balancer (AWS ALB, Cloudflare, Fly.io)

When djust runs behind a trusted L7 load balancer that terminates TLS and health-checks task IPs directly (AWS ALB, Cloudflare, Fly.io, GCP External HTTP(S) LB, etc.), the task's private IP rotates on every redeploy or autoscale event. Enumerating them in ALLOWED_HOSTS is not feasible, and setting ALLOWED_HOSTS = ['*'] normally trips the djust.A010 system check.

djust provides an explicit opt-in for this topology: set bothSECURE_PROXY_SSL_HEADER and a new DJUST_TRUSTED_PROXIES list/tuple, and djust.A010 / djust.A011 will recognize that the trusted proxy has already performed Host validation at the edge.

# settings.py — only when you really are behind a trusted L7 proxy

ALLOWED_HOSTS = ['*']  # task IPs rotate; edge proxy performs Host validation

SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')

# Identify the trusted proxy terminating requests. Any non-empty list/tuple
# opts in — contents are informational and can be used by your middleware.
DJUST_TRUSTED_PROXIES = ['aws-alb']  # or ['cloudflare'], ['fly-edge'], etc.

Security warning. Only use this escape hatch if you are actually behind a trusted L7 proxy that performs Host validation at the edge. On a raw VM / bare-metal with no proxy in front, ALLOWED_HOSTS = ['*'] opens you up to Host-header attacks — don't paper over the system check with this setting.

If only one of the two settings is present, djust.A010 / djust.A011 still fire — both are required so a single typo can't accidentally disable the check.

WebSocket stickiness on AWS ALB

ALB stickiness keeps a client's WebSocket connection pinned to the same task across reconnects. The literal solution is app_cookie stickiness with an application-set cookie, but the simpler shortcut is to use Django's existing sessionid cookie:

# Terraform — ALB target group
stickiness {
  type            = "app_cookie"
  cookie_name     = "sessionid"
  cookie_duration = 86400  # 24h
  enabled         = true
}

How it works: 1. User loads any page → Django's SessionMiddleware sets sessionid on the response. 2. Browser opens WebSocket → upgrade request includes the sessionid cookie. 3. ALB routes the upgrade by sessionid hash → pinned to the same task. 4. If the task dies and the client reconnects, ALB picks a healthy task and the cookie now hashes to the new one — state survives via the Redis state backend.

Caveat: pages that open a WebSocket WITHOUT a prior session-creating request won't have the cookie set yet → the first WS connect goes to whichever target ALB picks. After that, subsequent reconnects have sessionid and pin correctly. This is fine when SessionMiddleware is in MIDDLEWARE (default for any Django app); if you've removed it, you'll want to set a fresh cookie on the WS upgrade response yourself.

For higher-stakes setups (financial, healthcare), a dedicated NLB in front of Daphne/Uvicorn for the WS path with source-IP stickiness gives stronger guarantees — see Sizing and Scaling Tiers Tier 3.

Static and Media Files

For containerized deployments, do NOT serve /static/ and /media/ from the ASGI server in production. Each request consumes worker capacity that should be reserved for live-view traffic.

Cloud-agnostic options

OptionUse case
WhiteNoiseSmall deployments, single container, no CDN budget
Nginx in frontSelf-hosted bare-metal / VM
S3 + CloudFrontAWS deployments
GCS + Cloud CDNGCP
Cloudflare R2 + Cloudflare CDNMulti-cloud / cost-sensitive

Cloud storage configuration (S3 example)

# settings/prod.py
AWS_STORAGE_BUCKET_NAME = os.environ["AWS_STORAGE_BUCKET_NAME"]

STORAGES = {
    "default": {
        "BACKEND": "storages.backends.s3boto3.S3Boto3Storage",
        "OPTIONS": {
            "bucket_name": AWS_STORAGE_BUCKET_NAME,
            "region_name": os.environ.get("AWS_REGION", "us-east-1"),
            "default_acl": "private",       # User uploads — never public
            "file_overwrite": False,
            "querystring_auth": True,        # Presigned URLs for media
            "querystring_expire": 3600,
        },
    },
    "staticfiles": {
        "BACKEND": "storages.backends.s3boto3.S3StaticStorage",
        "OPTIONS": {
            "bucket_name": AWS_STORAGE_BUCKET_NAME + "-static",
            "region_name": os.environ.get("AWS_REGION", "us-east-1"),
            # Static files are public, long-cached
        },
    },
}

STATIC_URL = f"https://{os.environ['CDN_DOMAIN']}/static/"
pip install 'django-storages[s3]'

Offloading static-file serving from ASGI

When a CDN serves /static/ directly, your ASGI server doesn't need to. Set:

# Environment variable
ASGI_SERVE_STATIC=False

And in asgi.py, gate the static-files handler on this:

import os
from django.core.asgi import get_asgi_application
from django.contrib.staticfiles.handlers import ASGIStaticFilesHandler
from channels.routing import ProtocolTypeRouter, URLRouter

django_asgi_app = get_asgi_application()

if os.environ.get("ASGI_SERVE_STATIC", "true").lower() in ("true", "1", "yes"):
    http_app = ASGIStaticFilesHandler(django_asgi_app)
else:
    http_app = django_asgi_app  # CDN serves static; ASGI handles only Django views

application = ProtocolTypeRouter({
    "http": http_app,
    "websocket": ...,
})

CloudFront cache behavior split

Configure two cache behaviors on your CloudFront distribution: - /static/* — long TTL (1 year), versioned via ManifestStaticFilesStorage hashes; static assets never change once deployed. - /media/*Cache-Control: no-cache, presigned-URL passthrough; media is per-user and presigned URLs already enforce expiry.

Don't combine /static/ and /media/ under one cache behavior — the long-TTL bucket would cache user-private media URLs.

Deployment Checklist

Production checklist (one-page recipe)

The 8-line copy-pasteable recipe. Each line links to the relevant subsection of this guide for rationale and config:

☐ ASGI server: gunicorn -k uvicorn.workers.UvicornWorker -w (CPUs+1)
☐ Channel layer: channels_redis.core.RedisChannelLayer (not InMemoryChannelLayer)
☐ State backend: DJUST_CONFIG["STATE_BACKEND"] = "redis"
☐ State key invalidation: auto-derived from template hash (no env var needed since v0.9.4)
☐ ALB sticky sessions: app_cookie on "sessionid"
☐ CONN_MAX_AGE = 60, CONN_HEALTH_CHECKS = True
☐ App Auto Scaling registered; aws_ecs_service has lifecycle ignore_changes = [desired_count]
☐ Celery: --pool=gevent --concurrency=N for I/O work; queue-depth autoscaling

Where each line is covered:

Infrastructure

  • [ ] Redis 6.0+ installed with password protection
  • [ ] Redis bound to private network only
  • [ ] STATE_BACKEND='redis' in production settings
  • [ ] Sensitive config in environment variables
  • [ ] ASGI server (uvicorn) with multiple workers
  • [ ] Nginx with WebSocket proxy and sticky sessions
  • [ ] HTTPS enabled for all connections
  • [ ] proxy_read_timeout 86400 set for WebSocket routes

Application

  • [ ] Session cleanup configured (cron or Celery)
  • [ ] Health check endpoint added
  • [ ] Logging configured for djust.state_backend
  • [ ] CSRF protection enabled
  • [ ] SESSION_TTL tuned for your use case

Verification

  • [ ] Load test with concurrent WebSocket connections
  • [ ] Verify state sharing across multiple servers
  • [ ] Test reconnection after server restart
  • [ ] Monitor Redis memory under load
  • [ ] Confirm browser back/forward works after navigation

Sizing and Scaling Tiers

Three tiers indexed to concurrent active users. The numbers are heuristics for typical mixed-workload djust apps; CPU-heavy apps (large VDOM diffs, complex template rendering) and I/O-heavy apps (lots of push_to_view, presence, third-party APIs) shift the curves. Use these as a starting point and adjust based on your own load testing.

Don't pre-build for higher tiers — usage data should justify each escalation. Premature scaling adds cost and operational surface area without improving the user experience.

Tier 1: Pilot / soft-launch (≤50 concurrent active users)

ComponentSizing
Web2 instances, 1 vCPU / 2 GB each, gunicorn -w 2
Worker (if Celery)1 instance, 1 vCPU / 2 GB, --concurrency=20 --pool=gevent for I/O
Beat (if Celery)1 instance (singleton), 0.25 vCPU / 0.5 GB
Redis1 instance for state + channel layer (e.g., cache.t3.micro on AWS)
DatabaseStandard managed Postgres, no proxy, CONN_MAX_AGE=60
Static/mediaWhiteNoise OR Nginx OR small CDN

Two web instances buy you HA + zero-downtime deploys. Single Redis instance is fine.

Tier 2: Production launch (50-500 concurrent active users)

ComponentSizing
WebAuto-scale 2-6 instances on CPU 50% target; cooldown 60s scale-out / 300s scale-in
WorkerAuto-scale 1-4 on queue depth (NOT CPU — I/O-bound)
BeatStays at 1
RedisEither upsize the single instance (e.g., cache.t3.small) OR separate state vs channel-layer instances
DatabaseAdd PgBouncer or RDS Proxy; set CONN_MAX_AGE=0 if pooling externally
Static/mediaCDN required (CloudFront / Cloud CDN / Cloudflare); set ASGI_SERVE_STATIC=False

Auto-scaling triggers (AWS CloudWatch examples; equivalent metrics exist on every cloud):

  • Web CPUUtilization > 50% for 60s → scale out by 1
  • Web CPUUtilization < 30% for 300s → scale in by 1
  • Worker ApproximateNumberOfMessagesVisible > 50 for 60s → scale out by 1
  • Worker ApproximateNumberOfMessagesVisible < 5 for 300s → scale in by 1

Tier 3: High-scale (>500 concurrent active users)

ComponentSizing
WebSocket transportDedicated NLB in front of ASGI for WS-only path; ALB stays for HTTP
RedisElastiCache cluster mode (3+ shards) with separate state vs channel-layer instances
DatabaseAurora read replica(s) + read/write split via DATABASE_ROUTERS
CachingApplication-level view cache for hot pages (django.core.cache Redis-backed)
SearchMove full-text search out of Postgres into Elasticsearch / OpenSearch

Tier 3 is genuinely different — the architecture changes, not just the numbers.

When to escalate

FromTrigger
Tier 1 → 2Web CPU > 60% sustained for an hour, OR Aurora active connections > 30, OR ALB target response p95 > 2s, OR memory > 80%
Tier 2 → 3Web tasks pegged at max for >2 hours, OR Aurora ACUs sustained > 8, OR ElastiCache memory > 70%, OR ALB request rate > 5k req/s

Set CloudWatch alarms (or equivalents) for each trigger. Don't escalate on a hunch — wait for data.

What's Already Production-Ready in djust

Anti-recommendation list. Things you should NOT re-evaluate on every deployment — they're canonical patterns built into the framework:

  • djust state in Redis is the canonical multi-server pattern (DJUST_STATE_BACKEND = redis://...). View state survives container replacement; reconnects within SESSION_TTL get the user's view back.
  • channels_redis is required (not optional) when any view uses push_to_view, presence, cursor tracking, or any cross-process feature. Don't try to make it work with InMemoryChannelLayer in production.
  • sync_to_async for ORM in event handlers is built into the @event_handler decorator — no manual wrapping needed in your view code.
  • transaction.on_commit() for Celery enqueue is the right pattern; never .delay(...) directly from a view.
  • WebSocket Origin check (check_origin) is on by default since v0.4.1 (CSWSH protection). Don't disable.
  • HSTS preload, secure cookies, CSP nonce — all covered by djust's recommended settings/prod.py. Disabling for "convenience" opens real attack surface.
  • @event_handler decorator validates handler names + applies rate limits. Don't bypass with raw consumer methods.
  • VDOM diffing in Rust — there's nothing to "tune". The diff algorithm is O(N log N) keyed-LIS reconciliation; it's already as fast as you can reasonably get without compromising correctness.

If you find yourself building infrastructure to work around any of these (e.g., a custom cross-process push mechanism that bypasses channels_redis), step back and confirm the canonical path doesn't already do what you need.

Troubleshooting

Redis connection errors: Verify Redis is running (redis-cli ping), check URL and password, confirm firewall rules.

Session not found after restart: Ensure STATE_BACKEND='redis' and Redis persistence is enabled (save directives in redis.conf).

Memory issues: Increase maxmemory, enable allkeys-lru eviction, reduce SESSION_TTL, run cleanup more frequently.

Serialization errors: Ensure djust version matches across all servers, clear Redis cache, and verify Rust extension is compiled for the correct Python version.