Marketplaces11 min

Real-time coordination for marketplaces: Socket.IO patterns that scale

Socket.IO patterns for production marketplaces — rooms per listing, auth middleware on handshake, Redis adapter, sticky sessions, and the gotchas that show up at scale.

Marketplaces are coordination problems disguised as web apps. Two users need to see the same booking state within the same second, a listing's availability has to update before a second buyer commits, chat messages have to arrive in order, and typing indicators have to not feel broken. Socket.IO handles the transport reasonably well out of the box, but the patterns that work on one server fall apart on four. Here are the Socket.IO patterns we actually ship for production marketplaces — rooms, authentication, the Redis adapter, sticky sessions — and the gotchas that appear the moment horizontal scaling becomes non-optional.

Rooms as the primary addressing primitive

A room is Socket.IO's way of saying broadcast-to-a-subset. Rooms are cheap, ephemeral, and the right abstraction for almost every marketplace coordination problem. The rule we follow is: one room per coordinatable entity. A listing page is a room, a booking thread is a room, a user's personal notification stream is a room, an admin feed is a room. When a listing's price changes, the server emits to the listing's room and every client currently watching that listing sees it. When a buyer messages a seller, both sockets join the conversation room and messages fan out from there.

  • listing:{listingId} — every client currently viewing a listing page joins on mount, leaves on unmount.
  • booking:{bookingId} — buyer, seller, and any admins on a dispute join for the lifetime of the booking.
  • user:{userId} — personal notification stream; a user can have multiple tabs open, all in the same room.
  • presence:{scope} — optional, if the marketplace shows online/offline state per seller.

Authentication on the handshake

Authenticate on connect, not on the first message. Socket.IO's middleware runs during the handshake, before any event handlers fire, which means an unauthenticated socket never gets attached to any room or emitter. We pass a short-lived JWT on the client side as part of auth in the connection options, verify it server-side in middleware, and attach the decoded user to the socket instance so downstream handlers can trust it without re-checking. Session cookies also work for first-party apps; the pattern is identical, just swap the verification step.

// server/socket.ts — auth middleware + room setup
import { Server } from "socket.io";
import { createAdapter } from "@socket.io/redis-adapter";
import { createClient } from "redis";
import jwt from "jsonwebtoken";

const io = new Server(httpServer, {
  cors: { origin: process.env.APP_URL, credentials: true },
});

// Redis adapter — required once more than one Node process is running
const pub = createClient({ url: process.env.REDIS_URL });
const sub = pub.duplicate();
await Promise.all([pub.connect(), sub.connect()]);
io.adapter(createAdapter(pub, sub));

// Auth middleware runs on every handshake, before any event fires
io.use(async (socket, next) => {
  try {
    const token = socket.handshake.auth?.token;
    if (!token) return next(new Error("unauthorized"));
    const payload = jwt.verify(token, process.env.JWT_SECRET!) as {
      sub: string;
      role: "buyer" | "seller" | "admin";
    };
    socket.data.userId = payload.sub;
    socket.data.role = payload.role;
    next();
  } catch {
    next(new Error("unauthorized"));
  }
});

io.on("connection", (socket) => {
  // Auto-join the user's personal room for notifications
  socket.join(`user:${socket.data.userId}`);

  socket.on("listing:watch", async (listingId: string) => {
    // Authorize — does this user have the right to watch this listing?
    // (Public listings might skip this; private ones shouldn't.)
    socket.join(`listing:${listingId}`);
  });

  socket.on("listing:unwatch", (listingId: string) => {
    socket.leave(`listing:${listingId}`);
  });

  socket.on("booking:join", async (bookingId: string) => {
    const allowed = await canAccessBooking(socket.data.userId, bookingId);
    if (!allowed) return;
    socket.join(`booking:${bookingId}`);
  });

  socket.on("message:send", async ({ bookingId, body }) => {
    const message = await persistMessage({
      bookingId,
      senderId: socket.data.userId,
      body,
    });
    io.to(`booking:${bookingId}`).emit("message:new", message);
  });
});

Don't authorize on the handshake alone. Handshake auth tells you who the user is; every room join still needs a per-room authorization check because room names are client-supplied strings. A malicious client can try to join booking:123 whether or not they're a party to it.

Horizontal scaling with the Redis adapter

A single Socket.IO process tops out around 10,000–30,000 concurrent connections on typical Node hardware, depending on event rate and payload size. The moment a second process exists, in-memory room state fragments across them — a socket on process A and a socket on process B both joined to booking:123 don't know about each other. The Redis adapter solves this by publishing emits through Redis pub/sub so every process broadcasts to its own local sockets whenever any process emits to a room. From Redis 7.0 onward, the sharded adapter avoids the hot-channel problem of classic pub/sub and scales further.

  • One Redis client for publishing, one duplicated client for subscribing — don't share.
  • Use @socket.io/redis-adapter for most cases, @socket.io/redis-streams-adapter if message durability across restarts matters.
  • Set the adapter before io.on('connection') or early joins will miss cross-node propagation.
  • Redis Cluster works but requires the sharded adapter for correct behavior under resharding.

Sticky sessions are not optional

Socket.IO's long-polling fallback and the HTTP upgrade to WebSocket both require the same backend process for the lifetime of a connection. If the load balancer routes a polling request to a different instance mid-handshake, the session dies. Sticky sessions solve this — ALB with cookie-based stickiness, NGINX ip_hash, or HAProxy source-hashing all work. Configure the TTL to comfortably outlive the longest expected connection; default TTLs on ALB are fine for most apps. The Redis adapter does not remove the sticky-session requirement; the two solve different problems.

If deploying to Kubernetes with multiple replicas, test connection upgrades under rolling deployments specifically. A pod terminating mid-handshake is a real-world failure mode and the client reconnect behavior needs to be graceful — exponential backoff with jitter, not a tight retry loop that DDoSes the replacement pod.

Presence, typing, and read receipts

Three ancillary features that marketplaces tend to ship together. Each has a gotcha worth knowing before the first implementation.

  • Presence: track last-seen in Redis with a TTL, not in memory. On connect, SET user:{id}:online with a 60-second TTL and refresh on a heartbeat. On disconnect, let the TTL handle cleanup — explicit DELETE races with brief reconnects and produces flickery online/offline states.
  • Typing: fire-and-forget emits to the conversation room, rate-limited to one emit per 3 seconds per sender. Do not persist — it's ephemeral UX, not state. Include an explicit typing:stop event or a client-side idle timer because typing:start events drop more often than anyone expects.
  • Read receipts: persist, don't emit-only. The receiver hitting the message updates the database and emits to the sender's user room. If the sender is offline, the receipt is still there when they reconnect because it was persisted on the message row.

Events and acks — the production checklist

Socket.IO emits are fire-and-forget by default. Acks (callbacks) give the client a confirmation the server processed the event, which matters for sends that affect real state. For anything that writes to the database, use acks and surface transport failures to the UI explicitly. For broadcasts (price updates, typing indicators), skip acks — the volume isn't worth the round-trip overhead.

EventDirectionNeeds ack?Persist?Notes
listing:watchclient → serverNoNoRoom join; idempotent
listing:priceChangedserver → roomNoAlready in DBBroadcast; high volume on hot listings
message:sendclient → serverYesYesAck returns the persisted message ID
message:newserver → roomNoAlready persistedFan-out to both sides of a booking
typing:startclient → serverNoNoRate-limit per sender; broadcast to room
read:markclient → serverYesYesUpdates DB; emits read:updated to sender
booking:statusChangedserver → roomNoAlready persistedTriggered from backend state machine

Observability — the part everyone skips

Socket.IO is harder to debug than HTTP because there's no request log per event by default. Wire up four things early: connection count per instance (gauge), event rate per type (counter), event latency p50/p95 (histogram), and disconnect reasons (counter by reason). The Admin UI ships with Socket.IO and is invaluable during development, but disable it or lock it down in production. Disconnect reasons in particular surface problems that nothing else will — a spike in 'transport error' disconnects usually means the load balancer is recycling connections, and a spike in 'ping timeout' means the client's network is flaky or the server is saturated.

What we'd ship for a new marketplace today

Socket.IO 4.x on Node 22, behind an ALB with cookie stickiness and a 5-minute TTL, two Node replicas minimum behind the Redis adapter on a single-node Redis 7 (upgraded to Cluster if connection count pushes past ~50K), JWT auth on the handshake with per-room authorization checks, acks on any event that writes state, and Prometheus metrics on the four signals above. That stack handles six-figure concurrent connections without exotic architecture and fails gracefully when it doesn't.

Key takeaways

  • Use rooms as the primary addressing primitive — one room per coordinatable entity (listing, booking, user notification stream).
  • Authenticate on the handshake via middleware, then authorize on every room join — room names are client-supplied and can't be trusted.
  • The Redis adapter is required the moment a second Node process exists; sticky sessions are required independently and don't replace the adapter.
  • Use acks for events that write state, skip them for high-volume broadcasts.
  • Instrument connection count, event rate, event latency, and disconnect reasons from day one — debugging blind is painful.
  • Typing indicators and presence go in Redis with TTLs, not in memory. Read receipts persist on the message row.
#marketplaces#socket-io#websockets#real-time#redis#scaling#nodejs
Working on something similar?

Let's build it together.

We ship production SaaS, marketplaces, and web apps. If you want an engineering partner — not a consultancy — let's talk.