HLDintermediate

Message Queues and Pub-Sub

Queues and pub-sub systems decouple producers from consumers, absorb traffic spikes, and enable asynchronous workflows. They are foundational in notifications, analytics, billing, and event-driven systems.

Reading time

12 min

queuespub-subkafkarabbitmqasynchronous

Why Messaging Helps

Synchronous chains can be brittle. If one downstream service is slow, every caller waits. Queues let work be processed asynchronously and smooth out bursts. They also decouple producers from consumers so each can scale, deploy, and fail independently.

Queue vs Pub-Sub

Queue: one worker processes each message, used for task distribution and work offloading
Pub-sub: multiple subscribers each receive a copy, used for event broadcasting and fan-out

Common Messaging Systems

Kafka: high-throughput, durable log with replay support, ideal for event streaming at scale
RabbitMQ: flexible routing with exchanges and bindings, good for task queues and RPC patterns
AWS SQS: fully managed queue with at-least-once delivery and visibility timeouts
AWS SNS: managed pub-sub for fan-out to multiple endpoints or queues
Google Pub/Sub: managed pub-sub with strong ordering and replay support

Key Concerns

Ordering guarantees: most queues offer best-effort ordering, Kafka guarantees order within a partition
Retries and dead-letter queues: failed messages are retried up to a limit, then moved to a DLQ for inspection
Backpressure: consumers signal they are overwhelmed so producers slow down or buffer
Retention and replay: Kafka retains messages for a configurable window, allowing consumers to reprocess past events
Idempotent consumers: processing the same message twice must produce the same result as processing it once

Delivery Guarantees

At-most-once: messages may be lost but are never redelivered, lowest latency
At-least-once: messages are never lost but may be delivered more than once, most common default
Exactly-once: guaranteed single delivery with no loss, hardest to implement and highest overhead

Dead-Letter Queues

When a message fails processing repeatedly it is moved to a dead-letter queue instead of blocking the main queue. DLQs allow engineers to inspect, debug, and replay failed messages without losing them. Every production queue should have a DLQ configured.

Idempotency

Because at-least-once delivery is the norm, consumers must handle duplicate messages safely. Common approaches include tracking processed message IDs in a database, using upsert operations instead of inserts, and designing state transitions that are safe to apply multiple times.

Backpressure

When consumers fall behind, unbounded queues accumulate messages and exhaust memory. Backpressure mechanisms signal producers to slow down or pause. In Kafka this is handled by consumer lag monitoring and scaling consumers horizontally. In SQS, queue depth metrics trigger autoscaling policies.

Partitioning and Ordering

Kafka partitions topics across brokers and guarantees order only within a single partition. Choosing a partition key such as user ID or entity ID ensures all events for the same entity are processed in order by the same consumer, avoiding race conditions in downstream state machines.

Consumer Groups

Multiple consumers can form a group to share the work of processing a queue or topic. Each message is delivered to exactly one member of the group, allowing horizontal scaling of consumers. Adding consumers up to the number of partitions increases throughput linearly.

Message Schema and Versioning

Producers and consumers must agree on message format. Schema registries like Confluent Schema Registry enforce compatibility rules so producers cannot publish breaking changes that crash consumers. Versioning strategies include backward compatibility, forward compatibility, and full compatibility depending on deployment requirements.

Interview Tip

Say how you handle duplicates. Retries happen all the time in real systems. Also mention dead-letter queues proactively, it signals that you think about failure modes and operational visibility, not just the happy path.

← Back to all topics Practice questions →