Skip to main content

WebSocket Protocol Basics: Full-Duplex Guide

The WebSocket protocol is a bidirectional communication standard built on top of TCP that enables servers and clients to exchange messages in real time, persisting a single connection instead of opening new ones for each request. Defined in RFC 6455 (2011, IETF), WebSockets upgrade an initial HTTP connection to a stateful socket, allowing either side to push data without polling. This is fundamentally different from HTTP, where the client always initiates; with WebSockets, a server can broadcast to 100,000 connected clients in milliseconds.

How WebSockets Differ From HTTP Polling

HTTP is a request-response protocol: the client sends, the server responds, the connection closes. To simulate real-time updates (e.g., a stock price), you'd poll the server every 500ms, wasting bandwidth and introducing latency. WebSockets establish one persistent connection; both sides can send messages anytime. A stock ticker updates via a single WebSocket message to all connected traders instantly, versus 600 redundant HTTP requests per minute per client in a polling model.

Traditional HTTP polling also carries protocol overhead: each request includes headers (200–1,000 bytes), then the response does again. WebSocket frames are minimal: typically 2–14 bytes of overhead after the initial handshake, reducing payload bloat by 50–100x for frequent, small messages.

The WebSocket Handshake: HTTP Upgrade

A WebSocket connection begins with a standard HTTP upgrade request. The client sends:

GET /chat HTTP/1.1
Host: example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13

The key Sec-WebSocket-Key is a random 16-byte value (Base64-encoded) that proves the handshake is legitimate. The server responds:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=

The Sec-WebSocket-Accept is derived from the client's key using SHA-1 and Base64, proving the server understood the upgrade. After this exchange, the TCP connection persists, and both sides switch to WebSocket framing.

WebSocket Frame Format

Every message sent over WebSocket is wrapped in a frame. A frame header includes:

  • FIN (1 bit): Marks the final fragment of a message (allows multi-frame messages).
  • Opcode (4 bits): Message type — 0x1 for text, 0x2 for binary, 0x8 for close, 0x9 for ping, 0xA for pong.
  • MASK (1 bit): If set, the frame payload is XOR-masked (required for client-to-server frames for security).
  • Payload length (7 bits or 7+16 or 7+64 bits): Determines frame size; if payload exceeds 125 bytes, extended length fields encode the true size.
  • Masking key (4 bytes, if MASK=1): Used to XOR the payload.
  • Payload data: The actual message.

For a 5-byte text message "Hello" from client to server, the frame is roughly 14 bytes: 2 header + 4 mask key + 5 payload + 3 length field. An equivalent HTTP POST would be 300+ bytes with headers.

Opcode Types and Message Flow

OpcodeTypeDirectionPurpose
0x0ContinuationBothContinues a fragmented message
0x1TextBothUTF-8 encoded text (chat, JSON)
0x2BinaryBothRaw bytes (images, protobuf)
0x8CloseBothInitiates connection closure
0x9PingBothKeepalive probe (server/client sends)
0xAPongBothResponse to Ping

A ping frame sent every 30 seconds keeps the connection alive across proxies and firewalls that might drop idle sockets. The receiver automatically responds with a pong (most WebSocket libraries handle this transparently).

State Transitions and Connection Lifecycle

A WebSocket moves through states:

  1. CONNECTING: Initial HTTP upgrade request in flight.
  2. OPEN: Upgrade succeeded; data frames exchanged.
  3. CLOSING: Close frame sent by one side; awaiting the other to respond.
  4. CLOSED: Both sides closed; connection terminated.

When either side decides to close, it sends a close frame with an optional status code (1000 = normal closure, 1001 = going away, 1011 = server error). The receiver must respond with a close frame before the socket fully closes.

Why WebSocket Over Raw TCP?

You might ask: why add WebSocket framing over TCP? Raw TCP sockets work, but lack standardization. WebSocket provides:

  • Multiplexing awareness: Proxies and firewalls recognize WebSocket upgrade and pass it through (HTTP upgrade is a standard they expect).
  • Message boundaries: TCP is a byte stream; WebSocket frames delimit messages, so you know where one ends and another begins.
  • Close semantics: Graceful shutdown protocol, not abrupt disconnection.
  • Subprotocol negotiation: The handshake allows client and server to agree on an application-level protocol (e.g., STOMP for messaging).
  • Cross-origin security: The handshake requires origin headers, enabling CORS-style security for WebSocket.

Key Takeaways

  • WebSocket is a stateful, bidirectional protocol that persists a single TCP connection, reducing latency and overhead versus HTTP polling.
  • The handshake is HTTP-based; after the 101 Switching Protocols response, the connection switches to WebSocket framing.
  • Frames carry minimal overhead (2–14 bytes) compared to HTTP (200–1,000 bytes), enabling efficient real-time messaging.
  • Opcodes (text, binary, close, ping, pong) govern message type and connection lifecycle.
  • Servers can push to clients without polling, enabling truly responsive real-time experiences.

Frequently Asked Questions

What is the difference between ws:// and wss://?

ws:// is plaintext WebSocket (equivalent to HTTP); wss:// is TLS-encrypted (equivalent to HTTPS). Always use wss:// in production to prevent message interception. Proxies and firewalls are configured to expect this pattern.

Can a single TCP connection carry multiple independent WebSocket connections?

No. Each WebSocket connection is bound to one underlying TCP socket. Multiplexing multiple logical channels over one WebSocket is possible at the application layer (e.g., prefixing messages with a channel ID), but the WebSocket protocol itself does not support it.

Why is the client-to-server payload masked?

Masking prevents cache poisoning attacks. Proxies between client and server might cache responses if a request looks like HTTP GET. Masking the payload ensures proxies treat the frame as non-cacheable, closing a security hole in the HTTP upgrade handshake.

How often should I send ping frames?

Every 20–30 seconds is typical. Most cloud proxies (AWS ALB, Cloudflare) drop idle connections after 30–60 seconds, so a ping every 25 seconds keeps the connection fresh without excessive overhead.

Further Reading