2026-06-02

TCP Keepalive and Idle Timeout PCAP Analysis: Firewalls, NAT, Load Balancers, and Long-Lived Connections

How to analyze TCP keepalive packets, idle timeout, NAT session expiry, firewall connection drops, load balancer resets, long-lived API connections, and packet capture evidence.

tcp keepalive, idle timeout, firewall timeout, nat timeout, load balancer reset, long lived connection, pcap analysis

Long-lived TCP connections can fail after minutes or hours of inactivity. SSH sessions freeze. database connections reset. WebSocket connections drop. RTSP TCP interleaved streams stop after idle periods. API clients see broken pipe. Users search for "TCP keepalive pcap", "firewall idle timeout", "NAT session timeout", "load balancer reset idle connection", and "long-lived TCP connection drops" because the application error often appears long after the real timeout decision.

PCAP Surgery is useful because idle-timeout investigations depend on timing. You need the last real data packet, any TCP keepalive probes, ACKs, FIN/RST packets, and the exact idle duration.

What TCP keepalive is

TCP keepalive is an optional mechanism that sends small probes on an idle connection to check whether the peer is still reachable. It can also keep NAT and firewall state alive if probes occur more frequently than the middlebox timeout.

But defaults are often too slow for modern infrastructure. A firewall may expire idle state after 60 seconds while OS TCP keepalive may start much later.

Idle timeout symptoms

Common symptoms include:

Connection works, then fails after a fixed idle time.
First request after idle gets reset.
WebSocket disconnects after exactly 60 seconds.
Database pool has stale connections.
SSH freezes through NAT.
Load balancer sends RST after timeout.
Client sends data after idle and receives no response.

Exact timing is the clue.

FIN vs RST vs silent drop

Middleboxes and endpoints can close idle connections in different ways:

FIN: graceful close.
RST: abortive close.
Silent drop: no packet; later traffic is ignored.

If a firewall silently drops state, both endpoints may think the connection still exists. The next data packet triggers retransmissions or reset behavior.

Keepalive evidence

In a trace, look for small packets during idle periods. TCP keepalive probes often use sequence numbers just before the next expected byte. An analyzer may label them as keepalive.

Questions:

Were keepalives sent?
How often?
Did the peer ACK them?
Did a middlebox reset after a keepalive?
Did probes start too late?
Did the connection die before keepalive interval?

Load balancers and proxies

Load balancers often enforce idle timeouts. If the client expects a connection to survive 30 minutes but the load balancer closes idle connections after 60 seconds, the application must send heartbeats or reconnect.

Packet evidence can show who sent the close or reset and how long after last data.

Checklist

Use this workflow:

Identify the long-lived TCP connection.
Mark the last application data packet.
Measure idle time before failure.
Look for TCP keepalive probes.
Check whether probes are ACKed.
Identify FIN or RST sender.
If no close appears, look for silent drop and retransmissions.
Compare timeout to firewall/load-balancer settings.
Preserve timing when trimming.
Correlate with application heartbeats.

Final diagnosis

TCP idle failures are timing problems. The packet evidence can distinguish endpoint close, firewall/NAT state expiry, load balancer timeout, missing keepalive, too-slow keepalive, and silent drop.

PCAP Surgery helps preserve the idle interval and close/reset evidence so long-lived connection failures can be explained precisely.