RTP Timestamp Drift in RTSP Camera Streams: Clock Rate, Jitter, Audio Sync, and Frame Timing
How to diagnose RTP timestamp drift, wrong clock rate, jitter, audio/video sync problems, frame timing errors, and camera stream playback instability in RTSP sessions.
An RTSP camera stream can connect successfully, authenticate correctly, return valid SDP, send RTP packets, and still behave badly. The video may slowly fall behind real time. Audio and video may drift out of sync. Frames may arrive but play unevenly. A recorder may create files with strange duration. A player may show jitter, stutter, "non-monotonic timestamp", "invalid DTS", "RTP timestamp jump", or "clock rate mismatch" warnings.
Users search for "RTP timestamp drift", "RTSP camera audio video sync", "RTP clock rate wrong", "RTSP stream stutter timestamp", and "camera stream frame timing problem" when the network connection works but the media timeline does not.
This is exactly the kind of problem where a player-only test is too shallow. The player may hide the packet timeline behind buffering and decoding. RtspInspector is useful because RTP timing is protocol evidence: payload type, RTP timestamp, sequence number, marker bit, SDP clock rate, RTCP sender reports, jitter, and wall-clock mapping all matter.
RTP timestamps are not wall-clock timestamps
An RTP timestamp is a media clock value, not a Unix timestamp. For H.264 video, SDP often declares a 90 kHz clock:
a=rtpmap:96 H264/90000
That means RTP timestamp increments are measured in 1/90000 second units. If a camera sends 30 fps video, the timestamp usually increases by about 3000 per frame:
90000 / 30 = 3000
For 25 fps video, the increment is usually 3600:
90000 / 25 = 3600
Audio uses different clock rates. AAC may use 44100 or 48000. G.711 often uses 8000. If the client interprets the wrong clock rate, playback timing will drift even if every packet arrives.
SDP clock rate is the first clue
The SDP returned by RTSP DESCRIBE defines how dynamic payload types should be interpreted:
m=video 0 RTP/AVP 96
a=rtpmap:96 H264/90000
m=audio 0 RTP/AVP 97
a=rtpmap:97 MPEG4-GENERIC/48000/2
If the SDP says H.264 uses 90000, the client should interpret video RTP timestamps with that clock. If the SDP is missing, malformed, or inconsistent with payload behavior, the client may guess incorrectly.
Common SDP timing problems include:
- Missing
a=rtpmapfor dynamic payload type. - Wrong audio clock rate.
- Payload type reused inconsistently.
- Camera firmware declares 90000 but sends timestamp increments that do not match frame rate.
- AAC configuration does not match actual sample rate.
- Multiple tracks use confusing or duplicate control attributes.
RtspInspector should help preserve SDP beside the RTP evidence because the RTP timeline cannot be interpreted correctly without it.
Sequence number vs timestamp
RTP sequence numbers and timestamps answer different questions.
The sequence number helps detect packet loss and ordering:
- Did packet 1024 arrive?
- Did packet 1025 arrive?
- Did packet 1026 arrive before 1025?
- Are packets missing?
The RTP timestamp helps interpret media time:
- Which packets belong to the same video frame?
- How much media time passed between frames?
- Did the camera jump forward or backward?
- Is audio advancing at the expected rate?
- Does media time match wall-clock time?
A stream can have perfect sequence continuity and still have broken timestamps. It can also have some packet loss while timestamps remain otherwise consistent.
Marker bit and video frame boundaries
For many RTP video payloads, the marker bit indicates a frame boundary. With H.264, multiple RTP packets may carry fragments of one video frame. They share the same RTP timestamp, and the marker bit often appears on the last packet of the access unit.
If timestamps change too often, not often enough, or marker behavior is inconsistent, frame reconstruction can become unstable.
Symptoms include:
- Video stutter with no visible packet loss.
- Decoder receives incomplete frames.
- Recorder creates incorrect frame duration.
- Playback speeds up or slows down.
- Frame timestamps are non-monotonic.
This is why a diagnostic tool should show packet-level RTP metadata, not only decoded frames.
Audio/video sync drift
Audio and video sync depends on mapping each media track's RTP timestamp to a shared timebase. RTCP Sender Reports are often used for this. A Sender Report can map RTP timestamp to NTP time:
RTCP SR:
NTP timestamp: wall-clock reference
RTP timestamp: media timestamp at that reference
If RTCP Sender Reports are missing, inconsistent, or wrong, the client may have to infer timing. Some streams remain acceptable for short live viewing but drift during long recording.
Audio/video drift can happen when:
- Camera audio clock is inaccurate.
- Video RTP timestamps are generated from a different clock than audio.
- RTCP Sender Reports are absent or sparse.
- Network jitter causes buffering decisions that expose clock mismatch.
- SDP declares the wrong audio sample rate.
- The camera changes frame rate without consistent timestamp behavior.
The phrase "RTSP audio out of sync" often belongs to this category.
Timestamp jumps
RTP timestamps should generally move forward within a stream. Large jumps can happen after stream restart, camera firmware reset, encoder restart, or session discontinuity. But unexpected jumps during a continuous session can break recorders and analyzers.
Look for:
- Timestamp moving backward.
- Timestamp jumping forward by seconds or minutes.
- Sequence number continuous while timestamp jumps.
- Timestamp stable while sequence number increases for too long.
- RTCP Sender Report mapping changes abruptly.
Do not diagnose this from one packet. Measure before and after the jump and compare against expected frame rate.
Jitter is not the same as timestamp drift
Network jitter means packets arrive with variable delay. RTP timestamp drift means the media clock itself does not match expected timing or does not map correctly to wall-clock time.
A jitter buffer can hide network jitter. It cannot fully fix a camera generating wrong RTP timestamps for a long session.
Questions to separate them:
- Are RTP timestamps increasing at the expected media rate?
- Are arrival times uneven while RTP timestamps are correct?
- Are RTCP jitter values increasing?
- Does drift continue even on a clean LAN?
- Does the same stream drift in recordings as well as live playback?
If the issue appears on a clean local network with no packet loss, inspect RTP timestamp behavior before blaming bandwidth.
Checklist for RTP timestamp drift
Use this workflow:
- Capture SDP from RTSP
DESCRIBE. - Identify payload types and clock rates.
- Track RTP sequence numbers for loss and reordering.
- Track RTP timestamp increments per frame or audio packet.
- Compare increments against expected frame rate or sample rate.
- Inspect marker bits for video frame boundaries.
- Inspect RTCP Sender Reports for RTP-to-wall-clock mapping.
- Check whether audio and video tracks drift at different rates.
- Compare UDP and TCP interleaved only after timing evidence is clear.
- Test a short stream and a long stream; drift often needs time to show.
What to include in a useful report
If you need to report the issue to a camera vendor, include:
- RTSP URL pattern without credentials.
- SDP media sections.
- Payload type and
rtpmapvalues. - Expected frame rate and configured codec.
- RTP timestamp increment samples.
- RTCP Sender Report samples.
- Time when drift becomes visible.
- Whether packet loss is present.
- Whether the issue occurs on LAN.
This gives the vendor a media-timing defect to investigate instead of a vague "video stutters" complaint.
Final diagnosis
RTP timestamp drift is a media clock problem, not simply a connection problem. The stream may authenticate, describe, set up, play, and deliver packets while still producing a broken timeline. The useful evidence is SDP clock rate, RTP timestamp increments, sequence continuity, marker bits, RTCP Sender Reports, jitter, and long-session drift.
RtspInspector is designed for this layer-by-layer diagnosis: prove what the camera sends before assuming the player, network, or decoder is responsible.