WebCodecs and the End of the Black Box

For most of the web's history, video in the browser worked one way: you handed a URL to a <video> element and the browser handled everything. It fetched the bytes, demuxed the container, decoded the frames, composited them onto the page. The pipeline was completely opaque. If you needed to do anything outside it — process frames before display, encode video in the browser, build a player with custom buffering — you were fighting the platform rather than using it.

The WebCodecs API opens that up. It's a low-level interface to the browser's codec infrastructure — the same H.264 and VP9 decoders <video> uses internally, now reachable from JavaScript. You get individual frames. You control the decode pipeline. You can feed encoded chunks in any order, implement your own buffering, read pixel data off frames before they're composited, encode camera input to a compressed stream without routing through an intermediate canvas.

The API shipped in Chrome 94 in 2021 and landed in Firefox at version 130. It's not meant for ordinary media playback — <video> still does that better with less code. It's for applications that need to work below the abstraction: browser-based video editors, conferencing tools that need frame-level access for effects or background replacement, streaming platforms managing their own adaptive bitrate logic, and anything doing real-time analysis or transformation of video in the browser.

The core objects are VideoDecoder, VideoEncoder, AudioDecoder, and AudioEncoder. Each takes a configuration — codec string, image dimensions, bitrate, keyframe interval — and two callbacks: one for output chunks or frames, one for errors. For decoding, you construct the decoder with those callbacks, call configure() with the codec parameters, then feed it EncodedVideoChunk objects. Each chunk carries the encoded bytes, a timestamp in microseconds, a duration, and a type field that's either "key" for keyframes or "delta" for everything else. The decoder calls your output callback with a VideoFrame for each one it successfully decodes.

The VideoFrame you get back carries a timestamp and duration matching what you put in, a format field describing the pixel layout (usually I420 or NV12 for YUV, RGBA for others), and separate codedWidth/codedHeight and displayWidth/displayHeight fields — those can differ if the encoder used crop offsets or non-square pixels. You read pixel data with copyTo(), passing a TypedArray and an optional layout descriptor to control stride. One thing that catches people: call close() on frames when you're done with them. WebCodecs doesn't garbage-collect them automatically. Let enough pile up and the decoder's internal queue fills, which shows up as opaque errors rather than anything helpful.

One thing that trips people up: flush(). When you're done feeding chunks to a decoder, you call flush(), which returns a Promise that resolves when all pending output has been delivered. If you're seeking or switching streams, you call reset() instead, which discards queued input and output immediately. The distinction matters because decoders buffer several frames internally for reference — a decoder won't necessarily output a frame the moment it receives the corresponding chunk. Without flush(), you can end up missing the last few frames of a clip.

Decoding video in JavaScript — parsing compressed bytes, implementing motion compensation, reconstructing frames — was never practical beyond low resolutions and frame rates. That's the ceiling that WebCodecs removes. It hands decode work to the same native decoder the browser uses for <video>, with hardware acceleration where the device supports it. A browser-based editor can pull 4K footage at full frame rate on hardware where pure JS couldn't have gotten close.

Pair WebCodecs with the File System Access API and WebAssembly and you have enough to build tools that used to require a native app or a server. Demux the container with MP4Box.js or ffmpeg compiled to WASM, feed the encoded chunks to WebCodecs, process the output frames in a WASM module, write results back to disk via File System Access — no upload, no round-trip, no server involved. That stack didn't exist in a usable form two years ago.

It's not easy to implement. The API is deliberately low-level, and there's real plumbing work involved: handling keyframe dependencies when seeking (you can't decode a delta frame without its reference frames), managing decoder state across stream switches, keeping encode and decode in sync during transcoding. The error handling is also more demanding than higher-level APIs — you need to handle both synchronous configuration errors and asynchronous decode errors in the error callback. But the pieces are there. What you can build in a browser without a backend has moved considerably.