In November 2008, Springer-Verlag published SMIL 3.0: Interactive Multimedia for the Web, Mobile Devices and Daisy Talking Books, a book this site was built to accompany. SMIL, pronounced "smile," stood for Synchronized Multimedia Integration Language — a W3C standard for describing time-coordinated media presentations in XML. Play this video, overlay this text at 00:30, branch to the Spanish audio track, pause if the viewport drops below 400 pixels. Declarative, structured, accessible by default.
The W3C had been working on SMIL since 1998. Version 3.0, finalized in 2008, was the fullest statement of the idea: a markup language where time was a first-class dimension, the same way HTML treats document structure. A SMIL file described a presentation the way a screenplay describes a film — timing relationships between elements, conditions for branching, accessibility metadata that let screen readers and talking-book players navigate non-linearly.
To get a sense of what that looked like in practice: a SMIL document used a <body> element containing a <seq> (sequence) or <par> (parallel) to arrange media objects in time. A <seq> played its children one after another; a <par> played them simultaneously. You'd nest these to build complex timing structures. A video element with begin="5s" and dur="10s" started five seconds into the presentation and ran for ten. A <switch> element let you declare multiple alternatives — different audio tracks, different subtitle files, different image resolutions — and the player picked the one that matched the system's language, bandwidth, or screen size. The whole thing was written in XML and validated against a schema. It was verbose, but it was precise in a way that HTML has never been about time.
The Ambulant player, built at CWI Amsterdam where Dick Bulterman ran the distributed and interactive systems group, was the most complete open-source SMIL implementation. It ran on Windows, Mac, and Linux, covered the full SMIL 2.1 feature set and most of 3.0, and handled the layout, timing, and media fetching that a full SMIL runtime requires. Serious engineering. It never found much of an audience, though, because SMIL never did either.
The browser vendors didn't adopt it. Firefox had partial SMIL support for a while — enough to drive SVG animations — but it was never a priority. Chrome deprecated that SVG animation support in 2015, then reversed course after pushback from developers who relied on it for SVG animation without JavaScript. Internet Explorer ignored SMIL entirely. The mobile web cycled through Flash, then HTML5 video, then native apps, none of which had any use for a declarative time-coordination layer. RealNetworks had SMIL support in RealPlayer, which tells you roughly when the window closed.
SMIL lost, but the ideas didn't disappear — they scattered. The DAISY talking books format, one of SMIL's primary use cases in the 3.0 spec, is still the standard for accessible digital publishing used by libraries for the blind worldwide. DAISY 3 uses SMIL directly: each audio clip in a talking book is a SMIL <audio> element with a begin and end time keyed to a text span, so a playback device can synchronize highlighted text with narration and let a reader jump to any paragraph without losing their place in the audio. The National Library Service, Bookshare, and Learning Ally all distribute content in formats that descend from this work.
WebVTT, the subtitle and caption format baked into HTML5 video, carries SMIL's timing semantics in a simpler syntax — each cue has a start time, an end time, and a payload. The Web Animations API models animation as a sequence of keyframes on a timeline with a defined duration and fill mode, which is structurally what SMIL's animation elements were doing in SVG. The CSS working group's scroll-driven animations and the proposed Animation Worklet both treat time and progress as composable, controllable dimensions — again, the same problem.
SVG still has SMIL's animation elements — <animate>, <animateTransform>, <animateMotion> — and they work in every modern browser. Rarely used, mostly because CSS animations and JS libraries are more familiar. But the model is intact: timing, repetition, keyframes, synchronization to document events. SMIL's vocabulary, sitting in a corner of the spec.
SMIL 1.0 arrived in 1998, when broadband wasn't common and streaming video was experimental. SMIL 3.0 arrived in 2008, just as mobile was starting to pull the web stack apart and reassemble it differently. Neither version landed when the platform was positioned to run with it. The problems were real. The approach was defensible. The timing was just wrong, twice.
We've kept this domain because the questions outlast the answers. The web still hasn't fully solved time-based media, accessibility across formats, or coordination between streams. This is where we write about it.