White Paper

OTTO White Paper

A four-player, playback-only rhythm-generation instrument — played, not programmed. In four words: a live drum arrangement instrument.

← Back to OTTO

What OTTO Is

OTTO is a four-player, playback-only rhythm-generation instrument. It plays pre-existing MIDI patterns through kits, where a kit may use sampled voices, procedurally-synthesized voices, or a mix of both. Each of the four players is fully independent — its own mode, its own kit, its own pattern, its own timeline lane, its own mixer channel — and live performance is treated as a first-class use case alongside studio work. OTTO is both a standalone product and an embeddable engine: the same compiled code serves both, with the difference expressed only as configuration, never as a separate build.

The shortest description is four words: a live drum arrangement instrument. Live is why energy and tempo react in real time. Drum is why there is no audio recording or waveform editing and why the four player roles are all rhythm-section roles. Arrangement is why "pills" exist as the playback unit and why each player's lane is independent. Instrument is why OTTO is played rather than programmed.

The First Principles

  • Playback-only. OTTO does not record audio or capture MIDI, and it does not generate MIDI at runtime. The real-time audio thread reads pre-resolved configurations and pre-existing MIDI files and emits sound. Everything else is worker- or message-thread work.
  • Patterns are role-bearing playback units. A "pill" placed on a lane carries a pattern, a phrase role, and an energy level. Pills are the sole playback authority — no pill covering a lane means designed silence, with no fallback pattern.
  • Four independent players. Each player owns its mode, kit, pattern lane, mixer channel, and timeline. The four lanes are independent arrangements that happen to share a single transport clock.
  • Signed, transport-shaped time. Bar 1 is the first measure; the ruler skips 0 (a tape-machine convention), and negative bars are first-class positions. Pre-roll is simply the cursor sitting at bar -1.
  • Pro-audio convention is the silent tiebreaker. When a consumer shortcut and a pro-audio convention diverge, the convention wins.
  • An instrument, not a workstation. No waveform surgery, no track recording, no stitching of takes. The user's gesture vocabulary is musical.
  • One engine, two lives. The standalone product and the embedded-engine product are the same binary, differing only by configuration.

Modes, Styles, and the Library

A player can fill one of four roles: Drums, Percs (auxiliary percussion), Shakers, and Hands (hand percussion such as claps and snaps). Each mode owns its own catalog of Styles, and the four catalogs are deliberately disjoint — no Style name appears in more than one mode. A Style is a coherent pattern collection containing a fixed 36-file set: four section-tagged patterns (Intro, Verse, Chorus, Bridge) plus 32 fills. Cross-mode arrangements are assembled by the user, one Style per player, never enforced by the engine.

Internally, OTTO uses a single expanded articulation layout for all drum MIDI. The general drum-map convention exists only transiently at the import boundary; a translator stage converts imported material into the expanded layout before anything reaches the library. After import, the engine retains no general-map data and never converts back.

Sound Generation

Each player runs two parallel audio paths simultaneously: a sampler path that plays sample-mapping files referencing audio on disk, and a synth path that procedurally generates voices. A per-note mask decides, per articulation, whether each note dispatches to the sampler or the synth — a kit may send its kick to the sampler and its hat to the synth in the same moment. Drum kits are built from a large pool of single-element primitive definitions that an off-line tool composes into complete kits at content-build time; the runtime treats a composed kit as one cohesive unit.

Energy and Feel

Energy levels are a per-pill performance modifier that changes how a pattern is performed without altering the pattern file. Seven named levels — Asleep, Chilled, Relaxed, Normal, Alert, Excited, Energetic — each carry per-element velocity scaling, per-element timing offset, articulation choices, and a tempo-offset percentage. Crucially, this is dispatch-time transformation of events that already exist; no new MIDI is invented at play time. The seven slot definitions are remembered per song, so editing what "Excited" means in one song does not change it in another.

The Mixer and Output Routing

There is exactly one shared mixer, with channels allocated per mode rather than per player — two players in the same mode sum into the same element channels. The mixer resolves into a fixed table of stereo slots: banks of per-element channels for each mode, a set of effect returns, and per-player sub-buses, summing into a master bus that carries a transparent limiter on by default. The mixer is a steady-state signal graph whose topology is fixed at session load; the runtime touches parameter values, never routing topology. An output router then maps mixer outputs to physical output pairs using three strategies — a stereo sum, a per-player split, or a per-element split — without ever altering the underlying mixer table.

Threading and Real-Time Discipline

OTTO's concurrency is a contract among three threads meeting at well-defined seams. The audio thread fills the output buffer every block and may never allocate, lock, log, or perform file I/O. The message thread handles user gestures and orchestration. Worker threads handle disk I/O, parsing, import-time generation, and long compute. State crosses between threads only through three named mechanisms: an atomic snapshot for single values, a single-producer/single-consumer queue for event-shaped state, and a double-buffered configuration swap for broad changes such as a new kit or a re-routed mixer. When a real-time problem appears, it is fixed at the root rather than masked by a downstream workaround.

Timing Philosophy

Timing accuracy outranks cosmetic smoothness: a dropped UI frame is acceptable, a late transport event is not. Under externally-driven tempo, OTTO never silently overrides the external authority to smooth a discontinuity — an audible glitch is preferable to inaudible drift the listener cannot diagnose. Per-event dispatch is sample-accurate; configuration changes are block-accurate, taking effect at the next block boundary.

Tempo and Sync

OTTO recognizes several tempo sources: internal (OTTO owns transport), host-driven (the plugin host owns it), incoming or outgoing clock synchronization, and a peer-to-peer network tempo protocol. The user's chosen source is the authority, and the engine surfaces the consequence in the UI rather than overriding it. A global tempo, modulated by the lead player's energy offset, becomes the playback rate when OTTO owns transport; under an external authority, energy still shapes velocity and feel but not the playback rate.

Commercial Model

The full content bundle ships complete on every install regardless of tier; purchasing unlocks access to content already present rather than downloading new data. Locked content is refused at the picker and at load time — the audio thread never encounters it — so entitlement is invisible to playback. The interface keeps the same shape across tiers: locked controls render visibly disabled in place rather than hidden, and an upgrade prompt appears only when the user reaches for one. OTTO commits to no coercive upsell, no interruption of musical work, and no telemetry without consent.

← Back to OTTO