Product Vision & Architecture — Assembly Internal

1. Product Overview

1.1 The Paradigm Shift

The software industry has operated under a deeply flawed assumption for decades: that software is the primary artifact of product development. Business requirements get translated into technical specifications, handed to engineers, built in isolation, and returned to business owners who often find it does not match their original vision.

Assembly challenges this at the root. Software becomes invisible infrastructure. The product — the outcome, the user experience, the business value — is all that matters.

The Core Idea

A group of humans — business owners, PMs, designers, customers — enter a live Assembly session and simply talk about what they want to build. AI agents listen, probe, orchestrate, and construct the product in real time. The software writes itself. No IDE, no git commands, no deployment scripts visible to any non-technical participant.

1.2 Vision

A world where anyone with a business idea can bring a working product to life through natural conversation — where the distance between imagination and a live, deployed application collapses to a single Assembly session.

1.3 Mission

To eliminate the gap between business imagination and working software by creating the world's first AI-powered collaborative Assembly platform — where multi-human, multi-agent live sessions replace the entire traditional software development lifecycle.

1.4 Key Design Principles

Product First: The product outcome is the only artifact anyone should see or care about. Software is a black box.
Voice First: Natural speech is the primary input mode. Typing is optional; clicking is delegated to agents.
Session as Lifecycle: One evergreen session link is the entire life of a product — from ideation through production monitoring.
Timeline Not Versioning: Instead of branches and merges, Assembly uses an immutable time-based state log. Any state can be restored by navigating to a point in time.
Web Native: Assembly targets web applications (JavaScript/HTML DOM) exclusively in v1. Mobile, game engines, and embedded hardware are out of scope.
Leased Intelligence: Assembly does not train models. It maximizes value from frontier model subscriptions (Claude, GPT, Gemini) through expert agentic orchestration — which is the core IP.

2. Critical User Journeys (CUJs)

The following seven journeys represent the complete lifecycle of a product built on Assembly. Each CUJ maps to one or more technical modules and product phases.

CUJ-1: Initiating a Product Assembly Session

Who: Business Owner, Product Manager, invited collaborators. Trigger: a business owner wants to build a new product or feature.

User creates a new Assembly session — receives an evergreen session link
User assigns participant roles and sends invitations
AI agents auto-join based on session context (PM Agent, UX Agent, Tech Agent active by default)
Session begins — Phase Barometer shows: Ideation

Success: all participants in session, agents listening, Phase Barometer visible.

CUJ-2: AI-Guided Ideation & Requirements Capture

Humans converse freely about their product idea. AI agents listen passively and extract structured product requirements in real time.

PM Agent: extracts product type, critical user journeys, pain points, feature priorities, success metrics, and risks
UX Agent: identifies user flows, accessibility needs, friction points, and delight opportunities
Tech Agent: notes infrastructure needs, integration risks, and module boundaries
Turn-Taking Engine: when conversation is one-sided (e.g., only UX discussed), the relevant agent interjects with a probing question
Phase Barometer: updates in real time showing readiness percentage toward prototype threshold

Key Differentiator

Unlike any existing tool, multiple AI agents participate simultaneously as distinct expert personas — each with a different lens on the product — coordinated by the Agentic Orchestration Engine without interrupting each other or talking over humans.

CUJ-3: Live Prototype Generation

When the Phase Barometer crosses the prototype-ready threshold, the product materializes in front of the assembly while the conversation is still happening.

Agents synthesize all captured requirements into a structured product spec
Code Generation and UI/UX Generation models generate the web application code
Prototype Renderer embeds a live browser instance within the Assembly session
Participants see a working, interactive prototype rendered in real time — not a mockup
App Agent (Playwright) demonstrates the prototype by navigating it on screen
Phase Barometer advances to: Prototype

CUJ-4: Collaborative Feedback Loop

Participants react to the prototype verbally. The session becomes a feedback-driven iteration loop.

Participant says: "This button should be blue, not green" — UI Agent applies the change
Participant shares their screen highlighting a section — Screen Annotator captures visual context
Multiple participants provide concurrent feedback — Turn-Taking Engine manages the floor
App Agent clicks through the app demonstrating changes on screen
Timeline Manager records each state as an immutable checkpoint
Phase Barometer can advance to Enhancement or retreat to Feedback

CUJ-5: Milestone Progression — Prototype to Deployment

When the assembly is satisfied, the product advances through deployment phases. Environment Agent personas activate.

Environment Persona	Interaction Focus
Dev Agent	New feature requests, bug investigation, code changes in development environment
UAT Agent	Quality testing scenarios, user acceptance criteria, staging validation, edge cases
Prod Agent	Daily active users, geographic analytics, error rates, support tickets, SLA monitoring

Participants naturally direct their conversation to the appropriate environment agent. The system routes queries automatically based on intent — no participant needs to know which agent to address.

CUJ-6: Ongoing Product Lifecycle via Evergreen Session

The session link is permanent. Any authorized participant can join at any time throughout the product's entire life.

Product Manager joins to check metrics — Prod Agent surfaces DAU, errors, ticket trends
Business owner returns after a month — sees current live state, reviews Timeline for what changed
Security vulnerability detected — Assembly session drives the patch through Dev → UAT → Prod
Year-end enhancement planning — new features discussed and built in the same session
Retirement of legacy features — managed through the same interface that created them

The Evergreen Promise

The Assembly session link is not a meeting link. It is the product itself. Every state the product has ever been in is accessible through the Timeline. The product's entire life — from first idea to decommission — lives in one place.

CUJ-7: Asynchronous Stakeholder Catch-Up

Stakeholders who were absent from a session can rejoin at any time and see the current state. No manual catch-up documents, no review meetings.

Stakeholder opens evergreen link — sees current product state and Phase Barometer
Reviews Timeline: "what changed since I was last here" — agents summarize delta
Provides new direction, raises concerns, or approves current state verbally
Their contributions are captured and attributed in the Timeline

3. Solution Architecture

The Assembly platform is organized into six architectural layers. The Agentic Orchestration Engine is the core IP. All other layers either feed into it or execute its instructions.

3.1 Architecture Overview

LAYER 1ParticipantsBusiness Owner, PM, Designer, Stakeholder, Beta Customer

LAYER 2Session EngineStreaming, voice, screen share, IAM, speaker diarization

★ LAYER 3Agentic Orchestration Engine (Core IP)Context Router, Phase Barometer, Timeline Manager, Turn-Taking, Notifications

LAYER 4AI Agent PersonasPM, UX, UI, Tech Architect, Dev, UAT, Prod agents

LAYER 5Frontier Models + Prototype EngineCode Gen, UI/UX Gen, Voice Model, Prototype Renderer, App Agent (Playwright)

LAYER 6Data FoundationProduct Data Model, Session Store, State Snapshots, Artifact Store, Analytics

3.2 Data Flow

Voice and screen input from participants flows into the Session Engine, which converts it to structured context (speaker-attributed text with role metadata). The Agentic Orchestration Engine receives this context stream and routes it to the appropriate agent personas. Agents process their domain slice, collaborate with peer agents, and instruct the Frontier Model layer to generate or modify code. The Prototype Renderer reflects changes in the live browser. All state changes are persisted to the Data Foundation with immutable Timeline snapshots.

The loop is continuous: agent outputs return to the session (via voice/text response or prototype update), participants react, new context flows in. No step is manual. No deployment script is visible.

3.3 The Agentic Orchestration Engine (Core IP)

This is what Assembly will patent, protect, and continuously improve. It is not a chatbot or a copilot. It is a multi-agent coordination system with the following responsibilities:

Context Routing: parses incoming context (voice, annotated screen, chat) and determines which agent(s) should process it and in what order
Phase Management: maintains the readiness model across 6 phases and triggers transitions automatically when threshold conditions are met
Turn-Taking: manages the conversation floor — prevents agents from interrupting humans, arbitrates between agents, enables probing questions at the right moment
Agent Orchestration: dispatches tasks to agents, collects their outputs, merges conflicting inputs (e.g., two participants disagree), and produces a unified instruction to the model layer
Timeline Custody: writes every state transition as an immutable event to the Timeline — the single source of truth for the entire product history

4. Technical Modules

The following 15 modules constitute the complete technical implementation of Assembly. Each module is annotated with the development milestone in which it is first introduced.

#	Module	Description & Responsibility	Milestone
M1	Assembly Session Engine	WebRTC/WebSocket-based real-time session management. Multi-participant audio/video/chat streams, session lifecycle, evergreen link infrastructure.	M-1
M2	Multi-modal Input Processor	Captures and normalizes voice (Whisper/Deepgram), screen share, and text chat into a unified, structured context object per utterance or event.	M-1
M3	Speaker Diarizer & Role Mapper	Identifies who is speaking via voice fingerprinting and maps each speaker to their assigned role. Enables agents to weight input by persona.	M-2
M4	Agentic Orchestration Engine	Core IP. Context routing, phase management, turn-taking arbitration, agent dispatching and output merging, Timeline custody. The central nervous system.	M-2
M5	Phase Barometer & Transition Mgr	Computes readiness score across 6 product phases using a weighted signal model. Triggers environment agent activation at deployment threshold.	M-2
M6	AI Agent Persona Framework	Plugin-style framework for defining, instantiating, and managing AI agent personas — domain scope, prompt strategy, probing question library, output schema.	M-2
M7	Environment Agent Personas	Dev Agent (features/bugs), UAT Agent (quality/testing), Prod Agent (metrics/incidents). Context routing directs queries to the right environment automatically.	M-3
M8	Code Generation Engine	Interfaces with frontier code models (Claude API, OpenAI) to generate and iteratively refine web application code from structured specs.	M-2
M9	UI/UX Generation Engine	Interfaces with design-capable frontier models to generate UI components and design-system-compliant output. Web (DOM) only.	M-2
M10	Voice Model Integration	STT (Whisper/Deepgram) for real-time transcription and intent extraction; TTS (ElevenLabs/OpenAI) for agent voice responses.	M-1
M11	Prototype Renderer & App Agent	Embeds a live browser in the session (Playwright/cloud browser). App Agent navigates the prototype on behalf of participants, streaming to all members.	M-2
M12	Product Development Data Model	Core entity graph: Product, Session, Phase, Feature, UserJourney, Component, AgentDecision, TimelineEvent, Participant. The schema everything reads and writes.	M-1
M13	Timeline Manager	Append-only event log of every state transition. Point-in-time restore ("show me February 2025"). Single timeline, immutable history.	M-2
M14	IAM & Session Control	Role-based access control for participants and agents. Evergreen link permissions, invitations, agent configuration per session.	M-1
M15	Analytics & Observability	Production metrics ingestion (DAU, errors, geo, tickets) for the Prod Agent; platform observability (agent latency, model costs, session health).	M-3

4.1 Screen Annotation Processor (Supporting Module)

When a participant shares their screen, this sub-module captures the stream, processes highlighted or pointed-at regions using computer vision, and generates structured visual context (e.g., "region: top navigation bar, instruction: change background to navy"). This feeds the UI Agent without requiring participants to type descriptions.

4.2 Notification & Reconvene Coordinator (Supporting Module)

When the Agentic Engine completes a significant background task (prototype generated, vulnerability patched, UAT completed), it triggers the Notification Coordinator. This module sends calendar invites to the assembly group, emails, and in-app notifications — automatically scheduling the next review session so no human has to manage the iteration cycle.

5. Iterative Milestones (Crawl → Walk → Run → Fly)

The product will be built iteratively. No throwaway work — each milestone is a foundation for the next. The goal is a delightful, demo-ready product for the first 5 beta customers (VC adjacents) by Milestone 4.

M0 · CRAWLWeeks 1–4

Manual Proof of Concept

No custom infrastructure. Humans meet on Google Meet, record, export the transcript, and manually upload to a frontier model with a structured product-extraction prompt. Model generates product type, CUJs, modules, and an initial HTML/JS prototype. Success: at least one prototype generated from a real conversation in under 2 hours.

M1 · EARLY WALKMonths 1–2

Live Listener Plugin

A Read.ai-equivalent meeting plugin listens live and auto-triggers the prototype pipeline at meeting end; the generated prototype deploys to a URL and a calendar invite is auto-sent. Modules: M1 (partial), M2, M10, M12 (schema), M14 (basic). Success: session → prototype → invite in under 15 minutes post-meeting.

M2 · WALKMonths 2–4

Interactive Prototype Session

Assembly's own session interface with a single orchestrator agent: listens, extracts, generates the prototype in real time. Phase Barometer UI visible; Timeline activated; App Agent demonstrates in-session. Modules: M1, M2, M4, M5, M8, M9, M11, M12, M13, M14. Success: 5 internal test sessions produce working prototypes without any human-written code.

M3 · RUNMonths 4–6

Multi-Agent Assembly

PM, UX, UI, and Tech Architect agents activate as distinct session participants. Speaker Diarizer maps voices to roles; Turn-Taking prevents agent pile-ons; agents collaborate behind the scenes. Modules: M3, M6 (full), M7 (partial), Screen Annotator. Success: a beta customer says "this is unbelievably different from anything I have seen."

M4 · FLYMonths 6–9

Environment Agents + Full Lifecycle

Dev / UAT / Prod personas fully active with an automated deployment pipeline. Prod Agent surfaces real production metrics; vulnerability management and enhancement planning run through the same session; Notification Coordinator fully operational. Modules: M7 (full), M15, Notification Coordinator. Success: the first customer's product goes live and is monitored entirely through Assembly.

6. Open Questions & Research Priorities

Open Question	Research Action
Multi-human to single/multi-agent voice management — who has the floor, how does the system arbitrate?	Research OpenAI Realtime API, Google Project Astra, production apps using multi-party voice AI. Identify underlying architecture.
Shared browser interaction — can multiple users interact with the same live DOM simultaneously?	Decided: App Agent mediates all clicks. Participants direct the agent by voice. Direct multi-user DOM interaction is not in scope for v1.
Agent-to-agent collaboration model — how do PM Agent and UX Agent coordinate without user-visible noise?	Design a structured inter-agent message protocol within the Orchestration Engine. Agent-to-agent communication is invisible to participants.
Model cost management — frontier model token costs at scale could be prohibitive.	Design smart context compression. Cache model outputs. Use tiered models (cheaper for simple tasks, frontier for complex generation).