1. Studio Architecture

1.1 Physical Workstation (On-Prem Core)

  • Compute tiers

    • Studio Tower S (single GPU): RTX 4090/RTX 6000 Ada | 128 GB RAM | 16–28 core CPU | 4 TB NVMe | whisper/ASR + vLLM serving | ideal for prototyping + small finetunes.

    • Studio Tower M (multi-GPU): 2× RTX 6000 Ada or 2× A6000 | 256 GB RAM | 8–12 TB NVMe scratch + 20 TB NAS | local vector DB | parallel RAG + toolformer experiments.

    • Studio Rack E (Enterprise): 4–8× H100/A100 | 512 GB–1 TB RAM | 50–200 TB NAS w/ RAID | on-prem model zoo (Llama-3.x, Mistral, Mixtral, Command-R+, Whisper large-v3) | Triton/TGI/vLLM.

  • I/O & Multimodal Prompting Console

    • Text & Code: low-latency mechanical keyboards + macro pads (pre-mapped to agent commands).

    • Audio: studio interface (USB-C/Thunderbolt), XLR mic, preamp, soundboard (for voice prompting, “barge-in” interrupts, and real-time TTS monitoring).

    • Vision: 4K overhead camera for whiteboard/sketch capture; doc scanner.

    • Displays: 2–4 monitors: terminals & logs | LLM chat & eval dashboards | product previews | context pack viewer.

  • Networking & Storage

    • 10GbE switch, VLANs for dev/prod, jump host for remote vendor access.

    • ZFS/TrueNAS for high-throughput scratch; WORM (write once, read many) for master artifacts.

    • Secrets vault (e.g., HashiCorp Vault), HSM/TPM for signing master releases.

1.2 Software Stack (Studio OS)

  • Base OS: Linux (Ubuntu LTS) hardened; Windows dual-boot optional for design tools.

  • Containerization: Docker/Podman; compose stacks per project; optional k8s on Studio Rack E.

  • Model Serving: vLLM, Text Generation Inference (TGI), Faster-Whisper; Triton for multi-modal.

  • Orchestration: LangGraph-style DAGs, AutoGen-style conversants, Crew-like role agents.

  • RAG: pgvector/Postgres for on-prem; optional Milvus/Weaviate; document loaders (PDF, HTML, DOCX), chunkers, metadata taxonomy.

  • Eval & Telemetry:

    • Functional: unit prompts, tool-use traces, golden answers.

    • Grounding: source-citation checks, factuality (answer-to-source overlap).

    • Safety: jailbreak tests, PII leakage, brand guidelines.

    • Ops: latency, token cost, throughput, GPU utilization.

    • Tools: MLflow + Weights & Biases, Prometheus/Grafana, Sentry.

  • Security: SSO/OIDC, RBAC, IP allowlists; egress controls; audit logs; DLP filters.

  • Dev Tooling: VS Code + remote containers; pre-built notebooks; API gateway for frontends.

Last updated