1. Studio Architecture
1.1 Physical Workstation (On-Prem Core)
Compute tiers
Studio Tower S (single GPU): RTX 4090/RTX 6000 Ada | 128 GB RAM | 16–28 core CPU | 4 TB NVMe | whisper/ASR + vLLM serving | ideal for prototyping + small finetunes.
Studio Tower M (multi-GPU): 2× RTX 6000 Ada or 2× A6000 | 256 GB RAM | 8–12 TB NVMe scratch + 20 TB NAS | local vector DB | parallel RAG + toolformer experiments.
Studio Rack E (Enterprise): 4–8× H100/A100 | 512 GB–1 TB RAM | 50–200 TB NAS w/ RAID | on-prem model zoo (Llama-3.x, Mistral, Mixtral, Command-R+, Whisper large-v3) | Triton/TGI/vLLM.
I/O & Multimodal Prompting Console
Text & Code: low-latency mechanical keyboards + macro pads (pre-mapped to agent commands).
Audio: studio interface (USB-C/Thunderbolt), XLR mic, preamp, soundboard (for voice prompting, “barge-in” interrupts, and real-time TTS monitoring).
Vision: 4K overhead camera for whiteboard/sketch capture; doc scanner.
Displays: 2–4 monitors: terminals & logs | LLM chat & eval dashboards | product previews | context pack viewer.
Networking & Storage
10GbE switch, VLANs for dev/prod, jump host for remote vendor access.
ZFS/TrueNAS for high-throughput scratch; WORM (write once, read many) for master artifacts.
Secrets vault (e.g., HashiCorp Vault), HSM/TPM for signing master releases.
1.2 Software Stack (Studio OS)
Base OS: Linux (Ubuntu LTS) hardened; Windows dual-boot optional for design tools.
Containerization: Docker/Podman; compose stacks per project; optional k8s on Studio Rack E.
Model Serving: vLLM, Text Generation Inference (TGI), Faster-Whisper; Triton for multi-modal.
Orchestration: LangGraph-style DAGs, AutoGen-style conversants, Crew-like role agents.
RAG: pgvector/Postgres for on-prem; optional Milvus/Weaviate; document loaders (PDF, HTML, DOCX), chunkers, metadata taxonomy.
Eval & Telemetry:
Functional: unit prompts, tool-use traces, golden answers.
Grounding: source-citation checks, factuality (answer-to-source overlap).
Safety: jailbreak tests, PII leakage, brand guidelines.
Ops: latency, token cost, throughput, GPU utilization.
Tools: MLflow + Weights & Biases, Prometheus/Grafana, Sentry.
Security: SSO/OIDC, RBAC, IP allowlists; egress controls; audit logs; DLP filters.
Dev Tooling: VS Code + remote containers; pre-built notebooks; API gateway for frontends.
Last updated