From cd88ed29837ae4d97dd5cf1d25c1e39a0540b81f Mon Sep 17 00:00:00 2001
From: root <root@claude.deathstar-home.one>
Date: Mon, 30 Mar 2026 16:09:13 -0500
Subject: [PATCH] Revise architecture doc to reflect actual data pipeline
 (Spotify audio features + LLM)

---
 ARCHITECTURE.md | 140 +++++++++++++++++++++++++-----------------------
 1 file changed, 72 insertions(+), 68 deletions(-)

diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
index 98458f0..2bec387 100644
--- a/ARCHITECTURE.md
+++ b/ARCHITECTURE.md
@@ -1,22 +1,28 @@
-# Vynl - Audio Analysis Architecture
+# Vynl - Recommendation Architecture
 
-## Problem
+## Data Sources
 
-No LLM can actually listen to music. Text-based recommendations work from artist names, genre associations, and music critic knowledge — never from the actual sound. For genuine sonic analysis, we need a dedicated audio processing pipeline.
+### Spotify Audio Features API (already integrated)
+Pre-computed by Spotify for every track:
+- **Tempo** (BPM)
+- **Energy** (0.0–1.0, intensity/activity)
+- **Danceability** (0.0–1.0)
+- **Valence** (0.0–1.0, musical positivity)
+- **Acousticness** (0.0–1.0)
+- **Instrumentalness** (0.0–1.0)
+- **Key** and **Mode** (major/minor)
+- **Loudness** (dB)
+- **Speechiness** (0.0–1.0)
 
-## Audio Analysis: Essentia
+### Metadata (from Spotify + supplementary APIs)
+- Artist name, album, release date
+- Genres and tags
+- Popularity score
+- Related artists
 
-Essentia (open source, by Music Technology Group Barcelona) is the industry standard for music information retrieval. It analyzes actual audio and extracts:
-
-- Mood, genre, style classification
-- BPM, key, scale
-- Timbral descriptors (brightness, warmth, roughness)
-- Instrumentation detection
-- Song structure (verse/chorus/bridge)
-- Vocal characteristics
-- Audio embeddings for "this sounds like" similarity
-
-Free, self-hosted, used by Spotify/Pandora-type services under the hood.
+### Supplementary APIs (to add)
+- **MusicBrainz** — artist relationships, detailed genre/tag taxonomy, release info
+- **Last.fm** — similar artists, user-generated tags, listener overlap stats
 
 ## Recommendation Pipeline
 
@@ -24,66 +30,64 @@ Free, self-hosted, used by Spotify/Pandora-type services under the hood.
 User imports playlist
        │
        ▼
-Spotify preview clips (30s MP3s) ──→ Essentia (Celery worker)
-       │                               │
-       │                        Sonic fingerprint:
-       │                        tempo, key, timbre,
-       │                        mood, instrumentation
-       │                               │
-       ▼                               ▼
-    Metadata ──────────────────→ LLM (any cheap model)
-    (genres, tags, artist info)   combines sonic data
-                                  + music knowledge
-                                  → recommendations
-                                  + explanations
+Spotify API ──→ Track metadata + audio features
+       │
+       ▼
+Build taste profile:
+  - Genre distribution
+  - Average energy/danceability/valence/tempo
+  - Mood tendencies
+  - Sample artists and tracks
+       │
+       ▼
+LLM (cheap model) receives:
+  - Structured taste profile
+  - User's specific request/query
+  - List of tracks already in library (to exclude)
+       │
+       ▼
+Returns recommendations with
+"why you'll like this" explanations
 ```
 
-### Step 1: Audio Ingestion
-- Spotify provides 30-second preview clips as MP3 URLs for most tracks
-- On playlist import, queue preview downloads as Celery background tasks
-- Store clips temporarily for analysis, delete after processing
-
-### Step 2: Essentia Analysis
-- Runs as a Celery worker processing audio clips
-- Extracts per-track sonic fingerprint:
-  - **Rhythm**: BPM, beat strength, swing
-  - **Tonal**: key, scale, chord complexity
-  - **Timbre**: brightness, warmth, roughness, depth
-  - **Mood**: happy/sad, aggressive/relaxed, electronic/acoustic
-  - **Instrumentation**: detected instruments, vocal presence
-  - **Embeddings**: high-dimensional vector for similarity matching
-- Store fingerprints in the tracks table (JSON + vector column)
-
-### Step 3: Similarity Search
-- Use cosine similarity on audio embeddings to find "sounds like" matches
-- Query against a catalog of pre-analyzed tracks (build over time from all user imports)
-- Filter by user preferences (mood shift, era, underground level)
-
-### Step 4: LLM Explanation
-- Feed sonic data + metadata to a cheap LLM (Haiku, GPT-4o-mini, Gemini Flash)
-- The LLM's job is just natural language: turning structured sonic data into "why you'll like this" explanations
-- The intelligence is in the audio analysis, not the text generation
-
 ## Model Choice
 
-Since the LLM is reasoning over structured data (not doing the analysis), the cheapest model wins:
+The LLM reasons over structured audio feature data + metadata. It needs broad music knowledge but not heavy reasoning. Cheapest model wins:
 
-| Model | Cost (per 1M tokens) | Good enough? |
-|-------|---------------------|--------------|
-| Claude Haiku 4.5 | $0.25 input / $1.25 output | Yes — best value |
-| GPT-4o-mini | $0.15 input / $0.60 output | Yes |
-| Gemini 2.5 Flash | $0.15 input / $0.60 output | Yes |
-| Claude Sonnet | $3 input / $15 output | Overkill |
+| Model | Cost (per 1M tokens) | Notes |
+|-------|---------------------|-------|
+| Claude Haiku 4.5 | $0.25 in / $1.25 out | Best value, great music knowledge |
+| GPT-4o-mini | $0.15 in / $0.60 out | Cheapest option |
+| Gemini 2.5 Flash | $0.15 in / $0.60 out | Also cheap, good quality |
+| Claude Sonnet | $3 in / $15 out | Overkill for this task |
 
-Note: Gemini 2.5 can accept raw audio input directly, but Essentia's structured output is more reliable and reproducible for a production pipeline.
+## Taste Profile Structure
 
-## Competitive Advantage
+Built from a user's imported tracks:
 
-This approach means Vynl does what Spotify does internally (audio analysis) but exposes it transparently — users see exactly WHY a song was recommended based on its actual sonic qualities, not just "other listeners also liked this."
+```json
+{
+  "top_genres": [{"name": "indie rock", "count": 12}, ...],
+  "avg_energy": 0.65,
+  "avg_danceability": 0.55,
+  "avg_valence": 0.42,
+  "avg_tempo": 118.5,
+  "track_count": 47,
+  "sample_artists": ["Radiohead", "Tame Impala", ...],
+  "sample_tracks": ["Radiohead - Everything In Its Right Place", ...]
+}
+```
 
-## Tech Requirements
+The LLM uses this profile to understand what the user gravitates toward sonically (high energy? melancholy? upbeat?) and find new music that matches or intentionally contrasts those patterns.
 
-- **Essentia**: `pip install essentia-tensorflow` (includes pre-trained models)
-- **Storage**: Temporary audio clip storage during analysis (~500KB per 30s clip)
-- **Celery worker**: Dedicated worker for audio processing (CPU-bound)
-- **Vector storage**: PostgreSQL with pgvector extension for embedding similarity search
+## Platform Support
+
+### Currently Implemented
+- Spotify (OAuth + playlist import + audio features)
+
+### Planned
+- YouTube Music (via `ytmusicapi`, unofficial Python library)
+- Apple Music (MusicKit API, requires Apple Developer account)
+- Last.fm (scrobble history import + similar artist data)
+- Tidal (official API)
+- Manual entry / CSV upload (fallback for any platform)