Atlas Architecture
Data flow through the Atlas platform
Layer 1
Svelte Frontend (UX)
Port: 3003
Interactive dashboard with HMR
Layer 2
FastAPI BFF
Port: 8000
Backend For Frontend - Python proxy layer
Layer 3
AtlasCore API (Go)
Port: 8500
Gin framework - Central API gateway
Layer 4
PostgreSQL
Port: 5432
MinIO
Port: 9000
RedPanda
Port: 19092
OpenSearch
Port: 9200
Data Processing Pipeline
1
Ingestion Service
Reads 39.38 GB JSONL → Publishes to Kafka → Stores raw in MinIO
atlas.raw.ingested
2
Normalization (Spark + Flink)
Spark batch transforms → Flink streams to Iceberg → Writes Parquet tables
atlas.normalized
3
Enrichment (Hugging Face AI)
Text generation & translation → Enhance descriptions → Update Iceberg tables
atlas.enriched
4
Validation
OTA compliance checks → Quality scoring → Record metrics
atlas.validated
5
Distribution
Index to OpenSearch → Sync to Jupiter → API endpoints ready
atlas.published
Data Storage Architecture
📦 Iceberg Data Lake
• atlas.raw_payloads - Immutable source data
• atlas.hotels - Normalized hotel products
• atlas.rooms - Room type inventory
• atlas.amenities - Facility metadata
• atlas.quality_scores - Data quality metrics
Format: Parquet + Zstandard compression
🌊 Streaming Layer
• Apache Flink - Real-time streaming to Iceberg
• RedPanda - Kafka-compatible event streaming
• Apache Spark - Batch processing + transformations
• Iceberg REST - Catalog service (port 8181)
Flink provides exactly-once semantics to Iceberg
🔍 Search & Metadata
• OpenSearch - Full-text search indices
• PostgreSQL - Pipeline event tracking
• MinIO - Object storage (S3-compatible)
All data versioned with time travel support
Complete Data Flow
1. Supplier JSONL file → Ingestion → MinIO raw bucket + RedPanda (Kafka)
2. Flink streams from Kafka → Normalizes to OTA → Writes Parquet to Iceberg (exactly-once)
3. Spark batch jobs → Additional transformations → Update Iceberg tables
4. AI services enhance data → Update Iceberg tables in-place
5. Validation scores quality → Writes metrics to PostgreSQL
6. Distribution indexes to OpenSearch → Jupiter consumes via API