Loading services...

Atlas Architecture

Data flow through the Atlas platform

Layer 1

Svelte Frontend (UX)

Port: 3003

Interactive dashboard with HMR

Layer 2

FastAPI BFF

Port: 8000

Backend For Frontend - Python proxy layer

Layer 3

AtlasCore API (Go)

Port: 8500

Gin framework - Central API gateway

Layer 4

PostgreSQL

Port: 5432

MinIO

Port: 9000

RedPanda

Port: 19092

OpenSearch

Port: 9200

Data Processing Pipeline

1
Ingestion Service
Reads 39.38 GB JSONL → Publishes to Kafka → Stores raw in MinIO
atlas.raw.ingested
2
Normalization (Spark + Flink)
Spark batch transforms → Flink streams to Iceberg → Writes Parquet tables
atlas.normalized
3
Enrichment (Hugging Face AI)
Text generation & translation → Enhance descriptions → Update Iceberg tables
atlas.enriched
4
Validation
OTA compliance checks → Quality scoring → Record metrics
atlas.validated
5
Distribution
Index to OpenSearch → Sync to Jupiter → API endpoints ready
atlas.published

Data Storage Architecture

📦 Iceberg Data Lake

atlas.raw_payloads - Immutable source data
atlas.hotels - Normalized hotel products
atlas.rooms - Room type inventory
atlas.amenities - Facility metadata
atlas.quality_scores - Data quality metrics
Format: Parquet + Zstandard compression

🌊 Streaming Layer

Apache Flink - Real-time streaming to Iceberg
RedPanda - Kafka-compatible event streaming
Apache Spark - Batch processing + transformations
Iceberg REST - Catalog service (port 8181)
Flink provides exactly-once semantics to Iceberg

🔍 Search & Metadata

OpenSearch - Full-text search indices
PostgreSQL - Pipeline event tracking
MinIO - Object storage (S3-compatible)
All data versioned with time travel support

Complete Data Flow

1. Supplier JSONL file → Ingestion → MinIO raw bucket + RedPanda (Kafka)

2. Flink streams from Kafka → Normalizes to OTA → Writes Parquet to Iceberg (exactly-once)

3. Spark batch jobs → Additional transformations → Update Iceberg tables

4. AI services enhance data → Update Iceberg tables in-place

5. Validation scores quality → Writes metrics to PostgreSQL

6. Distribution indexes to OpenSearch → Jupiter consumes via API