Atlas Data Platform

Atlas Architecture

Data flow through the Atlas platform

Layer 1

Svelte Frontend (UX)

Port: 3003

Interactive dashboard with HMR

Layer 2

FastAPI BFF

Port: 8000

Backend For Frontend - Python proxy layer

Layer 3

AtlasCore API (Go)

Port: 8500

Gin framework - Central API gateway

Layer 4

PostgreSQL

Port: 5432

MinIO

Port: 9000

RedPanda

Port: 19092

OpenSearch

Port: 9200

Data Processing Pipeline

Ingestion Service

Reads 39.38 GB JSONL → Publishes to Kafka → Stores raw in MinIO

atlas.raw.ingested

Normalization (Spark + Flink)

Spark batch transforms → Flink streams to Iceberg → Writes Parquet tables

atlas.normalized

Enrichment (Hugging Face AI)

Text generation & translation → Enhance descriptions → Update Iceberg tables

atlas.enriched

Validation

OTA compliance checks → Quality scoring → Record metrics

atlas.validated

Distribution

Index to OpenSearch → Sync to Jupiter → API endpoints ready

atlas.published

Data Storage Architecture

📦 Iceberg Data Lake

• atlas.raw_payloads - Immutable source data

• atlas.hotels - Normalized hotel products

• atlas.rooms - Room type inventory

• atlas.amenities - Facility metadata

• atlas.quality_scores - Data quality metrics

Format: Parquet + Zstandard compression

🌊 Streaming Layer

• Apache Flink - Real-time streaming to Iceberg

• RedPanda - Kafka-compatible event streaming

• Apache Spark - Batch processing + transformations

• Iceberg REST - Catalog service (port 8181)

Flink provides exactly-once semantics to Iceberg

🔍 Search & Metadata

• OpenSearch - Full-text search indices

• PostgreSQL - Pipeline event tracking

• MinIO - Object storage (S3-compatible)

All data versioned with time travel support

Complete Data Flow

1. Supplier JSONL file → Ingestion → MinIO raw bucket + RedPanda (Kafka)

2. Flink streams from Kafka → Normalizes to OTA → Writes Parquet to Iceberg (exactly-once)

3. Spark batch jobs → Additional transformations → Update Iceberg tables

4. AI services enhance data → Update Iceberg tables in-place

5. Validation scores quality → Writes metrics to PostgreSQL

6. Distribution indexes to OpenSearch → Jupiter consumes via API