NBA Project

Author jyablonski

Updated Jun 14, 2026

Tags architectureawsinfrastructureoverview

This project is an end-to-end data platform delivering insights and predictions for the NBA season via a custom-built interactive dashboard. The system is fully containerized and deployed on AWS using best practices, including CI/CD pipelines, Terraform-managed infrastructure, and automated testing.

User-Facing Services

Core Components

Ingestion Script – Scrapes, loads, and stores raw NBA data.
dbt Project – Cleans, transforms, and models data.
ML Pipeline – Generates daily win probability predictions.
Terraform – Manages infrastructure as code.

Operational costs are minimal (around $1/month), primarily by leveraging the AWS Free Tier and other third-party cloud services.

Architecture Diagram

System Components

1. Data Ingestion

Python Script using Pandas, SQLAlchemy, and various other packages to gather data from a series of sources
Data is ingested into the Bronze Layer of a cloud-hosted Postgres Database, and also backed up to S3 for redundancy
Utilizes a feature flag table to determine what data to pull on each run
Ran via ECS Fargate as part of the daily pipeline orchestrated with AWS Step Functions

Note: The NBA blocks AWS IPs from accessing their API, necessitating custom scraping solutions.

2. dbt Transformations

Transforms raw Bronze data through a medallion architecture (Bronze -> Silver -> Gold)
Silver Layer: Fact and dimension tables standardize column names, enforce data types, and perform light cleaning; intermediate tables build custom models for downstream services and enable early data quality validation
Gold Layer: Analytics-ready marts optimized for consumption by the REST API and Frontend Dashboard
Utilizes dbt-expectations for comprehensive data quality testing
Intermediate tables in the Silver layer isolate and validate transformed models early, preventing data quality issues from propagating to the Gold layer and downstream services
Ran via ECS Fargate as part of the daily pipeline orchestrated with AWS Step Functions

graph LR A[Bronze Layer
Raw Sources] --> B[Fact Tables] A --> C[Dimension Tables] subgraph Silver[Silver Layer] B C D[Intermediate Tables] E[ML Features] end B --> D C --> D B --> E C --> E D --> F[Gold Layer
Analytics Marts] E --> F style A fill:#cd7f32,stroke:#8b5a2b,stroke-width:1.5px,color:#fff style Silver fill:#d1d5db,stroke:#6b7280,stroke-width:2px,color:#333 style F fill:#ffd700,stroke:#b8860b,stroke-width:1.5px,color:#333

3. ML Pipeline

Python Script which pulls purpose-built ML feature datasets from the Silver Layer and generates win probability predictions for upcoming games using a Logistic Regression model
Model features include recent team performance, rest days, and active injuries for both teams
Predictions are written back to the Gold Layer in Postgres and served by the REST API & Frontend Dashboard
Ran via ECS Fargate as part of the daily pipeline orchestrated with AWS Step Functions

4. REST API

Python Application which pulls transformed & enriched data from the Gold Layer in Postgres and serves it over public HTTP endpoints
Includes a lightweight web app for users sign in and make betting predictions for upcoming games
Also includes Admin pages for managing various aspects of the project, like feature flags
Deployed as a serverless application (AWS Lambda) for $0 / month.
Utilizes CloudFront & Route 53 for distribution and routing to https://api.jyablonski.dev.

Query Example

curl -H "Accept: application/json" https://api.jyablonski.dev/v1/league/game_types

5. Frontend Dashboard (Dash)

Python Application built with Dash which pulls data from Postgres to serve various tables, metrics, and graphs
Fully interactive with filtering and drill-down capabilities.
Hosted on free-tier resources and routed via Route 53 to https://nbadashboard.jyablonski.dev.

Dashboard Screenshot

Infrastructure

Terraform

Entire AWS stack provisioned via Terraform using custom-built modules for:
- S3 Buckets
- IAM Roles
- ECS Tasks & Services
- Lambda Functions
- PostgreSQL Infrastructure

Modules Repo

Common Modules

Custom internal Python package jyablonski_common_modules used by various services for:

AWS utilities (S3, Secrets Manager helpers)
Standardized logging
Postgres connection management & upsert functions

Ensures DRY principles and code consistency across services.

Orchestration (Step Functions)

AWS Step Functions orchestrates the daily pipeline (Ingestion -> dbt -> ML) for $ 0 / month
It triggers each Task to run in ECS Fargate in a free-tier VM
Apache Airflow would be preferred, but opted for Step Functions due to cost efficiency

Database Management

Postgres serves as the core database
All schemas, users, roles, and permissions managed via Terraform
Least privilege principles are implemented with strict role-based access control

module "reporting_schema" {
  source = "./modules/postgresql/schema"

  schema_name   = "reporting"
  database_name = var.jacobs_rds_db
  schema_owner  = var.postgres_username

  read_access_roles  = [module.rest_api_role_prod.role_name, module.dash_role_prod.role_name]
  write_access_roles = [module.dbt_role_prod.role_name]
  admin_access_roles = [var.postgres_username]
}

Although it’s an OLTP Database and not a true data warehouse, it effectively handles the analytical workloads for the project while being the most cost-effective solution available.