Supercharge Your Model Training
-
Updated
Nov 12, 2025 - Python
Supercharge Your Model Training
Efficient Deep Learning Systems course materials (HSE, YSDA)
A control plane for concurrent LLM RL on shared GPUs
Designing IT and ML Applications using Systems Thinking Approach at IIT Bhilai (CS559)
LICITRA v1 evidence — superseded by licitra-mmr-evidence
LICITRA v1 — superseded by licitra-mmr-core
Structured notes on designing scalable and fault-tolerant ML systems, to refresh your knowledge and help you prepare for a system design interview. Covers system design, MLOps, and case studies.
End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.
cloud-native machine learning platform for real-time macroeconomic inflation nowcasting
Evidence-based roadmap to becoming an AI System Engineer. Mathematical foundations, ML systems, production habits, and proof-backed progression.
Experimental web application demonstrating how an offline-trained financial fraud detection model can be exposed through a web interface. Built with Flask and a pre-trained XGBoost model to showcase ML inference flow, feature engineering, and result communication — not a production fraud prevention system.
Production-style real-time ML feature store with low-latency inference
Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.
Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,
Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.
End-to-end fraud anomaly detection system using FastAPI, Isolation Forest, Streamlit, Docker, and a CI/CD pipeline.
Production-style ML inference system for Pneumonia detection from chest X-rays, featuring custom CNN architectures, versioned model serving, preprocessing parity, observability, drift detection, and rollback using FastAPI and Docker.
Research investigation into training stability and optimization dynamics under reduced precision.
Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.
To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."