ml-systems

End-to-end personalized feed ranking system demonstrating retrieval → ranking pipelines, offline evaluation, realistic simulation, and business-aligned diagnostics inspired by large-scale social platforms.

machine-learning recommendation-system learning-to-rank ml-systems feed-ranking

Updated Dec 31, 2025
Python

finnpwalsh / macro-nowcast

Star

cloud-native machine learning platform for real-time macroeconomic inflation nowcasting

aws terraform data-engineering macroeconomics mlops ml-systems

Updated Mar 11, 2026
Python

dileepkreddy5 / real-time-ml-feature-store

Star

Production-style real-time ML feature store with low-latency inference

redis streaming kafka prometheus low-latency feature-store fastapi real-time-ml ml-systems ml-inference

Updated Feb 22, 2026
Python

kuttivicky / Waymo-e2e-profiler

Star

Profile-first ML systems project optimizing a multi-camera end-to-end driving model for hardware efficiency using PyTorch, CUDA streams, NVTX instrumentation, and Nsight Systems.

performance-engineering deep-learning async cuda pytorch gpu-optimization nvtx ml-systems nsight-systems automomous-driving

Updated Feb 12, 2026
Python

fractal360 / risk-gate-api

Star

Deterministic decision gate for AI/ML systems. Risk-Gate enforces strict, schema-driven admissibility boundaries between AI/LLM intent and real system actions. It provides a fixed, human-owned decision structure with deterministic allow/block outcomes, explicit audit logging, and environment-specific policy via configuration — no ML, no heuristics,

terraform aws-ecs system-architecture policy-enforcement fastapi applied-ml ml-systems ai-governance llm-safety deterministic-systems auditability decision-gate

Updated Jan 14, 2026
Python

karun2328 / inference_pipeline

Star

Benchmarking and optimizing transformer inference across PyTorch, ONNXRuntime, and TensorRT with latency/throughput analysis on GPU and CPU.

gpu inference pytorch quantization tensorrt onnxruntime ml-systems

Updated Jan 16, 2026
Python

Krishna13k / Fraud-Anomaly-System

Star

End-to-end fraud anomaly detection system using FastAPI, Isolation Forest, Streamlit, Docker, and a CI/CD pipeline.

docker machine-learning backend pipeline ci-cd fraud-detection anomaly-detection fastapi streamlit ml-systems

Updated Jan 29, 2026
Python

Thiyaga1586 / PneumoAI

Star

Production-style ML inference system for Pneumonia detection from chest X-rays, featuring custom CNN architectures, versioned model serving, preprocessing parity, observability, drift detection, and rollback using FastAPI and Docker.

docker machine-learning computer-vision deep-learning pytorch medical-imaging model-serving chest-xrays mlops pneumonia-detection fastapi ml-systems

Updated Jan 8, 2026
Python

crasofuentes-hub / swarm-forge

Star

Autonomous training optimizer for nanoGPT using multi-agent patch search, empirical validation, and rollback-safe execution. TinyShakespeare val_loss improved from ~4.17 to ~1.8454.

engineering ci rollback transformers pytorch multi-agent reproducibility ray governance multi-agent-systems automl autonomous-agents checkpointing ml-systems nanogpt training-optimization auditability tinyshakespeare

Updated Mar 16, 2026
Python

fortyfive-labs / ml-dash

Star

Scalable Training Telemetry and Metrics Visualization

visualization machine-learning robot-learning ml-systems rlhf physical-ai

Updated Mar 17, 2026
Python

Arnav-Ajay / rag-failure-modes

Star

Failure-first analysis of retrieval-augmented and agentic systems, focused on isolating and attributing failures across retrieval, planning, execution, memory, and policy layers.

evaluation-framework rag failure-analysis agent-systems ai-observability ml-systems retrieval-augmented-generation system-debug agent-architecture llm-systems

Updated Feb 1, 2026
Python

kalopez / ml-inference-service

Star

Containerized ML inference service exposing a churn prediction model via FastAPI, with Docker-based deployment and AWS-ready architecture.

python docker machine-learning scikit-learn fastapi ai-engineering ml-systems ml-inference

Updated Mar 16, 2026
Python

heyyubov / traffic-analytics

Star

Real-time traffic analytics platform demonstrating ML systems design: detection, tracking, event logging, observability, and reporting.

machine-learning real-time computer-vision backend analytics yolo systems-engineering fastapi ml-systems

Updated Mar 1, 2026
Python

Improve this page

Add a description, image, and links to the ml-systems topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ml-systems topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ml-systems

Here are 17 public repositories matching this topic...

mosaicml / composer

rlops / rlix

narendrakumarnutalapati / licitra-evidence

narendrakumarnutalapati / licitra-core

dahlp94 / feed-ranking-engine

finnpwalsh / macro-nowcast

dileepkreddy5 / real-time-ml-feature-store

kuttivicky / Waymo-e2e-profiler

fractal360 / risk-gate-api

karun2328 / inference_pipeline

Krishna13k / Fraud-Anomaly-System

Thiyaga1586 / PneumoAI

crasofuentes-hub / swarm-forge

fortyfive-labs / ml-dash

Arnav-Ajay / rag-failure-modes

kalopez / ml-inference-service

heyyubov / traffic-analytics

Improve this page

Add this topic to your repo