Back to catalog
Dev ToolsIntermediateProject

Run AI models locally with edge inference

Set up local-first AI inference on your own hardware — no cloud, no API keys, full privacy.

90 min
Ollamallama.cppvLLM
10xCareer Team

Choose your training style

Pick the format that matches the level of support you want.

Self-pacedAvailable

Self-paced

Start immediately and work through the training on your own schedule.

Free
Human trainerComing soon

Human trainer

Join a guided cohort or workshop format when live delivery is available.

$99

Guided by an instructor

AI trainerComing soon

AI trainer

Practice with an AI-guided trainer experience tailored to the course topic.

$9

Personalized guidance

What you'll learn
  • Run open-weight LLMs on your own hardware
  • Choose appropriate model sizes and quantization for your use case
  • Serve local models via OpenAI-compatible APIs
  • Evaluate trade-offs between local and cloud inference

Overview

Local-first edge AI inference lets you run powerful models on your own machine or edge devices. With 633K+ stars across the ecosystem, tools like llama.cpp, Ollama, and vLLM have made this accessible to everyone.

What you'll build

  • A local inference setup running open-weight models
  • A private AI assistant that never sends data to the cloud
  • A benchmark comparison across different quantization levels

What you'll learn

  1. Choosing the right model: size vs. quality vs. hardware requirements
  2. Quantization: what it is and how to pick the right level (Q4, Q5, Q8)
  3. Setting up Ollama or llama.cpp on your machine
  4. Serving models via OpenAI-compatible local APIs
  5. When local beats cloud (and vice versa)

Why this matters

Privacy-sensitive work, offline environments, and cost control all demand local inference skills. This is a must-know for anyone serious about AI.