Nicholas Bannister

Recent

Speculative Decoding and the Model Choice: Lessons

Speculative Decoding and the Model Choice: Lessons

Speculative decoding model differences.

Inference-Aware AI AI EngineeringEngineering Best Practices

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment

Standing Up vLLM on a Single A10G: From First Boot to Dual-Model Deployment

Deploying vLLM with docker on AWS using terraform.

AI EngineeringInference-Aware AI

More Articles

How to Lead High-Confidence, High-Certainty People Without Crushing Their Value

How to Lead High-Confidence, High-Certainty People Without Crushing Their Value

This article shows how leaders can channel strong, dominating confidence into team-strengthening collaboration.

Rough Notes - Product Observability

Rough Notes - Product Observability

A part of Product strategy that isn’t just about shipping features fast.

LeadershipEngineering Best Practices Software Philosophy

Inference-Aware AI: Working Definitions

Inference-Aware AI: Working Definitions

A glossary of terms that define the concept of inference-aware agents, breaking down the core ideas, agent types, awareness dimensions, and platform components behind cost-efficient AI systems.

AI EngineeringInference-Aware AI Software Philosophy

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency

A Hypothesis: Inference-Aware Agents Could Be the Next Big Leap in AI Efficiency

An introduction to the hypothesis that AI agents can be made faster, cheaper, and more effective through an inference-aware platform that optimizes how they decide, act, and use resources.

AI EngineeringInference-Aware AI Software Philosophy

A VECTR-Guided Refactor with Cursor.

A VECTR-Guided Refactor with Cursor.

Let's review a real example i've had to refactor to make it easier to contend with.

Software PhilosophyEngineering Best Practices Coding Practices

The Duplication Dilemma: A VECTR Guide to Repeating Yourself

The Duplication Dilemma: A VECTR Guide to Repeating Yourself

This article clarifies the "Don't Repeat Yourself" (DRY) principle within the context of VECTR

Software PhilosophyEngineering Best Practices Coding Practices

Scaling Engineering with AI from 0 to 50

Scaling Engineering with AI from 0 to 50

What it really takes to scale an engineering team from 0 to 50 inside a 100+ person company in today’s AI-native world.

AI EngineeringEngineering Team ScalingLeadership

Logistic Regression from Scratch with Python (Full Implementation)

Logistic Regression from Scratch with Python (Full Implementation)

Logistic regression from scratch with notes and learnings.

ML & Data Science

VECTR: Velocity-Engineered Code for Rapid Teams

VECTR: Velocity-Engineered Code for Rapid Teams

VECTR is a pragmatic software philosophy focused on speed, clarity, and context. It favors useful code, justified abstractions, and adaptive design over rigid rules.

Software PhilosophyEngineering Best Practices

Notes to Self: Hiring Playbook

Notes to Self: Hiring Playbook

No-fluff hiring guide for engineering leaders

Engineering Team ScalingHiring & Talent

Multiple Linear Regression from Scratch (with Diagnostics)

Multiple Linear Regression from Scratch (with Diagnostics)

A from-scratch implementation of multiple linear regression using gradient descent, with full diagnostic plots and batch prediction on test data.

ML & Data Science

Simple Linear Regression on Housing Data (Notes)

Simple Linear Regression on Housing Data (Notes)

Just some note on linear regression to come back later to.

ML & Data Science