Distance Engine

The Distance Engine defines how similarity between samples is computed in SmartKNN.

Unlike classical KNN, where distance is treated as a fixed mathematical formula, SmartKNN treats distance computation as a first-class system component with explicit guarantees around numerical stability, memory safety, and production behavior.

Role of the Distance Engine

The Distance Engine is responsible for a single task:

Given two samples, compute a meaningful, stable, and consistent measure of similarity.

It does not:

Retrieve neighbors
Select backends
Perform prediction aggregation
Learn feature weights

This strict separation ensures that distance behavior remains explicit, inspectable, and predictable.

Core Distance Definition

SmartKNN uses a numerically safe, weighted Euclidean distance kernel as its core similarity computation.

Feature contributions are scaled by learned importance weights, allowing distance behavior to adapt to the data while preserving a clear geometric interpretation.

A key property of this formulation is that near-zero feature weights effectively remove a dimension from distance computation, enabling soft feature selection without altering data representation.

Numerical Stability Guarantees

The Distance Engine is designed with production safety in mind and enforces several numerical guarantees:

Float32-only execution
Ensures consistent performance characteristics and predictable memory usage.
NaN and Inf sanitization
Invalid values are handled explicitly to prevent undefined behavior during distance computation.
Epsilon-floored weights
Feature weights are bounded away from zero to avoid numerical instability and degenerate distances.

These safeguards ensure that distance computation remains stable even in noisy or imperfect real-world datasets.

Memory Safety and Execution Constraints

Distance computation can be memory-intensive at scale.

Before allocating large intermediate buffers or distance matrices, SmartKNN estimates memory usage upfront and fails fast if limits are exceeded. This prevents out-of-memory conditions and avoids partial or undefined execution states.

Additional execution guarantees include:

Contiguous memory enforcement for efficient access patterns
Explicit control over allocation boundaries
Predictable memory footprint during inference

These constraints make distance computation safer to deploy in production services.

Parallel Execution Model

The Distance Engine is designed to leverage data-level parallelism where appropriate.

Distance computation is parallelized using a JIT-compiled execution model, enabling efficient utilization of available CPU cores while maintaining deterministic behavior.

Parallelism affects how fast distance is computed, not what distance means.

Consistency Across Backends

A core design requirement of the Distance Engine is semantic consistency.

Whether SmartKNN uses:

Brute-force retrieval
Approximate nearest-neighbor backends

The definition and behavior of distance remain unchanged.

Backend choice influences how candidates are retrieved, not how similarity is computed. This ensures that scaling decisions do not silently alter prediction semantics.

Relationship to Other Components

The Distance Engine interacts with other SmartKNN components in a strictly defined manner:

Feature Weight Learning determines feature importance
Feature Pruning Engine may suppress inactive dimensions
Backend Strategy supplies candidate neighbors
Prediction Logic consumes distance outputs

At inference time, the Distance Engine operates independently and does not modify or depend on other components.

Summary

The SmartKNN Distance Engine transforms distance computation from an implicit assumption into a numerically stable, memory-safe, and production-aware system component.

By enforcing explicit constraints and guarantees, SmartKNN ensures that similarity computation remains reliable, interpretable, and scalable across real-world deployment scenarios.