Benchmarks

This page summarizes the performance characteristics of SmartKNN under realistic deployment constraints.

The goal of these benchmarks is not to win leaderboard competitions, but to evaluate latency behavior, stability, and accuracy trade-offs that matter in real production environments.

Benchmarking Philosophy

SmartKNN benchmarks are designed around the following principles:

CPU-first evaluation
All benchmarks are conducted on commodity CPU hardware without GPU acceleration.
Latency-aware measurement
Mean latency is reported, but tail latency (p95 and p99) is treated as a first-class metric.
Realistic workloads
Benchmarks use real-world datasets and inference patterns rather than synthetic or toy setups.
Transparent trade-offs
No system is optimal for all scenarios. Performance benefits and limitations are reported explicitly.

Evaluation Methodology

Benchmarks are conducted using a consistent and reproducible methodology:

Hardware: Commodity x86 CPU (exact configuration documented per run)
Execution modes:
Single-sample inference
Batch inference
Metrics:
Accuracy / F1 score (task-dependent)
Mean inference latency
p95 inference latency
Baselines:
Classical KNN
Linear models
Tree-based models (where applicable)
Gradient-boosted models

Exact configurations, dataset details, and benchmark scripts are available in the benchmark repository.

Summary Results

The table below highlights representative results across selected datasets and workloads.

Note: Results are indicative rather than exhaustive. Performance varies with dataset size, dimensionality, feature structure, and hardware characteristics.

Model	Accuracy / F1	Mean Latency	p95 Latency
SmartKNN	Competitive	Sub-millisecond	Stable and bounded
Classic KNN	Comparable	Higher	Less predictable
GBM Models	Higher	Variable	Higher tail latency

Detailed per-dataset results and raw measurements are available in the full benchmark logs.

Observations

Across evaluated workloads:

SmartKNN delivers predictable inference latency, often operating in the sub-millisecond range on commodity CPUs depending on configuration.
Learned feature weighting improves robustness on noisy and high-dimensional datasets.
Approximate execution backends reduce latency at scale, with a tunable accuracy trade-off.
Tree-based models may achieve higher peak accuracy on some datasets, but frequently exhibit higher or less predictable tail latency on CPU.

Limitations and Trade-offs

SmartKNN is not always the optimal choice:

On very small datasets, simpler models may be faster due to lower overhead.
On highly nonlinear problems, tree-based or neural models may outperform in accuracy.
Approximate backends introduce controlled approximation error that must be tuned carefully.

These trade-offs are documented to support informed system design decisions.

Reproducibility

All benchmarks are conducted using versioned code and documented configurations.

Results may vary across hardware and environments and should be interpreted accordingly.
Benchmark scripts, configuration files, and raw logs are available for independent verification.