Limitations and Important Notes
SmartML is intentionally constrained to preserve benchmark correctness and reproducibility.
The following limitations are by design and must be understood before use.
Task–Dataset Mismatch
SmartML does not validate semantic correctness between the chosen task and the dataset.
- Using classification on a regression dataset will not raise an error
- Using regression on a classification dataset will not raise an error
- Models will still train and produce outputs
However, the results will be meaningless or misleading.
Important
It is the user’s responsibility to:
- Select the correct task type
- Ensure the target variable matches the task
SmartML assumes expert usage, not guardrails.
No Custom Train/Test Splits
- External train/test datasets are not supported
- Cross-validation is not supported
- Split ratios are fixed
This is required to ensure fair comparison across models.
No Hyperparameter Tuning
- All models run with fixed defaults
- No grid search or Bayesian optimization
- No per-model tuning
SmartML compares model families, not optimized instances.
Not a Production Pipeline
SmartML is not suitable for production use.
- No pipeline export
- No inference serving
- No model persistence guarantees
Results are intended for analysis only.
Limited Dataset Validation
SmartML does not enforce:
- Target type validation
- Label distribution sanity checks
- Feature leakage detection
Users must validate datasets independently.
Deep Learning Constraints
Deep learning models:
- Run on CPU by default
- Use conservative training limits
- May be slow on large datasets
- Are intended for comparison, not performance tuning
GPU acceleration is not managed automatically.
Dependency-Based Model Availability
- Some models require optional dependencies
- Missing dependencies silently disable models
- Disabled models are excluded from execution
SmartML does not install dependencies automatically.
Memory and Scale Constraints
- Large datasets may cause high memory usage
- Deep models can be resource intensive
- SmartML does not perform automatic dataset sharding
Users are responsible for resource management.
Benchmark Interpretation
SmartML outputs:
- Raw performance metrics
- Latency measurements
- No confidence intervals
- No statistical significance tests
Results should be interpreted as directional guidance, not absolute rankings.
Summary
SmartML limitations are intentional.
It assumes:
- Correct task selection
- Clean datasets
- Informed interpretation of results
If strict validation, automation, or safety checks are required, SmartML may not be the appropriate tool.