Usage Guide

This page shows practical, copy-paste templates for using SmartML correctly.

All examples reflect the actual SmartEco.SmartML API and recommended workflows.

Installing SmartML

pip install SmartEco

Importing SmartML

SmartML is exposed through the SmartEco package.

from SmartEco.SmartML import load_dataset, run_training, SmartML_Inspect

Inspect Available Models

SmartML_Inspect()

This will show:

Available classification models
Available regression models
Disabled models due to missing dependencies
Unavailable models are automatically excluded during execution.

Using OpenML Datasets

SmartML can load datasets directly from OpenML.

X, y = load_dataset(
    openml_id=562,
    target="usr",
    subset=None,
)

print(f"Dataset loaded: X={X.shape}, y={y.shape}")

OpenML datasets are treated the same as CSV datasets after loading.

Using Local CSV Datasets

SmartML supports standard CSV files.

X, y = load_dataset(
    csv_path="data/dataset.csv",
    target="label",
)

CSV and OpenML datasets follow the same internal pipeline.

Subsampling Large Datasets

For large datasets, subsampling can be used to get quick baseline results.

X, y = load_dataset(
    csv_path="data/large_dataset.csv",
    target="label",
    subset=50000,
)

Selecting Specific Models

Run Only Selected Models

results = run_training(
    X_df=X,
    y_ser=y,
    task="classification",
    models=[
        "random_forest",
        "xgboost",
        "lightgbm",
        "smartknn",
    ],
)

Exclude Models

results = run_training(
    X_df=X,
    y_ser=y,
    task="regression",
    exclude=[
        "svr",
        "knn",
    ],
)

Model names are:

Case-insensitive
Normalized internally
Task-specific

Sorting Results

SmartML does not auto-sort results.

Classification

results.sort_values("macro_f1", ascending=False)

Regression

results.sort_values("r2", ascending=False)

Full Code

from SmartEco.SmartML import load_dataset, run_training, SmartML_Inspect

SmartML_Inspect()


X, y = load_dataset(
    openml_id=562,
    target="usr",
    subset=None,
)

print(f"Dataset loaded: X={X.shape}, y={y.shape}")


results = run_training(
    X_df=X,
    y_ser=y,
    task="regression",
    exclude=[
        "tabnet",
        "nam",
    ],
    output_csv="results/benchmark.csv",
)

print(
    results.sort_values(
        "r2",
        ascending=False,
    )
)

Important Notes

SmartML does not validate task–dataset correctness
Wrong task selection will produce misleading results
SmartML is not a production pipeline
Benchmarks are dataset-specific