Skip to content

Conversation

asp2286
Copy link

@asp2286 asp2286 commented Sep 4, 2025

Related issues

Fixes #3043

Add Isolation Forest Anomaly Detection Trainer (Experimental)

This PR adds an Isolation Forest anomaly detection trainer for ML.NET.
Isolation Forest (Liu, Ting, Zhou, 2008) is a tree-ensemble algorithm for unsupervised anomaly detection that isolates outliers via random partitioning. It complements existing ML.NET anomaly detectors (e.g., SR-CNN, IID) with a density-agnostic approach.


Motivation

  • Provide a widely-used, general-purpose anomaly detection method.
  • Works without strong distribution assumptions.
  • Produces both a continuous anomaly score and a binary label.
  • Achieves parity with popular libraries like scikit-learn.

Design (v1, Experimental)

  • Core engine: IsolationForestModel (pure C#) implements random partitioning trees, scoring, and SHAP-like path contributions.
  • Pipeline integration: IsolationForestTrainer : IEstimator<ITransformer> appends:
    • Score (float, scaled 0–100; higher = more anomalous),
    • PredictedLabel (bool), thresholded by Contamination or explicit override.
  • Options:
    • Trees
    • SampleSize (psi)
    • Seed
    • Contamination
    • ParallelBuild
    • ThresholdOverride

⚠️ Experimental note: v1 uses CustomMapping internally. Models trained with this trainer cannot currently be persisted with mlContext.Model.Save(). A follow-up will introduce a proper IsolationForestTransformer with save/load and efficient row-mapping.


Usage

var pipeline = ml.Transforms.Concatenate("Features", "X1", "X2")
    .Append(new IsolationForestTrainer(new IsolationForestTrainer.Options
    {
        Trees = 200,
        SampleSize = 256,
        Contamination = 0.02
    }));

var model = pipeline.Fit(data);

@asp2286
Copy link
Author

asp2286 commented Sep 4, 2025 via email

Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 87.38255% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.05%. Comparing base (fb39755) to head (adab91a).

Files with missing lines Patch % Lines
...crosoft.ML.IsolationForest/IsolationForestModel.cs 84.69% 30 Missing and 32 partials ⚠️
...osoft.ML.IsolationForest/IsolationForestTrainer.cs 77.86% 21 Missing and 8 partials ⚠️
...L.IsolationForest.Tests/IsolationForestAllTests.cs 98.56% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7497      +/-   ##
==========================================
+ Coverage   69.01%   69.05%   +0.03%     
==========================================
  Files        1482     1485       +3     
  Lines      273999   274744     +745     
  Branches    28258    28388     +130     
==========================================
+ Hits       189093   189717     +624     
- Misses      77520    77594      +74     
- Partials     7386     7433      +47     
Flag Coverage Δ
Debug 69.05% <87.38%> (+0.03%) ⬆️
production 63.34% <83.02%> (+0.03%) ⬆️
test 89.49% <98.56%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...L.IsolationForest.Tests/IsolationForestAllTests.cs 98.56% <98.56%> (ø)
...osoft.ML.IsolationForest/IsolationForestTrainer.cs 77.86% <77.86%> (ø)
...crosoft.ML.IsolationForest/IsolationForestModel.cs 84.69% <84.69%> (ø)

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Isolation Forests
1 participant