A Python project for detecting network intrusions using a LightGBM classifier and serving predictions via a Flask API.
- Load and preprocess network session data
- Train and evaluate LightGBM model with balanced classes
- Adjust decision threshold to boost recall
- Expose /predict endpoint for real-time inference
Python 3.8+ Packages listed in requirements.txt:
  numpy
  pandas
  seaborn
  matplotlib
  xgboost
  lightgbm
  scikit-learn
  Flask
  joblib- Input CSV: cybersecurity_intrusion_data.csv
- Key steps:
- Drop unused columns
- Fill missing encryption_used with None
- One-hot encode protocol_type and encryption_used
- Split into train/test (80/20)
 
Run python train.py (or notebook):
- Imports and settings
- Preprocessing as above
- Model instantiation:
from lightgbm import LGBMClassifier
model = LGBMClassifier(class_weight='balanced', random_state=42)
model.fit(X_train, y_train)- Predictions and evaluation (precision, recall, F1)
- Lower threshold to 0.3 to increase recall:
probs = model.predict_proba(X_test)[:,1]
y_pred = (probs > 0.3).astype(int)- View results summary table in notebook
- Save model:
import joblib
joblib.dump(model, 'Lightgbm_model.pkl')- File: app.py (or flask_app.py)
- Load model:
model = joblib.load('Lightgbm_model.pkl')Endpoint:
POST /predict
Request JSON: {"features": [v1, v2, ..., v10]}
Response JSON: {"attack_detected": true|false}Runs on port 5000 by default:
flask run --host 0.0.0.0 --port 5000- Install dependencies:
pip install -r requirements.txt- Train model:
python train.py- Start API:
python app.py- Test prediction:
curl -X POST http://localhost:5000/predict \
     -H "Content-Type: application/json" \
     -d '{"features":[599,4,492.98,0.6068,1,0,1,0,1,0]}'- Results
Final comparison of models (precision, recall, F1):
Random Forest            98.6 / 71.9 / 83.2
XGBoost (Default)        94.1 / 73.7 / 82.6
XGBoost (Tuned)          90.3 / 74.3 / 81.5
LightGBM (Default)       97.8 / 72.7 / 83.4
LightGBM (Threshold 0.3) 82.7 / 78.2 / 80.4MIT