This repository contains an end-to-end project for predicting customer churn in a telecom dataset. We explore customer behavior through Exploratory Data Analysis (EDA), build a predictive machine learning model, and deploy it as an interactive web application using Streamlit.
Customer churn refers to the phenomenon where customers discontinue their services with a company, such as canceling a telecom subscription. In the telecom industry, churn is a critical metric because retaining existing customers is often more cost-effective than acquiring new ones. Factors like high monthly bills, poor service quality, or attractive competitor offers can drive churn. By understanding these triggers, businesses can develop targeted retention strategies, such as offering discounts, improving customer support, or enhancing service reliability.
Churn can occur under various circumstances, each requiring a different approach to mitigation. Common scenarios include voluntary churn, where customers choose to leave due to dissatisfaction with costs or service; involuntary churn, where accounts are closed due to unpaid bills; competitive churn, where customers switch to rival providers for better deals; and life event churn, driven by personal changes like relocation or financial constraints. Recognizing these scenarios enables companies to tailor interventions, such as loyalty programs for competitive churn or flexible payment plans for involuntary cases.
A telecom subscriber’s decision to stay or leave follows a cyclical process. It begins with onboarding, where first impressions are formed through setup ease and welcome offers. Next is usage, where customers evaluate service quality, like call clarity or internet speed. Billing shapes perceptions of value, influenced by contract terms and costs. Support interactions, such as resolving technical issues, play a pivotal role in satisfaction. Finally, customers re-evaluate their commitment, deciding to renew, upgrade, or churn. Addressing pain points at each stage—such as streamlining onboarding or improving support—can boost retention.
Customers can be grouped into segments based on their likelihood to churn, allowing for precise retention efforts. High-risk segments include short-tenure customers on month-to-month contracts or those with frequent complaints. Medium-risk customers show mixed signals, like average tenure but recent billing disputes. Low-risk segments consist of loyal, long-term customers with stable payment histories. At-risk newcomers are new subscribers still forming opinions, highly sensitive to early experiences. By identifying these segments, businesses can prioritize resources, offering incentives to high-risk groups or personalized onboarding to newcomers.
This project delivers a comprehensive solution for churn prediction:
- Exploratory Data Analysis (EDA): Analyzed the telecom dataset to identify patterns and characteristics of churners, using visualizations and statistical insights.
- Model Building: Developed a RandomForestClassifier with SMOTEENN to handle imbalanced data, achieving robust churn predictions.
- Model Deployment: Created an interactive Streamlit web app that allows users to input customer details and receive real-time churn predictions with probabilities.
- EDA: [Churn Analysis - EDA.ipynb](Churn Analysis - EDA.ipynb) – Jupyter notebook detailing data exploration and visualization.
- Model Building: [Churn Analysis - Model Building.ipynb](Churn Analysis - Model Building.ipynb) – Notebook covering data preprocessing, model training, and evaluation.
- Deployment: Application.py – Streamlit app script for interactive churn prediction.
The predictive model is deployed as a Streamlit web application, offering an intuitive interface for users to explore churn likelihood. Key features include:
- Input Interface: Users enter 19 customer features (e.g., demographics, services, billing details) via checkboxes, dropdowns, and number inputs, organized into expandable sections for clarity.
- Preprocessing: The app applies one-hot encoding and tenure binning to align inputs with the model’s training data.
- Prediction Output: Displays churn likelihood (Yes/No) and probability, enhanced with visual indicators (e.g., warning for high risk, success for low risk).
- User Experience: Includes a sidebar with project context, a header with branding, and a footer linking to my website and GitHub.
The app is live at: Customer Churn Prediction App
To run the Streamlit app locally:
- Clone this repository:
git clone https://github.com/Mutsinz1/Churn_Prediction_Project.git
- Navigate to the project directory:
cd Churn_Prediction_Project
- Install the required libraries:
pip install streamlit pandas scikit-learn numpy
- Run the Streamlit app:
streamlit run Application.py
- Open the provided URL (e.g., http://localhost:8501) in your browser to use the app.