This project provides a complete pipeline for fraud detection, from data engineering to model training and serving. It is composed of two main services, each in its own repository.
This project is split into two core repositories:
-
➡️ dataops_pipeline: Contains all code for the data engineering pipeline, including data ingestion, cleaning, and preparation (ETL), orchestrated by Airflow.
-
➡️ model_training: Contains the notebooks and scripts for training, evaluating, and serving the classification model for fraud detection.
To run this project locally, you need to containerize and run each service independently. This requires two separate terminal windows.
-
Clone both repositories to your local machine:
git clone git@github.com:arthurcornelio88/stripe_model_training.git git clone git@github.com:arthurcornelio88/stripe_dataops_pipeline.git
-
Open two terminal windows.
-
In the first terminal, navigate to the
dataops_pipeline
directory and start its services using Docker Compose:cd dataops_pipeline docker-compose up --build
-
In the second terminal, navigate to the
model_training
directory and start its services:cd model_training docker-compose up --build
Each service will now be running in its own isolated, containerized environment.
For detailed instructions on how to deploy these services to a production environment (like GCP VMs), please refer to the specific documentation within each repository's README.md
or /docs
folder.