Skip to content

PRAISELab-PicusLab/Deepfake-Detection-in-Healthcare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🩺⚑ SynthMed: Generating and Detecting Multimodal Deepfakes for Healthcare Communication

License Code Status

Healthcare communication increasingly relies on digital platforms, creating new vulnerabilities for misinformation through sophisticated deepfake technologies. The proliferation of synthetic medical content poses serious risks to patient safety, public health policies, and trust in healthcare institutions.

We introduce SynthMed, a comprehensive framework for generating and detecting multimodal deepfakes specifically designed for healthcare communication scenarios. Our approach combines state-of-the-art generative models with advanced detection mechanisms using multimodal late-fusion strategies across video, audio, and textual fact-checking modalities.

Our evaluation demonstrates that SynthMed achieves robust detection capabilities through multimodal fusion, significantly outperforming single-modality approaches. These results highlight the framework's potential as both a research tool for understanding deepfake vulnerabilities and a defense mechanism against malicious synthetic healthcare content.

πŸ“Š Data Sources

  • PUBHEALTH: Public health fact-checking dataset
  • COVID-Fact: COVID-19 related claims and fact-checks
  • SciFact: Scientific claim verification dataset
  • HealthVer: Health-related claim verification corpus

The framework processes over 31k healthcare claims through multiple synthesis pipelines and evaluation protocols:

  • 510 Generated Deepfakes: Using advanced TTS/voice cloning and lip-sync technologies
  • 40 In-the-Wild Deepfakes: Web-sourced videos from Duke Reporters' Lab and YouTube (reported in datasets/HealthcareDeepfakesInTheWild.xlsx)
  • 150 Real Videos: Authentic content for balanced evaluation
  • Balanced Dataset Splits: Synthetic (D₁), in-the-wild (Dβ‚‚), and combined (D₃) configurations

πŸ› οΈ Technologies Used

πŸ“‘ Methodology

Our SynthMed framework operates through a dual-phase architecture addressing both generation and detection of healthcare deepfakes:

Core Components

  1. Deepfake Generation Pipeline:

    • Claim Dataset Curation: Merging multiple healthcare fact-checking datasets
    • LLM-based Elaboration: Converting claims into persuasive spoken sentences
    • Synthetic Audio Production: TTS/voice cloning with speaker diarization
    • Video Lip-Sync: Photorealistic mouth movement alignment
  2. Single-task Models Framework:

    • Video Forensics: Spatial, geometric, and frequency-domain artifact analysis
    • Audio Forensics: TTS/voice-cloning detection and audio-visual synchrony
    • Textual Fact-Checking: Semantic veracity verification of spoken claims
  3. Late-Fusion Models Framework:

    • Meta-Classifier Engines: Logistic Regression, Random Forest, XGBoost
    • Monomodal Fusion: Within-modality (video, audio, and text) ensemble of multiple detectors
    • Multimodal Fusion: Cross-modal integration using meta-classifiers
    • Feature Importance Analysis: Understanding modality contributions to decisions
  4. Evaluation Protocol:

    • Incremental Assessment: Sequential videoβ†’audioβ†’text analysis pipeline
    • Cross-Domain Validation: Testing across different distribution scenarios

Architecture Overview

SynthMed Methodology

The system processes healthcare content through generation and detection pipelines, with late fusion combining complementary modality-specific signals for robust deepfake identification.

Deepfake Generation Process

The generation workflow demonstrates the multi-stage approach for creating realistic synthetic healthcare communications from curated claims to final video output.

Feature Importance Analysis

Late-fusion feature importance analysis reveals how different meta-classifiers (LR, RF, XGBoost) exploit distinct synergy patterns across video, audio, and text modalities.

✨ Key Features

  • Healthcare-Specific Pipeline: Specialized framework for medical communication synthesis and detection
  • Multimodal Late Fusion: Decision-level aggregation exploiting complementary error profiles
  • Comprehensive Benchmarking: Integration of 15 video, 9 audio, and 6 text detection systems
  • Cross-Domain Evaluation: Testing across synthetic, in-the-wild, and combined datasets
  • Interpretable Fusion: Feature importance analysis revealing modality contributions
  • Scalable Architecture: Subject-agnostic generation requiring no per-identity training
  • Comprehensive Benchmarking: Systematic comparison across multiple detection paradigms

πŸ“ Repository Structure

SynthMed/
β”œβ”€β”€ README.md                       # This file
β”œβ”€β”€ LICENSE.txt                     # CC BY-NC 4.0 license
β”œβ”€β”€ code/                          # Implementation code (coming soon)
β”œβ”€β”€ datasets/                      # Healthcare datasets and synthetic content
β”‚   β”œβ”€β”€ Dataset_llama.csv         # Llama-generated elaborations
β”‚   β”œβ”€β”€ Dataset_Palmyra.csv       # Palmyra-generated elaborations
β”‚   β”œβ”€β”€ Elaborated_DatasetClaim_BART.csv
β”‚   β”œβ”€β”€ Elaborated_DatasetClaim_T5.csv
β”‚   └── HealthcareDeepfakesInTheWild.xlsx
└── img/                          # Methodology diagrams and visualizations
    β”œβ”€β”€ ISM_DeepfakeGeneration.png
    β”œβ”€β”€ ISM_LF_FeatureImportanceAnalysis.png
    └── ISM_Methodology.png

πŸ† Experimental Results

Our comprehensive evaluation demonstrates significant advances in healthcare deepfake detection:

  • Multimodal Superiority: Late fusion consistently outperformed single-modality approaches across all evaluation metrics
  • Cross-Domain Robustness: Effective detection across synthetic and in-the-wild distribution scenarios
  • Complementary Modalities: Each modality captures distinct artifact patterns, enabling synergistic detection
  • Meta-Classifier Performance: XGBoost achieved optimal balance with ~89% accuracy and 0.84 AUC
  • Balanced Classification: Strong performance on both authentic and synthetic content detection

The results confirm that multimodal integration provides a more reliable and robust approach for medical deepfake detection, demonstrating the importance of ensemble strategies in high-stakes healthcare communication scenarios.

πŸ”§ Configuration Options

Generation Models

  • Video Synthesis: VideoReTalking for subject-agnostic lip-sync generation
  • Audio Synthesis: OpenVoice (IVC), WhisperSpeech (TTS), OuteTTS (TTS) with controllable parameters
  • Text Elaboration: PALMYRA-MED-70B-32K and LLAMA-3.1-NEMOTRON-70B-INSTRUCT

Detection Strategies

  • Video Models: Spatial, frequency-aware, and forensic detection architectures
  • Audio Models: Feature-based approaches combining spectral and neural representations
  • Text Models: Transformer-based fact-checkers and semantic verification systems

Fusion Engines

  • Logistic Regression: Interpretable linear combination for transparent decisions
  • Random Forest: Ensemble approach robust to noisy modality scores
  • XGBoost: Advanced gradient boosting for complex cross-modal pattern learning

πŸ“Š Evaluation Metrics

The framework includes comprehensive assessment tools:

  • Classification Metrics: Precision, Recall, F1-score per class (True/Fake)
  • Macro-Averaged Metrics: Macro-P, Macro-R, Macro-F1 for balanced evaluation
  • Threshold-Independent: AUC, Equal Error Rate (EER), Accuracy
  • Cross-Validation: Stratified splits ensuring class balance (70/15/15%)
  • Feature Importance: Model-agnostic analysis of modality contributions
  • Domain Analysis: Performance across synthetic (D₁), in-the-wild (Dβ‚‚), and combined (D₃) scenarios

🚨 Ethical Considerations

This research addresses the critical need for deepfake detection in healthcare while acknowledging the dual-use nature of synthetic content generation:

  • Research Purpose Only: All synthetic content generated solely for detection research
  • Responsible Disclosure: Results shared to enhance healthcare cybersecurity awareness
  • Safeguards Required: Clear labeling and guidelines for any data release
  • Privacy Protection: All datasets ethically sourced with proper anonymization
  • Healthcare Context: Special attention to patient safety and public health implications
  • Transparency: Open methodology enabling validation and responsible deployment

πŸ”¬ Code Release

The complete implementation code including generation pipelines, detection models, and fusion frameworks will be released upon paper acceptance. The codebase will include:

  • Multimodal deepfake generation scripts with healthcare-specific prompting
  • Comprehensive detection model implementations and benchmarking tools
  • Late-fusion framework with interpretability analysis
  • Evaluation protocols and dataset preprocessing utilities
  • Jupyter notebooks for reproducible experiments and analysis

🀝 Contributing

We welcome contributions to advance healthcare deepfake detection research! Please feel free to submit pull requests, report issues, or suggest improvements. All contributions should align with our ethical guidelines for responsible AI in healthcare.

πŸ‘¨β€πŸ’» This project was developed by Mariano Barone, Francesco Di Serio, Antonio Romano, Giuseppe Riccio, Marco Postiglione, and Vincenzo Moscato at University of Naples Federico II – PRAISE Lab - PICUS

πŸ“œ License

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

About

🩺⚑ SynthMed: Generating and Detecting Multimodal Deepfakes for Healthcare Communication

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published