PlatyPII is a modular, extensible Python framework for detecting and anonymizing Personally Identifiable Information (PII) in unstructured text. PlatyPII makes it easy to integrate PII detection into your applications.
π PII Detection: Detects emails, phone numbers, names, addresses, SSNs, and more.
π§ Hybrid Detection Engine: Combines rule-based regex detectors with NLP-based detectors (spaCy).
π‘ Anonymization Options: Supports multiple anonymization methods: mask, redact, hash, replace, and synthetic.
βοΈ Configurable Settings: Fine-tune detection thresholds, enable/disable PII types, and customize output.
π¦ Batch Processing: Run detection and anonymization across multiple documents at once.
π§ͺ Test Suite: Built-in test runner to verify core functionality, configuration, and individual detectors.
π¦ Installation
git clone https://github.com/NimeeshS/platypii.git
cd platypii
pip install -r requirements.txt
β οΈ Make sure you're using Python 3.7+.
π§ How It Works
PlatyPII works by orchestrating multiple detectors that each scan text for specific kinds of PII. It then applies a selected anonymization method to redact or transform the matched values. Architecture Overview:
Your Text β PIIEngine β Detectors β Anonymized Text
π§ͺ Running Tests
You can verify the functionality of the system using the provided test.py script:
python3 test.py
Test Coverage:
β
Basic PII detection + anonymization
β
Batch document processing
β
Configuration system
β
Individual detector testing (Regex + NLP)
π Usage Detecting and Masking PII
from platypii import detect_pii, mask_pii
text = "Email me at alice@example.com or call 123-456-7890."
matches = detect_pii(text)
for match in matches:
print(f"{match.pii_type}: {match.value} (confidence: {match.confidence:.2f})")
masked = mask_pii(text)
print(masked)
Using the PIIEngine
from platypii.core.engine import PIIEngine
engine = PIIEngine()
result = engine.process_text("SSN: 123-45-6789", anonymize=True, method="redact")
print(result["anonymized_text"])
π§ Configuration
The system is configurable via the Config class:
from platypii.config import Config
config = Config()
print(config.get("detection.confidence_threshold")) # Default threshold
config.set("detection.confidence_threshold", 0.9) # Update threshold
π§© Supported Anonymization Methods Method Description
mask Replaces characters with *
redact Replaces values with [REDACTED]
hash Replaces values with SHA-256 hash
replace Replaces with dummy placeholder values
synthetic Replaces with realistic synthetic data (optional)