Skip to content

PlatyPII is a modular, extensible Python framework for detecting and anonymizing Personally Identifiable Information

License

Notifications You must be signed in to change notification settings

NimeeshS/platypii

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

18 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🦫 PlatyPII

PlatyPII is a modular, extensible Python framework for detecting and anonymizing Personally Identifiable Information (PII) in unstructured text. PlatyPII makes it easy to integrate PII detection into your applications.

πŸš€ Features

πŸ” PII Detection: Detects emails, phone numbers, names, addresses, SSNs, and more.

🧠 Hybrid Detection Engine: Combines rule-based regex detectors with NLP-based detectors (spaCy).

πŸ›‘ Anonymization Options: Supports multiple anonymization methods: mask, redact, hash, replace, and synthetic.

βš™οΈ Configurable Settings: Fine-tune detection thresholds, enable/disable PII types, and customize output.

πŸ“¦ Batch Processing: Run detection and anonymization across multiple documents at once.

πŸ§ͺ Test Suite: Built-in test runner to verify core functionality, configuration, and individual detectors.

πŸ“¦ Installation

git clone https://github.com/NimeeshS/platypii.git

cd platypii

pip install -r requirements.txt

⚠️ Make sure you're using Python 3.7+.

🧠 How It Works

PlatyPII works by orchestrating multiple detectors that each scan text for specific kinds of PII. It then applies a selected anonymization method to redact or transform the matched values. Architecture Overview:

Your Text β†’ PIIEngine β†’ Detectors β†’ Anonymized Text

πŸ§ͺ Running Tests

You can verify the functionality of the system using the provided test.py script:

python3 test.py

Test Coverage:

βœ… Basic PII detection + anonymization

βœ… Batch document processing

βœ… Configuration system

βœ… Individual detector testing (Regex + NLP)

πŸ›  Usage Detecting and Masking PII

from platypii import detect_pii, mask_pii

text = "Email me at alice@example.com or call 123-456-7890."
matches = detect_pii(text)

for match in matches:
    print(f"{match.pii_type}: {match.value} (confidence: {match.confidence:.2f})")

masked = mask_pii(text)
print(masked)

Using the PIIEngine

from platypii.core.engine import PIIEngine

engine = PIIEngine()
result = engine.process_text("SSN: 123-45-6789", anonymize=True, method="redact")

print(result["anonymized_text"])

πŸ”§ Configuration

The system is configurable via the Config class:

from platypii.config import Config

config = Config()
print(config.get("detection.confidence_threshold"))  # Default threshold
config.set("detection.confidence_threshold", 0.9)    # Update threshold

🧩 Supported Anonymization Methods Method Description

mask Replaces characters with *
redact Replaces values with [REDACTED]
hash Replaces values with SHA-256 hash
replace Replaces with dummy placeholder values
synthetic Replaces with realistic synthetic data (optional)

About

PlatyPII is a modular, extensible Python framework for detecting and anonymizing Personally Identifiable Information

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages