A comprehensive, enterprise-grade platform for automated AI knowledge aggregation, curation, and publication. This project combines an Astro-based frontend with a sophisticated Python content pipeline to deliver high-quality, continuously updated AI knowledge resources.
- Node.js 18+ and npm 8+
- Python 3.9+ with pip
- PostgreSQL 14+
- Redis 7+
- Git
# Clone the repository (private - requires authentication)
git clone https://github.com/gianlucamazza/website_ai-knowledge.git
cd website_ai-knowledge
# Install dependencies
make install
# Set up environment
cp .env.example .env
# Edit .env with your configuration
# Initialize database
make db-setup
# Start development services
make dev
Access the site at http://localhost:4321
website_ai-knowledge/
├── apps/site/ # Astro frontend application
│ ├── src/
│ │ ├── components/ # Reusable UI components
│ │ ├── content/ # Content collections (articles, glossary)
│ │ ├── layouts/ # Page layout templates
│ │ └── pages/ # Route definitions
│ └── tests/ # Frontend tests
├── pipelines/ # Python content processing pipeline
│ ├── ingest/ # Content source ingestion
│ ├── normalize/ # Data cleaning and standardization
│ ├── dedup/ # Duplicate detection algorithms
│ ├── enrich/ # Content enhancement
│ ├── publish/ # Output generation
│ └── orchestrators/ # LangGraph workflow management
├── security/ # Security modules and compliance
├── tests/ # Python test suite
├── scripts/ # Automation and utility scripts
└── docs/ # Comprehensive documentation
- Automated Ingestion: Ethical web scraping with rate limiting and robots.txt compliance
- Duplicate Detection: Advanced SimHash and LSH algorithms with >98% accuracy
- Content Enrichment: AI-powered summarization, tagging, and cross-linking
- Quality Assurance: Multi-stage validation and schema compliance
- Workflow Orchestration: LangGraph-based pipeline management
- Static Site Generation: Optimized Astro-based site with excellent performance
- Content Collections: Zod-validated content with structured metadata
- Search & Navigation: Full-text search and intelligent content discovery
- Responsive Design: Mobile-first design with accessibility compliance
- SEO Optimization: Structured data and meta tag management
- Security: Zero-trust architecture with comprehensive input validation
- Monitoring: Prometheus metrics, structured logging, and alerting
- Scalability: Horizontal scaling with Kubernetes deployment
- Compliance: GDPR, copyright, and ethical AI compliance
- CI/CD: Automated testing, quality gates, and deployment pipelines
- Development Guide - Local development setup and workflow
- Deployment Guide - Production deployment procedures
- Architecture Overview - Comprehensive system architecture
- API Documentation - Pipeline API reference
- Code Standards - Code quality and style guidelines
- CI/CD Documentation - Continuous integration and deployment
- Monitoring Guide - System monitoring and alerting
- Troubleshooting - Common issues and solutions
- Maintenance Schedule - Regular maintenance tasks
- Security Overview - Security architecture and practices
- Incident Response - Security incident procedures
- Compliance Guide - Regulatory compliance procedures
- Framework: Astro 4.x with TypeScript
- Styling: Tailwind CSS
- Validation: Zod schemas
- Testing: Vitest, Playwright
- Language: Python 3.9+
- Orchestration: LangGraph
- Database: PostgreSQL 14+
- Cache: Redis 7+
- Processing: Celery, FastAPI
- Containerization: Docker
- Orchestration: Kubernetes
- CI/CD: GitHub Actions
- Monitoring: Prometheus, Grafana
- Logging: Structured JSON logging
- Duplicate Detection: >98% precision, <2% false positives
- Pipeline Processing: <30 minutes for full content refresh
- Site Build Time: <5 minutes for incremental builds
- API Response Time: <200ms (95th percentile)
- Uptime: >99.9% availability target
- Test Coverage: >95% code coverage requirement
- Security: Zero critical vulnerabilities policy
- Performance: Core Web Vitals compliance
- Accessibility: WCAG 2.1 AA compliance
- Code Quality: Automated linting and type checking
# Trigger content ingestion
make ingest
# Validate existing content
make validate
# Check for broken links
make link-check
# Run duplicate detection
make dedup-check
# Run all tests
make test
# Run specific test suite
make test-unit
make test-integration
make test-performance
# Code quality checks
make lint
make type-check
make security-check
# Deploy to staging
make deploy-staging
# Deploy to production (requires approval)
make deploy-production
# Rollback deployment
make rollback
We welcome contributions from the community. Please read our Contributing Guide for detailed information on:
- Code style and standards
- Development workflow
- Testing requirements
- Pull request process
- Issue reporting
- Documentation: Comprehensive guides in the
docs/
directory - Issue Tracking: GitHub Issues for bug reports and feature requests
- Security Issues: Report to security@example.com (not public issues)
This project is licensed under the MIT License. See LICENSE for details.
- OpenAI for GPT models used in content processing
- Anthropic for Claude models used in summarization
- The open-source community for the excellent tools and libraries
Project Status: Production Ready Last Updated: $(date) Documentation Version: 1.0.0