π WebMD (Web to Markdown) transforms web pages into clean, portable Markdown documents with surgical precision. This Chrome extension extracts content intelligently, preserves formatting perfectly, and exports to GitHub Flavored Markdown instantly.
Press Cmd+Shift+K (Mac) or Ctrl+Shift+K (Windows/Linux) to diagnose any webpage and get the cure: clean, portable Markdown.
webmd-demo.mov
- π Smart Article Extraction: Uses Mozilla's Readability.js to intelligently identify and extract main content
- π Surgical Precision: Preserves formatting with perfect fidelity:
- Headings (H1-H6)
- Lists (ordered, unordered, nested)
- Tables with proper alignment
- Code blocks with syntax highlighting hints
- Links, images, and embedded content
- π― GitHub Flavored Markdown: Full support for:
- Tables with column alignment
- Task lists with checkboxes
- Strikethrough text
- Fenced code blocks
- π Metadata Extraction: Automatically captures:
- Page title and author
- Publication date (when available)
- Source URL
- Site name
- β‘ Instant Actions:
- One-click copy to clipboard
- Download as
.mdfile - Auto-copy option available
- π Privacy-First: All processing happens locally in your browser - no data is ever sent to external servers
- Clone or download this repository
- Run the setup script to download dependencies:
./setup.sh
- Create or convert the icon files to PNG format
- Open Chrome and navigate to
chrome://extensions/ - Enable "Developer mode" in the top right
- Click "Load unpacked" and select this directory
- The extension is now installed!
- Navigate to any web page
- Press Cmd+Shift+K (Mac) or Ctrl+Shift+K (Windows/Linux)
- A new tab opens with the converted Markdown
- Click "Copy to Clipboard" or "Download" as needed
Alternatively, click the extension icon and press "Convert Current Page".
WebMD uses a sophisticated two-stage approach to convert web content:
-
π Initial Diagnosis: Examines the page structure using Mozilla's Readability.js
- Identifies article content vs. navigation/ads
- Extracts metadata and authorship information
- Determines the main content boundaries
-
βοΈ Content Extraction: Surgically removes the main content from surrounding clutter
- Preserves semantic HTML structure
- Removes scripts, styles, and hidden elements
- Maintains content hierarchy and relationships
-
π¨ Intelligent Fallback: For non-article pages, performs smart full-page conversion
- Searches for main content areas (main, article, [role="main"])
- Falls back to body content when needed
- Cleans up navigation and footer elements
-
π Treatment: Applies Turndown.js with GitHub Flavored Markdown
- Custom rules for code block preservation
- Smart table formatting
- Maintains link references
Access settings through the extension popup:
- Use Readability for articles: Enable/disable article extraction
- Include metadata: Add YAML frontmatter with page metadata
- Auto-copy to clipboard: Automatically copy result when conversion completes
- Manifest V3: Built on Chrome's latest extension platform for security and performance
- Service Worker: Background script handles commands and tab management
- Content Scripts: Injected on-demand for better performance and compatibility
- Programmatic Injection: Works on all pages, including those loaded before installation
- Content Security: Respects page CSP policies
- Local Processing: All conversion happens in your browser
- No External Requests: Zero network traffic for conversion
- No Tracking: No analytics or telemetry
- Lazy Loading: Scripts injected only when needed
- Efficient Processing: Handles large documents without UI blocking
- Memory Management: Cleans up resources after conversion
- Optimized Rendering: Fast display of converted content
- Readability.js (v0.4.4) - Article extraction
- Turndown (v7.1.2) - HTML to Markdown conversion
- Turndown GFM Plugin (v1.0.2) - GitHub Flavored Markdown support
We welcome contributions! Please see our Contributing Guide for details on:
- Setting up the development environment
- Code style and standards
- Submitting pull requests
- Reporting issues
This project is licensed under the MIT License - see the LICENSE file for details.
- Mozilla Readability for intelligent content extraction
- Turndown for excellent HTML to Markdown conversion
- Turndown GFM Plugin for GitHub Flavored Markdown support
Made with β€οΈ for better web content portability