Skip to content

Conversation

nicolasiscoding
Copy link
Member

Summary

Fixes #99 - Images with width/height HTML attributes now render at the specified dimensions instead of always using actual image size.

Problem

When HTML contains images with explicit width/height attributes (common from WYSIWYG editors like TinyMCE), these dimensions were being ignored in favor of the actual image dimensions.

Solution

  • Added checks for vNode.properties.width and vNode.properties.height
  • Use HTML-specified dimensions when present
  • Fall back to actual image dimensions when attributes are not specified
  • Also fixed ESLint configuration to prevent parent repo config interference

Example

<img src="image.png" width="100" height="100">

Will now render as 100x100 in the DOCX, regardless of the actual image size.

Changes

  • Modified buildImage function in src/helpers/render-document-file.js
  • Added root: true to .eslintrc.json to isolate submodule ESLint config

Testing

  • Test with images that have width/height attributes
  • Test with images without dimensions (should use actual size)
  • Verify no regression in existing functionality

🤖 Generated with Claude Code

@nicolasiscoding nicolasiscoding self-assigned this Aug 24, 2025
@nicolasiscoding nicolasiscoding added the bug Something isn't working label Aug 24, 2025
- `preProcessing` <?[Object]>
- `skipHTMLMinify` <?[Boolean]> flag to skip minification of HTML. Defaults to `false`.
- `imageProcessing` <?[Object]>
- `maxRetries` <?[Number]> maximum number of retry attempts for failed image downloads. Defaults to `2`.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the typescript typings

@nicolasiscoding nicolasiscoding force-pushed the fix/honor-html-image-dimensions branch from 684762c to f030781 Compare September 12, 2025 17:12
@Srajan-Sanjay-Saxena Srajan-Sanjay-Saxena marked this pull request as ready for review September 15, 2025 16:06
@nicolasiscoding nicolasiscoding force-pushed the fix/honor-html-image-dimensions branch from 96d6007 to c5eccae Compare September 26, 2025 13:20
nicolasiscoding and others added 14 commits September 26, 2025 11:41
- Check for width/height attributes in vNode.properties
- Use HTML-specified dimensions when available
- Fall back to actual image dimensions when not specified
- Particularly important for WYSIWYG editors like TinyMCE
- Added root:true to ESLint config to prevent parent config interference

Fixes #99

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add HTML attribute processing in computeImageDimensions function
- HTML attributes without units default to pixels (e.g., width="100" → "100px")
- Support aspect ratio preservation when only width or height is specified
- Fix fallback logic that was overriding HTML attributes with original dimensions
- Add comprehensive test cases in example files

Fixes issue where TinyMCE image width/height attributes were ignored,
causing all images to render at original size instead of specified dimensions.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extend calculateAbsoluteValues to support all units: px, pt, cm, in, %
- Update HTML attribute processing to detect all unit types
- Add comprehensive test cases covering all supported units:
  * Explicit pixel units (180px x 90px)
  * Point units (144pt x 72pt)
  * Centimeter units (4cm x 2cm)
  * Inch units (1.5in x 0.75in)
  * Percentage units (10% x 10%)
  * Mixed units (3cm width, 1in height)

This ensures full compatibility with TinyMCE and other rich text editors
that may specify dimensions in various measurement units.

Test cases added to both example-node.js and example.js files for
comprehensive validation of unit conversion accuracy.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add validation to detect when image URLs return HTML error pages instead of image data
- Add comprehensive buffer validation before calling sizeOf function
- Prevent "unsupported file type: undefined" errors with proper error detection
- Add graceful handling for invalid/corrupted image responses
- Provide clearer error messages for debugging image processing issues

This fixes the cryptic "unsupported file type: undefined (file: undefined)"
errors by detecting when URLs return HTML error pages (common with Wikimedia)
and providing meaningful error messages instead.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…d rate limiting

- Implement in-memory image cache with Map to store successful downloads
- Cache both successful downloads and failures to prevent retry spam within same document
- Clear cache between document generations to allow retry of failed URLs in new runs
- Add comprehensive cache statistics and logging for monitoring performance
- Prevent rate limiting by avoiding duplicate downloads of same image URLs
- Smart retry logic: cache failures per document, but allow retries across documents

Cache statistics show significant performance improvement:
- Only unique URLs are downloaded once per document generation
- Duplicate image references use cached data instantly
- Failed downloads are cached to prevent retry storms within same document
- Fresh attempts allowed for new document generations

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add comprehensive buffer validation before sizeOf calls in all locations
- Add HTML response detection to prevent "unsupported file type" errors
- Replace crashes with graceful error handling and continue processing
- Add retry mechanism with exponential backoff for image downloads
- Implement intelligent image caching to prevent rate limiting
- Clear cache between document generations to allow fresh retry attempts
- Add detailed logging for debugging image processing issues

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add imageProcessing config section to defaultDocumentOptions in constants.js
- Make maxRetries configurable (default: 2) via documentOptions.imageProcessing.maxRetries
- Add verboseLogging option (default: false) for conditional debug output
- Remove duplicate constants - all reference centralized defaults from constants.js
- Update buildImage, convertVTreeToXML, and findXMLEquivalent to accept imageOptions
- Replace console.log with conditional logVerbose helper function
- Add comprehensive test script demonstrating configuration options
- Users can now configure: { imageProcessing: { maxRetries: 3, verboseLogging: true } }

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add data URL detection in buildParagraph functions to handle cached images
- Prevent redundant imageToBase64 calls on already-processed data URLs
- Remove duplicate data URL parsing logic in buildParagraph
- Both buildParagraph sections now check if imageSource starts with 'data:'
- Resolves issue where cached images were being reprocessed causing sizeOf errors
- Images are now processed consistently across all code paths

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Document maxRetries and verboseLogging configuration options
- Add practical TypeScript example showing image processing configuration
- Include options in main API documentation section
- Provide clear defaults and usage examples for new image processing features

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- buildImage now updates vNode.properties.src with cached/converted data URLs
- Eliminates dual processing of the same image through different code paths
- Prevents buildParagraph from reprocessing already-cached images
- Resolves "unsupported file type: undefined" errors completely
- All image processing paths now see consistent, processed data URLs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add imageProcessing interface with maxRetries and verboseLogging properties
- Keep preprocessing lowercase to match actual implementation
- TypeScript definitions now correctly reflect the implementation in constants.js

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…ge processing

ISSUE 1 - Duplicate Buffer Creation (render-document-file.js):
- BEFORE: Buffer.from(response.fileContent, 'base64') called twice
  * For ZIP file creation
  * For image size analysis
- AFTER: Create imageBuffer once, reuse for both operations
- IMPACT: Reduces memory allocation and CPU usage for base64 decoding

ISSUE 2 - Code Duplication (render-document-file.js):
- BEFORE: Identical lineRule attribute setting code in two locations
  * figure > img case
  * direct img case
- AFTER: Extract into addLineRuleToImageFragment() helper function
- IMPACT: DRY principle, single source of truth for lineRule logic

ISSUE 3 - Double Regex Execution (docx-document.js):
- BEFORE: matches[1].match(/\/(.*?)$/) called twice in same expression
- AFTER: Execute once, store in mimeTypePart variable, reuse result
- IMPACT: Eliminates redundant regex execution and potential null reference

Applied to Nicolas's html-honor branch with retry/caching functionality intact.
…controls

- Replace image-to-base64 library with axios for better control over HTTP requests
- Add configurable timeout (default 5s) to prevent hung image downloads
- Add maximum image size limits (default 10MB) to prevent memory issues
- Implement exponential backoff on timeouts for retry attempts
- Add proper error handling for HTTP status codes and network errors
- Update TypeScript definitions and documentation

Security improvements:
- Prevents DoS attacks from slow/unresponsive image servers
- Bounded resource usage with size and timeout limits
- Better error categorization (timeout vs network vs HTTP errors)

DRY improvements:
- Extracted downloadImageToBase64 to src/utils/image.js
- Eliminated duplicate function definitions between helper files
- Added input validation for timeout/size parameters
- Enhanced error messages with structured logging

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@nicolasiscoding nicolasiscoding force-pushed the fix/honor-html-image-dimensions branch from 8155dcd to 7fda70e Compare September 26, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants