Skip to content

v3.2.0

Choose a tag to compare

@umarbutler umarbutler released this 20 Mar 04:46
· 17 commits to main since this release

Changed

  • Significantly improved the quality of chunks produced when chunking with low chunk sizes or documents with minimal varying levels of whitespace by adding a new rule to the semchunk algorithm that prioritizes splitting at the occurrence of single whitespace characters preceded by hierarchically meaningful non-whitespace characters over splitting at all single whitespace characters in general (#17).