A lightweight, intuitive wrapper around Intl.Segmenter for seamless segment-aware string operations in TypeScript and JavaScript.
- Intuitive Intl.SegmenterWrapper: Simplifies text segmentation with a clean API.
- Standards-Based: Built on native Intl.Segmenterfor robust compatibility.
- Lightweight & Tree-Shakeable: Minimal footprint with optimal bundling.
- Highly Performant: Uses iterators for efficient, on-demand processing.
- Full TypeScript Support: Strict types for safe, predictable usage.
npm install segment-stringsegment-string is a lightweight wrapper for Intl.Segmenter, designed to simplify locale-sensitive text segmentation in JavaScript and TypeScript. It lets you easily segment and manipulate text by graphemes, words, or sentences, ideal for handling complex cases like multi-character emojis or language-specific boundaries.
import { SegmentString } from "segment-string";
const str = new SegmentString("Hello, world! π©βπ©βπ§βπ¦ππ");
// Segment by grapheme
console.log([...str.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', 'π©βπ©βπ§βπ¦', 'π', 'π']The SegmentString class encapsulates a string and provides methods for segmentation, counting, and retrieving segments at specified indices with locale and granularity options.
new SegmentString(str: string, locales?: Intl.LocalesArgument);- str: The string to segment.
- locales: Optional locales argument for segmentation.
segments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Iterable<string>
Segments the string by the specified granularity and returns the segments as strings.
rawSegments(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.Segments | Iterable<Intl.SegmentData>
Returns raw Intl.SegmentData objects based on granularity and options.
segmentCount(granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): number
Counts segments in the string based on the specified granularity.
segmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): string | undefined
Retrieves the segment at a specific index, supporting negative indices.
rawSegmentAt(index: number, granularity: Granularity, options?: SegmentationOptions | WordSegmentationOptions): Intl.SegmentData | undefined
Returns the raw segment data at a specific index, supporting negative indices.
Returns an iterable of grapheme segments as strings.
Returns an iterable of raw grapheme segments.
Counts grapheme segments in the string.
Returns the grapheme at a specific index, supporting negative indices.
Returns the raw grapheme data at a specific index, supporting negative indices.
Returns an iterable of word segments, with optional filtering for word-like segments.
Returns an iterable of raw word segments, with optional filtering for word-like segments.
Counts word segments in the string.
Returns the word at a specific index, supporting negative indices.
Returns the raw word data at a specific index, supporting negative indices.
Returns an iterable of sentence segments.
Returns an iterable of raw sentence segments.
Counts sentence segments in the string.
Returns the sentence at a specific index, supporting negative indices.
Returns the raw sentence data at a specific index, supporting negative indices.
Returns an iterator over the graphemes of the string.
import { SegmentString } from "segment-string";
const text = new SegmentString("Hello, world! π©βπ©βπ§βπ¦ππ");
// Segmenting by words
for (const word of text.words()) {
	console.log(word); // 'Hello', ',', ' ', 'world', '!', ' π©βπ©βπ§βπ¦ππ'
}
// Segmenting graphemes and counting
console.log([...text.graphemes()]); // ['H', 'e', 'l', 'l', 'o', ',', ' ', 'w', 'o', 'r', 'l', 'd', '!', ' ', 'π©βπ©βπ§βπ¦', 'π', 'π']
console.log("Grapheme count:", text.graphemeCount()); // 17
console.log("String length:", text.toString().length); // 29
// Accessing a specific word
const secondWord = text.wordAt(1, { isWordLike: true }); // 'world'
console.log(secondWord);Alternatively, the SegmentSplitter class allows you to create an instance that can be directly used with JavaScript's String.prototype.split method for basic segmentation.
new SegmentSplitter<T extends Granularity>(granularity: T, options?: SegmentationOptions<T>);- granularity: Specifies the segmentation granularity level ('grapheme','word','sentence', etc.).
- options: Optional settings to customize the segmentation for the given granularity.
const str = "Hello, world!";
const wordSplitter = new SegmentSplitter("word", { isWordLike: true });
const words = str.split(wordSplitter);
console.log(words); // ["Hello", "world"]function getRawSegments(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Intl.Segments | Iterable<Intl.SegmentData>;- Description: Returns raw Intl.SegmentDataobjects based on granularity and options.
- Parameters:
- str: The string to segment.
- granularity: Specifies the segmentation level (- 'grapheme',- 'word', or- 'sentence').
- options: Includes- localesfor specifying locale and- isWordLikefor filtering word-like segments.
 
- Returns: An iterable of raw Intl.SegmentData.
function getSegments(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Iterable<string>;- Description: Returns segments of the string as plain strings.
- Parameters: Similar to getRawSegments.
- Returns: An iterable of segments as strings.
function segmentCount(
	str: string,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): number;- Description: Returns the count of segments based on granularity and options.
- Parameters: Similar to getRawSegments.
- Returns: Number of segments.
function rawSegmentAt(
	str: string,
	index: number,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): Intl.SegmentData | undefined;- Description: Returns the raw segment data at a specified index, supporting negative indices.
- Parameters: Similar to getRawSegments, plus anindexparameter.
- Returns: The Intl.SegmentDataat the specified index, orundefinedif out of bounds.
function segmentAt(
	str: string,
	index: number,
	granularity: Granularity,
	options?: SegmentationOptions | WordSegmentationOptions,
): string | undefined;- Description: Returns the segment at a specified index, supporting negative indices.
- Parameters: Similar to getRawSegments, plus anindexparameter.
- Returns: The segment at the specified index or undefinedif out of bounds.
function filterRawWordLikeSegments(
	segments: Intl.Segments,
): Iterable<Intl.SegmentData>;- Description: Filters and returns an iterable of raw word-like segment data where isWordLikeis true.
- Parameters:
- segments: The segments to filter.
 
- Returns: An iterable of Intl.SegmentDatafor each word-like segment.
function filterWordLikeSegments(segments: Intl.Segments): Iterable<string>;- Description: Filters and returns an iterable of word-like segments as strings where isWordLikeis true.
- Parameters:
- segments: The segments to filter.
 
- Returns: An iterable of strings for each word-like segment.
π This package was templated with
create-typescript-app.