Utility functions for Unicode codepoints and grapheme clusters.
Provides powerful utilities for handling Unicode codepoints and grapheme clusters.
Matches the speed of Zigβs standard library and outperforms competitors in benchmarks.
Every function is rigorously tested, making the library safe, reliable, and ready for production.
Designed with efficiency in mind, avoiding unnecessary allocations while maintaining flexibility.
πΉ π Quick Start β A quick guide to get you started with the library.
πΉ π API Reference β Detailed documentation of available functions.
If you have not already added the library to your project, please review the installation guide for more information.
const unicode = @import("io").unicode;
The
unicode
module provides powerful utilities for handling Unicode codepoints and grapheme clusters. Letβs explore some of its features.
const ZWJ_array : [3]u8 = .{0xE2, 0x80, 0x8D};
const MOD_array : [2]u8 = .{0xCC, 0x81};
// Initialize a codepoint.
const codepoint = try unicode.Codepoint.init("A");
const codepointZWJ = try unicode.Codepoint.init(&ZWJ_array);
const codepointMOD = try unicode.Codepoint.init(&MOD_array);
// Get the mode of the codepoint.
_ = codepoint.mode; // π .None
_ = codepointZWJ.mode; // π .ZWJ
_ = codepointMOD.mode; // π .Mod
// Get the length of the codepoint.
_ = codepoint.len; // π 1
_ = codepointZWJ.len; // π 3
_ = codepointMOD.len; // π 2
// Initialize an iterator for codepoints/graphemeClusters.
var it = try unicode.Iterator.init("AΨ£δ½ πβΉοΈπ¨βπ");
// Get the next codepoints/graphemeClusters slice.
_ = it.nextGraphemeClusterSlice(); // π "A"
_ = it.nextGraphemeClusterSlice(); // π "Ψ£"
_ = it.nextGraphemeClusterSlice(); // π "δ½ "
_ = it.nextGraphemeClusterSlice(); // π "π"
_ = it.nextGraphemeClusterSlice(); // π "βΉοΈ"
_ = it.nextGraphemeClusterSlice(); // π "π¨βπ"
while (it.<function>()) |<res>| {
std.debug.print("<resTag>: {}\n", .{<res>});
}
// and much more . . !
Index |
---|
Codepoint |
Iterator |
Utils |
Field | Type | Description |
---|---|---|
mode |
Mode |
The mode of the codepoint |
len |
usize |
The length of the codepoint |
Function | Description |
---|---|
init | Initializes a Codepoint instance with the specified slice. |
Field | Type | Description |
---|---|---|
src |
[]const u8 |
The input slice to iterate over |
pos |
usize |
The current position of the iterator |
Function | Description |
---|---|
init | Initializes an Iterator with the given input slice. |
initUnchecked | Initializes an Iterator with the given input slice without validation. |
nextCodepointSlice | Returns the next codepoint slice and advances the iterator. |
nextGraphemeClusterSlice | Returns the next grapheme cluster slice and advances the iterator. |
next | Decodes and returns the next codepoint and advances the iterator. |
peek | Decodes and returns the next codepoint without advancing the iterator. |
Function | Description |
---|---|
nextCodepointSlice | Retrieves the next codepoint slice and advances the iterator. |
nextGraphemeClusterSlice | Retrieves the next grapheme cluster slice and advances the iterator. |
next | Decodes and returns the next codepoint and advances the iterator. |
peek | Decodes and returns the next codepoint without advancing the iterator. |
Function | Description |
---|---|
getLengthOfStartByte | Returns length of the codepoint depending on the first byte. |
getFirstCodepointSlice | Returns the first codepoint slice. |
getFirstCodepoint | Returns the first codepoint. |
getLastCodepointSlice | Returns the last codepoint slice. |
getLastCodepoint | Returns the last codepoint. |
Function | Description |
---|---|
getFirstGraphemeClusterSlice | Returns the first grapheme cluster slice. |
getLastGraphemeClusterSlice | Returns the last grapheme cluster slice. |
Function | Description |
---|---|
getRealPosition | Returns the real position in the array based on the visual position. |
getVisualPosition | Returns the visual position in the array based on the real position. |
Function | Description |
---|---|
Utf8Validate | Returns true if the input consists entirely of UTF-8 codepoints. |
Utf8Decode | Decodes a UTF-8 codepoint slice into a codepoint value. |