Utility functions for Unicode codepoints and grapheme clusters.
π₯ Built for power. Designed for speed. Ready for production. π₯
-
Quick Start π
If you have not already added the library to your project, please review the installation guide for more information.
const unicode = @import("io").unicode;
The unicode
module provides powerful utilities for handling Unicode codepoints and grapheme clusters. Letβs explore some of its features.
const ZWJ_array : [3]u8 = .{0xE2, 0x80, 0x8D};
const MOD_array : [2]u8 = .{0xCC, 0x81};
// Initialize a codepoint.
const codepoint = try unicode.Codepoint.init("A");
const codepointZWJ = try unicode.Codepoint.init(&ZWJ_array);
const codepointMOD = try unicode.Codepoint.init(&MOD_array);
// Get the mode of the codepoint.
_ = codepoint.mode; // π .None
_ = codepointZWJ.mode; // π .ZWJ
_ = codepointMOD.mode; // π .Mod
// Get the length of the codepoint.
_ = codepoint.len; // π 1
_ = codepointZWJ.len; // π 3
_ = codepointMOD.len; // π 2
// Initialize an iterator for codepoints/graphemeClusters.
var it = try unicode.Iterator.init("AΨ£δ½ πβΉοΈπ¨βπ");
// Get the next codepoints/graphemeClusters slice.
_ = it.nextGraphemeClusterSlice(); // π "A"
_ = it.nextGraphemeClusterSlice(); // π "Ψ£"
_ = it.nextGraphemeClusterSlice(); // π "δ½ "
_ = it.nextGraphemeClusterSlice(); // π "π"
_ = it.nextGraphemeClusterSlice(); // π "βΉοΈ"
_ = it.nextGraphemeClusterSlice(); // π "π¨βπ"
while (it.<function>()) |<res>| {
std.debug.print("<resTag>: {}\n", .{<res>});
}
API
-
π Codepoint
-
𧩠Fields
Field |
Type |
Description |
mode |
Mode |
The mode of the codepoint |
len |
usize |
The length of the codepoint |
-
β¨ Initialization
Function |
Description |
init |
Initializes a Codepoint instance with the specified slice. |
-
β»οΈ Iterator
-
𧩠Fields
Field |
Type |
Description |
src |
[]const u8 |
The input slice to iterate over |
pos |
usize |
The current position of the iterator |
-
β¨ Initialization
Function |
Description |
init |
Initializes an Iterator with the given input slice. |
initUnchecked |
Initializes an Iterator with the given input slice without validation. |
nextCodepointSlice |
Returns the next codepoint slice and advances the iterator. |
nextGraphemeClusterSlice |
Returns the next grapheme cluster slice and advances the iterator. |
next |
Decodes and returns the next codepoint and advances the iterator. |
peek |
Decodes and returns the next codepoint without advancing the iterator. |
-
β»οΈ Methods
Function |
Description |
nextCodepointSlice |
Retrieves the next codepoint slice and advances the iterator. |
nextGraphemeClusterSlice |
Retrieves the next grapheme cluster slice and advances the iterator. |
next |
Decodes and returns the next codepoint and advances the iterator. |
peek |
Decodes and returns the next codepoint without advancing the iterator. |
-
π₯ Utils
-
π Codepoint
Function |
Description |
getLengthOfStartByte |
Returns length of the codepoint depending on the first byte. |
getFirstCodepointSlice |
Returns the first codepoint slice. |
getFirstCodepoint |
Returns the first codepoint. |
getLastCodepointSlice |
Returns the last codepoint slice. |
getLastCodepoint |
Returns the last codepoint. |
-
β Grapheme Cluster
Function |
Description |
getFirstGraphemeClusterSlice |
Returns the first grapheme cluster slice. |
getLastGraphemeClusterSlice |
Returns the last grapheme cluster slice. |
-
π Position
Function |
Description |
getRealPosition |
Returns the real position in the array based on the visual position. |
getVisualPosition |
Returns the visual position in the array based on the real position. |
-
π« More
Function |
Description |
Utf8Validate |
Returns true if the input consists entirely of UTF-8 codepoints. |
Utf8Decode |
Decodes a UTF-8 codepoint slice into a codepoint value. |
-
- Chars
Utility functions for char arrays.
- Viewer
Immutable fixed-size string type that supports unicode.
- String
Managed dynamic-size string type that supports unicode.
- Buffer
Mutable fixed-size string type that supports unicode.
- uString
Unmanaged dynamic-size string type that supports unicode.