Codepoint
When simplicity meets efficiency
🍃 Zero dependencies—meticulously crafted code.
🚀 Blazing fast—almost as fast as light!
🌍 Universal compatibility—Windows, Linux, and macOS.
🛡️ Battle-tested—ready for production.
If you have not already added the library to your project, please review the installation guide for more information.
const codepoint = @import("io").string.utils.codepoint;
Initializes a Codepoint from a Codepoint or UTF-8 slice.
_ = codepoint.init(0x1F31F).?; // 👉 .{ .src = 0x1F31F, .len = 4 }
_ = codepoint.fromUtf8("🌟").?; // 👉 .{ .src = 0x1F31F, .len = 4 }
Iterate over a Codepoint or UTF-8 slice.
var iter = codepoint.Utf8Iterator.init("..").?; // 👉 .{ .src = "..", .pos = 0 }
while(iter.nextSlice()) |slice| { .. }
while(iter.nextCodepoint()) |cp| { .. }
Field | Type | Description |
---|---|---|
src |
u21 |
Numeric value of the Unicode codepoint (U+0000 to U+10FFFF). |
len |
u3 |
Length of this codepoint in UTF-8 (1-4 bytes). |
Function | Return | Description |
---|---|---|
init | ?Self |
Initializes a Codepoint from a Unicode codepoint value if valid. |
unsafe_init | Self |
Initializes a Codepoint from a Unicode codepoint value. |
fromUtf8 | ?Self |
Initializes a Codepoint from a UTF-8 encoded slice if valid. |
unsafe_fromUtf8 | Self |
Initializes a Codepoint from a UTF-8 encoded slice . |
Field | Type | Description |
---|---|---|
src |
[]const u8 |
The UTF-8 encoded string that the iterator will traverse. |
pos |
usize |
The current byte position in the string. |
Function | Return | Description |
---|---|---|
init | ?Self |
Initializes a new Utf8Iterator from the given UTF-8 slice if valid. |
unsafe_init | Self |
Initializes a new Utf8Iterator from the given UTF-8 slice . |
Function | Return | Description |
---|---|---|
nextCodepoint | ?Codepoint |
Returns the next Codepoint and increments the position. |
nextSlice | ?Codepoint |
Returns the next UTF-8 slice and increments the position. |
nextLength | ?Codepoint |
Returns the next Codepoint length and increments the position. |
Function | Return | Description |
---|---|---|
peekCodepoint | ?Codepoint |
Returns the next Codepoint without incrementing the position. |
peekSlice | ?Codepoint |
Returns the next UTF-8 slice without incrementing the position. |
peekLength | ?Codepoint |
Returns the next Codepoint length without incrementing the position. |
A quick summary with sample performance test results between
SuperZIG
.io
.string
.utils
.codepoint
implementations and its popular competitors.
std.unicode
In summary,
io
is faster by 5 times compared tostd
in most cases, thanks to its optimized implementation. ✨
zig build run -- codepoint
)Benchmark | Runs | Total Time | Avg Time | Speed |
---|---|---|---|---|
std_x10 | 100000 | 87.4ms | 874ns | x1.00 |
io_x10 | 100000 | 65.6ms | 656ns | x1.33 |
std_x100 | 23412 | 2.108s | 90.082us | x1.00 |
io_x100 | 46583 | 1.952s | 41.918us | x2.15 |
std_x1000 | 234 | 2.061s | 8.81ms | x1.00 |
io_x1000 | 457 | 2.1s | 4.596ms | x1.92 |
zig build run --release=fast -- codepoint
)Benchmark | Runs | Total Time | Avg Time | Speed |
---|---|---|---|---|
std_x10 | 100000 | 84.9ms | 849ns | x1.00 |
io_x10 | 100000 | 22ms | 220ns | x3.86 |
std_x100 | 25531 | 1.967s | 77.053us | x1.00 |
io_x100 | 100000 | 1.56s | 15.608us | x4.94 |
std_x1000 | 263 | 2.107s | 8.012ms | x1.00 |
io_x1000 | 1233 | 1.966s | 1.594ms | x5.02 |
It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.
The benchmarks were run on a Windows 11 v24H2 with 11th Gen Intel® Core™ i5-1155G7 × 8 processor and 32GB of RAM.
The version of zig used is 0.14.0.
The source code of this benchmark bench/string/utils/codepoint.zig.