UTF-8
When simplicity meets efficiency
🍃 Zero dependencies—meticulously crafted code.
🚀 Blazing fast—almost as fast as light!
🌍 Universal compatibility—Windows, Linux, and macOS.
🛡️ Battle-tested—ready for production.
If you have not already added the library to your project, please review the installation guide for more information.
const utf8 = @import("io").string.utils.utf8;
Convert slice to codepoint
_ = utf8.decode("🌟").?; // 👉 0x1F31F
Convert codepoint to slice
var buf: [4]u8 = undefined; // 👉 "🌟"
_ = utf8.encode(0x1F31F, &buf).?; // 👉 4
Get codepoint length
_ = utf8.getCodepointLength(0x1F31F); // 👉 4
Get UTF-8 sequence length
_ = utf8.getCodepointLength("🌟"[0]); // 👉 4
Function | Return | Description |
---|---|---|
encode | u3 |
Encode a single Unicode codepoint to UTF-8 sequence , Returns the number of bytes written. |
decode | u21 |
Decode a UTF-8 sequence to a Unicode codepoint , Returns the decoded codepoint. |
Function | Return | Description |
---|---|---|
getCodepointLength | u3 |
Returns the number of bytes (1-4 ) needed to encode a codepoint in UTF-8 format. |
getCodepointLengthOrNull | ?u3 |
Returns the number of bytes (1-4 ) needed to encode a codepoint in UTF-8 format if valid. |
getSequenceLength | u3 |
Returns the number of bytes (1-4 ) in a UTF-8 sequence based on the first byte. |
getSequenceLengthOrNull | ?u3 |
Returns the number of bytes (1-4 ) in a UTF-8 sequence based on the first byte if valid. |
Function | Return | Description |
---|---|---|
isValidSlice | bool |
Returns true if the provided slice contains valid UTF-8 sequence . |
isValidCodepoint | bool |
Returns true if the provided code point is valid for UTF-8 encoding . |
A quick summary with sample performance test results between
SuperZIG
.io
.string
.utils
.utf8
implementations and its popular competitors.
std.unicode
In summary,
io
is faster by 5 times compared tostd
in most cases, thanks to its optimized implementation. ✨
zig build run --release=safe -- utf8
)Benchmark | Runs | Total Time | Avg Time | Speed |
---|---|---|---|---|
std_x10 | 100000 | 92.7ms | 927ns | x1.00 |
io_x10 | 100000 | 31.9ms | 319ns | x2.91 |
std_x100 | 21485 | 1.959s | 91.188us | x1.00 |
io_x100 | 96186 | 1.997s | 20.768us | x4.39 |
std_x1000 | 218 | 2.067s | 9.482ms | x1.00 |
io_x1000 | 961 | 1.87s | 1.946ms | x4.87 |
zig build run --release=fast -- utf8
)Benchmark | Runs | Total Time | Avg Time | Speed |
---|---|---|---|---|
std_x10 | 100000 | 102.6ms | 1.026us | x1.00 |
io_x10 | 100000 | 29.1ms | 291ns | x3.53 |
std_x100 | 20653 | 1.915s | 92.771us | x1.00 |
io_x100 | 100000 | 1.796s | 17.962us | x5.16 |
std_x1000 | 232 | 2.028s | 8.742ms | x1.00 |
io_x1000 | 1176 | 2.07s | 1.76ms | x4.96 |
It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.
The benchmarks were run on a Windows 11 v24H2 with 11th Gen Intel® Core™ i5-1155G7 × 8 processor and 32GB of RAM.
The version of zig used is 0.14.0.
The source code of this benchmark bench/string/utils/utf8.zig.