UTF-8
When simplicity meets efficiency
🍃 Zero dependencies—meticulously crafted code.
🚀 Blazing fast—almost as fast as light!
🌍 Universal compatibility—Windows, Linux, and macOS.
🛡️ Battle-tested—ready for production.
If you have not already added the library to your project, please review the installation guide for more information.
const utf8 = @import("io").string.utils.utf8;
Convert slice to codepoint
_ = utf8.decode("🌟").?; // 👉 0x1F31F
Convert codepoint to slice
var buf: [4]u8 = undefined; // 👉 "🌟"
_ = utf8.encode(0x1F31F, &buf).?; // 👉 4
Get codepoint length
_ = utf8.getCodepointLength(0x1F31F); // 👉 4
Get UTF-8 sequence length
_ = utf8.getCodepointLength("🌟"[0]); // 👉 4
| Function | Return | Description |
|---|---|---|
| encode | u3 |
Encode a single Unicode codepoint to UTF-8 sequence, Returns the number of bytes written. |
| decode | u21 |
Decode a UTF-8 sequence to a Unicode codepoint, Returns the decoded codepoint. |
| Function | Return | Description |
|---|---|---|
| getCodepointLength | u3 |
Returns the number of bytes (1-4) needed to encode a codepoint in UTF-8 format. |
| getCodepointLengthOrNull | ?u3 |
Returns the number of bytes (1-4) needed to encode a codepoint in UTF-8 format if valid. |
| getSequenceLength | u3 |
Returns the number of bytes (1-4) in a UTF-8 sequence based on the first byte. |
| getSequenceLengthOrNull | ?u3 |
Returns the number of bytes (1-4) in a UTF-8 sequence based on the first byte if valid. |
| Function | Return | Description |
|---|---|---|
| isValidSlice | bool |
Returns true if the provided slice contains valid UTF-8 sequence. |
| isValidCodepoint | bool |
Returns true if the provided code point is valid for UTF-8 encoding. |
A quick summary with sample performance test results between
SuperZIG.io.string.utils.utf8implementations and its popular competitors.
std.unicodeIn summary,
iois faster by 5 times compared tostdin most cases, thanks to its optimized implementation. ✨
zig build run --release=safe -- utf8)| Benchmark | Runs | Total Time | Avg Time | Speed |
|---|---|---|---|---|
| std_x10 | 100000 | 92.7ms | 927ns | x1.00 |
| io_x10 | 100000 | 31.9ms | 319ns | x2.91 |
| std_x100 | 21485 | 1.959s | 91.188us | x1.00 |
| io_x100 | 96186 | 1.997s | 20.768us | x4.39 |
| std_x1000 | 218 | 2.067s | 9.482ms | x1.00 |
| io_x1000 | 961 | 1.87s | 1.946ms | x4.87 |
zig build run --release=fast -- utf8)| Benchmark | Runs | Total Time | Avg Time | Speed |
|---|---|---|---|---|
| std_x10 | 100000 | 102.6ms | 1.026us | x1.00 |
| io_x10 | 100000 | 29.1ms | 291ns | x3.53 |
| std_x100 | 20653 | 1.915s | 92.771us | x1.00 |
| io_x100 | 100000 | 1.796s | 17.962us | x5.16 |
| std_x1000 | 232 | 2.028s | 8.742ms | x1.00 |
| io_x1000 | 1176 | 2.07s | 1.76ms | x4.96 |
It is normal for the values to differ each time the benchmark is run, but in general these percentages will remain close.
The benchmarks were run on a Windows 11 v24H2 with 11th Gen Intel® Core™ i5-1155G7 × 8 processor and 32GB of RAM.
The version of zig used is 0.14.0.
The source code of this benchmark bench/string/utils/utf8.zig.