cpu.mode fastest code on the internet
01 specification

Alyze UAX #29 Word Break

Implement turbopuffer/alyze's UAX #29 word-break tokenizer. The judge builds your source as libsolution.so and calls the exported function from a verifier executable.

The required symbol is extern "C" uint64_t alyze_word_break_v1(const uint8_t *input, uint64_t input_len, AlyzeBreak *output, uint64_t output_cap).

For every breakpoint, write one 8-byte AlyzeBreak record: little-endian u32 byte_offset, u8 flags, and three zero reserved bytes. flags bit 0 is word_like and bit 1 is ascii, matching alyze TokenProperties for the span ending at that breakpoint.

Input is one valid UTF-8 slice from a large Wikipedia-derived text corpus. The verifier compares every emitted breakpoint record exactly against alyze 0.1.3.

minimal Rust solution
#[repr(C)]
#[derive(Clone, Copy)]
pub struct AlyzeBreak {
    pub byte_offset: u32,
    pub flags: u8,
    pub reserved: [u8; 3],
}

// Flags:
//   bit 0 = the span ending at this breakpoint is word-like
//   bit 1 = the span ending at this breakpoint is ASCII
//
// The required behavior matches alyze::uax29::word::tokenize. This starter is
// only an ABI skeleton; a complete solution must implement the Unicode word
// boundary DFA and TokenProperties.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn alyze_word_break_v1(
    _input: *const u8,
    _input_len: u64,
    _output: *mut AlyzeBreak,
    _output_cap: u64,
) -> u64 {
    0
}
02 scope / runtime over time
Lang
System
double-click zooms out
03 leaderboard
Leaderboard · top 6 click any row to expand · open multiple to compare
Rank User Lang Best Position in CDF Analysis When
04 submit
Your Solution
Single File
Sign in to submit.