JavaScript-Duktape-XS
view release on metacpan or search on metacpan
*
* The main function duk__advance_bytes() is called at least once per every
* token so it has a major lexer/compiler performance impact. There are two
* variants for the main duk__advance_bytes() algorithm: a sliding window
* approach which is slightly faster at the cost of larger code footprint,
* and a simple copying one.
*
* Decoding directly from the source string would be another lexing option.
* But the lookup window based approach has the advantage of hiding the
* source string and its encoding effectively which gives more flexibility
* going forward to e.g. support chunked streaming of source from flash.
*
* Decodes UTF-8/CESU-8 leniently with support for code points from U+0000 to
* U+10FFFF, causing an error if the input is unparseable. Leniency means:
*
* * Unicode code point validation is intentionally not performed,
* except to check that the codepoint does not exceed 0x10ffff.
*
* * In particular, surrogate pairs are allowed and not combined, which
* allows source files to represent all SourceCharacters with CESU-8.
* Broken surrogate pairs are allowed, as ECMAScript does not mandate
*
* * Reject other invalid Unicode sequences (see Wikipedia entry for examples)
* in strict UTF-8 mode.
*
* * Size optimize. An attempt to use a 16-byte lookup table for the first
* byte resulted in a code increase though.
*
* * Is checking against maximum 0x10ffff really useful? 4-byte encoding
* imposes a certain limit anyway.
*
* * Support chunked streaming of source code. Can be implemented either
* by streaming chunks of bytes or chunks of codepoints.
*/
#if defined(DUK_USE_LEXER_SLIDING_WINDOW)
DUK_LOCAL void duk__fill_lexer_buffer(duk_lexer_ctx *lex_ctx, duk_small_uint_t start_offset_bytes) {
duk_lexer_codepoint *cp, *cp_end;
duk_ucodepoint_t x;
duk_small_uint_t contlen;
const duk_uint8_t *p, *p_end;
#if defined(DUK_USE_STRICT_UTF8_SOURCE)
duk_ucodepoint_t mincp;
( run in 0.237 second using v1.01-cache-2.11-cpan-4d50c553e7e )