makgordon/esp-mquickjs

0.1.0-beta

Pre-release
uploaded 23 hours ago
A port of MicroQuickJS (aka. MQuickJS for embedded systems) to the ESP-IDF platform.

readme

ESP-MicroQuickJS
============

## Overview
**ESP-MQuickJS** is a port of the [MicroQuickJS](https://bellard.org/mquickjs/) (mquickjs) Javascript engine to the **Espressif ESP-IDF** platform. 

MicroQuickJS is a tiny Javascript engine designed for embedded systems, capable of running with as little as **10 kB of RAM** and roughly 100 kB of ROM. This project packages it as an **ESP-IDF Component**, making it easy to integrate lightweight Javascript scripting capabilities into your ESP32, ESP32-C3, ESP32-S3, and other ESP-IDF based projects.

### Key Features
*   **ESP-IDF Component Support**: Ready to be used with the IDF Component Manager or as a submodule.
*   **Extremely Low Footprint**: Ideal for resource-constrained ESP chips.
*   **ES5 Subset**: Supports a strict subset of ES5, optimized for reliability and size.
*   **Native Integration**: (Planned/Implemented) Easy binding between C functions and Javascript.

*This project is based on the original work by Fabrice Bellard and Charlie Gordon.*

## Introduction

MicroQuickJS (aka. MQuickJS) is a JavaScript engine targeted at
embedded systems. It compiles and runs JavaScript programs using as little
as 10 kB of RAM. The whole engine requires about 100 kB of ROM (ARM
Thumb-2 code) including the C library. The speed is comparable to
QuickJS.

MQuickJS only supports a [subset](#javascript-subset-reference) of JavaScript close to ES5. It
implements a **stricter mode** where some error prone or inefficient
JavaScript constructs are forbidden.

Although MQuickJS shares much code with QuickJS, it internals are
different in order to consume less memory. In particular, it relies on
a tracing garbage collector, the VM does not use the CPU stack and
strings are stored in UTF-8.

## REPL

The REPL is `mqjs`. Usage:

```
usage: mqjs [options] [file [args]]
-h  --help            list options
-e  --eval EXPR       evaluate EXPR
-i  --interactive     go to interactive mode
-I  --include file    include an additional file
-d  --dump            dump the memory usage stats
    --memory-limit n  limit the memory usage to 'n' bytes
--no-column           no column number in debug information
-o FILE               save the bytecode to FILE
-m32                  force 32 bit bytecode output (use with -o)
-b  --allow-bytecode  allow bytecode in input file
```

Compile and run a program using 10 kB of RAM:

```sh
./mqjs --memory-limit 10k tests/mandelbrot.js
```


In addition to normal script execution, `mqjs` can output the compiled
bytecode to a persistent storage (file or ROM):

```sh
./mqjs -o mandelbrot.bin tests/mandelbrot.js
```

Then you can run the compiled bytecode as a normal script:

```sh
./mqjs -b mandelbrot.bin
```

The bytecode format depends on the endianness and word length (32 or
64 bit) of the CPU. On a 64 bit CPU, it is possible to use the option
`-m32` to generate 32 bit bytecode that can run on an embedded 32 bit
system.

Use the option `--no-column` to remove the column number debug info
(only line numbers are remaining) if you want to save some storage.

## Stricter mode

MQuickJS only supports a subset of JavaScript (mostly ES5). It is
always in **stricter** mode where some error prone JavaScript features
are disabled. The general idea is that the stricter mode is a subset
of JavaScript, so it still works as usual in other JavaScript
engines. Here are the main points:

- Only **strict mode** constructs are allowed, hence no `with` keyword
  and global variables must be declared with the `var` keyword.

- Arrays cannot have holes. Writing an element after the end is not
  allowed:
```js
    a = []
    a[0] = 1; // OK to extend the array length
    a[10] = 2; // TypeError
```
  If you need an array like object with holes, use a normal object
  instead:
```js
    a = {}
    a[0] = 1;
    a[10] = 2;
```
  `new Array(len)` still works as expected, but the array elements are
  initialized to `undefined`.
  Array literals with holes are a syntax error:
```js
    [ 1, , 3 ] // SyntaxError
```
- Only global `eval` is supported so it cannot access to nor modify
  local variables:
```js
    eval('1 + 2'); // forbidden
    (1, eval)('1 + 2'); // OK
```
- No value boxing: `new Number(1)` is not supported and never
  necessary.

## JavaScript Subset Reference
 
- Only strict mode is supported with emphasis on ES5 compatibility.

- `Array` objects:

    - They have no holes.
    
    - Numeric properties are always handled by the array object and not
      forwarded to its prototype.
  
    - Out-of-bound sets are an error except when they are at the end of
      the array.
      
    - The `length` property is a getter/setter in the array prototype.

- all properties are writable, enumerable and configurable.

- `for in` only iterates over the object own properties. It should be
  used with this common pattern to have a consistent behavior with
  standard JavaScript:
  
```js
    for(var prop in obj) {
        if (obj.hasOwnProperty(prop)) {
            ...
        }
    }
```    
Always prefer using `for of` instead which is supported with arrays:

```js
    for(var prop of Object.keys(obj)) {
        ...
    }
```

- `prototype`, `length` and `name` are getter/setter in function objects.

- C functions cannot have their own properties (but C constructors
  behave as expected).

- The global object is supported, but its use is discouraged. It
  cannot contain getter/setters and properties directly created in it
  are not visible as global variables in the executing script.

- The variable associated with the `catch` keyword is a normal
  variable.

- Direct `eval` is not supported. Only indirect (=global) `eval` is
  supported.

- No value boxing (e.g. `new Number(1)` is not supported)

- Regexp:

    - case folding only works with ASCII characters.

    - the matching is unicode only i.e. `/./` matches a unicode code
      point instead of an UTF-16 character as with the `u` flag.

- String: `toLowerCase` / `toUpperCase` only handle ASCII characters.

- Date: only `Date.now()` is supported.

ES5 extensions:
  
- `for of` is supported but iterates only over arrays. No custom
   iterator is supported (yet).

- Typed arrays.

- `\u{hex}` is accepted in string literals

- Math functions: `imul`, `clz32`, `fround`, `trunc`, `log2`, `log10`.

- The exponentiation operator

- Regexp: the dotall (`s`), sticky (`y`) and unicode (`u`) flags are
  accepted. In unicode mode, the unicode properties are not supported.

- String functions: `codePointAt`, `replaceAll`, `trimStart`, `trimEnd`.

- The `globalThis` global property.

## C API

### Engine initialization

MQuickJS has almost no dependency on the C library. In particular it
does not use `malloc()`, `free()` nor `printf()`. When creating a
MQuickJS context, a memory buffer must be provided. The engine only
allocates memory in this buffer:
```c
    JSContext *ctx;
    uint8_t mem_buf[8192];
    ctx = JS_NewContext(mem_buf, sizeof(mem_buf), &js_stdlib);
    ...
    JS_FreeContext(ctx);
```
`JS_FreeContext(ctx)` is only necessary to call the finalizers of user
objects as no system memory is allocated by the engine.

### Memory handling

The C API is very similar to QuickJS (see `mquickjs.h`). However,
since there is a compacting garbage collector, there are important
differences:

1. Explicitly freeing values is not necessary (no `JS_FreeValue()`).

2. The address of objects can move each time a JS allocation is
called. The general rule is to avoid having variables of type
`JSValue` in C. They may be present only for temporary use between
MQuickJS API calls. In the other cases, always use a pointer to a
`JSValue`. `JS_PushGCRef()` returns a pointer to a temporary opaque
`JSValue` stored in a `JSGCRef` variable. `JS_PopGCRef()` must be used
to release the temporary reference. The opaque value in `JSGCRef` is
automatically updated when objects move. Example:

```c
JSValue my_js_func(JSContext *ctx, JSValue *this_val, int argc, JSValue *argv)
{
        JSGCRef obj1_ref, obj2_ref;
        JSValue *obj1, *obj2, ret;

        ret = JS_EXCEPTION;
        obj1 = JS_PushGCRef(ctx, &obj1_ref);
        obj2 = JS_PushGCRef(ctx, &obj2_ref);
        *obj1 = JS_NewObject(ctx);
        if (JS_IsException(*obj1))
            goto fail;
        *obj2 = JS_NewObject(ctx); // obj1 may move
        if (JS_IsException(*obj2))
            goto fail;
        JS_SetPropertyStr(ctx, *obj1, "x", *obj2);  // obj1 and obj2 may move
        ret = *obj1;
     fail:
        PopGCRef(ctx, &obj2_ref);
        PopGCRef(ctx, &obj1_ref);
        return ret;
}
```

When running on a PC, the `DEBUG_GC` define can be used to force the
JS allocator to always move objects at each allocation. It is a good
way to check no invalid JSValue is used.

### Standard library

The standard library is compiled by a custom tool (`mquickjs_build.c`)
to C structures that may reside in ROM. Hence the standard library
instantiation is very fast and requires almost no RAM. An example of
standard library for `mqjs` is provided in `mqjs_stdlib.c`. The result
of its compilation is `mqjs_stdlib.h`.

`example.c` is a complete example using the MQuickJS C API.

### Persistent bytecode

The bytecode generated by `mqjs` may be executed from ROM. In this
case, it must be relocated before being flashed into ROM (see
`JS_RelocateBytecode()`). It is then instantiated with
`JS_LoadBytecode()` and run as normal script with `JS_Run()` (see
`mqjs.c`).

As with QuickJS, no backward compatibility is guaranteed at the
bytecode level. Moreover, the bytecode is not verified before being
executed. Only run JavaScript bytecode from trusted sources.

### Mathematical library and floating point emulation

MQuickJS contains its own tiny mathematical library (in
`libm.c`). Moreover, in case the CPU has no floating point support, it
contains its own floating point emulator which may be smaller than the
one provided with the GCC toolchain.

## Internals and comparison with QuickJS

### Garbage collection

A tracing and compacting garbage collector is used instead of
reference counting. It allows smaller objects. The GC adds an overhead
of a few bits per allocated memory block. Moreover, memory
fragmentation is avoided.

The engine has its own memory allocator and does not depend on the C
library malloc.

### Value and object representation

The value has the same size as a CPU word (hence 32 bits on a 32 bit
CPU). A value may contain:

  - a 31 bit integer (1 bit tag)

  - a single unicode codepoint (hence a string of one or two 16 bit code units)

  - a 64 bit floating point number with a small exponent with 64 bit CPU words

  - a pointer to a memory block. Memory blocks have a tag stored in
    memory.

JavaScript objects require at least 3 CPU words (hence 12 bytes on a
32 bit CPU). Additional data may be allocated depending on the object
class. The properties are stored in a hash table. Each property
requires at least 3 CPU words. Properties may reside in ROM for
standard library objects.

Property keys are JSValues unlike QuickJS where they have a specific
type. They are either a string or a positive 31 bit integer. String
property keys are internalized (unique).

Strings are internally stored in WTF-8 (UTF-8 + unpaired surrogates)
instead of 8 or 16 bit arrays in QuickJS. Surrogate pairs are not
stored explicitly but are still visible when iterating thru 16 bit
code units in JavaScript. Hence full compatibility with JavaScript and
UTF-8 is maintained.

C Functions can be stored as a single value to reduce the overhead. In
this case, no additional properties can be added. Most standard
library functions are stored this way.

### Standard library

The whole standard library resides in ROM. It is generated at compile
time. Only a few objects are created in RAM. Hence the engine
instantiation time is very low.

### Bytecode

It is a stack based bytecode (similar to QuickJS). However, the
bytecode references atoms thru an indirect table.

Line and column number information is compressed with 
[exponential-Golomb codes](https://en.wikipedia.org/wiki/Exponential-Golomb_coding).

### Compilation

The parser is very close to the QuickJS one but it avoids recursion so
the C stack usage is bounded. There is no abstract syntax tree. The
bytecode is generated in one pass with several tricks to optimize it
(QuickJS has several optimization passes).

## Tests and benchmarks

Running the basic tests:
```sh
make test
```

Running the QuickJS micro benchmark:
```sh
make microbench
```

Additional tests and a patched version of the Octane benchmark running
in stricter mode can be downloaded
[here](https://bellard.org/mquickjs/mquickjs-extras.tar.xz):

Running the V8 octane benchmark:
```sh
make octane
```

## License

MQuickJS is released under the MIT license.

Unless otherwise specified, the MQuickJS sources are copyright Fabrice
Bellard and Charlie Gordon.

Links

Supports all targets

License: MIT

To add this component to your project, run:

idf.py add-dependency "makgordon/esp-mquickjs^0.1.0-beta"

download archive

Stats

  • Archive size
    Archive size ~ 881.50 KB
  • Downloaded in total
    Downloaded in total 0 times
  • Downloaded this version
    This version: 0 times

Badge

makgordon/esp-mquickjs version: 0.1.0-beta
|