It's an overlap of Quick C(K)aml, a parody to Objective Caml (OCaml). Interestingly, though Caml is regarded as a member of the ML family, the 2 "ML" don't share the same meaning. It stands for Machine Language in Caml, instead of Meta Language. The name shows my wish to have a quick language in ML-style. Oh, it would be a hard task.
Currently, we have an interpreter to execute the compiled bytecode of QuicKaml. It might be well-known that there are some technique issues to write a fast interpreter in pure standard C. For example, register allocation is usually bad in a large function with many hot/cold paths (suggested by Mike Pall). My interpreter used the same trick described here to solve this problem. The basic idea is that we should split the interpreting code of bytecode into multiple functions, instead of a large interpreter loop (usually written in switch or computed goto). The difference between this approach and subroutine threading is that we do instruction dispatch using tail calls guaranteed by "musttail" attribute, which are optimized by the compiler to jumps. So we can save the time of function call and return. I think the main benefit comes from that now we have a "macro function" of which each "macro basic block" (which is actually a function) has its own context for register allocation and branch re-ordering, and intra-BB arguments are passed through a common interface decided by the calling convention. The only thing I can optimize further is to give the existing compiler a better calling convention (for our interpreter). I added a new calling convention to pass arguments in registers as many as possible in a branch of LLVM. I never wished such a change would be accepted by the upstream. A workaround is to use a custom LLVM build to compile the interpreter code into the assembly before distributing the source code. Later we can use the normal compiler to link it against other parts.
The source code is on Github.