Skip to main content

Converting synced C/C++ into asynced JavaScript

Emscripten is a C/C++ -> LLVM -> JavaScript compiler. It's useful and interesting, but one of its biggest limitations is about the sync/async model: JavaScript is single-threaded event-driven languages, therefore you cannot do a sleep() in the hope of receiving and processing extra events.

Demo


First of all, please enjoy this demo, on which I really spent some time.

tl;dr

  • Motivation: to make `sleep` and other similar functions actually work in our Runtime, without lots of labor work porting the C/C++ code
  • I actually made it worked
  •  `sync->async` is doable and I think it's not too hard - there are already implementations for JS
  • I think this is a feature that cannot be supported by merely an external library
  • Does it fit emscripten? Or is it possible to write a plugin for emscripten?

Demo explained(?)


The demo is basically a stop-time animation:

//draw something
//wait for some while
//draw something else
//wait...


You can verify this in the source code.

We all know that it is hard to convert it into the async model of JavaScript. Now would you please take a look at my ported code, it's almost identical to the original code, except for:
  • A few macros are introduced, which are defined in asyn2.h — actually the macros are pretty much just placeholders. 
  • A js function js_napms defined  — which is a wrapper of setTimeout

I fed emscripten with the code and blablabla — and the demo works. But wait! The code should be identical to the original code, which is synced!
Well, please let me explain a few more things before I reveal the secrets.

Another Demo

Here's another demo, which is... the same as above. So what's the deal?

We may imagine that, to really 'go to sleep', we need to store all the context and restore it when we come back again. Indeed, I did so in the source code, whenever you see a `ASYNC_` macro, it involves pushing and poping to maintain the async stack.

The actual functions behind those macros are defined in async.h.

Well, I'm NOT going to propose a set of API or a library, instead I'm proposing a way of pre-processing the code, and I did that myself manually. It's doable and there're patterns, you may see how a for-loop is broken down according to the comments. I'll put technical details in the end.

The porting experience may not be as smooth as it looks like, actually `xmas` is rather straightforward, where there are rarely recursive for-loops or branches. But if you take a look at other demos, it is a nightmare to define callbacks and maintain the stack manually, just imagine that there's no `call` and `ret ASM macros, and you have to do `push`, `pop` and `jump` manually.

My point is that: the sync->async process can, and should be done by the pre-processor/compiler


The Secrets of the 1st Demo

You didn't skip the previous section did you?

Actually I made the second demo at first, before I knew the the secret weapon — streamlinejs, and here is an intuitive demo.

It's not a library, but a parser/compiler instead. I didn't go too deep into its mechanism, but from the results it generated, the mechanism should be similar as what I'll mentioned below. You may read  this article for more details.

To build the first demo, all the placeholders are replace with underscores, which will be recognized by streamlinejs (as placeholders for calback), fortunately un-optimized JS generated by emscripten can be parsed without any problem — at lesat my demo.

Technical stuffs

Imagine that there a stack dedicated for async function calls, it is different from traditional stacks in that this stack is not cleared when a function exits.

Async function calls are different from (normal) sync funtion calls, an async call pushes the context into the async stack, including the callback (similar as the return address in the synced case) and returns. The central event dispatcher (the JS engine in our case) will call the callback eventually.

So the central idea is to identify all the async function calls, which are usually casuse by two reasons:

  • Calling an async function
  • `jump` over an async call

The first one should be easy: some functions are async natively, e.g. `SDL_Delay`. And if a function calls any other async funtions inside, it is async.

The second one is usually originated from loops and branches, which will be explained later.

I think that these can be identified by the compiler, in one of following stages:

- Pre-processing C/C++ — I did that manually myself
- LLVM bitcode — which I'm not so sure
- JavaScript — streamline itself is an example

There are advantages and disadvantages in different stages, for example it might be easier to optimize the code when parsing the C code; while it may be more light-weighted to store the local variables using JavaScript closures.


Identify and transfrom async functions


Here's an example:


// sync version
void work()
{
    int j = 99;
    SDL_Delay(1000);
    printf("result %d\n", j);
}


Since SDL_Delay is natively async, we have to transform `work` into its async counterpart, as follows:


// async version
// context: stack for async calls
int work(context_t * context)
{
    int j = 99;

    push_callback(context, work__cb1);  // set up callback
    put_variable(context, j); // save local variables

    SDL_Delay(1000, context); // async version of SDL_Delay
 
    return 0; // get out and wait for SDL_Delay
}
int work__cb1(context_t * context)
{
    get_variable(context, j);
    pop(context);  // prepare to return the previous chained callback
    printf("result %d\n", j);
    context->callback();
}
```

For-loops make the situation more complicated, which causes another type of async calls:


int f()
{
    for(int i = 0; i < 10; ++i)
    {
        printf("hi ");
        SDL_Delay(10);
        printf("%d\n", i);
    }
}



f() can be flattened as


int f()
{
   int i = 0;
start:
   if(i >= 10) goto end;
   printf("hi");
   SDL_Delay(10);
   printf("%d\n", i);
   ++ i;
   goto start;
end:
   // nothing
}


Now it is clear that we can split the function and make async calls


int f(context)
{
  int i = 0;
  // save i to the stack
  // async call f_start();
}
int f_start(context)
{
  // restore i
  //pop stack

  if(i >= 10) // async call f_end();

  printf("hi ");

  // save i
  // push f_start2 into the stack
  SDL_Delay(10, context);
  return 0;
}
int f_start2(context)
{
   // restore i
   //pop stack

   printf("%d\n", i);

  // push i
  // async call f_start() to continue the loop
   return 0;
}
int f_end(context)
{
   // pop stack
   // async call callback of f()
}


Braches (if, switch etc)  are similar, as long as we consider them as `goto`'s.


local variables and return values

local variables may be stored and retrieved when we push/pop the async stack,
and so are return values.

Compiler/Preprocessor Integration: Step 1

It should be clear now that this feature is kind of transformation, which cannot be supported by linking to an external library. Of course the pre-condition is that the transformation should be (almost) transparent, it should not be necessary for developers to maintain the stack manually.

The first step, I'd imagine, is that the async functions are explicitly marked through some mechanism. In my example, a placeholder is used.

Developers may still write programs in the sync fashion, for two reasons: one for the convenience writing new program, and the other for porting existing ones.

The compiler should detect, split and setup async functions automatically, the async stack should be managed by standard library while some API might be exposed.

There are two ways  of managing the local variables, let me call them the C style and the JavaScript style:

The C style: Local variables of async functions are stored in dedicated area in the memory (HEAP or a special stack for async functions), instead of the normal stack. To avoid lots of memcpy's, the variables may be directly allocate there. Some push/pop operations may be optimized if the caller/callee is known (e.g. loops/branches)

The JavaScript style: streamlinejs is a good example. Async functions are broken into a series of resursive functions, and local variables are stored into the closures.

The JavaScript style is easy and intuitive, but the hidden overhead might not be negligible. It may be too late to optimize when the LLVM bitcode have been transformed into JavaScript.


Compiler/Preprocessor Integration: Step 2

It might be possible to further reduce the work of writing/porting, as even marking async functions and define the placeholders for every async function declaration and every async function call is boring and error-prone.

My (naive & wild) imagination is that by defining a few essential async functions (such as SDL_Delay), the compiler would automatically recognize async functions, and set up the hidden parameter. It's not perfect, especially when we need to link a number libraries, but at least I think a C/C++ transformer would be possible and nice, perhaps based on LLVM?

Limitations

  • It might not work for muti-threading. Indeed I've been only thinking about single-threaded programs, especially most ones for terminal — But this should not affect the importance of this issue I think.
  • Lots of overhead might be introduced in this way — But I guess the performance should not be affected much if well optimized
  • C++: ctr/copy/dectr  of objects might be a problem, or maybe not since they can be `flattened` into C-style?
  • C++: try-catch might not work, the control flow is already diverted
  • There a few limitations of streamlinejs, but I think many of them can be addressed if we process in the C phase.

Discussion

Please follow this thread on GitHub.

Comments

Popular posts from this blog

Determine Perspective Lines With Off-page Vanishing Point

In perspective drawing, a vanishing point represents a group of parallel lines, in other words, a direction. For any point on the paper, if we want a line towards the same direction (in the 3d space), we simply draw a line through it and the vanishing point. But sometimes the vanishing point is too far away, such that it is outside the paper/canvas. In this example, we have a point P and two perspective lines L1 and L2. The vanishing point VP is naturally the intersection of L1 and L2. The task is to draw a line through P and VP, without having VP on the paper. I am aware of a few traditional solutions: 1. Use extra pieces of paper such that we can extend L1 and L2 until we see VP. 2. Draw everything in a smaller scale, such that we can see both P and VP on the paper. Draw the line and scale everything back. 3. Draw a perspective grid using the Brewer Method. #1 and #2 might be quite practical. #3 may not guarantee a solution, unless we can measure distances/p...

[转] UTF-8 and Unicode FAQ for Unix/Linux

这几天,这个东西把我搞得很头疼 而且这篇文章好像太大了,blogger自己的发布系统不能发 只好用mail了 //原文 http://www.cl.cam.ac.uk/~mgk25/unicode.html UTF-8 and Unicode FAQ for Unix/Linux by Markus Kuhn This text is a very comprehensive one-stop information resource on how you can use Unicode/UTF-8 on POSIX systems (Linux, Unix). You will find here both introductory information for every user, as well as detailed references for the experienced developer. Unicode has started to replace ASCII, ISO 8859 and EUC at all levels. It enables users to handle not only practically any script and language used on this planet, it also supports a comprehensive set of mathematical and technical symbols to simplify scientific information exchange. With the UTF-8 encoding, Unicode can be used in a convenient and backwards compatible way in environments that were designed entirely around ASCII, like Unix. UTF-8 is the way in which Unicode is used under Unix, Linux, and similar systems. It is now time to make sure that you are well familiar ...

Moving Items Along Bezier Curves with CSS Animation (Part 2: Time Warp)

This is a follow-up of my earlier article.  I realized that there is another way of achieving the same effect. This article has lots of nice examples and explanations, the basic idea is to make very simple @keyframe rules, usually just a linear movement, then use timing function to distort the time, such that the motion path becomes the desired curve. I'd like to call it the "time warp" hack. Demo See the Pen Interactive cubic Bezier curve + CSS animation by Lu Wang ( @coolwanglu ) on CodePen . How does it work? Recall that a cubic Bezier curve is defined by this formula : \[B(t) = (1-t)^3P_0+3(1-t)^2tP_1+3(1-t)t^2P_2+t^3P_3,\ 0 \le t \le 1.\] In the 2D case, \(B(t)\) has two coordinates, \(x(t)\) and \(y(t)\). Define \(x_i\) to the be x coordinate of \(P_i\), then we have: \[x(t) = (1-t)^3x_0+3(1-t)^2tx_1+3(1-t)t^2x_2+t^3x_3,\ 0 \le t \le 1.\] So, for our animated element, we want to make sure that the x coordiante (i.e. the "left" CSS property) is \(...