2022-09-22

Thoughts on Herb Sutter's cppfront

Recently I learned about cppfront, which is an experimental new syntax for C++ by Herb Sutter. It was nicely explained in CppCon, and I really enjoyed watching the video.

Roughly I'd view it as Rust with C++ interop. It's kind of some syntax sugar, preprocessor or a dialect. It has also enforced style guide or annotations that are compiler-aware. I like it mostly, I feel excited. 

And let me try to explain in a logic way.

The Syntax

I don't like it, nor do I hate it. I asked myself, do I not like it, just because I am not familar with it? The answer is yes. So it's my problem, not cpp2's. It didn't took me too much time to get myself comfortable (but not fluent) with Rust, so I think cpp2 won't be a problem.

Comparing with Rust

I got this several times while reading the docs or watching the video: 

  • If something is bad, let's remove it from the language instead of keeping teaching "don't do this, don't do that". Examples: NULL, union, unsafe casts.
  • If something is good, let's make it by default. Examples: std::move, std::unique_ptr, [[nodiscard]].

Well, this are not new to me, I saw almost exactly the same words before. Rust is a good example of designs with such goals in mind.

Suppose we want really want to achieve them with current C++:

Some may be achieved by adding new flags to existing compilers, say, a new flag to make [[nodiscard]] default for all functions.

Some may be achieved by teaching and engineer-enforced style guides, e.g. the usage of std::move or std:unique_ptr.

Some may be achieved by static analyzers, linters, e.g. usage before initialization (only partially achievable)

Some may be achieved by libraries, including macros, e.g. returning multiple variables via tuple or a temporary struct.

However it'd never be as good as native support in compilers. 

A small example is absl::Status in abseil vs the question mark syntax in Rust. And a large example would be the general memory safety, ownership in Rust.

I always think that C, C++ and Rust are essentially the same. All technologies are there for building compilers and analyzers, it's just a matter of:

  • How much information do we provide to compilers? E.g. ownership, types.
  • How much freedom do we have? E.g.  raw pointer arithmetics, arbitrary type casts.
I believe that by inventing enough annotations (e.g. ownerships, lifetime) and limiting some language features (e.g.  raw pointers arithmetics), it'd be possible to make C++ "Rust alike". But is it worth it?

The Biggest Obstacle 

I agree with Herb that "backwards compatibility" is the biggest and probably the only obstacle. Again, Rust is a perfect example without such constraint.

Honestly the usage of "<<" and ">>" for streams already sounds a bit odd, but I've never had strong feelings because that was in the hello world code while I was learning C++, and (luckily?) I didn't have a C background. 

It surprised me that the meaning of "auto" was (successfully) modified in C++11. And the usage of "&&" for rvalue references also sounds a bit strange to me. It feels to me that the process is "we have a new shiny feature, let's see what existing symbols can we borrow to give a new meaning". A similar case is the usage of "const" at the end of class member function declarations. I remember I asked someone, why here? And the answer was: probably they didn't have a better place to put it.

My guess (which could be really random, since I don't really know much behind-the-scene design thoughts) would be, it'd cause minimum changes to existing compilers. Or even better, no change at all: overloading operators has been a proper C++ feature, so why not repurposing it for streams since "left shift" does not make sense anyways? (Therfore nobody is/should use it as a shift operator).

I like the idea that cpp2 and cpp1 can interop at the source level. If cpp2 will become successful, some features may be left in cpp1 as the "unsafe features", like C# and Rust do.

I like Rust, but I've been hesitating to use it because I know most of the time I'd have to start from scratch. C and C++ are good examples here. If I need a PNG library, I'd download libpng, read the doc and start coding. Python is another good example. 

I feel that Rust is still too young. The language itself (mostly syntax & compiler but without the ecosystem) is probably mature enough for Linux kernel, but I'm not sure about the libraries, after all I have not seen many lib**-rust in the Ubuntu repository.

Yes I probably should look at the Cargo registry, and yes my knowledge is quite outdated, so probably the libraries are already enough for me, but I'd yet to double check. But this is the problem: I'm already path-depending C++ (and Python), so "easily migration from C++" would be a huge selling point for me.

The Preprossor

Quoting Herb "there is no preprocessor in cpp2". But actually I think cppfront itself, or at least part of it, is a preprocessor.

My favorite part is the optional out-of-bound check for the "x[i]" operator, the check may be turned on or off by compiler flags, it's not as ugly/long as "x.super_safe_index_subscript_with_optional_boundary_check(i)", not is it as wild as "MAGICAL_MACRO_SUBSCRIPT(x, i)". I also like that it's a loose protocol with the compiler: it automatically works as long as x.size() is defined. Somehow like how range-based for works.

So if we want this feature, either we redefine the "[]" operator in C++, or translate every single "x[i]" to "x.super_safe_index_subscript_with_optional_boundary_check(i)", which is a preprocessor. (maybe compiler is the correct term, but it feels like a preprocessor to me).

I still remember a reading about technique in a game programming book, before Lua became popular. The technique was to define a set of C/C++ macros, to create a mini-scriping language. Then engineers may add lots of game logic using this scripting language, that will be way more readable and less error-prone than C++. Somehow cppfront resembles that technique a bit.

Conclusions

I like almost everything about cpp2 so far. But it is (too) early, and there could be lots of hidden/implied issues that I don't know yet, or that are not revealed yet.

Rust would be a very good competitor here, only if it has C++ interop at the source level. It probably never will, but that's OK.

Carbon sounds very similar as well. Its README mentioned the analogue of "JavaScript and TypeScript". Interestingly, Herb mentioned the same thing as well for cpp2. And this is why I see huge overlaps between cpp2 and Carbon. I do hope both projects will not divert too far from each other, such that most knowledge, technologies or code can be shared.

Overall I feel excited about recent changes and proposals of C++. My overly-simplified view is C++11 was driven by nice features in languages like C# and Java. And this new wave is probably driven by nice features in languages like Go and Rust. 

Hopefully, one day, the reason that I like C++ will no longer be Stockholm syndrome.

清理Ubuntu软件包

我的小服务器上一直装了个Ubuntu Desktop,不过安装之后一直没用过GUI,而且各种依赖的包有时也挺烦人的,比如gvfs和tracker自带的systemd user service,我还得手动给若干用户禁用掉。

本来我是想着留个Desktop,万一紧急情况可以上网查查命令。不过有网的话,最差情况我应该也能临时装一个X和浏览器,估计问题不太大。

于是我决定把Ubuntu Desktop换成Ubuntu Server,主要还是把gnome的包都删了。

一番折腾以后,安装包的数量从大约1800降到了800以下。舒服!

2022-09-20

Moving Items Along Bezier Curves with CSS Animation (Part 2: Time Warp)

This is a follow-up of my earlier article. I realized that there is another way of achieving the same effect.

This article has lots of nice examples and explanations, the basic idea is to make very simple @keyframe rules, usually just a linear movement, then use timing function to distort the time, such that the motion path becomes the desired curve.

I'd like to call it the "time warp" hack.


Demo




How does it work?


Recall that a cubic Bezier curve is defined by this formula:

\[B(t) = (1-t)^3P_0+3(1-t)^2tP_1+3(1-t)t^2P_2+t^3P_3,\ 0 \le t \le 1.\]

In the 2D case, \(B(t)\) has two coordinates, \(x(t)\) and \(y(t)\). Define \(x_i\) to the be x coordinate of \(P_i\), then we have:

\[x(t) = (1-t)^3x_0+3(1-t)^2tx_1+3(1-t)t^2x_2+t^3x_3,\ 0 \le t \le 1.\]

So, for our animated element, we want to make sure that the x coordiante (i.e. the "left" CSS property) is \(x(t)\) at time \(t\). 

Because \(x(0)=x_0\) and \(x(1)=x_3\), we know that the @keyframes rule must be defined as

@keyframes move-x {
  from { left: x0; }
  to { left: x3; }
}
Now to determine the timing function, suppose the function is
cubic-bezier(u1, v1, u2, v2)
Note that this is again a 2D cubic Bezier curve, defined by four points \((0, 0), (u_1, v_1), (u_2, v_2), (1, 1)\). And the function for each coordinate would be:

\[ u(t) = 3(1-t)^2tu_1 + 3(1-t)t^2u_2 + t^3 \]
\[ v(t) = 3(1-t)^2tv_1 + 3(1-t)t^2v_2 + t^3 \]

Recall that, according to the CSS spec, at any time \(t\), the animate value \(\text{left}(t)\) is calculated as:

\[ \text{left}(t) = x_0 + v(t')(x_3 - x_0), \text{where}\ u(t')=t \]

Our first step is to set \(u_1=1/3\) and \(u_2=2/3\), such that \(u(t)=t\) for all \(t\).

Then we set:
\[ v_1 = \frac{x_1-x_0}{x_3-x_0}, v_2 = \frac{x_2-x_0}{x_3-x_0}  \]

This way we have

\[ v(t) = \frac{x(t) - x_0}{x_3-x_0} \]

Combining everything together, we know that if we set the animation-timing-function as

\[ \text{cubic-bezier}(\frac{1}{3}, \frac{x_1-x_0}{x_3-x_0}, \frac{2}{3},  \frac{x_2-x_0}{x_3-x_0}) \]

then we have \(\text{left}(t)=x(t)\) as desired.

Simliarly we can define @keyframes and animation-timing-function for \(y(t)\), then our CSS animation is completed.

Note: obviously the method does not work when \(x_0=x_3\) or \(y_0=y_3\), but in practice we can add a tiny offset in such cases.

Animation Timing


Observe that \(u(t)\) controls the mapping between the animation progress and the variable \(t\) of the curve. \(u_1=1/3\) and \(u_2=2/3\) are chosen to achieve the default linear timing. We can tweak the values of \(u_1\) and \(u_2\) to alter the timing.

Note that the methods from the previous article supports any timing functions, including "steps()" and "cubic-bezier()".

It's easy to see that a "cubic-bezier(u1, 1/3, u2, 2/3)" timing function for the previous article would be the same as setting the same values of \(u_1\) and \(u_2\) for the "time warp" version. In other words, animation timing is limited here, we have only the input progress mapping, but not the output progress mapping.

Of course the reason is we are already using the output progress mapping for the time warp effect.

2022-09-15

Restricting Network Access of Processes

I recently read this article, which talks about restricting (proactive) internet access of a process.

It is easy to completely disable internet/network access, by throwing a process into a new private network namespace. I think all popular sandboxing tools support it nowadays:

  • unshare -n
  • bwrap --unshare-net
  • systemd.service has PrivateNetwork=yes
  • docker has internal network
But the trickier, and more realistic scenario is:
  • [Inbound] The process needs to listen one or more ports, and/or
  • [Outbound] The process needs to access one or more specific IP address/domain
I can think of a few options.

Option 1: Firewall Rules


Both iptables and nftables support filter packets by uid and gid. So the steps are clear:
  • Run the process with a dedicate uid and/or gid
  • Filter packets in the firewall
  • If needs, regularly query DNS and update the allowed set of IP addresses.
This option is not very complicated, and I think the overhead is low. While the DNS part is a bit ugly, it is flexiable and solves both inbound and outbound filtering.

On the other hand, it might be a bit difficult to maintain it, because the constraints (firewall rules) and the processes are in different places.

Option 2: Systemd Service with Socket Activation


Recently I've been playing with sandboxing flags in systemd. Especially systemd-analyze. Our problem can be solved with systemd + socket activation like this:
  • Create my-service.socket that listens to the desire address and port
  • Create my-service.service for the process, with PrivateNetwork=yes.
    • The process has no access to network, it receives a socket from systemd instead, i.e. socket activation
However, it only works if the process supports socket activation. If not, there is a handy tool systemd-socket-proxyd designed for this case. There are nice examples in the manual.

I tested the following setup:
  • my-service-proxy.socket, which activate the corresponding service
  • my-service-proxy.service, which runs systemd-socket-proxyd.
    • The service must have PrivateNetwork=yes and JoinsNamespaceOf=my-service.service
  • my-service.service, the real process, with PrivateNetwork=yes
This way, the process can accept connections at a pre-defined address/port, but has no network access otherwise.

It works for me, but with a few shortcomings:
  • It only worked for system services (running with root systemd). I suspected that it might work with PrivateUsers=yes, but it didn't.
  • It is quite some hassle to write and maintain all these files.
For outbound traffic, systemd can filter by IP addresses, but I'm not sure about ports. For domain filtering, it might be possible to borrrow ideas from the other two options, but I suppose it won't be easy.

Option 3: Docker with Proxy


If the process in question is in a Docker container, inbound traffic is already handled by Docker (via iptables rules).

For outbound traffic, the firewall option also works well for IP addresses. Actually it might be easier to filter packets this way.

For domains, there is another interesting solution: use a proxy. Originally I had some vague ideas about this option, then I found this article. I learned a lot from it and I also extended it.

To explain how it works, here's an example docker compose snippet:

networks:
  network-internal:
    internal: true
  network-proxy:
    ...

services:
  my-service:
    # needs to access https://my-domain.com
    networks:
      - network-internal
    ...
  my-proxy:
    # forwards 443 to my-domain.com:443
    networks:
      - network-internal
      - network-proxy
    ...


The idea is that my-service runs in network-internal, which has no Internet access. But my-service may access selected endpoints via my-proxy.

There are two detailed problems to solve:
  • Which proxy to use?
  • How to make my-service talks to my-proxy?


Choosing the Proxy


In the article the author uses nginx. Originally I had thought it'd be a mess of setting up SSL (root) certificates. But later I learned that nginx can act as a stream proxy that forwards TCP/UDP ports, which make thing much easier.

On the other hand, I often use socat to forwards ports as well, which can also be used here. 

Comparing both:
  • socat is lighter-weighted, the alpine/socat docker image is about 5MB, while the nginx docker image is about 55MB.
  • socat can be configured via command line flags, but nginx needs a configuration file.
  • socat can support only one port, but nginx can manage multiple ports with one instance.
So in practice I'd use socat for one or two ports, but I'd switch to nginx for more. It'd be a hassle to create one container for each port.


Enabling the Proxy


If my-service needs to be externally accessible, the ports must be forwarded and exposed by my-proxy.

For outbound traffic, we want to trick my-service, such that it will see my-proxy when it wants to resolve, for example, my-domain.com.

I'm aware of three options:

#1 That article uses links, but the option is designed for inter-container communcations, and it is deprecated.

#2 Another option is to assign a static IP of my-proxy, then add an entry to extra_hosts of my-service.

#3 Add an aliases entry of my-proxy on network-internal.

While #3 seems better, it is does not just work like that, because when my-proxy wants to send the real traffic to my-domain.com, it will actually send to itself because of the aliases.

To fix it, I have a very hacky solution:

networks:
  network-internal:
    internal: true
  network-proxy:
    ...

services:
  my-service:
    networks:
      - network-internal
    ...
  my-proxy1:
    # forwards 443 to my-proxy2:443
    networks:
      network-internal:
        aliases:
          - my-domain.com
      network-proxy:
    ...
  my-proxy2:
    # forwards 443 to my-domain.com:443
    networks:
      - network-proxy
    ...


In this version, my-proxy1 injects the domain and thus hijacks traffic from my-service. Then my-proxy1 forwards traffic to my-proxy2. Finally my-proxy2 forwards traffic to the real my-domain.com. Note that my-proxy2 can correctly resolve the domain because it is not in network-internal.

On the other hand, it might be possible to tweak the process to ignore local hosts, but I'm not aware of any easy soltuion. 

I use #3 in practice despite it is ugly and hacky, mostly because I don't want to set up static IP for #2.

More on Docker, or Docker Compose, it is possible to specify the network for building containers, which could be handy.

Conclusions


In practice I use option 3 with a bit of option 1. 

With option 3, if I already have a Docker container/image, it'd be just adding a few lines in docker-compose.yml, maybe plus a short nginx.conf file.

With option 1, the main concern is the rules may become out of sync with the processes. For example, if  the environment of the process is changed (e.g. uid, pid, IP address etc), I may need to update the firewall rules to stay up-to-date. But this could be easily missed. I'd set up firewall rules for stable services and generic rules

Option 2 could be useful in some cases, but I don't enjoy writing the service files. And it seems harder to extend (e.g. add a proxy).

2022-09-13

Migrating from iptables to nftables

nftables has been enabled by default in latest Ubuntu and Debian, but not fully supported by Docker.

I've been hestitating about migrating from iptables to nftables, but managed to do it today.

Here are my thoughts.

Scripting nftables

The syntax of iptables and nftables are different, but not that different, both are more or less human readable. However, nftables is clearly more friendly for scripting.

I spent quite some time in a python script to generate a iptables rule set, and I was worried that I need lots of time migrating the script. Aftering studying the syntax of nftables, I realized that I could just write /etc/nftables.conf directly. 

In the conf file I can manage tables and chains in a structured way. I'm free to use indentations and new lines, and I no longer need to write "-I CHAIN" for every rule.

Besides, I can group similar rules (e.g. same rule for different tcp ports) easily, and I can define variables and reuse them. 

Eventually I was able to write a nice nftables rule set quickly with basic scripting syntax. It was not as powerful as my custom python script, but it is definitely easier to write. Further, I think it might be worth learning mapping in the future.

Tables & Chains in nftables

Unlike iptables, nftables is decentralized. Instead of pre-defined tables (e.g. filter) and chains (e.g. INPUT), nftables uses hooks and priorities. It sounds like event listeners in JavaScript.

One big difference is: a packet is dropped if it is dropped any matching rule, and a packet is accepted only if all relevant chains accept the packet. Again, this is similar to event listeners. On the other hand, in iptables, a packet is accepted if it is accepted by any rule. It sounds a bit confusing at the beginning, but I think nftables is more flexible, especially in my cases, see below.

Docker & nftables

Docker does not support nftables, but it add rules via iptables-nft. It was painful to managed iptables rules with Docker:
  • Docker creates its own DOCKER and DOCKER-USER chains, which may accept some rules.
  • If I need to control the traffic from/to containers, I need to make sure that the rules are defined before or in DOCKER-USER.
  • Docker may or may not be started at boot. And Docker adds DOCKER to INPUT, so I need to make sure that my rules are in effect in all cases.
Well all the mess is because: in iptables, a packet is accepted if it is accepted by any rule. That means I must insert my REJECT rules before DOCKER/DOCKER-UESR, which might accept the packet.

This is no longer an issue in nftables! I can simply define my own tables and reject some packets as I like.

Finally, I don't need to touch the tables created by Docker via iptables-nft, instead I can create my own nft tables.

Conclusions

I had lots of worries about nftables, about scripting and working with Docker. As it turned out, none was actually an issue thanks to the new design of nftables!

2022-09-11

Migrating to Rootless Docker

 There are three ways of running Docker:

  • Privileged: dockerd run with root, container root = host root
  • Unprivileged: dockerd run with root, container root = mapped user
  • Rootless: dockerd run with some user, container root = some user
I've been hestitating between Unprivileged and Rootless. On one hand, rootless sounds like a great idea; on the other hand, some considers unprivileged user namespace as a security risk.

Today I decided to migrate all my unprivileged containers to rootless ones. I had to enable unprivileged user namespace for a rootless LXC container anyways.


A Cryptic Issue


The migration is overall smooth, except for a cryptic issue: sometimes DNS does not work inside the container.

The symptom is rather unusual: curl works but apt-get does not work. For quite a while I'd thought that apt-get uses some special DNS mechanism.

After some debugging, especially comparing files /etc/ between a unprivileged container and a rootless container, I realized that non-root users cannot access /etc/resolve.conf. This is also quite hidden because I apt-get uses a non-root user to fetch HTTP.

Further digging, eventually I figured that there are special POXIS acl on ~/.local/share/docker/containers, and I should set o+rx by default.

Pros

It is definitely an advantage to elimiate root processes. It is also now easier to manage the containers. I no longer need special visudo files to call maintenance scripts.

With rootless containers, all network interfaces are in a dedicate namespace. A nice side-effect is that all iptables rules will be constrained in this namespace as well. Services running on the host are no longer accessible by the containers, if they are listening on 0.0.0.0 or localhost. Further, Docker will no longer pollute my iptables rules. It will also be easier to migrate to nftables (on the host)

Cons

There is another side-effect with network namespaces: it is trickier to manage port forwarding and firewall rules between the host and containers. slirp4netns and docker proxy handles most parts well, but still a big ugly. Perhaps lxc-user-nic might work better, but it is only experimentally supported in rootlesskit at the moment.