2022-09-22

Thoughts on Herb Sutter's cppfront

Recently I learned about cppfront, which is an experimental new syntax for C++ by Herb Sutter. It was nicely explained in CppCon, and I really enjoyed watching the video.

Roughly I'd view it as Rust with C++ interop. It's kind of some syntax sugar, preprocessor or a dialect. It has also enforced style guide or annotations that are compiler-aware. I like it mostly, I feel excited. 

And let me try to explain in a logic way.

The Syntax

I don't like it, nor do I hate it. I asked myself, do I not like it, just because I am not familar with it? The answer is yes. So it's my problem, not cpp2's. It didn't took me too much time to get myself comfortable (but not fluent) with Rust, so I think cpp2 won't be a problem.

Comparing with Rust

I got this several times while reading the docs or watching the video: 

  • If something is bad, let's remove it from the language instead of keeping teaching "don't do this, don't do that". Examples: NULL, union, unsafe casts.
  • If something is good, let's make it by default. Examples: std::move, std::unique_ptr, [[nodiscard]].

Well, this are not new to me, I saw almost exactly the same words before. Rust is a good example of designs with such goals in mind.

Suppose we want really want to achieve them with current C++:

Some may be achieved by adding new flags to existing compilers, say, a new flag to make [[nodiscard]] default for all functions.

Some may be achieved by teaching and engineer-enforced style guides, e.g. the usage of std::move or std:unique_ptr.

Some may be achieved by static analyzers, linters, e.g. usage before initialization (only partially achievable)

Some may be achieved by libraries, including macros, e.g. returning multiple variables via tuple or a temporary struct.

However it'd never be as good as native support in compilers. 

A small example is absl::Status in abseil vs the question mark syntax in Rust. And a large example would be the general memory safety, ownership in Rust.

I always think that C, C++ and Rust are essentially the same. All technologies are there for building compilers and analyzers, it's just a matter of:

  • How much information do we provide to compilers? E.g. ownership, types.
  • How much freedom do we have? E.g.  raw pointer arithmetics, arbitrary type casts.
I believe that by inventing enough annotations (e.g. ownerships, lifetime) and limiting some language features (e.g.  raw pointers arithmetics), it'd be possible to make C++ "Rust alike". But is it worth it?

The Biggest Obstacle 

I agree with Herb that "backwards compatibility" is the biggest and probably the only obstacle. Again, Rust is a perfect example without such constraint.

Honestly the usage of "<<" and ">>" for streams already sounds a bit odd, but I've never had strong feelings because that was in the hello world code while I was learning C++, and (luckily?) I didn't have a C background. 

It surprised me that the meaning of "auto" was (successfully) modified in C++11. And the usage of "&&" for rvalue references also sounds a bit strange to me. It feels to me that the process is "we have a new shiny feature, let's see what existing symbols can we borrow to give a new meaning". A similar case is the usage of "const" at the end of class member function declarations. I remember I asked someone, why here? And the answer was: probably they didn't have a better place to put it.

My guess (which could be really random, since I don't really know much behind-the-scene design thoughts) would be, it'd cause minimum changes to existing compilers. Or even better, no change at all: overloading operators has been a proper C++ feature, so why not repurposing it for streams since "left shift" does not make sense anyways? (Therfore nobody is/should use it as a shift operator).

I like the idea that cpp2 and cpp1 can interop at the source level. If cpp2 will become successful, some features may be left in cpp1 as the "unsafe features", like C# and Rust do.

I like Rust, but I've been hesitating to use it because I know most of the time I'd have to start from scratch. C and C++ are good examples here. If I need a PNG library, I'd download libpng, read the doc and start coding. Python is another good example. 

I feel that Rust is still too young. The language itself (mostly syntax & compiler but without the ecosystem) is probably mature enough for Linux kernel, but I'm not sure about the libraries, after all I have not seen many lib**-rust in the Ubuntu repository.

Yes I probably should look at the Cargo registry, and yes my knowledge is quite outdated, so probably the libraries are already enough for me, but I'd yet to double check. But this is the problem: I'm already path-depending C++ (and Python), so "easily migration from C++" would be a huge selling point for me.

The Preprossor

Quoting Herb "there is no preprocessor in cpp2". But actually I think cppfront itself, or at least part of it, is a preprocessor.

My favorite part is the optional out-of-bound check for the "x[i]" operator, the check may be turned on or off by compiler flags, it's not as ugly/long as "x.super_safe_index_subscript_with_optional_boundary_check(i)", not is it as wild as "MAGICAL_MACRO_SUBSCRIPT(x, i)". I also like that it's a loose protocol with the compiler: it automatically works as long as x.size() is defined. Somehow like how range-based for works.

So if we want this feature, either we redefine the "[]" operator in C++, or translate every single "x[i]" to "x.super_safe_index_subscript_with_optional_boundary_check(i)", which is a preprocessor. (maybe compiler is the correct term, but it feels like a preprocessor to me).

I still remember a reading about technique in a game programming book, before Lua became popular. The technique was to define a set of C/C++ macros, to create a mini-scriping language. Then engineers may add lots of game logic using this scripting language, that will be way more readable and less error-prone than C++. Somehow cppfront resembles that technique a bit.

Conclusions

I like almost everything about cpp2 so far. But it is (too) early, and there could be lots of hidden/implied issues that I don't know yet, or that are not revealed yet.

Rust would be a very good competitor here, only if it has C++ interop at the source level. It probably never will, but that's OK.

Carbon sounds very similar as well. Its README mentioned the analogue of "JavaScript and TypeScript". Interestingly, Herb mentioned the same thing as well for cpp2. And this is why I see huge overlaps between cpp2 and Carbon. I do hope both projects will not divert too far from each other, such that most knowledge, technologies or code can be shared.

Overall I feel excited about recent changes and proposals of C++. My overly-simplified view is C++11 was driven by nice features in languages like C# and Java. And this new wave is probably driven by nice features in languages like Go and Rust. 

Hopefully, one day, the reason that I like C++ will no longer be Stockholm syndrome.

清理Ubuntu软件包

我的小服务器上一直装了个Ubuntu Desktop,不过安装之后一直没用过GUI,而且各种依赖的包有时也挺烦人的,比如gvfs和tracker自带的systemd user service,我还得手动给若干用户禁用掉。

本来我是想着留个Desktop,万一紧急情况可以上网查查命令。不过有网的话,最差情况我应该也能临时装一个X和浏览器,估计问题不太大。

于是我决定把Ubuntu Desktop换成Ubuntu Server,主要还是把gnome的包都删了。

一番折腾以后,安装包的数量从大约1800降到了800以下。舒服!

2022-09-20

Moving Items Along Bezier Curves with CSS Animation (Part 2: Time Warp)

This is a follow-up of my earlier article. I realized that there is another way of achieving the same effect.

This article has lots of nice examples and explanations, the basic idea is to make very simple @keyframe rules, usually just a linear movement, then use timing function to distort the time, such that the motion path becomes the desired curve.

I'd like to call it the "time warp" hack.


Demo




How does it work?


Recall that a cubic Bezier curve is defined by this formula:

\[B(t) = (1-t)^3P_0+3(1-t)^2tP_1+3(1-t)t^2P_2+t^3P_3,\ 0 \le t \le 1.\]

In the 2D case, \(B(t)\) has two coordinates, \(x(t)\) and \(y(t)\). Define \(x_i\) to the be x coordinate of \(P_i\), then we have:

\[x(t) = (1-t)^3x_0+3(1-t)^2tx_1+3(1-t)t^2x_2+t^3x_3,\ 0 \le t \le 1.\]

So, for our animated element, we want to make sure that the x coordiante (i.e. the "left" CSS property) is \(x(t)\) at time \(t\). 

Because \(x(0)=x_0\) and \(x(1)=x_3\), we know that the @keyframes rule must be defined as

@keyframes move-x {
  from { left: x0; }
  to { left: x3; }
}
Now to determine the timing function, suppose the function is
cubic-bezier(u1, v1, u2, v2)
Note that this is again a 2D cubic Bezier curve, defined by four points \((0, 0), (u_1, v_1), (u_2, v_2), (1, 1)\). And the function for each coordinate would be:

\[ u(t) = 3(1-t)^2tu_1 + 3(1-t)t^2u_2 + t^3 \]
\[ v(t) = 3(1-t)^2tv_1 + 3(1-t)t^2v_2 + t^3 \]

Recall that, according to the CSS spec, at any time \(t\), the animate value \(\text{left}(t)\) is calculated as:

\[ \text{left}(t) = x_0 + v(t')(x_3 - x_0), \text{where}\ u(t')=t \]

Our first step is to set \(u_1=1/3\) and \(u_2=2/3\), such that \(u(t)=t\) for all \(t\).

Then we set:
\[ v_1 = \frac{x_1-x_0}{x_3-x_0}, v_2 = \frac{x_2-x_0}{x_3-x_0}  \]

This way we have

\[ v(t) = \frac{x(t) - x_0}{x_3-x_0} \]

Combining everything together, we know that if we set the animation-timing-function as

\[ \text{cubic-bezier}(\frac{1}{3}, \frac{x_1-x_0}{x_3-x_0}, \frac{2}{3},  \frac{x_2-x_0}{x_3-x_0}) \]

then we have \(\text{left}(t)=x(t)\) as desired.

Simliarly we can define @keyframes and animation-timing-function for \(y(t)\), then our CSS animation is completed.

Note: obviously the method does not work when \(x_0=x_3\) or \(y_0=y_3\), but in practice we can add a tiny offset in such cases.

Animation Timing


Observe that \(u(t)\) controls the mapping between the animation progress and the variable \(t\) of the curve. \(u_1=1/3\) and \(u_2=2/3\) are chosen to achieve the default linear timing. We can tweak the values of \(u_1\) and \(u_2\) to alter the timing.

Note that the methods from the previous article supports any timing functions, including "steps()" and "cubic-bezier()".

It's easy to see that a "cubic-bezier(u1, 1/3, u2, 2/3)" timing function for the previous article would be the same as setting the same values of \(u_1\) and \(u_2\) for the "time warp" version. In other words, animation timing is limited here, we have only the input progress mapping, but not the output progress mapping.

Of course the reason is we are already using the output progress mapping for the time warp effect.

2022-09-15

Restricting Network Access of Processes

I recently read this article, which talks about restricting (proactive) internet access of a process.

It is easy to completely disable internet/network access, by throwing a process into a new private network namespace. I think all popular sandboxing tools support it nowadays:

  • unshare -n
  • bwrap --unshare-net
  • systemd.service has PrivateNetwork=yes
  • docker has internal network
But the trickier, and more realistic scenario is:
  • [Inbound] The process needs to listen one or more ports, and/or
  • [Outbound] The process needs to access one or more specific IP address/domain
I can think of a few options.

Option 1: Firewall Rules


Both iptables and nftables support filter packets by uid and gid. So the steps are clear:
  • Run the process with a dedicate uid and/or gid
  • Filter packets in the firewall
  • If needs, regularly query DNS and update the allowed set of IP addresses.
This option is not very complicated, and I think the overhead is low. While the DNS part is a bit ugly, it is flexiable and solves both inbound and outbound filtering.

On the other hand, it might be a bit difficult to maintain it, because the constraints (firewall rules) and the processes are in different places.

Option 2: Systemd Service with Socket Activation


Recently I've been playing with sandboxing flags in systemd. Especially systemd-analyze. Our problem can be solved with systemd + socket activation like this:
  • Create my-service.socket that listens to the desire address and port
  • Create my-service.service for the process, with PrivateNetwork=yes.
    • The process has no access to network, it receives a socket from systemd instead, i.e. socket activation
However, it only works if the process supports socket activation. If not, there is a handy tool systemd-socket-proxyd designed for this case. There are nice examples in the manual.

I tested the following setup:
  • my-service-proxy.socket, which activate the corresponding service
  • my-service-proxy.service, which runs systemd-socket-proxyd.
    • The service must have PrivateNetwork=yes and JoinsNamespaceOf=my-service.service
  • my-service.service, the real process, with PrivateNetwork=yes
This way, the process can accept connections at a pre-defined address/port, but has no network access otherwise.

It works for me, but with a few shortcomings:
  • It only worked for system services (running with root systemd). I suspected that it might work with PrivateUsers=yes, but it didn't.
  • It is quite some hassle to write and maintain all these files.
For outbound traffic, systemd can filter by IP addresses, but I'm not sure about ports. For domain filtering, it might be possible to borrrow ideas from the other two options, but I suppose it won't be easy.

Option 3: Docker with Proxy


If the process in question is in a Docker container, inbound traffic is already handled by Docker (via iptables rules).

For outbound traffic, the firewall option also works well for IP addresses. Actually it might be easier to filter packets this way.

For domains, there is another interesting solution: use a proxy. Originally I had some vague ideas about this option, then I found this article. I learned a lot from it and I also extended it.

To explain how it works, here's an example docker compose snippet:

networks:
  network-internal:
    internal: true
  network-proxy:
    ...

services:
  my-service:
    # needs to access https://my-domain.com
    networks:
      - network-internal
    ...
  my-proxy:
    # forwards 443 to my-domain.com:443
    networks:
      - network-internal
      - network-proxy
    ...


The idea is that my-service runs in network-internal, which has no Internet access. But my-service may access selected endpoints via my-proxy.

There are two detailed problems to solve:
  • Which proxy to use?
  • How to make my-service talks to my-proxy?


Choosing the Proxy


In the article the author uses nginx. Originally I had thought it'd be a mess of setting up SSL (root) certificates. But later I learned that nginx can act as a stream proxy that forwards TCP/UDP ports, which make thing much easier.

On the other hand, I often use socat to forwards ports as well, which can also be used here. 

Comparing both:
  • socat is lighter-weighted, the alpine/socat docker image is about 5MB, while the nginx docker image is about 55MB.
  • socat can be configured via command line flags, but nginx needs a configuration file.
  • socat can support only one port, but nginx can manage multiple ports with one instance.
So in practice I'd use socat for one or two ports, but I'd switch to nginx for more. It'd be a hassle to create one container for each port.


Enabling the Proxy


If my-service needs to be externally accessible, the ports must be forwarded and exposed by my-proxy.

For outbound traffic, we want to trick my-service, such that it will see my-proxy when it wants to resolve, for example, my-domain.com.

I'm aware of three options:

#1 That article uses links, but the option is designed for inter-container communcations, and it is deprecated.

#2 Another option is to assign a static IP of my-proxy, then add an entry to extra_hosts of my-service.

#3 Add an aliases entry of my-proxy on network-internal.

While #3 seems better, it is does not just work like that, because when my-proxy wants to send the real traffic to my-domain.com, it will actually send to itself because of the aliases.

To fix it, I have a very hacky solution:

networks:
  network-internal:
    internal: true
  network-proxy:
    ...

services:
  my-service:
    networks:
      - network-internal
    ...
  my-proxy1:
    # forwards 443 to my-proxy2:443
    networks:
      network-internal:
        aliases:
          - my-domain.com
      network-proxy:
    ...
  my-proxy2:
    # forwards 443 to my-domain.com:443
    networks:
      - network-proxy
    ...


In this version, my-proxy1 injects the domain and thus hijacks traffic from my-service. Then my-proxy1 forwards traffic to my-proxy2. Finally my-proxy2 forwards traffic to the real my-domain.com. Note that my-proxy2 can correctly resolve the domain because it is not in network-internal.

On the other hand, it might be possible to tweak the process to ignore local hosts, but I'm not aware of any easy soltuion. 

I use #3 in practice despite it is ugly and hacky, mostly because I don't want to set up static IP for #2.

More on Docker, or Docker Compose, it is possible to specify the network for building containers, which could be handy.

Conclusions


In practice I use option 3 with a bit of option 1. 

With option 3, if I already have a Docker container/image, it'd be just adding a few lines in docker-compose.yml, maybe plus a short nginx.conf file.

With option 1, the main concern is the rules may become out of sync with the processes. For example, if  the environment of the process is changed (e.g. uid, pid, IP address etc), I may need to update the firewall rules to stay up-to-date. But this could be easily missed. I'd set up firewall rules for stable services and generic rules

Option 2 could be useful in some cases, but I don't enjoy writing the service files. And it seems harder to extend (e.g. add a proxy).

2022-09-13

Migrating from iptables to nftables

nftables has been enabled by default in latest Ubuntu and Debian, but not fully supported by Docker.

I've been hestitating about migrating from iptables to nftables, but managed to do it today.

Here are my thoughts.

Scripting nftables

The syntax of iptables and nftables are different, but not that different, both are more or less human readable. However, nftables is clearly more friendly for scripting.

I spent quite some time in a python script to generate a iptables rule set, and I was worried that I need lots of time migrating the script. Aftering studying the syntax of nftables, I realized that I could just write /etc/nftables.conf directly. 

In the conf file I can manage tables and chains in a structured way. I'm free to use indentations and new lines, and I no longer need to write "-I CHAIN" for every rule.

Besides, I can group similar rules (e.g. same rule for different tcp ports) easily, and I can define variables and reuse them. 

Eventually I was able to write a nice nftables rule set quickly with basic scripting syntax. It was not as powerful as my custom python script, but it is definitely easier to write. Further, I think it might be worth learning mapping in the future.

Tables & Chains in nftables

Unlike iptables, nftables is decentralized. Instead of pre-defined tables (e.g. filter) and chains (e.g. INPUT), nftables uses hooks and priorities. It sounds like event listeners in JavaScript.

One big difference is: a packet is dropped if it is dropped any matching rule, and a packet is accepted only if all relevant chains accept the packet. Again, this is similar to event listeners. On the other hand, in iptables, a packet is accepted if it is accepted by any rule. It sounds a bit confusing at the beginning, but I think nftables is more flexible, especially in my cases, see below.

Docker & nftables

Docker does not support nftables, but it add rules via iptables-nft. It was painful to managed iptables rules with Docker:
  • Docker creates its own DOCKER and DOCKER-USER chains, which may accept some rules.
  • If I need to control the traffic from/to containers, I need to make sure that the rules are defined before or in DOCKER-USER.
  • Docker may or may not be started at boot. And Docker adds DOCKER to INPUT, so I need to make sure that my rules are in effect in all cases.
Well all the mess is because: in iptables, a packet is accepted if it is accepted by any rule. That means I must insert my REJECT rules before DOCKER/DOCKER-UESR, which might accept the packet.

This is no longer an issue in nftables! I can simply define my own tables and reject some packets as I like.

Finally, I don't need to touch the tables created by Docker via iptables-nft, instead I can create my own nft tables.

Conclusions

I had lots of worries about nftables, about scripting and working with Docker. As it turned out, none was actually an issue thanks to the new design of nftables!

2022-09-11

Migrating to Rootless Docker

 There are three ways of running Docker:

  • Privileged: dockerd run with root, container root = host root
  • Unprivileged: dockerd run with root, container root = mapped user
  • Rootless: dockerd run with some user, container root = some user
I've been hestitating between Unprivileged and Rootless. On one hand, rootless sounds like a great idea; on the other hand, some considers unprivileged user namespace as a security risk.

Today I decided to migrate all my unprivileged containers to rootless ones. I had to enable unprivileged user namespace for a rootless LXC container anyways.


A Cryptic Issue


The migration is overall smooth, except for a cryptic issue: sometimes DNS does not work inside the container.

The symptom is rather unusual: curl works but apt-get does not work. For quite a while I'd thought that apt-get uses some special DNS mechanism.

After some debugging, especially comparing files /etc/ between a unprivileged container and a rootless container, I realized that non-root users cannot access /etc/resolve.conf. This is also quite hidden because I apt-get uses a non-root user to fetch HTTP.

Further digging, eventually I figured that there are special POXIS acl on ~/.local/share/docker/containers, and I should set o+rx by default.

Pros

It is definitely an advantage to elimiate root processes. It is also now easier to manage the containers. I no longer need special visudo files to call maintenance scripts.

With rootless containers, all network interfaces are in a dedicate namespace. A nice side-effect is that all iptables rules will be constrained in this namespace as well. Services running on the host are no longer accessible by the containers, if they are listening on 0.0.0.0 or localhost. Further, Docker will no longer pollute my iptables rules. It will also be easier to migrate to nftables (on the host)

Cons

There is another side-effect with network namespaces: it is trickier to manage port forwarding and firewall rules between the host and containers. slirp4netns and docker proxy handles most parts well, but still a big ugly. Perhaps lxc-user-nic might work better, but it is only experimentally supported in rootlesskit at the moment.

2022-08-02

Moving Items Along Bezier Curves with CSS Animation (Part 1: Constructions)

TLDR: This article is NOT about cubic-bezier(), and it is not about layered CSS animations (for X and Y axes respectively). It is about carefully crafted combined animation, that moves an element along any quadratic/cubic Bezier curve.

UPDATE: Here's the link to part 2.

Following my previous post, I continued investigating competing CSS animations, which are two or more CSS animations affecting the same property.

I observed that two animations may "compete". In the following example, the box has two simple linear animations, move1 and move2. It turns out the box actually moves along a curved path:

So clearly it must be the combined effects from both animations. Note that in move2, the `from` keyframe was not specified. It'd look like this if it is specified:

In this case, it seems only the second animation takes effects.

Actually this is not surprising, in the first case, the starting point of move2 would be "the current location from move1".  But in the second case, move2 does not need any "current location", so move1 would take no effect.

I further examined the actual behavior of the first case. If move1 is "move from \(P_0\) to \(P_1\)" and move2 is "move to \(P_2\)". At time \(t\):

- The animated location of move1 is \(Q_1=(1-t)P_0 + tP_1\)

- The animated location of move1+move2 is \(Q_2=(1-t)Q_1 + tP_2\)

This formula actually looks very similar to the Bezier curve, but I just double checked from Wikipedia, they are not the same.

Fun fact: I came up with this string art pattern during high school, I realized that it is not a circle arc, but I didn't know what kind of curve it is. Now I understand that it is a Bezier curve.

Quadratic Beziers in string art


Build a quadratic Bezier animation path with two simple animations

The quadratic Bezier curve is defined by this formula:

\[B(t) = (1-t)^2P_0 + 2(1-t)tP_1 + t^2P_2, 0\le t \le 1\]

But our curve looks like \(f(t) = (1-t)^2 P_0 + t(1-t)P_1' + tP_2\). Note that I use \(P_1'\) to distinguish it from P1 above.

If we set \(P_1'=2P_1-P_2\), we'll see that f(t)=B(t) for all t. So it is a Bezier curve, just the control point is a bit different.

Here's an interactive demo, which is based on this codepen.


An Alternative Version

Another option is to follow the construction of a Bezier curve:

Bézier 2 big

This version needs slightly more code, but it does not require much math. Just observe that a quadratic Bezier curve is a linear interpolation of two moving points, which are in turn obtained by another two linear interpolations.

All these linear interpolation can be easily implemented with CSS animation on custom properties. Here is an example:

Here I just have one animation, which does multiple linear interpolation at the same time. In this case I have to make sure all animated custom properties are defined with @property, which was not the case in the previous example.

How about cubic Bezier curves?

Both versions can be extended to make animation along cubic Bezier paths.

The first version needs a bit more math, but doable. 

\[ P_1' = 3P_1 - 3P_2 + P_3\]
\[ P_2' = 3P_2 - 2P_3\]


The second version just involves more custom properties.


Actually both versions may be extended to even higher-degree Bezier curves, and 3D versions.

For the first version, I suppose there would be a generic formula for any \(P_i'\) for any \(N\)-order curve, but I did not spend time in it.

2022-08-01

Studying CSS Animation State and Multiple (Competing) Animations

[2022-08-04 Update] Most of my observations seems confirmed in the CSS animation spec.

I stumped upon this youtube video. Then after some discussion with colleages,  I was really into CSS-only projects.

Of course I'd start with a rotating cube:

Then I built this CSS-only first person (rail) shooter:

It was lots of fun. Especially I learned about the checkbox hack

However there were two problems that took me long to solve.


1. Move from in the middle of an animation

The desired effect is:
  • An element is moving from point A to point B.
  • If something is clicked, the element should move from its current state (somewhere between A and B, in the middle of the animation/transition) to another point C.
This is for the last hit of the boss, the boss is fastly moving. I'd like the boss to slowly move to the center if being hit.

The first try: just set a new animation "move-to-new-direction" when triggered, but it does not work. The new animation starts with the "original location" instead of the "current location".


This is because the old animation was removed in the new rule, so the animation state would be instantly lost.

As a side note, it turned out this is easier to achieve with CSS transition:


However transition does not well for me because I need non-trivial scripted animation.

In order to let the browser "remember the state", I tried to "append a new animation" instead of "setting new animations". Which kind of works.


However it is clear that the first animation is not stopped. Both animations are competing with the transform property, and eventually the box will always stop, due to the second animation. It'd be obvious that the order of the animation matters.

Then I tried to play with the first animation in the triggerd rule. 

The original first animation is "default-move 2s infinite 0s". It turns out that it does not matter too much if I
  • change infinite to 100 (or another large number), and I trigger the change before the animation finished
  • change delay from 0s to 2s (or another multiple of 2s), and I trigger the change after that delay.
However it will be an obvious change if I modify the duration, or change the delay to 1s:


So I figured that the browser will keep the state of "animation name" and "play time". It'd re-compute the state once the new value is set for the animation property. 

To stop the first animation from "competing", I figured that I could just set the state to paused, which works well:


And similary, I should keep all the setting (duration, delay etc) of the first animation, of carefully change it to a "compatible" set.

This basically solves my problem. And finally, actually I realized that the boss should just die at where he's shot, so I'd just pause the current animation. It's also easier than copying all existing animations and adding a new one, in the CSS ruleset.


2. Replaying Animation with Multiple Triggers

This is needed for the weapon firing animation, whenever an enemy is being shot, the "weapon firing" animation should play.

This simple implementation does not work well:


Both input elements work individually, however if the first checkbox is checked before clicking on the second, the animation does not replay.

Now it should be clear why this happens: the browser would remember the state of "animation name" and "play time". This can be verified by tweaking duration and delay:


Especially, it would be a good solution, if I know exactly when the animation should be played (despite of the trigger time). This is not the case for my game, because I want to play the weapon-firing animation immediately after the trigger.

To make both triggers work, one solution is to "append the animation":


But this could be repetitive and hard to maintain for a large number of triggers (on the same element).

A better solution is to define multiple identical @keyframes with different names. Then different triggers can just set individual animation names. The browser would replay the animation because the animation name changes:


I think this one is better because the animation rules can be written with a @for loop in SCSS.

In my game I ended up using multiple (duplicate) elements with individual triggers. All elements are hidden by default. Upon triggered, each element would show up, play the animation, then hide.

One nice thing about this solution, I can still control the "non-firing weapon" with scripted animations, because the triggers will not override the animation CSS property.

Note that here it is assumed the triggers will start in a pre-defined order, otherwise the CSS rules would not work properly.


Conclusion

I find most CSS-only projects fascinating. Pure animatoin is one thing, but some interactive projects, especially games, are quite inspiring.

I'm also wondering if there is any practical value besides the "fun factor". It'd be interesting if we can export some simple 3D models/animations from Blender to CSS.

2022-05-06

Setting up sslh as transparent proxy for a remote container

 I have an NGINX server that is publicly accessible. It has been deployed in the following manner:

  • Machine A
    • Port forwarding with socat: localhost:4443 ==>  0.0.0.0:443
  • Machine B
    • Running NGINX in a Docker container
    • Port forwarding by Docker: <container_ip>:443 ==> localhost:4443
    • Port forwarding by SSH to Machine A: localhost(B):4443 ==> localhost(A):4443
This in general works. Machine A is published to my domain, and the traffic to 443 is forwarded to NGINX in a few hops.

However there is a problem: the NGINX server never sees the real IP of the client, so it is impossible to depoly fail2ban or other IP address based tools. So I wanted to fix it.


Step 1: VPN

The first step is to connect machine A and B with a VPN. I feel that it would also work without it, but the iptables rules could be more tricky. 

WireGuard is my choice. I made a simple setup:
  • Machine A has IP: 10.0.0.2/24
  • Machine B has IP: 10.0.0.1/24
  • On both machines, the interface is called wg0, AllowedIPs of the other peer is <other_peer_ip>/32 
  • wg-quick and systemd are used manage the interface.

Step 2: Machine A

Configure sslh:

sslh --user sslh --transparent --listen 0.0.0.0:443 --tls 10.0.0.1:4443

This way sslh will create a transparent socket that talks to Machine B. When the reply packets come back, we need to redirect them to the transparent socket:

iptables -t mangle -N MY-SERVER
iptables -t mangle -I PREROUTING -p tcp -m socket --transparent -j MY-SERVER
iptables -t mangle -A MY-SERVER -j MARK --set-mark 0x1
iptables -t mangle -A MY-SERVER -j ACCEPT
ip rule add fwmark 0x1 lookup 100
ip route add local 0.0.0.0/0 dev lo table 100

Here I'm forwarding all transparent sockets, which is OK because sslh is the only one that creates such traffic.

Step 3: Machine B

Now machine A will start routing packets, the source address will be of the real HTTP client, not Machine A. However WireGuard will block them because of AllowedIPs. 

To unblock:

wg set wg0 peer MACHINE_A_PUB_KEY allowed-ips 10.0.0.2/32,0.0.0.0/0

Note that I cannot simply add 0.0.0.0/0 to AllowedIPs in the conf file, because wg-quick will automatically set ip routing.

My Linux distro and Docker already set up some good default values for forwarding traffic towards containers:
  • IP forwarding is enabled
  • -j DNAT is set to translate the destination IP address and port.
Now NGINX can see the real IP addresses of clients. It will also send response traffic back to that real IP. I need make sure that the traffic is sent back to machine A.

Note that if NGINX proactively initiates traffic to the Internet, I still want it to go through the default routing on machine B. But I suppose it is also OK to route all traffic to machine A if preferred/needed.

iptables -N MY-SERVER
# Tag incoming traffic towards NGINX
iptables -I FORWARD -i wg0 -o docker0 -m conntrack --ctorigdst 10.0.0.1 --ctorigdstport 4443 -j MY-SERVER
iptables -A MY-SERVER -j CONNMARK --set-xmark 0x01/0x0f
iptables -A MY-SERVER -j ACCEPT
# Tag response traffic from NGINX
iptables -t mangle -I PREROUTING -i docker0 -m connmark --mark 0x01/0x0f -j CONNMARK --restore-mark --mask 0x0f

# Route all tagged traffic via wg0
ip rule add fwmark 0x1 lookup 100
ip route add 0.0.0.0/0 dev wg0 via 10.0.0.2 table 100

Now everything should work.

Notes

I mainly referred to the official guide of sslh. I also referred to a few other sources like Arch Wiki. 

In practice, some instructions did not apply to my case:

  • I did not need to grant CAP_NET_RAW or CAP_NET_ADMIN to sslh. Althougth it is mentioned in an sslh doc and a manpage. Maybe the sslh package already handled it automatically.
  • On machine A I did not need to enable IP forwading. Actually this could make sense, because routing is happening on machine B.
  • I did not need to enable route_localnet on machine A

2022-04-02

Home Server Tinkering

Weeks ago I  purchased a secondhand machine. Since then I have been tinkering this little box.

The Perfect Media Server site is a good place to start with. Arch Linux Wiki is my go-to learnning resource, even though I use Ubuntu.

Filesystem

I'd be super paranoid and careful, as this is my first time manually configuring a disk array. Basically my optoins include:

  • ZFS
  • btrfs
  • Snapraid (or even combnied ZFS/btrfs)
  • Unraid
My considerations include:
  • Data integrety, which is the most important.
  • Maintenance. I want everything easy to set up and maintain.
  • Popularity. There will be more doc/tutorial/discussions if the technology is more popular.
Eventually I decided to use ZFS with raidz2 on 4 disks. 

I also took this chance to learn configuring disk encryption. I decided to use LUKS beneath ZFS. I could have just used ZFS's built-in encryption, but I thought LUKS is fun to learn. It really was. The commands are way more user-friendly that I had expected.

Hardening SSH

Most popular best practices include:
  • Use a non-guessable port.
  • Use public key authenticatoin and disable password authentication.
  • Optionally use an OTP (e.g. Google Authenticator) authentication.
  • Set up chroot and command restrictions if applicable. E.g. for backup users.

Various Routines

  • Set up remote disk decryption via SSH, with dropbear.
  • Set up mail/postfix, so I will receive all kinds of system errors/warnings. E.g. from cron.
  • Set up ZED. Schedule scrubbing with sanoid.
  • Set up samba and other services.
  • Set up backup routines.

Containers

I also took the chance to learn about Docker, and tried a couple of images. Not all of them are useful, but I found a few very useful:
  • Grafana + Prometheus. Monitoring system, UPS, air quality etc.
  • Photoprism. Managing personal photos
  • Pi-hole. Well I do have it running on my Pi, but I guess it's nice to have another option.
  • Hosting GUI software with web access. E.g. firefox. 
However there may be security concerns. See below.

Security Considerations

While I'd like to run userful software and services, I'd also want to keep my data safe. 

I want to protect my data from two scenarios:
  • Malicious/Untrusted code. I have heard so many news about malicious NPM packages in the last few years.
  • Human/Script errors. It happened with a popular package, where a whitepsace was unintended added into the install script, such that the command became "rm -rf / usr/lib/...". Horrible. For similar reason, I don't trust scripts that I wrote myself either.
At this moment I am not worried about DoS attacks.

User and File Permissions

The easiest option is to use different users for diffrent tasks. Avoid using root when possible. Also limit the resources that each user can access. 

This is a natural choice when I want other devices to back up data to my server. It is also useful when I need to run code in a "sandbox like environment". This is explained well in Gentoo Wiki.

There are two issues with this approach:
  1. It is not really a sandbox. It is straightfoward to prevent a user reading/writing some files, but it'd be trickier to limit other resource, like network, memory etc.
  2. It is tricky to maintain permissions for multiple users, especially when they need to access the same files with different scopes. ACL are better than the classic Linux permission bits, yet it can still become too complicated. I believe that complicated rules equal to security holes.

I created and applied different users for each docker image, but it was not enough. Discussed later below.


Mandatory Access Control (MAC)

Examples include AppArmor and SELinux.

Funny enough, years ago I thought AppArmor was quite annoying, because it kept showing popup messages. And now I proactively write AppArmor profiles.

I decided to write AppArmor profiles for docker images and all my scripts in crontab. I feel more assured knowing that my backup scripts cannot silently delete all my data.

Sandboxing

I had thought that chroot was a nice security tool, until I learned that it isn't. I found a couple of sandboxing options on Arch Wiki

However I don't see them fit well in my case. It should work for my scripts, but dedicated user + MAC sounds simpler to me. I also want to protect against malicious install scripts (e.g. NPM packages), and I feel that firejail/bubblewrap won't help too much here.

Sandboxing seems to be useful for beast software, like web browsers. However I'll also need to access it remotely, so I'd just go for containers or VMs.

I suppose I may find useful scenarios later.

Docker / Container / VM

I use Docker when
  • User + MAC is not enough.
  • I do not trust the code.
  • It is difficult to deploy to the host.
Security best practices include:
  • Do not run as root
  • Drop all unnecessary capabilities. (Most images don't need anything at all)
  • Set no-new-priviledges to true.
  • Apply AppArmor profiles.
[UPDATE: Obviously I did not see the whole story. Added a new section below]
I was quite surprised when I learned that root@container == root@host, unless I'm running rootless docker. What's worse, almost all docker images that I found use root by default.

While I managed to run most containers without root, many GUI-related container really want to use root. I really hate it and I started looking for rootless options.

Instead of mutiple GUI containers, I decided to run an entire OS. This will be my playground which has no access to my data. Docker is not designed for this task, although probably it can still do the job if configured correctly.

VMs (e.g. VirtualBox) are my last resorts. They are quite laggy on my box. It is also tricky to dynamically balance the load, e.g. I have to specify the max CPU/RAM beforehand.

I learned that Kata Container is good for this tast. It is fast and considered very secure. However I didn't find an easy way of depolying it. (Somehow I don't like Snap and disabled it on my machine. Well now snap is almost required by Kata Container, LXD and Firefox, maybe I should give it a go some time?)

Eventually I turned to LXC. It was quite easy to deploy a Ubuntu box. I am very happy with the toolchain and the design choices. For example the root filesystem of the container is exposed as a plain directory tree on the host, instead of (virtual disk) images.

[UPDATE] Containers without root@host

I really dislike it, that processes in containers can be run as root@host. Therefore I was looking for "rootless options". There are in fact two options:

  1. Container daemon run as root. Contaniers run as non-root. 
  2. Both container daemon and containers run as non-root.

#1 means to turn on user namespace mapping for containers in Docker or LXC. #2 means to further configure the daemon of Docker or LXC.

#2 seems more secure, but it requires another kernal feature CONFIG_USER_NS_UNPRIVILEGED, which might have security concerns. So funny enough, it is both "more secure" and "less secure" than #1.

While I don't know anything deeper, I'm slighly leaning towards #1. I will keep an eye on #2 and maybe turn to it when the secury concerns are resolved.

What's Next

Probably I will try to improve the box to reload services/container on failure/reboot. Maybe systemd is enough, or maybe I need something like Kubernetes or Ansible. Or maybe I can live well without them.


2022-03-29

Fix broken sudoers files

Lesson learned today: An invalid sudoers file can break the sudo command, which in turn prevent the sudoers file from being edited via sudo. 

The good practice is to always use visudo to modify sudoers file. In my case I needed to modify a file inside /etc/sudoers.d, where I should have used `visudo -f`.


To recover from invalid sudoers files, it is possible to run `pkexec bash` to gain root access. However I got an error "polkit-agent-helper-1: error response to PolicyKit daemon: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: No session for cookie"


Solution to this error:

Source: https://github.com/NixOS/nixpkgs/issues/18012#issuecomment-335350903

- Open two terminals. (tmux also works)

- In terminal #1, get PID by running `echo $$`

- In terminal #2, run `pkttyagent --process <PID>`

- In terminal #1, run `pkexec bash`



2022-01-16

On Data Backup

Around 2013, every year I'd burn all my important data into a single DVD, that is 4.7GB. Nowadays I have ~5TB data, and I don't even bother optimizing 5GB data.

Background

I realized that it is the time to consider backup. I guess I have seen enough signs.
  • The NAS shows that the disks are quite full.
  • I just happened to see article and videos about data backup.
  • I found corrupted data in my old DVDs.
  • I realized that most of my important data are not properly backed up.
  • I have a few scripts that manage different files, which might contains bugs.
The goal is to have good coverage under acceptable cost.

The Plan

All my data are categorized into 4 classes.

Class 1: Most Important + Frequently Accessed

Rougly ~50GB in total. Average file size is ~5MB. 
Examples include official documents, my artworks and source code.

The plan: sync into multpile locations to maximize robustness. Sometimes I chooes a smaller subset when I don't have enough free space.
In case some copies are down/corrupted, I can still access the data quickly.

Class 2: Important + Frequently Modified

Roughly ~500GB in total. Average file size is ~500KB.
Typically there are groups of small files, which must be used together.
Examples include source code, git repo and backup repo.
Note that it overlaps with Class 1.

The plan: hot backup with versioning/snapshots,  yearly cold archives.

Class 3: Important + Frozen

Roughly ~1TB in total. Average flie size is ~50MB.
Frozen means they are never (or at least rarely) changed once created.
Most data of this class are labeled, for example /Media/Video/2020/2020-01-03.mp4
Examples include raw GoPro footages.
Note that it overlaps with Class 1.

The plan: hot backup with versioning/snapshots, labeled data are directly sync'ed to cold storage, unlabled data go to yearly cold archives.

Class 4: Unimportant

The rest of the data are not important, I wouldn't worry too much if they are lost, but I'm happy to keep them with minimum cost.
Examples include downloaded Steam games.

The plan: upload some to hot backup storage, shoud I have enough quota. 
No cold archive is planned.

Thoughts

I have put lots of thought when designing the plan, and I'm happy with the result. 
On the other hand, I had too many headaches throughout the process. 
To name a few:


Hot Backup or Cold Archive
I had a hard time choosing between hot backups and cold archvies. Hot backups are more up-to-date but cold archives are safer.

Originally I had planned to use only one per data class (and the classes were defined slightly differently). But I just couldn't decide. 

The decision is to try both then revisit later.


Format of Cold Archives
There are two possibilities:
  1. Directly uploading the files, with the same local file structure
  2. Create an archive and upload it. This also includes chunk-based backup methods.
Note that cold storage are special:
  • Files on cold storage cannot be modified/moved/renamed, or more preciely it is more expensive to do so. 
  • There is typically cost per API/object. So we get a penalty for too many small objects.
With option 1 I can easily access individual files on the storage, but if I rename or move some files locally, it will be a diaster in the next backup cycle.
With oiption 2 there is no problem with too many files, but I have to download the whole archive in order to access a single file inside.  Also I will need to make sure that the archives do not overlap (too much), or they will just waste space.

My solution is to organize and label the data, mostly by year. Good news is that most frozen data can be labeled this way, and they are often large files. This way it is mostly safe to upload them directly, since files may be added or removed, they are unlikely to be renamed or modified.

For unlabeled data, I'd just create archives every year, the size is small comparing with labeled data, so I wouldn't worry about it.


Format of Hot Backups
There are also two possibilities:
  1. File-based. Every time a file is modified or removed, the old version is saved somewhere else.
  2. Chunk-based. All files are broken and store in chunks. Like git repos.
There are lots of things to consider, e.g. size, speed, safety/robustness, easiness to access.

The decision is to go for #1 for all relevant data classes. My thoughts are:
  • In the worse case, the whole chunk-based repo may be affected by a few rotten bits. This is not the caes for file-based solutions.
  • I want to be able to access individual files without special tools.
  • Benefits of chunk-base approaches include deduplication and smaller sizes (mostly for changed files). But it does not really apply for my data. Most big files are large video files. They are rarely changed and cannot be compressed much.
On the other hand, I do plan to try out some chunk-based software in the future.


Backup for Repos
In my NAS I have a few git repos and (chunk-based) backup repos. So how should I back them up?

On one hand, the repos already saved versions of source files, so simply syncing them into cloud storage should work well enough.
On the other hand, should there be local data corruption, the cloud version will also be damaged after one data sync.

The decision is to keep versions in the repo backups as well. Fortunately they are not very big.
I plan to revisit this later. And hopefully I never need to recover a repo like this. 



Backup Storage

I don't have enough extra HDDS to back up all my data. Anyways I prefer cloud storage for this task.
Both hot and cold ones need to be discussed individually.


Hot Backup

Most cloud storage services would work well as a hot backup repo. In general the files are always available for reading or writing, which make them suitable for simple rsync-alike backups, or chunk-based backups.

It is not too difficult to fit Class 1 data into free quota, although I do need to subset it.

It is tricky to choose one service for other classes, as I'd like to keep all backup data together. 
I have check a number of services. I found the followings especially interesting.
  • Backblaze B2
  • Amazon S3
  • Google Cloud Storage
  • Google Storage
I'd just pick one while balancing cost, speed and reputation etc. I wouldn't worry too much about software support, since all of them are popular.


Cold Archive

I only learned recently about cold archives from Jeff Geerling's backup plan. After some readings I find the concept really interesting.

Mostly I'd narrow down to the following:
  • Amazon S3 Glacier Deep Archive
  • Google Archival Cloud Storage
  • Azure Archive
I remember also seeing similar storage class from Huawei and Tencent, but the software support is not as good among open source tools that I have found.


Software

I'd like to manage all backup tasks on my Raspberry Pi. 

rclone, so-called the swiss army knife for cloud storages, is an easy winner. I just couldn't find another one the matches with it. The more I learn about the tool, the more I like it. To name a few observations:
  • Great coverage on cloud storage providers.
  • Comprehensive and well-written documents.
  • Active community.
  • Lots of useful features and safety checks
  • Output both human-readable and machine-readable information.
So I ended up writing my own scripts that calls rclone. It is always easy to execute a task like "copy all files from A to B, and in case some files in B need to be modified, save a copy in C". So I just needed to focus on defining my tasks, scoping the data and set up the routines. Well it is not trivial though, more on that later.

I also spent quite some time researching chunk-based backup tools. For example
I also checked a few others, but not as extensive as these three. Here are a few useful lists:
I pulled myself out of the rabbit hole, as soon as I realized that I don't need them at the moment. While I still cannot decide which one to use, should I need a chunk-based backup today. Here's a summary of my 2-page notes on these tools:
  • BorgBackup is mature (forked from Attic in 2010), but have limited support on backends.
  • restic is relatively new (first GitHub commit in 2014). It used to have performance issues with pruning, which seems to have been fixed. The backup format is not fixed (yet).
  • Duplicacy is even younger (first GitHub commit in 2016). The license is not standard, which concerns many people. Due to the lock-free design it is measured faster than others, especially when there are multiple clients connecting to the same repo. However it might waste some space in order to achieve that.
Maybe thing will change in a few years. I will keey an eye on them.


Technical Issues

I had quite a few issues when using OneDrive + WebDAV.
  • Limit on max path length
  • Limit on max file length
  • No checksums
  • No quota metrics.
Fortunatley most of them are not big problems.

Another issue, about 7zip, is that I cannot add empty directories in the archive without adding files in the directories. This is particularly important for my cold yearly archives.

Eventually I used Python and tarfile to achieve it. I probably can do the same with a 7z python library. But I used tarfile anyway because it is natively available in Python, plus I realized that most of the archives cannot be effectively compressed.


Next Steps

I probably will add a few more scripts to monitor and to verify the backups. For example, download and verify ~10GB data that is randomly selected from the backup repo.

I will also keep an eye on chunk-based solutions.