“An example of a really responsible system is the system the Romans used when they built an arch. The guy who created the arch stood under it as the scaffolding was removed. It’s like packing your own parachute.”
― Charles T. Munger
systems
I run most of my containers with LXC. Just recently I discovered that there is a automatic way of making snapshots and expiring them. All you have to do is a few commands. You can apply these options per single container or per profile. In my case the profile is called “prod-nvme”:
lxc profile set prod-nvme snapshots.pattern 'snapshot-{{creation_date.Format("20060102")}}-%d'
lxc profile set prod-nvme snapshots.expiry "14d"
lxc config set prod-nvme-wordpress snapshots.schedule "0 2 * * SAT"
Snapshots schedule follows CRON format so it’s very easy to use. See https://crontab.guru/#0_2_*_*_SAT for explanation of above schedule. All newly created snapshots will follow snapshots.pattern and they will expire after 14 days. This doesn’t apply to patterns created before setting the schedule. And it will apply to all manual snapshots created after setting up the schedule.
$ lxc info prod-nvme-wordpress Name: prod-nvme-wordpress Created: 2018/06/23 16:33 UTC Status: Running Type: container Profiles: default, prod-nvme ............. Snapshots: prod-nvme-wordpress-30-04-2020 (taken at 2020/04/30 20:08 UTC) (stateless) backup-1.05.2020 (taken at 2020/05/01 20:28 UTC) (stateless) backup-23.05.2020 (taken at 2020/05/23 21:07 UTC) (stateless) snapshot-20200606-0 (taken at 2020/06/06 00:00 UTC) (expires at 2020/06/20 00:00 UTC) (stateless)
I wanted simple and secure shell access to my home lab, which runs many containers. I have physical u2f key from Yubico, so I wanted to have second factor with it. Also, recording of SSH session would be nice. All of that I discovered in the Teleport service from Gravitational. See here: https://gravitational.com/teleport/
Usage
From the user perspective, you can access the Teleport service via web or the command line:


User manual is easy to follow and it is here: https://gravitational.com/teleport/docs/user-manual/
After successful login process, you can see all machines. In the “Login as” there is a list of usernames. E.g. you might have access to “tjarosik” username, but if that username doesn’t exist on a machine, you will not be able to log in.

You will see shell in your browser, when you click on user in “Login as” column. You can also join existing session or view and replay previous sessions in the sessions list:

From smartphone
It is also possible to use your smartphone to login securely with U2F:


Configuration
In my case, I have Teleport Proxy Web interface behind load balancer. One thing to remember is that ports, even default 443, must be specified explicitly in the config files. Setup is really simple. Here are sample configs:
Auth server and proxy config:
Single node config (in my case it’s a container running Ubuntu):
Uptime 15,364 days
There is a fascinating talk on youtube: https://www.youtube.com/watch?v=H62hZJVqs2o It’s rare thing to see system working for so long completely remotely. Worth watching!
I’m usually curios about how did it happen. What kind of choices there made to make that project possible. There are few things in this talk that can be applied in many domains (well, not everyone can build a spaceship ;)).
“Don’t make engineering choices which could limit the lifetime of a spacecraft.”
Also, there are 3 main design principles which helped this project last that long, and which I think can be applied to other systems as well:
1. Reliability
2. Redundancy
3. Reconfigurability