As part of our never-ending pursuit of staying secure, we have recently built an SSH jumpbox as a central, secure way to access our production instances on AWS. A fairly standard affair, although in this instance we solved the problem using Docker, numerous services (such as rsyslog and fail2ban) and related the jumpbox users to our AWS users for seamless management... so we thought we'd share how we did it!
What is an SSH Jumpbox?
A jumpbox is a host you connect/tunnel through to access a target (hidden) host. It performs no additional function beyond helping create a secure tunnel between the user and the target (hidden) host. SSH is the underlying technology used between the user and jumpbox to form the tunnel. Using SSH any port can be mapped back to the user, in order to do so we need to allow the jumpbox to talk to that host via the port in question using an EC2 Security Group.
Why an SSH Jumpbox?
- Simplicity - we don't yet need the overhead of a VPN, a Jumpbox is enough for our current needs and far cheaper
- Reduced Attack Surface - fewer hosts are publicly exposed. Only the jumpbox is publicly accessible
- Auditing - logging access is simpler as all users access internal hosts via the jumpbox
- Management - we have a single public host to secure/maintain/update instead of numerous hosts
- Single Responsibility - the jumpbox performs a single function and performs it well
Building an SSH Jumpbox
We settled on using Docker for creating our jumpbox, hosted on an EC2 instance via ECS. Docker was largely chosen because of the fast feedback loop from being able to test-drive the container (via `testinfra`) to running/debugging it locally (and consistently) and tearing it up/down on AWS easily.
The container is based on an alpine image for a smaller size and attack surface, running a handful of processes:
- `openssh` for our SSH server. We have locked down `/etc/ssh/sshd_config`:
- not allowing interactive mode (there's no reason to be on the jumpbox)
- not allowing password auth (as it is less secure than key-based)
- not allowing root login (as there should be no need to login as root)
- forcing SSH protocol 2
- `rsyslog` for managing our logs (and sending them to our logging platform)
- `fail2ban` for banning malicious activity per ip address
- `s6` overlay as our process supervisor, managing all the above processes
Here is our `Dockerfile`:
As part of building our jumpbox, we create a (password disabled) user on the jumpbox for each user on AWS. This is easily done using a bash script and `boto`. For each user created on the jumpbox, we get the public SSH key associated with respective AWS user and add it as an `~/.ssh/authorized_keys` (so the user is allowed to connect via SSH). This relates our AWS users to the jumpbox users and means there is no user/key sharing happening and we have cleaner/clearer auditing as a consequence! If someone new starts or leaves, we simply need to update our AWS users and kick off a CI build and deploy (which takes no more than 2 minutes) to refresh the container.
Testing our container
We use the rather awesome `testinfra` python package to help us test-drive our container. Using it we can test numerous things from packages installed to more complex tests such as checking a logging platform connection is 'ESTABLISHED' via `netstat`. We have over 20 tests, here's a snippet of some:
Running our container
As `fail2ban` uses iptables under the hood for banning ip addresses, the container needs to run with slightly elevated privileges. This translates into giving the container `NET_ADMIN` privileges (this is an inbuilt Linux privilege), the minimum privileges needed in order for the container to be able to modify `iptables` of the host. In a `docker run` command this translates as `--cap-add=NET_ADMIN`.
The container can also run in `--readonly` mode meaning it can't be modified, just for that added bit of security. We mount our log directory into the container meaning we also retain our log files after a deploy.