Root Conf is a conference on DevOps and Cloud Infrastructure. 2017 edition’s theme is service reliability. Following is my notes from Day 1.
-
State of the open source monitoring landscape
- The speaker of the session is the co-founder of Icinga monitoring system. I missed first ten minutes of the talk.-The talk is a comparison of all available OSS options for monitoring, visualization.
- Auto-discovery is hard.
- As per 2015 monitoring tool usage survey, Nagios is widely used.
- Nagios is reliable and stable.
- Icinga 2 is a fork of Nagios, rewrite in c++. It’s modern, web 2.0 with APIs, extensions and multiple backends.
- Sensu has limited features on OSS side and a lot of features on enterprise version. OSS version isn’t useful much.
- Zabbix is full featured, out of box monitoring system written in C. It provides logging and graphing features. Scaling is hard since all writes are written to single Postgres DB.
- Riemann is stream processor and written in Clojure. The DSL stream processing language needs knowledge of Clojure. The system is stateless.
- OpenNMS is a network monitoring tool written in Java and good for auto discovery. Using plugins for a non-Java environment is slow.
- Graphite is flexible, a popular monitoring tool for time series database.
- Prometheus is flexible rule-based alerting and time series database metrics.
- Elastic comes with Elastic search, log stash, and kibana. It’s picking up a lot of traction. Elastic Stack is extensible using X-PACK feature.
- Grafana is best for visualizing time series database. Easy to get started and combine multiple backends. - - Grafana annotations easy to use and tag the events.
- There is no one tool which fits everyone’s case. You have to start somewhere. So pick up a monitoring tool, see if it works for you else try the next one til you settle down.
-
Deployment strategies with Kubernetes
- This was talk with a live demo.
- Canary deployment: Route a small amount of traffic to a new host to test functioning.
- If new hosts don’t act normal roll back the deployment.
- Blue Green Deployment is a procedure to minimize the downtime of the deployment. The idea is to have two set of machines with identical configuration but one with the latest code, rev 2 and other with rev 1. Once the machines with latest code act correctly, spin down the machines with rev 1 code.
- Then a demo of
kubectl
with adding a new host to the cluster and roll back.
-
- The talk is on a story on developing, push to GitHub, merge and release. And shit hits the fan. Now, what to do?
- The problem is developer didn’t get the code reviewed.
- How can automation help here?
- Enforcing standard like I unreviewed merge is reverted using GitHub API, Slack Bot, Hubot.
- As soon as developer opens a PR, alice, the bot adds a comment to the PR with the checklist. When the code is merged, bot verifies the checklist, if items are unchecked, the bot reverts the merge.
- The bot can do more work. DM the bot in the slack to issue commands and bot can interact with Jenkins to roll back the deployed code.
- The bot can receive commands via slack personal message.
-
Necessary tooling and monitoring for performance critical applications
- The talk is about collecting metrics for German E-commerce company Otto.
- The company receives two orders/sec, million visitors per day.On an average, it takes eight clicks/pages to complete an order.
- Monitor database, response time, throughput, requests/second, and measure state of the system
- Metrics everywhere! We talk about metrics to decide and diagnose the problem.
- Metrics is a Clojure library to measure and record the data to the external system.
- The library offers various features like Counter, gauges, meters, timers, histogram percentile.
- Rather than extracting data from the log file, measure details from the code and write to the data store.
- Third party libraries are available for visualization.
- The demo used d3.js application for annotation and visualization. In-house solution.
- While measuring the metrics, measure from all possible places and store separately. If the web application makes a call to the recommendation engine, collect the metrics from the web application and recommendation for a single task and push to the data store.
-
What should be PID 1 in a container?
- In older version of Docker, Docker doesn’t reap child process correctly. As a result, for every request, docker spawns a new application and never terminated. This is called PID 1 zombie problem.
- This will eat all available PIDs in the container.
- Use
Sysctl-a | grep pid_max
to find maximum available PIDs in the container. - In the bare metal machine, PID 1 is
systemd
or any init program. - If the first process in the container is bash, then is PID 1 zombie process doesn’t occur.
- Using bash is to handle all signal handlers is messy.
- Yelp came up with Yelp/dumb-init. Now,
dumb-init
is PID 1 and no more zombie processes. - Docker-1.13, introduced the flag,
--init
. - Another solution uses
system
as PID 1 - Docker allows running
system
without privilege mode. - Running system as PID 1 has other useful features like managing logs.
-
‘Razor’ sharp provision for bare metal servers
- I attended only first half of the talk, fifteen minutes.
- When you buy physical rack space in a data server how will you install the OS? You’re in Bangalore and server is in Amsterdam.
- First OS installation on bare metal is hard.
- There comes Network boot!
- PXELinux is a syslinux derivative to boot OS from NIC card.
- Once the machine comes up, DHCP request is broadcasted, and DHCP server responds.
- Cobbler helps in managing all services running the network.
- DHCP server, TFTP server, and config are required to complete the installation.
- Microkernel in placed in TFTP server.
- Razor is a tool to automate provisioning bare metal installation.
- Razor philosophy, consume the hardware resource like the virtual resource.
- Razor components - Nodes, Tags, Repository, policy, Brokers, Hooks
-
FreeBSD is not a Linux distribution
- FreeBSD is a complete OS, not a distribution
- Who uses? NetFlix, WhatsApp, Yahoo!, NetApp and more
- Great tools, mature release model, excellent documentation, friendly license.
- Now a lot of forks NetBSD, FreeBSD, OpenBSD and few more
- Good file system. UFS, and ZFS. UFS high performance and reliable. - If you don’t want to lose data use ZFS!
- Jails - GNU/Linux copied this and called containers!
- No GCC only llvm/clang.
- FreeBSD is forefront in developing next generation tools.
- Pluggable TCP stacks - BBR, RACK, CUBIC, NewReno
- Firewalls - Ipfw , PF
- Dummynet - live network emulation tool
- FreeBSD can run Linux binaries in userspace. It maps GNU/Linux system call with FreeBSD.
- It can run on 256 cores machine.
- Hard Ware - NUMA, ARM64, Secure boot/UEFI
- Politics - Democratically elected core team
- Join the Mailing list and send patches, you will get a commit bit.
- Excellent mentor program - GSoC copied our idea.
- FreeBSD uses SVN and Git revision control.
- Took a dig at GPLV2 and not a business friendly license.
- Read out BSD license on the stage.
See also

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.