Root Conf is a conference on DevOps and Cloud Infrastructure. 2017 edition’s theme is service reliability. Following is my notes from Day 1.
- The speaker of the session is the co-founder of Icinga monitoring system. I missed first ten minutes of the talk.-The talk is a comparison of all available OSS options for monitoring, visualization.
- Auto-discovery is hard.
- As per 2015 monitoring tool usage survey, Nagios is widely used.
- Nagios is reliable and stable.
- Icinga 2 is a fork of Nagios, rewrite in c++. It’s modern, web 2.0 with APIs, extensions and multiple backends.
- Sensu has limited features on OSS side and a lot of features on enterprise version. OSS version isn’t useful much.
- Zabbix is full featured, out of box monitoring system written in C. It provides logging and graphing features. Scaling is hard since all writes are written to single Postgres DB.
- Riemann is stream processor and written in Clojure. The DSL stream processing language needs knowledge of Clojure. The system is stateless.
- OpenNMS is a network monitoring tool written in Java and good for auto discovery. Using plugins for a non-Java environment is slow.
- Graphite is flexible, a popular monitoring tool for time series database.
- Prometheus is flexible rule-based alerting and time series database metrics.
- Elastic comes with Elastic search, log stash, and kibana. It’s picking up a lot of traction. Elastic Stack is extensible using X-PACK feature.
- Grafana is best for visualizing time series database. Easy to get started and combine multiple backends. - - Grafana annotations easy to use and tag the events.
- There is no one tool which fits everyone’s case. You have to start somewhere. So pick up a monitoring tool, see if it works for you else try the next one til you settle down.
<a href="https://rootconf.talkfunnel.com/2017/17-deployment-strategies-with-kubernetes" target="_blank">Deployment strategies with Kubernetes</a> * This was talk with a live demo. * Canary deployment: Route a small amount of traffic to a new host to test functioning. * If new hosts don’t act normal roll back the deployment. * <a href="https://www.martinfowler.com/bliki/BlueGreenDeployment.html" target="_blank">Blue Green Deployment</a> is a procedure to minimize the downtime of the deployment. The idea is to have two set of machines with identical configuration but one with the latest code, rev 2 and other with rev 1. Once the machines with latest code act correctly, spin down the machines with rev 1 code. * Then a demo of `` kubectl `` with adding a new host to the cluster and roll back.
<a href="https://rootconf.talkfunnel.com/2017/7-a-little-bot-for-big-cause" target="_blank">A little bot for big cause</a> * The talk is on a story on developing, push to GitHub, merge and release. And shit hits the fan. Now, what to do? * The problem is developer didn’t get the code reviewed. * How can automation help here? * Enforcing standard like I unreviewed merge is reverted using GitHub API, Slack Bot, Hubot. * As soon as developer opens a PR, <a href="https://github.com/moengage/alice" target="_blank">alice</a>, the bot adds a comment to the PR with the checklist. When the code is merged, bot verifies the checklist, if items are unchecked, the bot reverts the merge. * The bot can do more work. DM the bot in the slack to issue commands and bot can interact with Jenkins to roll back the deployed code. * The bot can receive commands via slack personal message.
<a href="https://rootconf.talkfunnel.com/2017/18-necessary-tooling-and-monitoring-for-performance-c" target="_blank">Necessary tooling and monitoring for performance critical applications</a> * The talk is about collecting metrics for German E-commerce company Otto. * The company receives two orders/sec, million visitors per day.On an average, it takes eight clicks/pages to complete an order. * Monitor database, response time, throughput, requests/second, and measure state of the system * Metrics everywhere! We talk about metrics to decide and diagnose the problem. * <a href="http://metrics-clojure.readthedocs.io/en/latest/" target="_blank">Metrics</a> is a Clojure library to measure and record the data to the external system. * The library offers various features like Counter, gauges, meters, timers, histogram percentile. * Rather than extracting data from the log file, measure details from the code and write to the data store. * Third party libraries are available for visualization. * The demo used d3.js application for annotation and visualization. In-house solution. * While measuring the metrics, measure from all possible places and store separately. If the web application makes a call to the recommendation engine, collect the metrics from the web application and recommendation for a single task and push to the data store.
<a href="https://rootconf.talkfunnel.com/2017/51-what-should-be-pid-1-in-a-container" target="_blank">What should be PID 1 in a container?</a> * In older version of Docker, Docker doesn’t reap child process correctly. As a result, for every request, docker spawns a new application and never terminated. This is called <a href="https://rootconf.talkfunnel.com/2017/51-what-should-be-pid-1-in-a-container" target="_blank">PID 1 zombie problem</a>. * This will eat all available PIDs in the container. * Use `` Sysctl-a | grep pid_max `` to find maximum available PIDs in the container. * In the bare metal machine, PID 1 is `` systemd `` or any init program. * If the first process in the container is bash, then is PID 1 zombie process doesn’t occur. * Using bash is to handle all signal handlers is messy. * Yelp came up with <a href="https://github.com/Yelp/dumb-init" target="_blank">Yelp/dumb-init</a>. Now, `` dumb-init `` is PID 1 and no more zombie processes. * Docker-1.13, introduced the flag, `` --init ``. * Another solution uses `` system `` as PID 1 * Docker allows running `` system `` without privilege mode. * Running system as PID 1 has other useful features like managing logs.
<a href="https://rootconf.talkfunnel.com/2017/9-razor-sharp-provisioning-for-baremetal-servers" target="_blank">‘Razor’ sharp provision for bare metal servers</a> * I attended only first half of the talk, fifteen minutes. * When you buy physical rack space in a data server how will you install the OS? You’re in Bangalore and server is in Amsterdam. * First OS installation on bare metal is hard. * There comes Network boot! * <a href="http://www.syslinux.org/wiki/index.php?title=PXELINUX" target="_blank">PXELinux</a> is a syslinux derivative to boot OS from NIC card. * Once the machine comes up, DHCP request is broadcasted, and DHCP server responds. * <a href="https://cobbler.github.io/" target="_blank">Cobbler</a> helps in managing all services running the network. * DHCP server, TFTP server, and config are required to complete the installation. * Microkernel in placed in TFTP server. * <a href="https://puppet.com/blog/introducing-razor-a-next-generation-provisioning-solution" target="_blank">Razor</a> is a tool to automate provisioning bare metal installation. * Razor philosophy, consume the hardware resource like the virtual resource. * Razor components - Nodes, Tags, Repository, policy, Brokers, Hooks
<a href="https://rootconf.talkfunnel.com/2017/77-freebsd-is-not-a-linux-distribution" target="_blank">FreeBSD is not a Linux distribution</a> * FreeBSD is a complete OS, not a distribution * Who uses? NetFlix, WhatsApp, Yahoo!, NetApp and more * Great tools, mature release model, excellent documentation, friendly license. * Now a lot of forks NetBSD, FreeBSD, OpenBSD and few more * Good file system. UFS, and ZFS. UFS high performance and reliable. - If you don’t want to lose data use ZFS! * Jails - GNU/Linux copied this and called containers! * No GCC only llvm/clang. * FreeBSD is forefront in developing next generation tools. * Pluggable TCP stacks - BBR, RACK, CUBIC, NewReno * Firewalls - Ipfw , PF * Dummynet - live network emulation tool * FreeBSD can run Linux binaries in userspace. It maps GNU/Linux system call with FreeBSD. * It can run on 256 cores machine. * Hard Ware - <a href="https://en.wikipedia.org/wiki/Non-uniform_memory_access" target="_blank">NUMA</a>, ARM64, Secure boot/UEFI * Politics - Democratically elected core team * Join the Mailing list and send patches, you will get a commit bit. * Excellent mentor program - GSoC copied our idea. * FreeBSD uses SVN and Git revision control. * Took a dig at GPLV2 and not a business friendly license. * Read out BSD license on the stage.
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.