# post
- [TIL: Running SQLite Queries from Python Command Line](https://kracekumar.com/post/simple-sqlite-cli-from-python/index.md): TIL: Running SQLite Queries from Python Command Line
---
title: "TIL: Running SQLite Queries from Python Command Line"
date: 2025-03-21T01:14:01Z
draft: false
tags:
  - Python
  - Sqlite3
  - CLI
  - TIL
---
![Sqlite command output](/images/simple-sqlite-cli-from-python/sqlite_from_python.png)

Python provides a simple interface to run SQLite queries from the command line. The syntax is straightforward: `python3 -m sqlite3 database query`
Here's an example using a GitHub database with the table `repos` and a query to return five rows:


```bash
$python3 -m sqlite3 ../../Downloads/github.db "select name, html_url from repos limit 5"
('dotfiles', 'https://github.com/marksteve/dotfiles')
('djangopackages', 'https://github.com/djangopackages/djangopackages')
('honzajavorek.cz', 'https://github.com/honzajavorek/honzajavorek.cz')
('dashboard', 'https://github.com/openelections/dashboard')
('dotFiles', 'https://github.com/harish2704/dotFiles')
```

### Command Line Options

```bash
python3 -m sqlite3 --help
usage: python -m sqlite3 [-h] [-v] [filename] [sql]

Python sqlite3 CLI

positional arguments:
  filename       SQLite database to open (defaults to ':memory:'). A new database is created if the file does not previously exist.
  sql            An SQL query to execute. Any returned rows are printed to stdout.

options:
  -h, --help     show this help message and exit
  -v, --version  Print underlying SQLite library version
```

You can view the implementation in the [CPython Source Code](https://github.com/python/cpython/blob/main/Lib/sqlite3/__main__.py)

- [Notes: AI Copilot Code Quality](https://kracekumar.com/post/ai_copilot_code_quality_paper/index.md): Notes: AI Copilot Code Quality
---
title: "Notes: AI Copilot Code Quality"
date: 2025-02-15T18:56:15Z
draft: false
tags:
- paper
- AI
- LLM
---
[GitClear](https://www.gitclear.com/) published *[AI Copilot Code Quality](https://gitclear-public.s3.us-west-2.amazonaws.com/AI-Copilot-Code-Quality-2025.pdf)*
and discovered it via *[The PrimeTime](https://www.youtube.com/c/ThePrimeTime)* YouTube channel.

The paper focuses on the less discussed topic of software maintainability,
in contrast to the more frequently discussed discourse on the internet: boosting developer productivity.


**Abstract**

>The data in this report contains multiple signs of eroding code quality. This
is not to say that AI isn’t incredibly useful. But it is to say that the frequency
of copy/pasted lines in commits grew 6% faster than our 2024 prediction.
Meanwhile, the percent of commits with duplicated blocks grew even faster.
Our research suggests a path by which developers can continue to
generate distinct value from code assistants into the foreseeable future.


**Key Points:**
>The sharp
upward curve of AI adoption seemingly guaranteed that, if the identified trends were
really correlated with AI use, they would get worse in 2024. That led us to predict, in
January 2024, that the annual Google DORA Research (eventually released in
October 2024) would show “Defect rate” on the rise. Fortunately for our prediction
record, unfortunately for Dev T eam Managers, the Google data bore out the notion
that a rising defect rate correlates with AI adoption.

- The rise of AI code assistants correlates with an increase in bugs.

>2024 marked the first year GitClear has ever measured where the
number of “Copy/Pasted” lines exceeded the count of “Moved” lines. Moved
lines strongly suggest refactoring activity. If the current trend continues, we believe it
could soon bring about a phase change in how developer energy is spent, especially
among long-lived repos. Instead of developer energy being spent principally on
developing new features, in coming years we may find “defect remediation” as the
leading day-to-day developer responsibility.

![Lines Moved over years](/images/ai_copilot_code_quality/line_moved.png)

This suggests that developers prioritize shipping code,
demonstrating impact, contributing to FOSS, and experiencing a sense of productivity.
However, they are focusing less on refactoring and creating general, reusable code.
I would like to know how maintainers feel about this trend in contributions and the
quality of pull requests. If AI can generate code quickly,
then there must also be efforts to develop tools that enhance code quality.

>Even when managers focus on more substantive productivity metrics, like “tickets
solved” or “commits without a security vulnerability,
” AI can juice these metrics by
duplicating large swaths of code in each commit. Unless managers insist on finding
metrics that approximate “long-term maintenance cost,
” the AI-generated work their
team produces will take the path of least resistance: expand the number of lines
requiring indefinite maintenance.

This perspective resonates well—at higher levels within an organization,
key metrics often revolve around increasing profits, accelerating feature deployment,
and minimizing incidents and bugs. Discussions about code quality are comparatively less common.

>The combination of these trends leaves little room to doubt that the current
implementation of AI Assistants makes us more productive at the expense of
repeating ourselves (or our teammates), often without knowing it. Instead of
refactoring and working to DRY ("Don't Repeat Yourself") code, we’re constantly
tempted to duplicate.

![Increase in created line](/images/ai_copilot_code_quality/increase_in_created_lines.png)
The process has become easier, as assistants and agents can now generate code,
edit files, and write test cases.
I experimented with Cline, a VS Code extension,
and found that a well-structured, detailed prompt can produce code remarkably quickly.
An interesting observation is that most AI benchmarks focus on solving LeetCode
problems and GitHub issues, yet no benchmark currently exists to assess code quality and maintainability.

>According to our duplicate block detection method [A8], 2024 was without precedent
in the likelihood that a commit would contain a duplicated code block. The prevalence
of duplicate blocks in 2024 was observed to be approximately 10x higher than it had
been two years prior.

![Duplicate code block](/images/ai_copilot_code_quality/duplicate_code_block.png)

AI-generated suggestions are free and quick to obtain.

>Google DORA’s 2024 survey included 39,000 respondents–enough sample size to
evaluate how the reported AI benefit of “increased developer productivity” mixed with
the AI liability of “lowered code quality.
” That research has since been released, with
Google researchers commenting:
>AI adoption brings some detrimental effects. We have observed reductions to
software delivery performance, and the effect on product performance is
uncertain.

>But the 2024 ratios for “what type of code is being revised” do not paint an
encouraging picture. During the past year, only 20% of all modified lines were
changing code that was authored more than a month earlier. Whereas, in 2020, 30%
of modified lines were in service of refactoring existing code.

This trend implies that new pull requests are often created to fix issues introduced by previous pull requests.

>The trend line here is a little cagey, with 2023 faking a return toward pre-AI levels. But
if we consider 2021 as the “pre-AI” baseline, this data tells us that, during 2024, there
was a 20-25% increase in the percent of new lines that get revised within a month.

This raises an important question about finely crafted.
Are new developers actively thinking about improving their craft?
I have observed engineers with ambitions to write compilers, design new programming languages,
or even rewrite the TCP stack in Rust.

>The never-ending rollout of more powerful AI systems will continue to transform the
developer ecosystem in 2025. In an environment where change will be constant, we
would suggest that developers emphasize their still-uniquely human ability to “simplify”
and “consolidate” code they understand. There is art, skill and experience that gets
channeled into creating well-named, well-documented modules. Executives that want
to maximize their throughput in the “Age of AI” will discover new ways to incentivize
reuse. Devs proficient in this endeavor stand to reap the benefits.

That observation aligns with my previous point.

**Conclusion**

The report also provides a use case for companies to adopt GitClear.
I find it worthwhile to consider the long-term advantages and effects of AI coding assistants.

A couple of months ago, I was following Crafting Interpreters and using VS Code to write Go code.
I had to disable Copilot so that I could fully understand what was happening in the codebase and
avoid ten lines of auto-completion.

My personal take is that most of these tools primarily offer suggestions, with little
to no emphasis on leveraging existing code within the codebase to achieve coherence.
This lack of contextual awareness is one of the contributing factors.
While these tools may help rank pull request quality, detect duplication, and assess other metrics,
unless they are integrated into the development process early to reuse existing utility functions or
suggest refactoring of existing code, the problem will persist.

It is also worth considering whether to deploy LLMs fine-tuned to a specific
codebase with the goal of improving code quality by providing suggestions that
prioritize reusability and maintainability. However, the cost of maintaining and updating
LLMs remains a significant challenge.

Another consideration is to use a reasoning model to determine whether to analyze the codebase and
offer suggestions or generate new code.

- [ChatGPT Shambles for Gary Marcus Prompt](https://kracekumar.com/post/chatgpt-shambles-gary-marcus-prompt/index.md): ChatGPT Shambles for Gary Marcus Prompt
---
title: "ChatGPT Shambles for Gary Marcus Prompt"
date: 2025-02-08T22:14:34Z
draft: false
tags:
- chatgpt
- llm
- AI
---

Gary Marcus recently wrote an article titled [ChatGPT in Shambles](https://garymarcus.substack.com/p/chatgpt-in-shambles).
The prompt instructed chatgpt to produce a tabular table of median house hold income across [U.S. states](https://chatgpt.com/share/67a20c79-bfa4-8001-8cdf-4ee60d42df5f).

```
Make a table of every state in U.S., including population, area, median house hold income, sorted in order of median household income.
```


![chatgpt-original-output](/images/chatgpt-shambles/chatgpt_original.png)
The output contained only twenty states and interrupted. The final row contained only name of the state.

### ChatGPT - My Attempt
The [same prompt](https://chatgpt.com/share/67a7d85c-6af0-8001-bb5e-d01d816b59f7) returned all the states and income, when I tried and logged into the ChatGPT. I skipped verifying the data quality and checked only the structure.

![chatgpt-1](/images/chatgpt-shambles/chatgpt_1.png)
![chatgpt-2](/images/chatgpt-shambles/chatgpt_2.png)
![chatgpt-3](/images/chatgpt-shambles/chatgpt_3.png)

My guess is fine-tuned(don't think so in the short interval) or non-deterministic output based on logged in user vs anonymnous ask.

I tried the same prompt in other models

### Claude

Produced well structured output with an extra summary and further asking for more task.

![claude-output](/images/chatgpt-shambles/claude.png)

### Deepseek

![deepseek-output](/images/chatgpt-shambles/deepseek.png)
Similar to Claude's output Deepseek did produce all states including a summary.


[Gemini 2.0 Flash](https://g.co/gemini/share/ea864e8105c1)

![gemini-flash-pro-2.0-output](/images/chatgpt-shambles/gemini.png)
By the far the Gemini output is well-structured with rank column, option to export the results to google sheets and summary at the end.

### Le Chat
![lechat](/images/chatgpt-shambles/le_chat.png) [Le Chat](https://chat.mistral.ai/chat) produced all the fifty states with the sources.

In the overall exercise it's clear to see small variation across models and clearly other models produce better output compared to ChatGPT.
It's confusing to see different behaviour from ChatGPT.

- [Notes on Four Blog Posts on How I use LLM](https://kracekumar.com/post/notes-on-ai-coding-assistant/index.md): Notes on Four Blog Posts on How I use LLM
---
title: "Notes on Four Blog Posts on How I use LLM"
date: 2025-01-30T22:06:26+05:30
draft: false
tags:
  - AI
  - LLM
  - Practices
---
Over the past few weeks, several top software engineers have published blog posts about how they use AI. Here are a few of the posts I came across in various forums:
- [Why I use Cline for AI Engineering](https://addyo.substack.com/p/why-i-use-cline-for-ai-engineering) by Addy Osmani
- [How I use AI](https://nicholas.carlini.com/writing/2024/how-i-use-ai.html) by Nicholas Carlini
- [How I Use AI: Meet My Promptly Hired Model Intern](https://lucumr.pocoo.org/2025/1/30/how-i-ai/) by Armin Ronacher
- [Building personal software with Claude](https://blog.nelhage.com/post/personal-software-with-claude/) by Nelson Elhage

Below, I’ve compiled my personal notes on these posts. I’ll highlight key points, share my thoughts, and reflect on what stood out to me as particularly interesting or novel.

### [Why I use Cline for AI Engineering](https://addyo.substack.com/p/why-i-use-cline-for-ai-engineering) by Addy Osmani

**Author's Bio**: Addy Osmani is an Irish Software Engineer and leader currently working on the Google Chrome web browser.

[Cline is a coding agent VS Code extension](https://github.com/cline/cline). The description from the GitHub Repo

> Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser,
> and more with your permission every step of the way.

In this blog post, Addy Osmani presents an interesting mental model for thinking about
Cline not as an interactive Q&A system, but as a system tool for suggesting or modifying code blocks.

> Cline approaches AI assistance differently from most tools in the market.
> Rather than focusing solely on code generation or completion, it operates as a systems-level tool that can
> interact with your entire development environment. This becomes particularly valuable when dealing with complex debugging
> scenarios, large-scale refactoring, or integration testing.

**The DeepSeek-R1 + Sonnet hybrid approach**

> Recent benchmarks and user experiences have shown that combining DeepSeek-R1 for planning with Claude 3.5 Sonnet
> for implementation can reduce costs by up to 97% while improving overall output quality.

The combination is interesting and looks similar to plumbing various Unix commands through pipes
to achieve the desired output rather than using a single command.

> Cline's ability to switch between models seamlessly makes this hybrid approach practical.
> With the v3.2.6 update, the system even remembers your preferred model for each mode,
> making it effortless to maintain optimal model selection for different types of tasks.
> You're not stuck with a single model's trade-offs - you can optimize for cost, capability,
> or speed depending on the specific task at hand.

**Checkpoints: Version control beyond git**

> The system operates independently of your regular git workflow, preventing the need to pollute commit
> history with experimental changes.

This is the first time I have come across the concept, and I am intrigued to try it out.

**Computer Use: Runtime awareness**

> Above, Cline was able to connect to launch Chrome to verify that a set of changes correctly rendered.
> It notices that there was a Next.js error and can proactively address this without me copy/pasting
> issues back and forth. This is a game-changer.

> This bridges a crucial gap between static code analysis and runtime behavior - something particularly
> valuable when dealing with complex web applications or distributed systems.

This  looks  promising  if  you're  doing  web  development  and  a  lot  of  front-end  development.

**Conclusion**

> The trade-off of additional complexity for greater control and capability makes sense for serious development work.
> While simpler tools might be sufficient for basic tasks, Cline's system-level approach provides unique value for
> complex engineering challenges.

Cline's philosophy for a being coding agent is what stands out.

### [How I Use "AI"](https://nicholas.carlini.com/writing/2024/how-i-use-ai.html) by Nicholas Carlini

**Author Bio**: Nicholas Carlini is a research scientist at Google DeepMind.

> But the reason I think that the recent advances we've made aren't just hype is that, over the past year,
> I have spent at least a few hours every week interacting with various large language models,
> and have been consistently impressed by their ability to solve increasingly difficult tasks
> I give them. And as a result of this, I would say I'm at least 50% faster at writing code
> for both my research projects and my side projects as a result of these models.

The approach of tinkering or using LLMs to solve coding problems on a regular basis is noteworthy

> If I were to categorize these examples into two broad categories,
> they would be “helping me learn” and “automating boring tasks”.
> Helping me learn is obviously important because it means that I can now do things
> I previously would have found challenging; but automating boring tasks is (to me)
> actually equally important because it lets me focus on what I do best, and solve the hard problems.

Rather than thinking of an LLM as replacing you in your job, using it as a tool to improve your skillset
and enhance your knowledge by using it as a companion seems to be a common pattern.

**As a tutor for new technologies**

>But today, [I'll just ask a language model to teach me Docker](https://chatgpt.com/share/40dcc017-9cc6-4a99-8eac-959a171fbb2f). So here's me doing just that.

This  is  a  recurring  theme,  and  a  lot  of  folks  are  doing  it.  Last  week,  I  was  using  DeepSeek  to  do  something  similar
and  was  impressed  by  the  accuracy  and  reliability  (though  there’s  still  a  long  way  to  go  for  unpopular  languages).
A  year  back,  LLMs  had  high  false  positive  rates  for  suggestions  (anecdotal).  Recently,  at  least  for  the  top  six  languages,
the  quality  of  the  suggestions  has  significantly  improved.

**To simplify code**

> Now golly has a CLI tool that does what I want---all I needed was a way to call into it correctly.
> The first step of this was to take the C++ code that supports something like 50 different
> command line options and just get it to do exactly the one thing I wanted.
> So I just dumped all 500 lines of C++ into the LLM and asked for a shorter file that would do the same thing.

> And you know what? It worked flawlessly. Then I just asked for a Python wrapper around the C++ code.
> And that worked too.

This is a fabulous testimonial. The concept of using it for code reviews, combined with a reasoning model, can significantly enhance one's journey in mastering a particular language.
Overall, I can see the scientist at work here. It’s an excellent use case for automating mundane
tasks and increasing utilitarian value. The article is perfect for anyone
hesitant to try LLMs but looking for ways to improve their quality of life through automation.

### [How I Use AI: Meet My Promptly Hired Model Intern](https://lucumr.pocoo.org/2025/1/30/how-i-ai/) by Armin Ronacher

**Author Bio**: Armin is a well known software engineer who have created various pouplar libraries like [Flask](https://flask.palletsprojects.com/en/stable/),
[Jinja](https://jinja.palletsprojects.com/en/stable/) and co-founder of [Sentry](https://sentry.io/welcome/), SAAS product.

```bash
#!/bin/sh
MODEL=phi4:latest
if ping -q -c1 google.com &>/dev/null; then
  MODEL=claude-3-5-sonnet-latest
fi
OLD_TEXT="$(cat)"
llm -m $MODEL "$OLD_TEXT" -s "fix spelling and grammar in the given text,
    and reply with the improved text and no extra commentary.
    Use double spacing."
```
> This script can automatically switch between a local model (phi4 via Ollama)
> and a remote one (claude-3-5-sonnet-latest) based on internet connectivity.
> With a command like !llm-spell in Vim, I can fix up sentences with a single step.

This is relatable to me because I use grammar correction tools both at work and
for personal blog posts—ensuring my writing is clear and polished.
Like Armin, I face a similar challenge as a non-native English speaker:
maintaining a consistent voice and keeping the same level of engagement throughout a post.
To address this, I use the `llm` command and also invoke it through Raycast as a script command.

**Writing with AI**


> Here are some of the things I use AI for when writing:

> Grammar checking: I compare the AI’s suggested revisions side by side
> with my original text and pick the changes I prefer.

> Restructuring: AI often helps me see when my writing is too wordy.
> In the days before AI, I often ended up with super long articles that did not read well
> and that I did not publish. Models like o1 are very helpful in identifying things that don't need to be said.

> Writing Notes and finding key points: Here, I ask the AI to read through
> a draft “like a Computer Science 101 student” and take notes.
> This helps me see if what it absorbed matches what I intended to convey.

> Roast my Article: I have a few prompts that asks the AI to “roast” or criticize my article,
> as if commenting on Reddit, Twitter, or Hacker News. Even though these critiques seem shallow,
> they can sting, and they often highlight weaknesses in my argument or lack of clarity.
> Even if they don't necessarily impact the writing, they prime me for some of the feedback I inevitably receive.

> Identifying jargon: If I worry there's too much jargon, I use AI to resolve acronyms
> and point out technical terms I've used without explanation, helping me make the text more accessible.

I find three use cases particularly helpful:
(1) writing notes and identifying key points,
(2) having my article critiqued, and
(3) identifying jargon.

Writing notes and identifying key points: This approach provides valuable feedback on
your article by placing the LLM in the reader’s shoes.

**Talking to Her**

> ChatGPT is also incredibly helpful when having to work with multiple languages.
> For a recent example, my kids have Greek friends and we tried to understand the
> difference between some Greek words that came up. I have no idea how to write it,
> Google translate does not understand my attempts of pronouncing them either. However,
> ChatGPT does. If I ask it in voice mode what “pa-me-spee-tee”
> in Greek means it knows what I tried to mumble and replies in a helpful manner.

Lately, I’ve been thinking about improving my pronunciation
of English words using LLMs. For context, I grew up in Tamil Nadu,
in southern India, and I speak with a thick accent. I’ve often had to
repeat myself multiple times due to my pronunciation. I hate it when my
jokes fall flat because of it. Now, it’s time to experiment with LLMs to improve this.

**Final Thoughts**

> My approach isn't about outsourcing thinking, but augmenting it: using LLMs to accelerate grunt work,
> untangle mental knots, and prototype ideas faster. Skepticism is healthy, but dismissing AI outright
> risks missing its potential as a multiplier for those willing to engage critically.

I like the usage of the word, `augmenting`, feels and fits apt.

### [Building personal software with Claude](https://blog.nelhage.com/post/personal-software-with-claude/) by Nelson Elhage

**Working between defined interfaces**

> When working with Claude, I found myself instinctively choosing to break down problems into ones
> with relatively well-defined and testable interfaces. For instance, instead of asking it to make
> changes to the Rust and elisp code in one query, I would ask for a feature to be added to the Rust side,
> which I would then spot-check by inspecting the output JSON, and then ask for the corresponding elisp changes.

This is something I do often in code base, where when I don't like certain pieces of bigger task,
I ask LLM to do A, B, C task separately. Example: Writing a long SQL query. I do this out of habit of
iterative developing the pieces and finally plugging all the components(also writing the most exciting stuff first!)

The entire post covers how the author fixed a performance issue with emacs lisp function that was interacting with [obsidian.md](https://obsidian.md/).
It's fanatastic plug for using LLM and it's coding capability.

### Conclusion

- I enjoyed reading all these articles especially how everyone perceives, utilises the LLM's power to improve
the quality of their work and life.
- One thing that's clear is to get real value out of LLM, you're curious and invest enough time to learn. Then you seek rewards regularly.
- The four blog posts had four different approaches.
  - Addy's use case of using cline to do complex engineering tasks was backed by solid thoughts and example cases.
  - Nicholas' had exhaustive use cases listed and detailed experiments from a scientist's lab.
  The breadth of usage was astonishing and results were too.
  - Armin's usecase was personal and technical experiences. Armin also delves the use case how his kids use LLM,
  that reminds LLM has utilitarian value for everyone.
  - Nelson's post was clear show case of using LLM to fix performance issue. I remember playing with [ChatGPT](https://kracekumar.com/post/chatgpt-gh-profile-lookup/)
  for coding task back in 2022. In the last 2 years, a significant improvement in LLM is clearly visible and expect to see
  more such cases of LLM in improving quality of code.

  **Disclaimer**: The post is not a slop(no sumamrization) but LLM was used to improve the grammar.

This is something I often do in my codebase. When I don’t like certain parts of a larger task, I ask the LLM to handle tasks A, B, and C separately. For example, when writing a long SQL query, I iteratively develop smaller pieces and then combine them—often starting with the most exciting parts first!

The post details how the author fixed a performance issue with an Emacs Lisp function that interacted with [Obsidian.md](https://obsidian.md/).
It’s a fantastic showcase of using LLMs and their coding capabilities.

### Conclusion

I enjoyed reading all these articles, especially seeing how everyone perceives and utilises the power of LLMs to improve their work and life.

One thing is clear: to get real value out of LLMs, you need curiosity and a willingness to invest time in learning. The rewards come with consistent effort.

Each of the four blog posts took a unique approach:

  - Addy’s use case of using Cline for complex engineering tasks was backed by solid reasoning and practical examples.
  - Nicholas provided an exhaustive list of use cases and detailed experiments, showcasing the breadth of LLM applications and their impressive results.
  - Armin shared personal and technical experiences, including how his kids use LLMs—highlighting their utilitarian value for everyone.
  - Nelson’s post was a clear demonstration of using LLMs to fix performance issues. It reminded me of my own experiments with [ChatGPT](https://kracekumar.com/post/chatgpt-gh-profile-lookup/) for coding tasks back in 2022. Over the past two years, significant improvements in LLMs have become evident, and I expect to see even more cases of LLMs enhancing code quality.

**Disclaimer**: This post is not AI-generated slop (no summarization), but an LLM was used to improve grammar.

- [DeepSeek R1 Aider Benchmark](https://kracekumar.com/post/deepseek_r1_aider_benchmark/index.md): DeepSeek R1 Aider Benchmark
---
title: "DeepSeek R1 Aider Benchmark"
date: 2025-01-26T00:52:45+05:30
draft: false
tags: ["LLM", "benchmark", "aider"]
---

[DeepSeek recently released its R1 model](https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf), a state-of-the-art LLM that outperforms all available reasoning models on the market.
The accompanying paper includes a comprehensive comparison across 21 benchmarks in four categories:
`English, Code, Math, and Chinese` .

![R1 benchamark results](/images/aider/deepseek-comparision)

As a software engineer, I was particularly curious about the Code category and
decided to explore the datasets and evaluation criteria.
While many benchmarks in this category were either poorly documented or required extensive dataset downloads.
Aider-polyglot stood out for its clear documentation and ease of use, [benchmark script](https://github.com/Aider-AI/aider/blob/main/benchmark/benchmark.py)

### What is Aider?

The benchmark is based on programming problems from exercism.io and covers six popular languages: `Python, Java, JavaScript, C++, Rust, and Go`.
The [README](https://github.com/Aider-AI/aider/blob/main/benchmark/README.md) provides step-by-step instructions for running the benchmarks,
making it accessible even for those new to the AI.

### Running the benchmark

Set the `DEEPSEEK_API_KEY` while running the benchmark command.
I used the hosted version of DeepSeek to run the benchmark.
Here’s the command I executed for the Python benchmarks:

```bash
$ ./benchmark/benchmark.py test-deepseek-r1-run --model r1 --edit-format whole --threads 10 --exercises-dir polyglot-benchmark --verbose --new --languages python
```

**Key CLI Parameters:**

- `model: r1` (indicating the DeepSeek R1 model).
- `edit-format: whole` (the other option is edit, which was used in the original paper).
- `threads: 10` (number of Python threads to run in parallel).
- `languages`: python (by default, all languages are benchmarked).

**Output:**

```bash
- dirname: 2025-01-25-19-03-46--test-deepseek-r1-run
  test_cases: 34
  model: deepseek/deepseek-reasoner
  edit_format: whole
  commit_hash: b276d48
  pass_rate_1: 35.3
  pass_rate_2: 64.7
  pass_num_1: 12
  pass_num_2: 22
  percent_cases_well_formed: 100.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 0
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 1
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-25
  versions: 0.72.3.dev
  seconds_per_case: 226.0
  total_cost: 0.9313

costs: $0.0274/test-case, $0.93 total, $6.16 projected
```

Most fields are self-explanatory, but two key metrics stand out: `pass_rate_1` and `pass_rate_2`,
which indicate the percentage of problems solved on the first and second attempts, respectively.
The R1 model achieved a `64.7%` pass rate across 34 exercises.
From the [official leaderboard](https://aider.chat/2024/12/21/polyglot.html) the pass rate of `56.9%` across langauges.
This is not like to like comparision but for illustrative purpose.
Notably, the official website does not distinguish between pass rates for the first and second attempts.

![images/aider/polyglot-benchmark](/images/aider/polyglot-benchmark)

### Conclusion

During the benchmark, I encountered a temporary issue where the DeepSeek API returned a 503 error.
While Aider employs exponential backoff to retry failed exercises, recovery can be time-consuming.

Following are some of the results from other language benchmarks except Java.

### C++
```
$./benchmark/benchmark.py test-deepseek-r1-run-cpp --model r1 --edit-format whole --threads 10 --exercises-dir polyglot-benchmark --verbose --new --languages cpp
- dirname: 2025-01-25-19-26-20--test-deepseek-r1-run-cpp
  test_cases: 26
  model: deepseek/deepseek-reasoner
  edit_format: whole
  commit_hash: b276d48
  pass_rate_1: 19.2
  pass_rate_2: 69.2
  pass_num_1: 5
  pass_num_2: 18
  percent_cases_well_formed: 100.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 0
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 0
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-25
  versions: 0.72.3.dev
  seconds_per_case: 410.2
  total_cost: 0.4168

costs: $0.0160/test-case, $0.42 total, $3.61 projected
```

### Go
```
$./benchmark/benchmark.py test-deepseek-r1-run-go --model r1 --edit-format whole --threads 10 --exercises-dir polyglot-benchmark --verbose --new --languages go
rname: 2025-01-26-07-44-16--test-deepseek-r1-run-go
  test_cases: 39
  model: deepseek/deepseek-reasoner
  edit_format: whole
  commit_hash: b276d48
  pass_rate_1: 41.0
  pass_rate_2: 66.7
  pass_num_1: 16
  pass_num_2: 26
  percent_cases_well_formed: 100.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 3
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 1
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-26
  versions: 0.72.3.dev
  seconds_per_case: 204.4
  total_cost: 0.8196

costs: $0.0210/test-case, $0.82 total, $4.73 projected
```

### Javascript

```
./benchmark/benchmark.py test-deepseek-r1-run-javascript --model r1 --edit-format whole --threads 10 --exercises-dir polyglot-benchmark --verbose  --languages javascript --new
- dirname: 2025-01-26-14-52-31--test-deepseek-r1-run-javascript
  test_cases: 49
  model: deepseek/deepseek-reasoner
  edit_format: whole
  commit_hash: b276d48
  pass_rate_1: 22.4
  pass_rate_2: 57.1
  pass_num_1: 11
  pass_num_2: 28
  percent_cases_well_formed: 100.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 2
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 1
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-26
  versions: 0.72.3.dev
  seconds_per_case: 236.6
  total_cost: 1.2589

costs: $0.0257/test-case, $1.26 total, $5.78 projected
```

### Rust
```
./benchmark/benchmark.py test-deepseek-r1-run-rust --model r1 --edit-format whole --threads 10 --exercises-dir polyglot-benchmark --verbose  --languages rust --new

- dirname: 2025-01-26-15-18-05--test-deepseek-r1-run-rust
  test_cases: 30
  model: deepseek/deepseek-reasoner
  edit_format: whole
  commit_hash: b276d48
  pass_rate_1: 50.0
  pass_rate_2: 63.3
  pass_num_1: 15
  pass_num_2: 19
  percent_cases_well_formed: 100.0
  error_outputs: 0
  num_malformed_responses: 0
  num_with_malformed_responses: 0
  user_asks: 3
  lazy_comments: 0
  syntax_errors: 0
  indentation_errors: 0
  exhausted_context_windows: 0
  test_timeouts: 0
  total_tests: 225
  command: aider --model deepseek/deepseek-reasoner
  date: 2025-01-26
  versions: 0.72.3.dev
  seconds_per_case: 174.1
  total_cost: 0.7162

costs: $0.0239/test-case, $0.72 total, $5.37 projected
```

- [ndjson](https://kracekumar.com/post/ndjson/index.md): ndjson
---
title: ndjson
date: 2025-01-14T23:16:25+05:30
draft: false
tags:
  - HTTP
  - TIL
---
![ndjson](https://raw.githubusercontent.com/ndjson/ndjson-spec/refs/heads/master/ndjson.png)
```python
curl http://localhost:11434/api/generate -d '{
                                                                                                                            "model": "llama3.2",
                                                                                                                            "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                          }'
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.15898Z","response":"Located","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.183229Z","response":" on","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.206942Z","response":" the","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.230918Z","response":" east","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.254533Z","response":" coast","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.278113Z","response":" Ireland","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.301689Z","response":".","done":false}
{"model":"llama3.2","created_at":"2025-01-14T17:48:33.3255Z","response":"","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,9241,374,33977,30,22559,304,264,4848,4339,128009,128006,78191,128007,271,48852,389,279,11226,13962,14990,13],"total_duration":2392671125,"load_duration":575523041,"prompt_eval_count":34,"prompt_eval_duration":1649000000,"eval_count":8,"eval_duration":167000000}
```

I was playing around with ollama API to explore the API capabilities and noticed the HTTP response was streaming JSON that prompted me to look into the response headers. 

```python
curl -v http://localhost:11434/api/generate -d '{
                                                                                                                            "model": "llama3.2",
                                                                                                                            "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                          }'
* Host localhost:11434 was resolved.
* IPv6: ::1
* IPv4: 127.0.0.1
*   Trying [::1]:11434...
* connect to ::1 port 11434 from ::1 port 49217 failed: Connection refused
*   Trying 127.0.0.1:11434...
* Connected to localhost (127.0.0.1) port 11434
> POST /api/generate HTTP/1.1
> Host: localhost:11434
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Length: 250
> Content-Type: application/x-www-form-urlencoded
>
* upload completely sent off: 250 bytes
< HTTP/1.1 200 OK
< Content-Type: application/x-ndjson
< Date: Tue, 14 Jan 2025 17:49:29 GMT
< Transfer-Encoding: chunked
<
...
```

The content type is `application/x-ndjson` and quick search hinted it's a new line separated JSON that can be used in streaming protocols. Also the  `Transfer-Encoding`is chunked and fits well with for LLM responses over the wire.

```python
curl http://localhost:11434/api/generate -d '{
                                                                                                                                   "model": "llama3.2",
                                                                                                                                   "prompt": "Where is Dublin? Answer in a six words"
                                                                                                                                 }' | jq .response
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1399    0  1149  100   250   3291    716 --:--:-- --:--:-- --:--:--  3997
"Located"
" on"
" the"
" east"
" coast"
" Ireland"
"."
""
```

Also `jq`could handle new line delimited json. 

### JSON Streaming formats

While researching further on [JSON streaming](https://en.wikipedia.org/wiki/JSON_streaming) there are several other approaches to stream JSON objects. Notable ones are `ndjson, jsonl, json-seq`.  All these formats are useful for processing and parallelising large JSON objects without loading entire dataset into the memory.


**Syntax**
- `ndjson`: Uses a newline character (`\n`) to separate each JSON object, and no whitespace is allowed between objects or values. Example: `{"some":"thing\n"}`. Only single `\n`

-  `jsonl` ([JSON Lines](https://jsonlines.org/on_the_web/)): Similar to `ndjson`, but allows for optional whitespace around the `\n` separator and `\r\n` in windows. Example: `{"name": "John", "age": 30}\r\n` 

- `json-seq` (JSON Sequence): Each JSON object prefixed by an ASCII Record Separator (0x1E), and each ending with an ASCII Line Feed character (0x0A). Example: `␞{"d":"2014-09-22T21:58:35.270Z","value":6}␤`

It's quite interesting to see the different use cases of different variations of JSON formats.


- [Subtitle Generator Using Whisper](https://kracekumar.com/post/subtitle-generator-using-whisper/index.md): Subtitle Generator Using Whisper
---
title: Subtitle Generator Using Whisper
date: 2025-01-12T22:30:34+05:30
draft: false
tags:
  - AI
  - language_model
---

I want to generate the subtitles for the `Normal People`TV series in my laptop using LLM. After searching a bit, whisper from OpenAI was a proper fit.

### Step 1: Extracting Audio from Video

The first step is to extract the audio from the video file using `ffmpeg` and store it separately.

```python
ffmpeg -i /Users/kracekumar/Movies/TV/Normal.People.S01/Normal.People.S01E01.mp4 -vn -acodec copy /Users/kracekumar/Movies/TV/Normal.People.S01/audio/Normal.People.S01E01.aac
```

### Step 2: Converting Audio to Text

The second step is to run the audio file through the [whisper model](https://github.com/openai/whisper) from OpenAI. I use uv to install and run inside a project.

```python
uv run whisper /Users/kracekumar/Movies/TV/Normal.People.S01/audio/Normal.People.S01E01.aac --model turbo -f srt --output_dir /Users/kracekumar/Movies/TV/Normal.People.S01/generated_subs/
```

Here is the first ten subtitles generated from the model
```
1
00:00:00,000 --> 00:00:24,000
It's a simple game. You have 15 players. Give one of them the ball. Get it into the net.

2
00:00:24,000 --> 00:00:26,000
Very simple. Isn't it?

3
00:00:26,000 --> 00:00:31,000
Brilliant. How's it going, Rachel? Talking tactics there for the big game.

4
00:00:31,000 --> 00:00:35,000
We're getting a masterclass. How incredibly boring of you.

5
00:00:35,000 --> 00:00:39,000
Yeah. Did you use your hair though? I did, yeah.

6
00:00:39,000 --> 00:00:44,000
It's very pretty. Thanks. Can I use my locker? By any chance?

7
00:00:44,000 --> 00:00:50,000
Yeah. Yeah, I sorta need you to move, Connell.

8
00:00:50,000 --> 00:00:55,000
Oh, sorry. Excuse me. Sorry. Excuse me. Right, relax, will ya?

9
00:00:55,000 --> 00:01:00,000
Okay, now that's important because it's turned up in the exam twice out of the last three years.

10
00:01:02,000 --> 00:01:03,000
Marianne.

```


Here is subtitles from the other site

```
1
00:00:18,989 --> 00:00:20,269
It's a simple game.

2
00:00:20,320 --> 00:00:22,642
You have 15 players.
Give one of them the ball,

3
00:00:22,693 --> 00:00:24,048
get it into the net.

4
00:00:24,099 --> 00:00:25,708
- Very simple.
- Isn't it?

5
00:00:26,052 --> 00:00:27,192
Oh, what?

6
00:00:27,415 --> 00:00:28,535
Brilliant.

7
00:00:28,833 --> 00:00:31,520
How's it going, Rachel?
Talking tactics, there, for the game.

8
00:00:31,571 --> 00:00:33,200
We're getting a master class.

9
00:00:33,598 --> 00:00:35,965
- How incredibly boring of you.
- Yeah.

10
00:00:36,601 --> 00:00:38,570
- Did you get your hair done?
- I did, yeah.

```

The complete generated subtitles can be found in [gist](https://gist.github.com/kracekumar/efe9da9ea0d13e42b10f1fc7eaad5c50) 
### Comparison with original subtitles

The LLM produced close to perfect subtitles in terms of text and highly useful with certain annoying behaviours 

1. **Text appears before characters starts to speak**: The first generated text appears as soon a video starts whereas in the original file, starts at 18th second. When there is a long pause in the video, the next dialogue appears immediately.
2. **Subtitle Length**: The first subtitle, `It's a simple game. You have 15 players. Give one of them the ball. Get it into the net.`is longer and represents 6 seconds of dialogue.  The splitting these into multiple sequences would be useful especially for movie subtitles but may not matter for speech to text.(couldn't find any CLI options)
3. **Inconsistent Punctuation**: While some generated text includes proper punctuation, other sections lack it. The CLI offers `--append_punctuations, --prepend_punctations`  to address this, but I haven't tried.
### Script to bulk convert


```python
import os
import subprocess
import argparse

def extract_audio(input_dir, output_dir):
    """
    Extracts audio from video files in the input directory and saves them to the output directory.

    Args:
        input_dir: Path to the directory containing video files.
        output_dir: Path to the directory where audio files will be saved.
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.mp4', '.avi', '.mov')):  # Add more video extensions if needed
            input_path = os.path.join(input_dir, filename)
            output_path = os.path.join(output_dir, os.path.splitext(filename)[0] + '.aac')
            command = f"ffmpeg -i {input_path} -vn -acodec copy {output_path}"
            # Add logging to track
            print(f"Running the command: {command}")
            subprocess.run(command, shell=True)

def generate_subtitles(input_dir, output_dir):
    """
    Generates subtitles for audio files using the Whisper LLM model.

    Args:
        input_dir: Path to the directory containing audio files.
        output_dir: Path to the directory where subtitle files will be saved.
    """
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for filename in os.listdir(input_dir):
        if filename.endswith(('.aac', '.wav')):
            input_path = os.path.join(input_dir, filename)
            command = f"whisper {input_path} --model turbo -f srt --output_dir {output_dir}"
            # Adjust model size ('tiny', 'base', 'small', 'medium', 'large') as needed
            
            print(f"Running the command: {command}")
            subprocess.run(command, shell=True)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Extract audio and generate subtitles.")
    parser.add_argument("input_dir", help="Path to the directory containing video files.")
    parser.add_argument("audio_dir", help="Path to the directory to save extracted audio files.")
    parser.add_argument("subtitle_dir", help="Path to the directory to save generated subtitles.")

    args = parser.parse_args()

    extract_audio(args.input_dir, args.audio_dir)
    generate_subtitles(args.audio_dir, args.subtitle_dir)
```


I asked Gemini to generate the code for the task. The following is the code that was generated by the model. The prompt was basic and as follows

```
Write a Python program that takes a command line arguments to do following tasks

1) Get a directory that contains video files and extracts the audio from the video and stores the audio in a separate directory using ffmpeg command. If the output directory is missing, new directory should be created.

2) Then audio file is passed on to the command of whisper llm model to produce the sub ttitles for the audio file. The output should be stored in a new directory
```

From the generated code, I modified two things

1) The command line arguments to ffmpeg and whisper command
2) Add a log line to print the current command to track progress.

### Chinese language

After successful english subtitles generation, I was tempted to try non-english audio. for `In the mood for love` movie. The whisper model failed to convert the generated chinese translation to English.

```python
$uv run whisper /Users/kracekumar/Movies/In.the.Mood.for.Love/audio/In.the.Mood.for.Love.mp4 --model turbo -f srt --output_dir /Users/kracekumar/Movies/In.the.Mood.for.Love/generated_sub --language zh --task translate
[00:00.000 --> 00:00.180]
[00:30.000 --> 00:30.180]
[01:00.000 --> 01:00.160]  The frustrating is for thoseatks. It's beautiful and adorable and significant. It's adorable and typical. There's a blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank blank
[01:30.000 --> 01:30.180]
[02:00.000 --> 02:05.200] 謝謝你, 那我先走了
[02:05.260 --> 02:06.680] of息, 再見
[02:07.800 --> 02:11.360] 請問你們有嗎?
[02:11.520 --> 02:15.200] 對不起, 房間剛剛租給一位太太
[02:15.520 --> 02:16.520] 謝謝你
...
[01:28:35.420 --> 01:28:36.420] 謝謝
[01:28:36.420 --> 01:28:37.420]
Traceback (most recent call last):
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/transcribe.py", line 598, in cli
    writer(result, audio_path, **writer_args)
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 101, in __call__
    self.write_result(result, file=f, options=options, **kwargs)
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 257, in write_result
    for i, (start, end, text) in enumerate(
                                 ^^^^^^^^^^
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 197, in iterate_result
    for subtitle in iterate_subtitles():
                    ^^^^^^^^^^^^^^^^^^^
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 147, in iterate_subtitles
    last: float = get_start(result["segments"]) or 0.0
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 72, in get_start
    return next(
           ^^^^^
  File "/Users/kracekumar/code/s2t/.venv/lib/python3.12/site-packages/whisper/utils.py", line 73, in <genexpr>
    (w["start"] for s in segments for w in s["words"]),
                                           ~^^^^^^^^^
KeyError: 'words'
```


### Conclusion

Whisper model was able to provide useable and close to accurate subtitles for the audio. There are a lot of rough edges like producing long text without proper truncation that hampers the experience. I'm pretty sure, there are tweaks to get the perfect subtitles with enough effort. 

- [Open-webui in personal laptop](https://kracekumar.com/post/open-webui-in-personal-laptop/index.md): Open-webui in personal laptop
---
title: Open-webui in personal laptop
date: 2024-12-31T23:00:05+05:30
draft: false
tags:
  - AI
  - chatgpt
  - language_model
---


In 2024, Large Language Models (LLMs) and Generative AI(GenAI) exploded at an unimaginable rate. I didn't follow the trend. Currently, there is a news every day on new models. Also, the explosion of models reached a stage where local MacBooks can run a decent enough model. I want to have local model with a decent UI support through web or terminal that provides clean user interface.

I stumbled upon [open-webui](https://github.com/open-webui/open-webui). 

>Open WebUI is an [extensible](https://github.com/open-webui/pipelines), feature-rich, and user-friendly self-hosted WebUI designed to operate entirely offline. It supports various LLM runners, including Ollama and OpenAI-compatible APIs. For more information, be sure to check out our [Open WebUI Documentation](https://docs.openwebui.com/).

I have previously tried [llm python package](https://github.com/simonw/llm) to try out stand alone models. 

### Installing open-webui

LLM package was setup using `uv` and `python` 3.12. Adding open-webui to the existing package failed because of `ctranslate` version compatability. So I had to run the LLM package and open-webui in Python 3.11 version. After installing open-webui, I expected it to pick up llama model from llm package installation in `~/Library/Application\ Support/io.datasette.llm/`. That didn't work. So I installed ollama mac package with [llama 3.2 model](https://ollama.com/library/llama3.2).  

Then the open-webui picked up the model (see the top left corner of the image) without making any changes. I used simple `uv run open-webui serve` to run openwebui in the local machine.

![Open WebUI  running on a laptop](/images/openwebui/openwebui.png)


I tried out the a simple question, `When did new year became important global fesatival? Explain the key historical events.`

Here is the answer

![New year  answer - Part 1](/images/openwebui/newyear_part1.png)


![New year  answer - Part 2](/images/openwebui/newyear_part2.png)

The interface looks similar to ChatGPT and usable for long chat.

The voice to text translation was sub-par in the home page, I asked, `explain the beginning of new year and major historical events around it`. The translation was out right wrong and considered new year as holi festival. It skipped first part of the voice message.


![Holi](/images/openwebui/holi.png)
- [Chatgpt Generate Ruby Code to Check User Exists in Github](https://kracekumar.com/post/chatgpt-gh-profile-lookup/index.md): Chatgpt Generate Ruby Code to Check User Exists in Github
---
title: "Chatgpt Generate Ruby Code to Check User Exists in Github"
date: 2022-12-04T21:30:15+00:00
draft: false
tags: ["chatgpt", "AI", "language_model"]
---

On saturday night, I want to work on a side project. To find a name for the project and I wanted to create an Github organization in the same name. I started trying out the name one after the other, all the names were taken and thought about writing small script in ruby. Then from nowhere I decided to let chatgpt to write the code for me.

In this blog post, I'll share some of the code generated by chatgpt for the checking whether a profile name exists in the Github with increasing complexity.

[You can signup for chatgpt from openai site.](https://chat.openai.com/chat)

# Case 1: Simple case 

![Case 1](/images/chatgpt/Screenshot_2022-12-04_at_17.59.46.png)

**Prompt**

```
Can you write a ruby program to check a a list of profile names exists in github?
```

**Generated Code**

```ruby
require 'octokit'

# Replace with your access token
ACCESS_TOKEN = "ACCESS_TOKEN"

# Replace with the list of profile names
profiles = ["profile1", "profile2", "profile3"]

client = Octokit::Client.new(access_token: ACCESS_TOKEN)

profiles.each do |profile|
  user = client.user(profile)
  if user
    puts "#{profile} exists on GitHub"
  else
    puts "#{profile} does not exist on GitHub"
  end
end


```

The code looks legit and may run, but there are two issues

- There is no instruction to install `octokit`.
- Not reading the secret from the environment variable is a bad sign.


Let's enhance the instruction by explicitly stating it.

**Prompt**

```
Can you write a ruby program to check a a list of profile names exists in github? 
If you use external gem, specify the installation instruction. Also read secrets from environment variable.
```

**Generated code**

```ruby
require 'octokit'

# Replace with the list of profile names
profiles = ["profile1", "profile2", "profile3"]

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

profiles.each do |profile|
  user = client.user(profile)
  if user
    puts "#{profile} exists on GitHub"
  else
    puts "#{profile} does not exist on GitHub"
  end
end
```

Along with the code, there is an installation line, `To install the octokit gem, run gem install octokit in your terminal.`

**Output**

```
$ ruby case_1.rb
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
profile1 exists on GitHub
profile2 exists on GitHub
profile3 exists on GitHub
```

The code actually works!

Now let's slightly modify the requirements to read list of profile names from a file.

# Case 2: Read list of profile names from a file, check the profiles and write to JSON file

**Prompt**

```
Can you write a ruby program to read a list of github profile from a text file, 
check each profile exists in github and write the result to a JSON file? 
If you use external gem, specify the installation instruction. Also read secrets from environment variable.
```

**Generated code**

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
profiles = File.readlines("profiles.txt").map(&:chomp)

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles.each do |profile|
  user = client.user(profile)
  if user
    result[profile] = true
  else
    result[profile] = false
  end
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end

```

**Output**

```
$cat profiles.txt
kracekumar
zzzzzz
404
$ruby case_2.rb
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
Traceback (most recent call last):
        15: from case_2.rb:15:in `<main>'
        14: from case_2.rb:15:in `each'
        13: from case_2.rb:16:in `block in <main>'
        12: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/client/users.rb:34:in `user'
        11: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/connection.rb:19:in `get'
        10: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/connection.rb:156:in `request'
         9: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/sawyer-0.9.2/lib/sawyer/agent.rb:99:in `call'
         8: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/connection.rb:200:in `get'
         7: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/connection.rb:445:in `run_request'
         6: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/rack_builder.rb:153:in `build_response'
         5: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/middleware/follow_redirects.rb:61:in `call'
         4: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/middleware/follow_redirects.rb:73:in `perform_with_redirection'
         3: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/middleware.rb:17:in `call'
         2: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/response.rb:42:in `on_complete'
         1: from /Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/faraday-2.7.1/lib/faraday/middleware.rb:18:in `block in call'
/Users/krace/.rbenv/versions/2.7.2/lib/ruby/gems/2.7.0/gems/octokit-6.0.1/lib/octokit/response/raise_error.rb:14:in `on_complete': GET https://api.github.com/users/404: 404 - Not Found // See: https://docs.github.com/rest/reference/users#get-a-user (Octokit::NotFound)
```

Ah ha. the `profile` 404 returns `404` and the program fails, now enhance the prompt to modify it.

**Prompt**

```
Can you write a ruby program to read a list of github profile from a text file, 
check each profile exists in github and write the result to a JSON file? The code should handle exceptions. 

If you use external gem, specify the installation instruction. Also read secrets from environment variable.
```

**Generated code**

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
profiles = File.readlines("profiles.txt").map(&:chomp)

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles.each do |profile|
  begin
    user = client.user(profile)
    if user
      result[profile] = true
    else
      result[profile] = false
    end
  rescue Octokit::NotFound
    result[profile] = false
  end
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end
```

- Now the model generates the code with exception handling but only handles 404 without explicitly mentioning the specific exception in the prompt. Neat!

**Output**

```
$ruby case_2.rb
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
$cat result.json 
{"kracekumar":true,"zzzzzz":true,"404":false}
```

# Case 3: Read from dictionary, check the profile in github, handle rate-limit and write the results to file.

**Prompt** 

```
Can you write a ruby program to read a list of github profile from a text file in the location /usr/share/dict/words, 
check each profile exists in github, check for rate limit and write the result to a JSON file? 
Print result of each user check in the terminal. The code should handle exceptions. 

If you use external gem, specify the installation instruction. Also read secrets from environment variable.
```

**Generated Code**

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
profiles = File.readlines("/usr/share/dict/words").map(&:chomp)

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles.each do |profile|
  # Check for rate limit
  rate_limit = client.rate_limit
  if rate_limit.remaining == 0
    puts "Rate limit exceeded. Waiting for reset at #{rate_limit.reset_at}..."
    sleep rate_limit.resets_in
  end

  begin
    user = client.user(profile)
    if user
      result[profile] = true
      puts "#{profile} exists on GitHub"
    else
      result[profile] = false
      puts "#{profile} does not exist on GitHub"
    end
  rescue Octokit::NotFound
    result[profile] = false
    puts "#{profile} does not exist on GitHub"
  end
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end

```

**Output**

```
$ruby case_3.rb 
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
A exists on GitHub
a exists on GitHub
aa exists on GitHub
aal exists on GitHub
...
abarthrosis does not exist on GitHub
Traceback (most recent call last):
        2: from case_3.rb:15:in `<main>'
        1: from case_3.rb:15:in `each'
case_3.rb:19:in `block in <main>': undefined method `reset_at' for #<Octokit::RateLimit:0x0000000107ba8760> (NoMethodError)
Did you mean?  resets_at
               resets_at=
```

Oops, the code breaks at line number 19  - `rate_limit.reset_at`.

Let's give feedback to the chatgpt, saying the code fails at line 19.

**Prompt**

```
the above code failed with the following error, undefined method `reset_at' for #<Octokit::RateLimit:0x000000010498f210> (NoMethodError)
Did you mean?  resets_at
               resets_at=

Can you fix it?
```

**Generated code**

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
profiles = File.readlines("/usr/share/dict/words").map(&:chomp)

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles.each do |profile|
  # Check for rate limit
  rate_limit = client.rate_limit
  if rate_limit.remaining == 0
    puts "Rate limit exceeded. Waiting for reset at #{rate_limit.resets_at}..."
    sleep rate_limit.resets_in
  end

  begin
    user = client.user(profile)
    if user
      result[profile] = true
      puts "#{profile} exists on GitHub"
    else
      result[profile] = false
      puts "#{profile} does not exist on GitHub"
    end
  rescue Octokit::NotFound
    result[profile] = false
    puts "#{profile} does not exist on GitHub"
  end
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end

```

The model generated above code with a message, 
`The error occurred because the Octokit::RateLimit object does not have a method named reset_at. The correct method is resets_at.` Great!

**Output** 

```
$ruby case_3.rb 
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
Rate limit exceeded. Waiting for reset at 2022-12-04 19:46:29 +0000...
```

Now the bug is fixed and worked as expected! I don't say issues. OF course! there can be all sort of exceptions that need to be handled - socket timeout, slow response time, 500s etc... Saving the results to a file after checking all profiles is problematic.

# Case 4: Let's tell the model to refactor the code into functions.

**Prompt**

```
Thanks, that fixed exception. Can you refactor the code into functions?
```

**Generated Code** 

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
def read_profiles
  File.readlines("/usr/share/dict/words").map(&:chomp)
end

# Check if user exists on GitHub
def check_user(user)
  begin
    client.user(user)
    true
  rescue Octokit::NotFound
    false
  end
end

# Check for rate limit and wait if necessary
def check_rate_limit
  rate_limit = client.rate_limit
  if rate_limit.remaining == 0
    puts "Rate limit exceeded. Waiting for reset at #{rate_limit.resets_at}..."
    sleep rate_limit.resets_in
  end
end

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles = read_profiles

profiles.each do |profile|
  check_rate_limit

  result[profile] = check_user(profile)
  puts "#{profile} exists on GitHub" if result[profile]
  puts "#{profile} does not exist on GitHub" unless result[profile]
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end

```

The generated code is synatically invalid, since `client` variable is undefined in 21

**Prompt** 

```
I get the error when I run the above code, Traceback (most recent call last):
        3: from case_3.rb:38:in `<main>'
        2: from case_3.rb:38:in `each'
        1: from case_3.rb:39:in `block in <main>'
case_3.rb:21:in `check_rate_limit': undefined local variable or method `client' for main:Object (NameError)
```

**Generated Code** 

```ruby
require 'octokit'
require 'json'

# Read profile names from text file
def read_profiles
  File.readlines("/usr/share/dict/words").map(&:chomp)
end

# Check if user exists on GitHub
def check_user(client, user)
  begin
    client.user(user)
    true
  rescue Octokit::NotFound
    false
  end
end

# Check for rate limit and wait if necessary
def check_rate_limit(client)
  rate_limit = client.rate_limit
  if rate_limit.remaining == 0
    puts "Rate limit exceeded. Waiting for reset at #{rate_limit.resets_at}..."
    sleep rate_limit.resets_in
  end
end

# Read access token from environment variable
access_token = ENV["ACCESS_TOKEN"]

client = Octokit::Client.new(access_token: access_token)

# Create a new hash to store the result
result = {}

profiles = read_profiles

profiles.each do |profile|
  check_rate_limit(client)

  result[profile] = check_user(client, profile)
  puts "#{profile} exists on GitHub" if result[profile]
  puts "#{profile} does not exist on GitHub" unless result[profile]
end

# Write the result to a JSON file
File.open("result.json","w") do |f|
  f.write(result.to_json)
end

```

Fix message is better: `The error occurred because the check_rate_limit and check_user functions did not have access to the client object. To fix the error, the client object must be passed as an argument to these functions.`

**Output** 

```
$ruby case_3.rb 
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
Rate limit exceeded. Waiting for reset at 2022-12-04 19:46:29 +0000...
```

# Case 5: Write the test cases

**Prompt**

```
Can you write the test cases for the above code? 
```

**Generated Code**

```ruby
require 'minitest/autorun'
require 'octokit'
require 'json'

class TestGitHubProfileChecker < Minitest::Test
  def setup
    @access_token = ENV["ACCESS_TOKEN"]
    @client = Octokit::Client.new(access_token: @access_token)
  end

  def test_read_profiles
    expected_profiles = ["test1", "test2"]
    File.stub :readlines, expected_profiles do
      assert_equal expected_profiles, read_profiles
    end
  end

  def test_check_user_exists
    user = "octocat"
    assert check_user(@client, user)
  end

  def test_check_user_does_not_exist
    user = "not_a_real_user"
    refute check_user(@client, user)
  end

  def test_check_rate_limit
    rate_limit = @client.rate_limit
    rate_limit.stub :remaining, 0 do
      rate_limit.stub :resets_at, Time.now do
        rate_limit.stub :resets_in, 0 do
          assert_output "Rate limit exceeded. Waiting for reset at #{rate_limit.resets_at}...\n" do
            check_rate_limit(@client)
          end
        end
      end
    end
  end
end

```

After adding the previously generated code, a test case failed. 

```
ruby test_case.rb 
To use retry middleware with Faraday v2.0+, install `faraday-retry` gem
Run options: --seed 34383

# Running:

..F.

Finished in 0.782971s, 5.1087 runs/s, 5.1087 assertions/s.

  1) Failure:
TestGitHubProfileChecker#test_check_rate_limit [test_case.rb:83]:
In stdout.
--- expected
+++ actual
@@ -1,2 +1 @@
-"Rate limit exceeded. Waiting for reset at 2022-12-04 20:24:05 +0000...
-"
+""


4 runs, 4 assertions, 1 failures, 0 errors, 0 skips
```

Looking at the code, rate limit code was stubbed but doesn't behave as expected. Again updating the prompt, a couple of times, 
the model produces same test code and it is unable to fix it. After multiple prompts, the model stop generating complete code.

**Final generated code** 

```ruby
require 'minitest/autorun'
require 'octokit'
require 'json'

class TestGitHubProfileChecker < Minitest::Test
  def setup
    @access_token = ENV["ACCESS_TOKEN"]
    @client = Octokit::Client.new(access_token: @access_token)
  end

  def test_read_profiles
    expected_profiles = ["test1", "test2"]
    File.stub :readlines, expected_profiles do
      assert_equal expected_profiles, read_profiles
    end
  end

  def test_check_user_exists
    user = "octocat"
    assert check_user(@client, user)
  end

  def test_check_user_does_not_exist
    user = "not_a_real_user"
    refute check_user(@client, user)
  end

  def test_check_rate_limit
    now = Time.now
    rate_limit = Octokit::RateLimit.new(remaining: 0, resets_at: now.to_i, resets_in: 0)
    @client.stub :rate_limit, rate_limit do
      assert_output "Rate limit exceeded. Waiting for reset at #{now}...\n" do
        check_rate_limit(@client)

```

**Final conversation**

```
Me: How many lines of code are you allowed to generate in a given time?
Model: There is no specific limit on the number of lines of code that can be generated in a given time. 
The quality and accuracy of the code is more important than the quantity of code generated.
Me: But your previous code generation was incomplete. can you tell me what happened?
Model: I apologize if my previous code generation was incomplete. There can be a number of reasons why this may have happened. 
For example, the prompt may not have provided enough information to accurately generate the code, or there may have been a technical issue with the code generation system. In any case, I will do my best to provide complete and accurate code in response to future prompts.
```

# Takeway

- Given enough context and iteration the model is able to correct the simple errors.
- It take many attempts to get something working, but it's far away from production quality code(could be pure model safe-guard).
- It's able to remember the previous context and improvise based on feedback.
- Once things get slightly complicated the model fails to generate syntactically correct code.

It was definitely fun to play around with the model but making the model to produce fool proof code seems very far away.

- [Python 3.11 micro-benchmark](https://kracekumar.com/post/micro-benchmark-python-311/index.md): Python 3.11 micro-benchmark
---
title: "Python 3.11 micro-benchmark"
date: 2022-10-31T12:14:14Z
draft: false
---
[speed.python.org](https://speed.python.org/comparison/) tracks Python module performance improvement
against several modules across Python versions.
In the real world, the module level speed improvements don't
directly translate to application performance improvements.
The application is composed of several hundreds of dependencies
the performance of one specific module doesn't improve total
application performance. Nonetheless, it can improve performance
parts of the API or certain flows.

When I first heard the [faster CPython initiative](https://github.com/faster-cpython/ideas), I was intrigued to
find out, how does it translate to small application performance across
various versions since a lot of critical components are already in C
like Postgresql driver. The faster CPython presentation clear states,
the performance boost is only guaranteed for pure python code and
not C-extensions.

In this post, I'll share my benchmark results on a couple of hand picked snippets.
There is a PyPI package data, do some transformation or do some network operations
or file operations. How does that perform against different Python versions.

# Setup
- The benchmark was run on `Intel 11th Gen, i7 @ 2.30GHz` with 16 cores.
During entire benchmark no other user initialized programs was run like browser or text editor.

- The benchmark result was measured using `hyperfine` [command line tool](https://github.com/sharkdp/hyperfine) with `--warmup 1` flag and 10 runs for each version.

- No CPU pinning during benchmark.

- Python 3.9.13, Python 3.10.5, Python 3.11.0 versions were used for benchmarking.

- Median of 10 runs is used over mean.

Here is the result of the benchmark.

``` bash
                           Python performance - 3.9 vs 3.10 vs 3.11
┏━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Name                     ┃ Median 3.9 (s) ┃ Median 3.10 (s) ┃ Median 3.11 (s) ┃ 3.11 Change ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ pypicache                │ 7.4096         │ 7.2654          │ 6.9122          │ 6.71%       │
│ pypi_compression         │ 57.2634        │ 57.3878         │ 57.3969         │ -0.23%      │
│ pypi_postgres            │ 11.4657        │ 11.3525         │ 11.1345         │ 2.89%       │
│ pypi_sqlite_utils        │ 35.6113        │ 34.8789         │ 34.3522         │ 3.54%       │
│ pypi_write_file          │ 17.7075        │ 17.2318         │ 16.7363         │ 5.48%       │
│ pypi_write_file_parallel │ 12.7005        │ 13.0702         │ 12.5040         │ 4.33%       │
│ pypi_zstd_compression    │ 1.4794         │ 1.4687          │ 1.4643          │ 1.02%       │
└──────────────────────────┴────────────────┴─────────────────┴─────────────────┴─────────────┘
```

# Experiments

## PyPI Cache

``` python
import json
from operator import itemgetter
from urllib.parse import urlparse
from rich.console import Console
from rich.table import Table
from collections import defaultdict


def get_domain(addr):
    domain = urlparse(addr)
    return domain.netloc


def count_domains(data):
    result = defaultdict(int)
    for package in data:
        domain = get_domain(package['info']['home_page'])
        result[domain] += 1

    return result


def count_licenses(data):
    result = defaultdict(int)
    for package in data:
        classifiers = package['info'].get('classifiers', '')
        license = ''
        if classifiers:
           for classifier in classifiers:
               if 'License' in classifier:
                  license = classifier.split('::')[-1].strip()
                  result[license] += 1
    return result


def get_top(data, n=10):
    return sorted(data.items(), key=itemgetter(1), reverse=True)[:n]


def main():
    data = json.load(open('../pypicache/pypicache.json'))
    domains = count_domains(data)
    top_domains = get_top(domains, 10)
    licenses = count_licenses(data)
    top_licenses = get_top(licenses, 10)

    # Print result in a table
    table = Table(title="Project domains")
    table.add_column("Domain")
    table.add_column("Count")
    table.add_column("Percentage")

    for domain, count in top_domains:
        table.add_row(str(domain), str(count), str(count/len(data) * 100))

    console = Console()
    console.print(table)

    table = Table(title="Project licenses")
    table.add_column("License")
    table.add_column("Count")
    table.add_column("Percentage")


    for license, count in top_licenses:
        table.add_row(str(license), str(count), str(count/len(data) * 100))

    console.print(table)


if __name__ == "__main__":
   main()

```

The snippet loads the PyPI JSON file downloaded (720 MB) from [https://pypicache.repology.org/pypicache.json.zst](https://pypicache.repology.org/pypicache.json.zst) and does IO operations. Extract the zstd file and place it in pypicache directory.

The IO operations performs five activities
- Find home page domain frequencies across various packages.
- Get top 10 home page domains.
- Find license frequencies of the packages.
- Get top 10 used licenses.
- Print the results in the table.

**Python 3.11 faster compared to Python 3.9 by `6.71%`. The median execution times - `Python 3.9 - 7.40s, Python 3.10 - 7.26s, Python 3.11 - 6.91s`.**

## PyPI Compression

``` python
import bz2
import json
import pathlib


def main():
    with open('../pypicache/pypicache.json', 'rb') as fp:
        filename = '../pypicache/pypicache.json.bz2'
        with bz2.open(filename, 'wb') as wfp:
            wfp.write(bz2.compress(fp.read()))

        pathlib.Path(filename).unlink()


if __name__ == "__main__":
    main()

```

The snippet compresses the decompressed PyPI JSON data to `bz2` format
and deletes the compressed file.

**Python 3.11 was the slowest, the performance degraded by `0.23%` compared to 3.9
and the median execution times are `Python 3.9 - 57.26 s, Python 3.10 - 57.38 s, Python 3.11 - 57.39s`.**

Interesting part is Python 3.9 is faster than Python 3.10 and Python 3.10 is faster than 3.11. (No hunch why is it so)

## PyPI Postgres

``` python
import json
import psycopg2


def main():
    data = json.load(open('../pypicache/pypicache.json'))
    conn = psycopg2.connect("dbname=scratch user=admin password=admin")
    cur = conn.cursor()
    stop = 100000
    for idx, package in enumerate(data[:stop]):
        info = package['info']
        cur.execute("""insert into
        pypi(author, author_email, bugtrack_url, license, maintainer, maintainer_email, name, summary, version)
        values(%s, %s, %s, %s, %s, %s, %s, %s, %s)""",
                    (info['author'], info['author_email'],
                     info['bugtrack_url'], info['license'], info['maintainer'],
                     info['maintainer_email'], info['name'], info['summary'],
                     info['version']))
        if idx % 100 == 1 or idx == stop:
            conn.commit()
    conn.commit()
    cur.execute('select count(*) from pypi')
    print("Total rows: ", cur.fetchone())
    cur.execute('delete from pypi')
    conn.commit()

```

The snippet uses psycopg2 to insert hundred thousand packages into a postgres database.

- Insert pypi package details into a table and commit 100 records at a time to `pypi` table.
- Find total number of inserted records in `pypi` table.
- Delete the `pypi` table.

**Python 3.11 is faster compared to Python 3.10 by `2.89%`. The median execution times, `Python 3.9 - 11.46s, Python 3.10 - 11.35s, Python 3.11 - 11.13s`.**

Since most of the code was doing network call, it's surprising to see a small performance improvement in Python 3.11.

## PyPI SQLite Utils

``` python
import json
from sqlite_utils.db import Database, Table
from pathlib import Path

def main():
    data = json.load(open('../pypicache/pypicache.json'))
    db_name = 'pypi.db'
    db = Database(db_name)
    table = Table(db, 'pypi')

    for idx in range(1000):
        table.insert_all(data[idx * 100:idx * 100 + 100])


    print("Rows: ", table.count)
    Path(db_name).unlink()

if __name__ == "__main__":
   main()

```

The snippet inserts hundred thousand PyPI package data to sqlite database over ten thousand iterations using `sqlite_utils` package and deletes the sqlite file.

**Python 3.11 is faster compared to Python 3.9 by 3.54%. The median execution times, `Python 3.9 - 35.61s, 34.87s, 34.35s`.**

## PyPI Write To File

``` python
import json
from pathlib import Path


def write_to_file(directory, package):
    name = package['info']['name']

    with open(directory / (name + ".json"), "w") as fp:
        fp.write(json.dumps(package))


def delete_files(directory):
    for filename in list(directory.iterdir()):
        filename.unlink()

    directory.rmdir()


def main():
    data = json.load(open('../pypicache/pypicache.json'))
    directory = Path("/tmp/pypi")
    directory.mkdir()
    for package in data:
        write_to_file(directory=directory, package=package)
    delete_files(directory)


if __name__ == "__main__":
   main()

```

The snippet writes each PyPI package info to a separate JSON file and deletes all the file.

**Python 3.11 is faster compared to Python 3.9 by 5.48%. The median execution times, `Python 3.9 - 17.07 s, Python 3.10 - 17.23 s, Python 3.11 - 16.73 s`.**

# PyPI Parallel Write To File

``` python
import json
from pathlib import Path
from multiprocessing import Pool
from functools import partial


def write_to_file(directory, package):
    name = package['info']['name']

    with open(directory / (name + ".json"), "w") as fp:
        fp.write(json.dumps(package))


def delete_files(directory):
    for filename in list(directory.iterdir()):
        filename.unlink()

    directory.rmdir()


def main():
    data = json.load(open('../pypicache/pypicache.json'))
    directory = Path("/tmp/pypi")
    directory.mkdir()
    with Pool(33) as p:
        p.map(partial(write_to_file, directory), data)
    delete_files(directory)


if __name__ == "__main__":
   main()

```

The snippet uses Python multi-processing to write PyPI package to a separate JSON file and deletes all the JSON file serially. 33 worker in the pool.

**Python 3.11 is faster compared to Python 3.9 by 4.33%. The median execution time, `Python 3.9 - 12,70 s, Python 3.10 - 13.07 s, Python 3.11 - 12.50s`.**

## PyPI zstd Compression

``` python
import zstandard as zstd
import json
import pathlib


def main():
    with open('../pypicache/pypicache.json', 'rb') as fp:
        filename = '../pypicache/pypicache_benchmark.json.zstd'
        with zstd.open(filename, 'wb') as wfp:
            wfp.write(zstd.compress(fp.read()))

        pathlib.Path(filename).unlink()


if __name__ == "__main__":
    main()

```

The snippet compress the PyPI JSON file to zstd file format using `zstandard` library.

**Python 3.11 is faster compared to Python 3.10 by 1.02%. In general, it's safe to say, there is no useful performance improvement here. The median execution times, `Python 3.9 - 1.47s, Python 3.10 - 1.46 s, Python 3.11 - 1.46s`.**

**Compared to `bz2` compression, zstd is atleast ~50 times faster.**

## Benchmark runner

The benchmark runner is similar for all experiments.


``` fish
#!/usr/bin/env fish
ls -lat | grep venv | xargs rm -rf
echo "Running 3.9 benchmark"
python3.9 -m venv .venv_3_9
source .venv_3_9/bin/activate.fish
pip install -r requirements.txt
hyperfine --warmup 1 'python run_benchmark.py' --export-json py_3_9.json
echo "Running 3.10 benchmark"
python3.10 -m venv .venv_3_10
source .venv_3_10/bin/activate.fish
pip install -r requirements.txt
hyperfine --warmup 1 'python run_benchmark.py' --export-json py_3_10.json
echo "Running 3.11 benchmark"
python3.11 -m venv .venv_3_11
source .venv_3_11/bin/activate.fish
pip install -r requirements.txt
hyperfine --warmup 1 'python run_benchmark.py' --export-json py_3_11.json

```
- python3.9, python3.10, python3.11 are aliases to pyenv python versions.

# Conclusion

- If the code uses standard library JSON, switching to Python 3.11 will provide significant performance improvement.
- If you're using `bz2` for compression, consider using `zstd` that is 50 times faster. (I wasn't aware of zstd format untill I downloaded pypi cache data).
- If you're using Python 3.8 or 3.9, it's better to upgrade to Python 3.11 over Python 3.10.
- Even when the code is entirely using C-extension like psycopg2, there is performance improvement.
So it's good to benchmark against 3.11 when all the dependencies run on Python 3.11.
- Since all programs load JSON data, the performance gain can be due to pure JSON improvement.

- [Bazel Build System Introduction for Java](https://kracekumar.com/post/bazel-build-system-intro-for-java/index.md): Bazel Build System Introduction for Java
---
title: "Bazel Build System Introduction for Java"
date: 2022-10-23T16:52:00+01:00
draft: false
tags: ["java", "bazel", "tutorial"]
---
> You can find the source code of tutorial in [bazel-101 branch](https://github.com/kracekumar/jplay/tree/bazel-101).

### What will you learn?

- Introduction to bazel build system
- How to build and run Java package?
- How to add maven dependency to bazel build files?
- How to add protobuf compiler to bazel build?

### Introduction

[Bazel](https://bazel.build/) is imperative build system that can build packages for Java, C++, Python, Ruby, Go, etc ...
The two main advantages of bazel,

1. One build tool can build packages for variety of languages and easier for
platform teams to build packages across variety of languages.
Consider learning many different build systems - Pip, bundle, maven, etc...
2. Bazel build system can cache already built packages in a remote or local environment and
can reuse it without compiling be it for binary, library, or tests.

The main difference between bazel and other build/dependency management systems
is imperative vs declarative approach.

Consider a Java package `sample` with the following structure with one file `Sample.java`

``` shell
$ls_custom

.
BUILD
src/main/java/com/example/Sample.java
WORKSPACE

```

Every project contains one `WORKSPACE` [file](https://bazel.build/reference/glossary#workspace) and contains one or many `BUILD` [files](https://bazel.build/reference/glossary#build-file).
One `BUILD` file for a package. In the example project, `BUILD`
> WORKSPACE: A directory containing a WORKSPACE file and source code for the software you want to build. Labels that start with // are relative to the workspace directory.

> WORKSPACE FILE: Defines a directory to be a workspace. The file can be empty, although it usually contains external repository declarations to fetch additional dependencies from the network or local filesystem.

> A BUILD file is the main configuration file that tells Bazel what software outputs to build, what their dependencies are, and how to build them. Bazel takes a BUILD file as input and uses the file to create a graph of dependencies and to derive the actions that must be completed to build intermediate and final software outputs. A BUILD file marks a directory and any sub-directories not containing a BUILD file as a package, and can contain targets created by rules. The file can also be named BUILD.bazel.

The `Sample.java` file looks like

``` shell
package com.example;

public class Sample{
    public static void main(String[] args) {
        String label = "Krace";
        System.out.println(String.format("Hello: %s", label));

    }
}
```

### Build the target

Now let's build the java binary and execute it. The Sample.java file has no external dependency.

Assuming the bazel is installed, let's write imperative code to build the binary package(BUILD file).

``` shell
java_binary(
    name = "Sample",
    srcs = glob(["src/main/java/com/example/*.java"]),
)

```

In BUILD file mention, it's a Java binary by invoking `java_binary` function, name the package as `Sample`
and source files as `srcs=glob(["src/main/java/com/example/*.java"])`.
All the java files inside `src/main/java/com/example` directory is part of the package `Sample`.

Now build the package using `bazel build <target>` syntax.

``` shell
$bazel build Sample
INFO: Analyzed target //:Sample (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:Sample up-to-date:
  bazel-bin/Sample.jar
  bazel-bin/Sample
INFO: Elapsed time: 0.046s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action

```

The command built the package without any error and the target is generated in `bazel-bin` directory.

### Run the target

Now run the target using `bazel run <target>` or `./bazel-bin/Sample`

``` shell
 ./bazel-bin/Sample
Hello: Krace

```

When you invoke `bazel run <target>`, bazel build the package and executes it(uses the cache, if there is no change).

``` shell
$ bazel run Sample
INFO: Analyzed target //:Sample (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:Sample up-to-date:
  bazel-bin/Sample.jar
  bazel-bin/Sample
INFO: Elapsed time: 0.046s, Critical Path: 0.00s
INFO: 1 process: 1 internal.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Hello: Krace

```

Similar to `java_binary`, notable functions are `java_library`, `java_test`.

### Add a dependency from the maven repository

Bazel has rules and definition for how to download and build the packages that are distributed to the
repositories like [maven](https://bazel.build/migrate/maven) or zip files.

To download file from maven repository, bazel needs to some information about the repository and it's structure.

In `WORKSPACE` file, you can details about the maven bazel rules and what to packages are required.

Let's add `okhttp3` from maven as dependency to `Sample Project`.


### Update WORKSPACE file

``` shell
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

RULES_JVM_EXTERNAL_TAG = "4.2"

RULES_JVM_EXTERNAL_SHA = "cd1a77b7b02e8e008439ca76fd34f5b07aecb8c752961f9640dea15e9e5ba1ca"

http_archive(
    name = "rules_jvm_external",
    sha256 = RULES_JVM_EXTERNAL_SHA,
    strip_prefix = "rules_jvm_external-%s" % RULES_JVM_EXTERNAL_TAG,
    url = "https://github.com/bazelbuild/rules_jvm_external/archive/%s.zip" % RULES_JVM_EXTERNAL_TAG,
)

load("@rules_jvm_external//:repositories.bzl", "rules_jvm_external_deps")

rules_jvm_external_deps()

load("@rules_jvm_external//:setup.bzl", "rules_jvm_external_setup")

rules_jvm_external_setup()

load("@rules_jvm_external//:defs.bzl", "maven_install")

maven_install(
    artifacts = [
        # https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp
        "com.squareup.okhttp3:okhttp:jar:4.10.0",
    ],
    repositories = [
        "https://repo1.maven.org/maven2",
    ],
)

```
That's a lot of copy-paste code!

1. First the bazel workspace loads `http` build file located in `bazeltools` repo.
2. Set some rules for JVM, external dependencies and external setup.
3. Then workspace loads `maven_install` function.
4. `maven_install` function specifies the dependency and repository location for installation.


### Update the BUILD file
Now the workspace knows what to load for the project, now let's update the build file.

Most of the heavy lifting happens in the WORKSPACE file.
In build file, mention the dependency to load using `deps` parameter to the function `java_binary`.

``` python
java_binary(
    name = "Sample",
    srcs = glob(["src/main/java/com/example/*.java"]),
    deps = [
        "@maven//:com_squareup_okhttp3_okhttp",
    ],
)
```

`@maven` indicates the dependency is a maven install. And in the name, `.` in artifcat becomes  `_`.

``` shell
bazel build Sample
INFO: Analyzed target //:Sample (41 packages loaded, 753 targets configured).
INFO: Found 1 target...
Target //:Sample up-to-date:
  bazel-bin/Sample.jar
  bazel-bin/Sample
INFO: Elapsed time: 5.997s, Critical Path: 2.27s
INFO: 20 processes: 7 internal, 10 linux-sandbox, 3 worker.
INFO: Build completed successfully, 20 total actions

```

### Add protobuf as dependency

Similar to maven rules, it's possible to download any dependency from the internet and add it as dependency.

[Protobuf](https://developers.google.com/protocol-buffers/) is a binary data serialization format
for communicating with services. Since it's a binary format,
the proto buffer compiler generates the java class to encode and decode.

Let's add a proto buf definition to the project and use it.

Create a new directory `protos` in example directory and  add `label.proto`.

A simple Label with list of names. Extra option configuration is to generate java class from proto definition.

``` java
syntax = "proto3";
package example;

option java_multiple_files = true;
option java_package = "com.example.protos";
option java_outer_classname = "LabelProtos";

message Label {
  repeated string names = 1;
}

```

Add [following lines](https://github.com/cgrushko/proto_library) to `WORKSPACE` file

``` python
# proto
# rules_cc defines rules for generating C++ code from Protocol Buffers.
http_archive(
    name = "rules_cc",
    sha256 = "35f2fb4ea0b3e61ad64a369de284e4fbbdcdba71836a5555abb5e194cf119509",
    strip_prefix = "rules_cc-624b5d59dfb45672d4239422fa1e3de1822ee110",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_cc/archive/624b5d59dfb45672d4239422fa1e3de1822ee110.tar.gz",
        "https://github.com/bazelbuild/rules_cc/archive/624b5d59dfb45672d4239422fa1e3de1822ee110.tar.gz",
    ],
)

http_archive(
    name = "rules_java",
    sha256 = "ccf00372878d141f7d5568cedc4c42ad4811ba367ea3e26bc7c43445bbc52895",
    strip_prefix = "rules_java-d7bf804c8731edd232cb061cb2a9fe003a85d8ee",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_java/archive/d7bf804c8731edd232cb061cb2a9fe003a85d8ee.tar.gz",
        "https://github.com/bazelbuild/rules_java/archive/d7bf804c8731edd232cb061cb2a9fe003a85d8ee.tar.gz",
    ],
)

# rules_proto defines abstract rules for building Protocol Buffers.
http_archive(
    name = "rules_proto",
    sha256 = "2490dca4f249b8a9a3ab07bd1ba6eca085aaf8e45a734af92aad0c42d9dc7aaf",
    strip_prefix = "rules_proto-218ffa7dfa5408492dc86c01ee637614f8695c45",
    urls = [
        "https://mirror.bazel.build/github.com/bazelbuild/rules_proto/archive/218ffa7dfa5408492dc86c01ee637614f8695c45.tar.gz",
        "https://github.com/bazelbuild/rules_proto/archive/218ffa7dfa5408492dc86c01ee637614f8695c45.tar.gz",
    ],
)

load("@rules_cc//cc:repositories.bzl", "rules_cc_dependencies")

rules_cc_dependencies()

load("@rules_java//java:repositories.bzl", "rules_java_dependencies", "rules_java_toolchains")

rules_java_dependencies()

rules_java_toolchains()

load("@rules_proto//proto:repositories.bzl", "rules_proto_dependencies", "rules_proto_toolchains")

rules_proto_dependencies()

rules_proto_toolchains()

```

So many rules and setup for proto conversion and java specific instructions!

### protobuf BUILD instructions and example

Now add the proto build instructions in `BUILD` file

1. Load the bazel definition for proto library and java proto library.

``` python

load("@rules_proto//proto:defs.bzl", "proto_library")
load("@rules_java//java:defs.bzl", "java_proto_library")

```

2. Convert proto definition and generate java code

``` python

proto_library(
    name = "label_proto",
    srcs = ["src/main/java/com/example/protos/label.proto"],
)

java_proto_library(
    name = "label_java_proto",
    deps = [":label_proto"],
)

```

3. Update the `deps` in `java_binary` function call to include the generate java code.

``` python
deps = [
        ":label_java_proto",
        "@maven//:com_squareup_okhttp3_okhttp",
    ],
```

3. Modify the `Sample.java` code to use generated Java class

``` java
package com.example;
import com.example.protos.Label;
import java.util.ArrayList;

public class Sample{
    public static void main(String[] args) {
        ArrayList<String> names = new ArrayList<String>();
        names.add("Adult!");
        names.add("Programmer");

        Label.Builder builder = Label.newBuilder();
        builder.addAllNames(names);
        Label label = builder.build();
        System.out.println(String.format("Hello: %s", label));

    }
}
```
4. Run the target.

``` shell
bazel run Sample
INFO: Analyzed target //:Sample (0 packages loaded, 0 targets configured).
INFO: Found 1 target...
Target //:Sample up-to-date:
  bazel-bin/Sample.jar
  bazel-bin/Sample
INFO: Elapsed time: 0.501s, Critical Path: 0.45s
INFO: 5 processes: 1 internal, 2 linux-sandbox, 2 worker.
INFO: Build completed successfully, 5 total actions
INFO: Build completed successfully, 5 total actions
Hello: names: "Adult!"
names: "Programmer"


```

### Common beginner mistakes

1. Using wrong function in BUILD and WORKSPACE.
2. Not loading relevant load functions or rules.
3. Missing out dependency in deps.


### Conclusion

There are a lot of more important concepts like `visibility, local dependency` that's skipped.
Another tutorial for another day.

Bazel is definitely confusing and powerful build system that can make you hate building the package.
In my opinion, learning bazel is like learning new programming language with step-learning curves.


### References

- [Bazel build for java](https://bazel.build/start/java)
- [Bazel http_archive](https://bazel.build/rules/lib/repo/http#http_archive)
- [Bazel Maven integration](https://bazel.build/migrate/maven)
- [OkHTTP Maven repository](https://mvnrepository.com/artifact/com.squareup.okhttp3/okhttp/4.10.0)
- [Bazel JVM rules](https://github.com/bazelbuild/rules_jvm_external)
- [Protobuf in Bazel](https://blog.bazel.build/2017/02/27/protocol-buffers.html)
- [Bazel Proto library](https://github.com/cgrushko/proto_library)
- [Bazel proto rules](https://github.com/bazelbuild/rules_proto/releases)
- [Repo with source code](https://github.com/kracekumar/jplay/tree/bazel-101)
- [Proto buf](https://developers.google.com/protocol-buffers/)

- [Notes from Tail Latency Aware Caching Paper by RobinHood](https://kracekumar.com/post/tail_latency_aware_caching/index.md): Notes from Tail Latency Aware Caching Paper by RobinHood
---
title: "Notes from Tail Latency Aware Caching Paper by RobinHood"
date: 2022-09-24T04:30:00+01:00
draft: false
tags: ["paper", "notes"]
---
**The problem**

![Application](/images/tail_latency/introduction_figure_1.png)

When the web service latency increases, the first suggested technique is to cache.
The cache is a good solution when your system is a read heavy system.

The common technique is to cache the frequently used objects.
The method generally reduces the latency, but doesn’t help much for tail latency (p99).
The paper  “Tail Latency Aware caching - Dynamically Reallocating from cache rich to cache poor” proposes a novel solution for maintaining low request tail latency.

![Latency of backends](/images/tail_latency/introduction_latency_backend_figure_2.png)

In multi-tier architecture, each service gets a cache resource and then
depending on the need to optimize the latency or throughput, the cache resources
can be increased or decreased. The main drawback of the method, the policy that
determines the cache resource is “static”. The static policies are mostly offline
and based on estimates that tend to solve 90% of users and not the tail end of the users.

**What’s the proposal?**

![Latency of backends](/images/tail_latency/proposal_figure_5.png)

The paper suggests a novel approach of creating the polices “dynamically” aimed
at solving the tail latency. The new policies are made on the fly in a fixed
time interval based on the latency contribution of each service in the previous cycle.
If `service A` contributes to 10% to tail latency, the service gets `X% of cache` resource and in the next cycle if `service A`
contributes to 5% to tail latency, service A can occupy Y% of caching resource.

The key idea behind RobinHood is to identify backend queries that are
responsible for high P99 request latency, which they call “cache-poor” backends.
RobinHood then shifts cache resources from the other “cache-rich” backends to the cache-poor backends.


**What’s new in the proposal?**

The design of the system focuses on solving tail latency (p99).
Rather than considering cache resource policy as static policy, the cache policies are dynamic adjusted based on statistics.


**Challenges**

- The latency of each system varies over time.
- Latency is not correlated with specific queries nor with query rate.
- Latency depends on request structure, which varies greatly

**How does it work?**

![Working](/images/tail_latency/working_figure_6.png)

Robinhood collects the response time for each request and filters the response time that falls between `P98.5 and P99.5`.
Next, the system collects each service response time for the request ID in the tail bucket.
Then the system counts each service’s contribution to tail latency.
This metric is called RBC (Request Blocking count). The systems with the higher RBC values are cache-poor systems.

Once cache poor systems are identified, the system can leverage the available cache resources.
The policy of cache is calculated every 5 minutes.
The basic RobinHood algorithm assumes that redistributed cache space is filled
immediately by each backend’s queries. In reality, some backends are slow to make use of
the additional cache space because their hit ratios are already high.

Each request statistics from the application server is forwarded to the RBC server.
Then the RBC server generates the new policy every five minutes and updates the cache controller.
Each application has one controller. The controller enforces the cache resource resizing.

**Implementation details**

- The RobinHood controller is a lightweight Python process.
- The RBC server and application servers are highly concurrent and implemented in Go.
- The caching layer is composed of off-the-shelf `memcached` instances, capable of
dynamic resizing via the memcached API. Each application server has a local cache with 32 GB cache capacity.
- On average, a request to the application server spawns 50 queries. A query is
first looked up in the local memcached instance; cache misses are then forwarded to the corresponding backend system.
- During the experiments, the average query rate of the system is 200,000 queries per second (over 500,000 peak).
- The experimental test bed consists of 16 application servers and 34 backend servers divided among 20 back- end services.
These components are deployed across 50 Microsoft Azure D16 v3 VMs.

**Evaluation**

![SLOs](/images/tail_latency/eval_figure_8.png)

The empirical evaluation of RobinHood focuses on five key questions. Throughout this section, the goal is to meet a P99 request latency Service Level Objective (SLO) of 150ms.
RobinHood brings SLO violations down to 0.3%, compared to 30% SLO violations under the next best policy.
For quickly increasing backend load imbalances, RobinHood maintains SLO violations below 1.5%, compared to 38% SLO violations under the next best policy.
RobinHood maintains less than 5% SLO violations, while other policies do significantly worse.
The best clairvoyant static allocation re- quires 73% more cache space in order to provide each backend with its maximum allocation under RobinHood.
RobinHood introduces negligible overhead on network, CPU, and memory usage.

![Six caching system result](/images/tail_latency/eval_table_2.png)

The evaluation compares the result to the existing two production systems (OneRF, TAO++) and three research caching systems (Cliffhgr++, FAIR++, LAMA++).

**Result**

RobinHood algorithm is capable of meeting a 150ms SLO for the OneRF workload even under challeng ing conditions where backends simultaneously become overloaded.
Many other systems, Facebook, Google, Amazon, and Wikipedia, use a similar multi-tier architecture where a request depends on many queries.
However, these other systems may have different optimization goals, more complex workloads, or slight variations in system architecture compared to OneRF.

**Conclusion**

RobinHood is also lightweight, scalable, and can be deployed on top of an off-the-shelf software stack.
The RobinHood caching system demonstrates how to effectively identify the root cause of P99 request latency in the presence of structured requests.

**Links:**
- Paper: https://www.usenix.org/system/files/osdi18-berger.pdf
- Summary from morning paper: https://blog.acolyer.org/2018/10/26/robinhood-tail-latency-aware-caching-dynamic-reallocation-from-cache-rich-to-cache-poor/
- Source Code: https://github.com/dasebe/robinhoodcache

- [Dia Duit Dublin, Bye Bengaluru](https://kracekumar.com/post/dia-duit-dublin-bye-bengaluru/index.md): Dia Duit Dublin, Bye Bengaluru
---
title: "Dia Duit Dublin, Bye Bengaluru"
date: 2022-08-05T12:13:15+00:00
draft: false
tags: ["Bengaluru", "Dublin", "Stripe", "Life"]
---

**Dia Duit Dublin, Bye Bengaluru**

```
TL;DR: After working for a decade and a year in Bengaluru, I decided to join Stripe in
Dublin, Ireland as a software engineer.

```

![Bangalore Collage](/images/dia-duit-dublin-bye-bengaluru/bangalore_collage.jpg)

When I was in final year of college, I had to choose job location preferences over Chennai and Bengaluru. I choose Bengaluru for two reasons - startups and weather. After working a decade and a year in Bengaluru over seven companies, I decided to leave the startup scene, city, and the country.

I enjoyed my stay in Bengaluru. I spent half a decade building the python community in Bengaluru (of course with immense support from everyone), and felt every second was a complete worth it. Furthermore, I made a lot of acquaintances and some friends. My world view changed every couple of years. I have puzzled at my own beliefs and opinions over time.

I’m happy to call Bengaluru my home and happy being a houseplant. Before leaving the city, I wrote  a [semi-satirical tweets](https://twitter.com/kracetheking/status/1546795418079883264?s=21&t=sVruBcsejUChvX0AVJ5A4g) how to survive in Bangalore. It kind of went viral.

<blockquote class="twitter-tweet" data-dnt="true" data-theme="dark"><p lang="en" dir="ltr">1. 5 tips to survive in <a href="https://twitter.com/hashtag/bangalore?src=hash&amp;ref_src=twsrc%5Etfw">#bangalore</a> for the next decade<br><br>Be a house plant. Cycle/walk for small purchases. Order clothes online. Call up the next door liquor shop for home delivery. Don’t expect Uber or bangalore traffic to get sorted in next 5 to 10 years.</p>&mdash; kracekumar (@kracetheking) <a href="https://twitter.com/kracetheking/status/1546795418079883264?ref_src=twsrc%5Etfw">July 12, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

```
You measure the distance in time on the road
Like life is measured in memories.

You measure the best day by
the quietness of the day and
sipping hot tea
yet looking for your cup of tea.

There are roads and gates you can pass through
But you will never belong there.
There are hands you can never hold
But you can cross the shadow.

There are times when clock moved backwards
And day is dark in the broad daylight,
smokes when nothing was lit
yet you kept moving forwards
you’re tall and short at the same time.
```

<br/>
Now I’m here in Dublin. A small (don’t mistake me, I’m stating based on population) island, 8500 KMS air distance away from Bengaluru. It’s a different kind of summer here and sun sets at 9:15 PM, all day chilly wind in the side of the canal! Never thought so to see sunlight after 7 PM, but heard of the northern lights!

![Dublin Collage](/images/dia-duit-dublin-bye-bengaluru/dublin_collage.jpg)

The main reason to choose Dublin is Stripe. I joined Stripe as a Software Engineer. I have never set my foot in Ireland before, yet I made the choice. After working a decade in startups, Stripe will be the first big company job.

It's been a week in the city and had an interesting encounter. I spotted a calibration problem in one of the weighing machine in the Fresh Food store.
The customers can choose their combinations from a wide range of items.
You fill the salad in the bowl and weigh in the table where you pick up. Once you're in the billing counter, the assistant weighs
the bowl, bills by the weight. I picked up 490 gms of salad and in the counter it weighed 518 gms.
The weight was off by 5%. The store manager acknowledged it but never offered any recourse 🤷‍♂️
[Here is the google map review about the incident](https://goo.gl/maps/GReUHrFVMUeCmMZPA).

It’s scary and anxious to relocate to a country you never visited, stayed, especially during the turbulent economic down turn and during a war between Russia and Ukraine. Everyday layoffs in the air. I’m scared. Anxious. Excited. Happy.

The home search is a nightmare in Dublin. Recently, [r/Dublin](https://www.reddit.com/r/Dublin/comments/vd4tlh/new_subreddit_rrentingindublin/) came up with a separate sub-reddit dedicated to discuss about rental scene in Dublin! I'm told, to find a home, one need to apply to atleast 100 ads to get a single viewing request. I had applied for a fifty ads in [https://daft.ie](daft.ie). The housing market is hot such that the listing that was created last night isn't visible the next day. You setup alerts in the site, try to reply in next 5 minutes else you're out of luck to get a reply or to find it in the site. Fingers crossed.

The folks over here are nice. The city is walkable, clean, and cyclable. The crowd is nothing(tiny compared to Indian street standards) compared to any
big cities in India.

In the uncertainty, only so far the weather app and Google Maps suggestions holds up all the time.

I don’t know a lot of things here. Haven’t even looked up the location of the nearest hospital or pharmacy.

**A song dedicated to Bengaluru**

<iframe width="560" height="315" src="https://www.youtube.com/embed/GLvohMXgcBo" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

**Dia Duit Dublin!**

- [Rafting Raft Workshop](https://kracekumar.com/post/rafting_raft/index.md): Rafting Raft Workshop
---
title: "Rafting Raft Workshop"
date: 2022-05-08T14:30:27+05:30
draft: false
Tags: ["Raft", "workshop"]
---

Last week, May 2-6, 2022, I attended [Rafting Raft](http://www.dabeaz.com/raft.html) workshop by [David Beazley](http://www.dabeaz.com/index.html).
The workshop focussed on building a raft implementation and it was intense and exhausting.

A few folks had asked me about the workshop and how it works. The post focuses on what happens before and during the workshop,
so future attendees can decide.

# Day -43

Someday when you're contemplating attending the workshop, you register on [the website](http://www.dabeaz.com/raft.html).
You get a confirmation email from David Beazley to confirm your availability to attend the workshop.

Email looks like

> Thanks for your interest in the upcoming Raft course.   I can confirm that space is available.  I simply need you to confirm your email by replying to this message.  Once I hear back, I will provide more details about logistics and other details.   Hopefully, I'll see you in the course!

# Day -30

You will receive a stripe link to pay the workshop fee. The workshop cost was $1250. Is it worth paying $1250 for 5 days workshop? It's subjective!

# Day -15

You will receive an email asking for a Github username for [collaboration](https://github.com/dabeaz-course) to a private repo.

# Day -7

Later you will be added to a Github repo with instructions to follow and preparation materials. The live discussion during the session happens in gitter channel.

# Day -1

Reminder about the workshop and materials to read up and programming warmup tasks - socket programming, concurrency primitives.
You introduce yourself in a Github thread.

# Day 0, -3:00:00

Reminder about the workshop.

# Day 0

The workshop started on India time, 8:00 PM, and went to till 4:00 AM in Zoom.

The first day was divided into a presentation session and an implementation session.
In the presentation session, Beazley walked us through the housekeeping tasks, - introduction, the structure of the workshop, and warm-up tasks.

In the implementation session, David started sharing his setup and started coding the tasks.
The first day was warmup tasks on `socket programming`, `concurrency primitives`, and `implementing a simple Key-Value server`
that will be used as an application for forthcoming raft implementation.

Another implementation session was focused on building a traffic system with two traffic lights and some constraints.

The first day was filled with the necessary concepts for next four days.

All the presentation sessions were recorded and uploaded to Vimeo by the end of the day.

These presentation sessions were lively and hands-on, the participants ask questions during the session.

The format

- Presentation/Discussion - 1 hour
- Project implementation - 2 hours
- Presentation/Discussion - 1 hour
- Break - 1 hour
- Implementation - 2 hour
- Discussion - 1 hour


At the end of the day, it was clear, what to expect and how you will go about implementing raft.

The entire implementation was split into multiple projects

- Project 1 - Key-Value Server
- Project 2 - Event Driven Programming
- Project 3 - Starting Out/Planning
- Project 4 - The Log
- Project 5 - Log Replication
- Project 6 - Consensus
- Project 7 - Leader Election
- Project 8 - Testing
- Project 9 - Simulation
- Project 10 - The System
- Project 11 - The Rest of Raft


# Day 1

The day started with the discussion on how to layout the project - different message types, components, timers, elections, log replications etc. How should we approach implementing the algorithm?
All the participants started building an outline of the project - `handlers, messages, enums, classes, etc...`

After the break, David discussed the topic of `log and log replication.`
He gave the rationale why this is core and the first and most important piece to implement.

Everyone pushed their code to a separate branch in a shared Github repository including David.
So it was easy to cross verify the implementation and look it up.

There was discussion on how to test the components, especially unit tests.

Everyone used Python and but you're free to use any language.

# Day 2

We're halfway through the workshop. At this point, more or less project structure, the core modules, and the style of programming are clear.
Now the use of event-driven programming and the simplicity of using queues to communicate is clear.

The main discussion of the day was `consensus` while applying the requests. Without consensus from the raft, the application cannot commit the changes. The implementation was focused on getting consensus working.

# Day 3

David started the day with TLA+ and he implemented a simple TLA+ for a traffic system that was implemented earlier.
He shared his experience with using TLA+ for distributed systems.

While looking at the [Raft TLA+](https://github.com/ongardie/raft.tla/blob/master/raft.tla), we noticed extra fields in the
RPC messages.

Then we continued implementation of the project on pending tasks in log replication and consensus.

After the break, David presented the leader election along with various cases and edge cases that aren't clear from the paper.
Then we continued to implement the leader election for the rest day.

# Day 4

On the last day, David discussed how to design the application to communicate with the raft server.
For all practical purposes, the raft was implemented like a module inside the Key-Value server.

Everyone agreed the code looked like a big ball of mud.
The complexity of designing the distributed system in 4 days with
perfect separation of concern and testable code is hard. The question of testing was the main point during every component implementation. Especially, how do we do integration tests?

We continued to implement the rest of the pending pieces in the project.
I was tired by the middle of the day and slept off :P

# Conclusion

- It was neatly organized and focused on implementing the raft **correctly**.
I stress correctly because it is easy to overlook certain portions and paper ends up implementing something close enough. How do you verify the correctness of implementation?

- David has done the workshop more than 10 times and focussing on one topic a day helped to
understand and implement.

- I haven't completed the entire raft implementation. Need to complete leader election edge cases and backtracking out-of-sync followers.

- The workshop was attended by 13 folks - the participants were from India, the UK, France, and the USA.

- The workshop format may change in the future, it's better to email all the doubts to David Beazley.

- It's a fun and intense workshop. Give it a try!

# Raft Links

- [Raft web page](https://raft.github.io/)
- [Raft TLA+](https://github.com/ongardie/raft.tla/blob/master/raft.tla)
- [Implementing Raft by Eli Bendersky](https://eli.thegreenplace.net/2020/implementing-raft-part-0-introduction/)
- [Chelsa Troy's Raft Tutorial](https://chelseatroy.com/tag/raft/)
- [Indradhanush Notes on Raft](https://indradhanush.github.io/blog/notes-on-raft/)

- [Profiling Django App](https://kracekumar.com/post/profiling_django/index.md): Profiling Django App
---
title: "Profiling Django App"
date: 2021-10-17T09:37:52+05:30
draft: false
tags: ["Python", "Django", "Profiling", "Perf"]
---

![](/images/profiling_django/tracing.jpeg)

# TL:DR

-  [Pyinstrument](https://pyinstrument.readthedocs.io/en/latest/home.html) is a call stack sampling profiler with low overhead to find out time spent in your Django application.
- [QueryCount](https://github.com/bradmontgomery/django-querycount) is a simplistic ORM query count middleware that counts the number of ORM queries, finds duplicate queries, and prints them in the terminal.
- [Django Silk](https://github.com/jazzband/django-silk) is an extensive Django profiler that records the entire execution, SQL queries, source of origin, and persists the recordings. The complete Django profiler.

# 🔬 What's Profiling? 🔬



[Profiling](https://en.wikipedia.org/wiki/Profiling_(computer_programming)) is a dynamic program analysis that measures a running program's time and(or) memory consumption. The profiler can instrument the entire running program or record samples for a fixed duration of time.

# 🚀 Why Profile Django code? 🚀

The Django application can consume more than expected memory or time. The blog post will focus on profiling performance. Some of the reasons why Django App can be slow

- ORM could be making [N+1 select queries](https://stackoverflow.com/questions/97197/what-is-the-n1-selects-problem-in-orm-object-relational-mapping). Example: The child model fetching parent models.
- The custom python code can be making DB queries inside a for-loop.
- The service or utility function may be making an HTTP or network call that consumes a lot of time.

# Application Setup

Here is a sample book counter application that contains three models - Book, Author, and Shelf. A shelf contains a list of books. A book has many authors.

## Models

```python
# book/models.py

from django.contrib.auth.models import User
from django.db import models


class Author(models.Model):
    first_name = models.CharField(max_length=255, blank=False, null=False)
    middle_name = models.CharField(max_length=255, blank=True, null=True,
                                   default='')
    last_name = models.CharField(max_length=255, blank=True, null=True,
                                 default='')
    profile_photo = models.URLField(default='', blank=True, null=True)


class Book(models.Model):
    name = models.CharField(max_length=255, blank=False, null=False)
    cover_photo = models.URLField(default='', blank=True, null=True)
    language = models.CharField(max_length=255, blank=True, null=False)
    
    
#shelf/models.py
from django.db import models

# Create your models here.
from book.models import Book


class Shelf(models.Model):
    name = models.CharField(max_length=255, unique=True,
                            db_index=True, null=False, blank=False)
    books = models.ManyToManyField(Book, related_name="shelves")


```



## Views

```python
#shelf/views.py
class ShelfViewSet(viewsets.ViewSet):
    http_method_names = ["get"]
    queryset = Shelf.objects.all()

    def list(self, request):
        name = request.GET.get('name')
        qs = self.queryset

        if name:
            qs = qs.filter(name=name)

        data = ShelfSerializer(qs, many=True).data
        return Response(data)

```



## Serializers

```python
#book/serializers.py
from rest_framework import serializers
from book.models import Author, Book


class AuthorSerializer(serializers.ModelSerializer):
    class Meta:
        model = Author
        fields = ("first_name", "middle_name", "last_name", "profile_photo")


class BookSerializer(serializers.ModelSerializer):
    authors = AuthorSerializer(many=True)

    class Meta:
        model = Book
        fields = ("name", "cover_photo", "language", "authors")
        
#shelf/serializers.py
from rest_framework import serializers
from shelf.models import Shelf
from book.serializers import BookSerializer
import time

classic_books = ['War And Peace', 'Lolita', 'The Stranger']


class ShelfSerializer(serializers.ModelSerializer):
    books = BookSerializer(many=True)

    class Meta:
        model = Shelf
        fields = ("books", "name")

    def to_representation(self, obj):
        book = super().to_representation(obj)
        if book['name'] in classic_books:
            book['is_classic'] = True
        else:
            book['is_classic'] = False
            time.sleep(1)
        return book

```



The application has an endpoint `/shelf/` that returns all the shelves along with books and authors using DRF model serializer.

# 🚆 Pyinstrument 🚆

![](https://raw.githubusercontent.com/joerick/pyinstrument/main/docs/img/screenshot.jpg)

[Pyinstrument](https://pyinstrument.readthedocs.io/en/latest/home.html) is a call stack Python profiler to find performance bottlenecks in your Python programs. It doesn't track every function call but samples the call stack every one millisecond. So it's a statistical profiler(not tracer), and it may lead to optimizing the wrong part of the program.

Since the profiler records the call stack, every fixed interval(can be configured to lower value too, say 1ns) has a low overhead. 

`Profiling time = program execution time + profiler recording time `

If the program takes `0.33s` to run, pyinstrument profiling takes 0.43s (30% extra) , cProfile takes 0.61s (84% extra). 

Pyinstrument is a Python profiler that offers a Django middleware to record the profiling.

## How to use it?

- Pyinstrument middleware records the request-response time. The middleware class goes into `MIDDLEWARE` list variable. `MIDDLEWARE+=[ 'pyinstrument.middleware.ProfilerMiddleware'].`
- The profiler stores the data in the configured directory mentioned in setting file. The profiler generates profile data for every request. The `PYINSTRUMENT_PROFILE_DIR` contains a directory that stores the profile data.

## Output

![](/images/profiling_django/pyinstrument_output.png)

- By default, pyinstrument produces HTML profile information.

- The right corner contains the metadata. 

- - SAMPLES - number of samples collected during the execution.
  - DURATION - Total duration of the profiling.

- The tree structure captures the flow of the code along with the time spent.

- Each line is a frame that contains the duration, function/method name, file path, and line number.

- data = ShelfSerializer(qs, many=True).data in the views, took 2.09 seconds.

- The profiler auto hides any standard library and third-party calls to provide a brief overview of the program call graph.

- From the profile, `sleep` function inside the serializer takes most of the time.

## Upside

- Fast and easy to use. 
- Requires no code modification other than configuring `settings variables`.
- Less overhead to measure performance.

## Downside

- The library can only collect Python code and not SQL queries executed during the request-response cycle.
- Since the library samples the call stack every one millisecond, there can be false positives. It's possible to configure the sample rate.

# 🧮 Django Query Count 🧮

![](https://raw.githubusercontent.com/bradmontgomery/django-querycount/61a380d98bc55e926c011367ecc2031102c3484c/screenshot.png)

[Django QueryCount](https://github.com/bradmontgomery/django-querycount) is a middleware that prints the number of database queries made during the request processing.

## How to use it?

- Add `querycount.middleware.QueryCountMiddleware` to the `MIDDLEWARE` settings variable.

- The middleware supports printing duplicate queries. Following are some extra settings options.

  ```python
  QUERYCOUNT = {
      'THRESHOLDS': {
          'MEDIUM': 50,
          'HIGH': 200,
          'MIN_TIME_TO_LOG':0,
          'MIN_QUERY_COUNT_TO_LOG':0
      },
      'IGNORE_REQUEST_PATTERNS': [],
      'IGNORE_SQL_PATTERNS': [],
      'DISPLAY_DUPLICATES': True,
      'RESPONSE_HEADER': 'X-DjangoQueryCount-Count'
  }
  ```

## Ouptut

```bash
# output in the server console
[17/Oct/2021 11:05:43] "GET /shelf/ HTTP/1.1" 200 1031

http://localhost:8000/shelf/
|------|-----------|----------|----------|----------|------------|
| Type | Database  |   Reads  |  Writes  |  Totals  | Duplicates |
|------|-----------|----------|----------|----------|------------|
| RESP |  default  |    8     |    0     |    8     |     1      |
|------|-----------|----------|----------|----------|------------|
Total queries: 8 in 2.0803s


Repeated 1 times.
SELECT "book_book"."id", "book_book"."name",
"book_book"."cover_photo", "book_book"."language" FROM "book_book"
INNER JOIN "shelf_shelf_books" ON ("book_book"."id" =
"shelf_shelf_books"."book_id") WHERE "shelf_shelf_books"."shelf_id" =
1


Repeated 1 times.
SELECT "shelf_shelf"."id", "shelf_shelf"."name" FROM "shelf_shelf"

# output in the client
$http http://localhost:8000/shelf/ --header
HTTP/1.1 200 OK
Allow: GET
Content-Length: 1031
Content-Type: application/json
Date: Sun, 17 Oct 2021 11:05:43 GMT
Referrer-Policy: same-origin
Server: WSGIServer/0.2 CPython/3.9.7
Vary: Accept, Cookie
X-Content-Type-Options: nosniff
X-DjangoQueryCount-Count: 8
X-Frame-Options: DENY
```



- The middleware records the total number of SQL queries made and prints it in a tabular form in the terminal. 
- The table contains the total number of reads, total number of writes, total queries, and total duplicates. The number of duplicates is a good indicator of N + 1 select queries.
- When DISPLAY_DUPLICATES is True, the middleware prints the duplicate query.

## Upside

- Simple to use.
- Fast and accurate.
- Low overhead to find out the SQL queries.

## Downside

- Simple, hence hard to find the origin of the SQL queries.
- No option to print all the SQL queries.

# 🔥 Django Silk 🔥

![](https://raw.githubusercontent.com/jazzband/django-silk/master/screenshots/1.png)

>  [Silk](https://github.com/jazzband/django-silk) is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before presenting them in a user interface for further inspection.

## Installation

```bash
pip install git+https://github.com/jazzband/django-silk.git#egg=django-silk
```

**The latest PyPI version doesn't support Django 3.2.**



## How to use it?

- Add `'silk.middleware.SilkyMiddleware'` to `MIDDLEWARE` settings variable.
- Add `silk` to `INSTALLED_APPS` settings variable.
- Add silk endpoints to url patterns. `urlpatterns += [url(r'^silk/', include('silk.urls', namespace='silk'))]`.
- Run `migrate` to create tables. `python manage.py migrate`.
- Run `collectstatic` to collect static files, `python manage.py collectstatic`.

## Output

![](/images/profiling_django/silk_dashboard.png)

- `/silk/` endpoint is the dashboard and lists all the recorded requests.
- The quick view under `Most Time Overall` shows, the `response status, total response time, total time spent in queries, and total number of queries`. 

![](/images/profiling_django/silk_details.png)

- Clicking on the request profile takes to the detail view. The detail view displays `request headers details, response headers, and details on how to use the client to connect to the endpoint`.

![](/images/profiling_django/silk_sql.png)

- The SQL tab displays all the queries made during the request-response cycle. 
- Clicking on the single query displays the `query plan and Python traceback` to find the source of origin. `SILKY_ANALYZE_QUERIES` variable controls the feature to analyze the SQL query. 

![](/images/profiling_django/silk_sql_detail.png)



Silk also provides option to generate Python profile code for a view or any Python function using a decorator or a context manager. Below is the modified view.

```python
from silk.profiling.profiler import silk_profile

class ShelfViewSet(viewsets.ViewSet):
    http_method_names = ["get"]
    queryset = Shelf.objects.all()


    @silk_profile(name='Shelf list')
    def list(self, request):
        name = request.GET.get('name')
        qs = self.queryset

        if name:
            qs = qs.filter(name=name)

        data = ShelfSerializer(qs, many=True).data
        return Response(data)

```



![](/images/profiling_django/silk_profile_graph.png)

- Profile tab displays the call graph and Python traceback. The page also shows the link to download ` profile file(.prof). [snakeviz](https://jiffyclub.github.io/snakeviz/) is an excellent tool to visualize the profile data.
- Below is the profile visualized using snakeviz.

![](/images/profiling_django/snakeviz_graph.png)



- Snakeviz graph is visually pleasing and gives a breakdown of the function call based on time spent.

## Other features

- Dynamic Profiling - Profile specific function or class during runtime.

  ```python
  SILKY_DYNAMIC_PROFILING = [{
      'module': 'shelf.views',
      'function': 'ShelfViewSet.list'
  }] 
  ```

- Silk can also display the time taken by the silk profiler. The setting variable `SILKY_META` controls it. When set to True, the dashboard displays the time taken by silk in red color in the display card.

## Upside

- Exhaustive profiling option and completeness.
- UI dashboard to visualise and browse the profiling data.
- Storing all the data in the database helps to analyze performance improvement on the same endpoint over time.
- Highly customizable and extensible to profile.

## Downside

- The library is not well maintained, there are a lot of open issues, and the PyPI release doesn't support Django 3.2 version.
- A bit of a learning curve and a lot of configuration.
- It can be slow on slow endpoints since it stores all the details in the database.

# Conclusion



Performance is a feature that lets the user of the application perform certain types of operations quicker. The profiling tools help developers to measure the application's performance and make decisions accordingly.

There are other Django/Python profilers, but these three profilers help measure and understand the Django application performance. Use these tools to make your Django application faster.



<blockquote class="twitter-tweet" data-theme="dark"><p lang="en" dir="ltr">1. 🚀Profiling Django App 🚀- <a href="https://t.co/EifIAiW8ba">https://t.co/EifIAiW8ba</a>. In this blog post, I discuss three tools. <a href="https://twitter.com/hashtag/Django?src=hash&amp;ref_src=twsrc%5Etfw">#Django</a> <a href="https://twitter.com/hashtag/Python?src=hash&amp;ref_src=twsrc%5Etfw">#Python</a></p>&mdash; kracekumar || கிரேஸ்குமார் (@kracetheking) <a href="https://twitter.com/kracetheking/status/1449813820151513088?ref_src=twsrc%5Etfw">October 17, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

# Image Credits

- Foot Print Photo by [Martin Widenka](https://unsplash.com/@widenka?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/foot-print?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText).  
- Pyinstrument cover image is from the source repository.
- Query Count cover image is from the repository.
- Django Silk cover image is from the repository.

- [HTTPie and Print HTTP Request](https://kracekumar.com/post/print_http_request/index.md): HTTPie and Print HTTP Request
---

title: "HTTPie and Print HTTP Request"
date: 2021-10-11T00:03:27+05:30
draft: false
Tags: ["Python", "HTTP", "CLI", "HTTPie"]

---

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/HTTP_logo.svg/2880px-HTTP_logo.svg.png">



[HTTPie](https://httpie.io/cli) is a command-line utility for making HTTP requests with more straightforward syntax(controversial, I agree). The interesting feature is `--offline` flag which prints HTTP raw request text. The client sends the HTTP request to the server, and the server responds to the request. It's an alternate to [curl](https://curl.se/).



## HTTP Syntax



[HTTP Flow and syntrax from Wikipedia.](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)

>  A client sends *request messages* to the server, which consist of
>
>  - a request line, consisting of the case-sensitive request method, a [space](https://en.wikipedia.org/wiki/Space_(punctuation)), the request target, another space, the protocol version, a [carriage return](https://en.wikipedia.org/wiki/Carriage_return), and a [line feed](https://en.wikipedia.org/wiki/Line_feed) (e.g. *GET /images/logo.png HTTP/1.1*)
>  - zero or more [request header fields](https://en.wikipedia.org/wiki/HTTP_request_header_field), each consisting of the case-insensitive field name, a colon, optional leading [whitespace](https://en.wikipedia.org/wiki/Whitespace_(computer_science)), the field value, and optional trailing whitespace (e.g. *Accept-Language: en*), and ending with a carriage return and a line feed.
>  - an empty line, consisting of a carriage return and a line feed;
>  - an optional [message body](https://en.wikipedia.org/wiki/HTTP_message_body).
>  - In the HTTP/1.1 protocol, all header fields except *Host* are optional.
>  - A request line containing only the path name is accepted by servers to maintain compatibility with HTTP clients before the HTTP/1.0 specification in [RFC](https://en.wikipedia.org/wiki/RFC_(identifier)) [1945](https://datatracker.ietf.org/doc/html/rfc1945).

Throughout the post, I'll use `--offline` feature to understand how the HTTP request structure looks for educational purposes.



## 🉑 Accept only JSON response data 🉑



- HTTPie uses `:` to separate header key and values in the terminal. `Accept:application/json`.

- `Accept` [header tells the server, the client accepts only specific MIME types, here `JSON`. ](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept)

```bash
$http --offline httpbin.org/get Accept:application/json
GET /get HTTP/1.1
Accept: application/json
Accept-Encoding: gzip, deflate
Connection: keep-alive
Host: httpbin.org
User-Agent: HTTPie/2.5.0
# Sample request sent to the server
$ http httpbin.org/get Accept:application/json
HTTP/1.1 200 OK
Access-Control-Allow-Credentials: true
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 310
Content-Type: application/json
Date: Sun, 10 Oct 2021 07:21:40 GMT
Server: gunicorn/19.9.0

{
    "args": {},
    "headers": {
        "Accept": "application/json",
        "Accept-Encoding": "gzip, deflate",
        "Host": "httpbin.org",
        "User-Agent": "HTTPie/2.5.0",
        "X-Amzn-Trace-Id": "Root=1-61629484-3b25a3631e2a89bf60f2600e"
    },
    "origin": "xxx.xxx.xxx.xxx",
    "url": "http://httpbin.org/get"
}
```

- You can pass more than one value in the `Accept` header seperated by comma.

- The response from the server contains `Content-Type` as JSON. The server can choose to ignore the `Accept` header.



## 📚 Request the Tamil language version of duckduckgo 📚

- `Accept-Language` [header instructs the web server to deliver particluar language version of the page.](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language)
- Here `Accept-Language` is set to `ta`. `ta` is [ISO 639-1 code.](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)

```bash
$http --offline  https://duckduckgo.com Accept-Language:ta
GET / HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Accept-Language: ta
Connection: keep-alive
Host: duckduckgo.com
User-Agent: HTTPie/2.5.0

$http  https://duckduckgo.com Accept-Language:ta
HTTP/1.1 200 OK
Cache-Control: no-cache
Connection: keep-alive
Content-Encoding: gzip
Content-Security-Policy: default-src 'none' ; connect-src  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ https://duck.co ; manifest-src  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; media-src  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; script-src blob:  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ 'unsafe-inline' 'unsafe-eval' ; font-src data:  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; img-src data:  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; style-src  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ 'unsafe-inline' ; object-src 'none' ; worker-src blob: ; child-src blob:  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; frame-src blob:  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ ; form-action  https://duckduckgo.com https://*.duckduckgo.com https://3g2upl4pq6kufc4m.onion/ https://duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion/ https://spreadprivacy.com/ https://duck.co ; frame-ancestors 'self' ; base-uri 'self' ; block-all-mixed-content ;
Content-Type: text/html; charset=UTF-8
Date: Sat, 09 Oct 2021 21:44:33 GMT
ETag: W/"6161a338-16b8"
Expect-CT: max-age=0
Expires: Sat, 09 Oct 2021 21:44:32 GMT
Permissions-Policy: interest-cohort=()
Referrer-Policy: origin
Server: nginx
Strict-Transport-Security: max-age=31536000
Transfer-Encoding: chunked
Vary: Accept-Encoding
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1;mode=block

<!DOCTYPE html>
...
<body id="pg-index" class="page-index body--home">
	<script type="text/javascript" src="/tl5.js"></script>
<script type="text/javascript" src="/lib/l123.js"></script>
<script type="text/javascript" src="/locale/ta_IN/duckduckgo50.js"></script>
<script type="text/javascript" src="/util/u588.js"></script>
<script type="text/javascript" src="/d3012.js"></script>
...
<noscript>
    <div class="tag-home">
        <div class="tag-home__wrapper">
            <div class="tag-home__item">
                Privacy, simplified&period;
                <span class="hide--screen-xs"><a href="/about" class="tag-home__link"> மேலும் கற்க</a></span>
            </div>
        </div>
    </div>
</noscript>
...
</html>
```

- The header can carry more than one value separated by a comma.
- When more than one value is present, an extra parameter `q=0.5` represents the weightage among the values. Example: `Accept-Language: ta,fr;q=0.50`.
- One of the link description is in Tamil, `மேலும் கற்க`.

## 🔑 Post Authorization token as part of login 🔑

- `Authorization` [header can be used to provide credentials to authenticate the user](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization).
- Example: `Authorization:"Bearer my-dear-darkness"`.
- Because of space between Bearer and the value quotes is mandatory. 

```bash
$http --offline POST httpbin.org/auth Authorization:"Bearer my-dear-darkness"
POST /auth HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Authorization: Bearer my-dear-darkness
Connection: keep-alive
Content-Length: 0
Host: httpbin.org
User-Agent: HTTPie/2.5.0

```



## 🍪 Send Cookie as part of a request 🍪

- The server can send HTTP Cookie as part of the response, and the client can send the Cookies in subsequent requests.
- `Cookie` [header carries the cookies previously sent by the user.](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization)
- Semi-colon separates multiple cookie pairs.
- Example: `"name=not-unique;value=23"`

```bash
$http --offline http://pie.dev/cookies Cookie:"name=not-unique;value=23"
GET /cookies HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Cookie: name=not-unique;value=23
Host: dev.pie
User-Agent: HTTPie/2.5.0
$http http://pie.dev/cookies Cookie:"name=not-unique;value=23"
HTTP/1.1 200 OK
CF-Cache-Status: DYNAMIC
CF-RAY: 69be1b984c743c0c-BLR
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Sun, 10 Oct 2021 07:24:13 GMT
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=2Zt9wQcKQYStY5dPukoEXFwn7K6pKNaVTiKljZ5h9oL3mcQZj0khYq8Kmzo8PmQinPncStZeHASetQqHzCODe0wbrljPEIJxCWGdRWMbry9rWOG%2FBheDvJs7"}],"group":"cf-nel","max_age":604800}
Server: cloudflare
Transfer-Encoding: chunked
access-control-allow-credentials: true
access-control-allow-origin: *
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

{
    "cookies": {
        "name": "not-unique",
        "value": "23"
    }
}
```



## ✉️ Send JSON Request ✉️

- HTTPie supports sending form-encoded values or JSON values. `=` sign indiciates JSON key-value pair.
- `=` is useful for primitive values like `number, string, null, boolean`.
- Example: `lang-py`

```bash
$http --offline PUT http://pie.dev/put lang=py version=3.10
PUT /put HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 33
Content-Type: application/json
Host: pie.dev
User-Agent: HTTPie/2.5.0

{
    "lang": "py",
    "version": "3.10"
}
$HTTP/1.1 200 OK
CF-Cache-Status: DYNAMIC
CF-RAY: 69be1e017fb53c12-BLR
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Sun, 10 Oct 2021 07:25:52 GMT
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=GMW5PQHMwjxpVUeDG7uWo6M%2FiKLMzmQLd1e5BIG3AXljmuQgwCP9nzrFPdaidR2wL14eisfiViDhumDHpepgNB6yIrsVKXRybHa5tRqmH7lUDoQdRqCK1Ijg"}],"group":"cf-nel","max_age":604800}
Server: cloudflare
Transfer-Encoding: chunked
access-control-allow-credentials: true
access-control-allow-origin: *
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

{
    "args": {},
    "data": "{\"lang\": \"py\", \"version\": \"3.10\"}",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "application/json, */*;q=0.5",
        "Accept-Encoding": "gzip",
        "Cdn-Loop": "cloudflare",
        "Cf-Connecting-Ip": "49.207.222.139",
        "Cf-Ipcountry": "IN",
        "Cf-Ray": "69be1e017fb53c12-FRA",
        "Cf-Visitor": "{\"scheme\":\"http\"}",
        "Connection": "Keep-Alive",
        "Content-Length": "33",
        "Content-Type": "application/json",
        "Host": "pie.dev",
        "User-Agent": "HTTPie/2.5.0"
    },
    "json": {
        "lang": "py",
        "version": "3.10"
    },
    "origin": "xxx.xxx.xxx.xxxx",
    "url": "http://pie.dev/put"
}

```



## ✉️ Send non-primitive JSON values ✉️

- `:=` means the JSON value is non-primitve values like `array/list and dictionary`.
- Single quote carries the value.
- Example: `os:='["GNU/Linux", "Mac OSX"]'`.

```bash
$http --offline  PUT http://pie.dev/put lang=py version=3.10 os:='["GNU/Linux", "Mac OSX"]'
PUT /put HTTP/1.1
Accept: application/json, */*;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 65
Content-Type: application/json
Host: pie.dev
User-Agent: HTTPie/2.5.0

{
    "lang": "py",
    "os": [
        "GNU/Linux",
        "Mac OSX"
    ],
    "version": "3.10"
}
$http  PUT http://pie.dev/put lang=py version=3.10 os:='["GNU/Linux", "Mac OSX"]'
HTTP/1.1 200 OK
CF-Cache-Status: DYNAMIC
CF-RAY: 69be224ce97c3c0c-BLR
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Sun, 10 Oct 2021 07:28:48 GMT
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=dgY0R%2FqomBbumKs0IVB7GjPR2chhHJ9wVKAzW33IOnvZ%2Fs4l2LYnvKKeXo6Xhd162AYKGzyIrK4pNrYdH8SEs1NvGmYfJqEDIPmfUOELqpC6HK9iP1zIENa0"}],"group":"cf-nel","max_age":604800}
Server: cloudflare
Transfer-Encoding: chunked
access-control-allow-credentials: true
access-control-allow-origin: *
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

{
    "args": {},
    "data": "{\"lang\": \"py\", \"version\": \"3.10\", \"os\": [\"GNU/Linux\", \"Mac OSX\"]}",
    "files": {},
    "form": {},
    "headers": {
        "Accept": "application/json, */*;q=0.5",
        "Accept-Encoding": "gzip",
        "Cdn-Loop": "cloudflare",
        "Cf-Connecting-Ip": "49.207.222.139",
        "Cf-Ipcountry": "IN",
        "Cf-Ray": "69be224ce97c3c0c-FRA",
        "Cf-Visitor": "{\"scheme\":\"http\"}",
        "Connection": "Keep-Alive",
        "Content-Length": "65",
        "Content-Type": "application/json",
        "Host": "pie.dev",
        "User-Agent": "HTTPie/2.5.0"
    },
    "json": {
        "lang": "py",
        "os": [
            "GNU/Linux",
            "Mac OSX"
        ],
        "version": "3.10"
    },
    "origin": "xx.xxx.xxx.xxx",
    "url": "http://pie.dev/put"
}

```



## 📤 Upload files 📤



- `-f` flag indicates the data is form-encoded values.
- `@` symbol indicates the value is a file. 
- Example: `cv@hello-world.txt`. `cv` form field name.

```bash
$cat hello-world.txt
hello-world
$http --offline -f POST pie.dev/post name='Krace' cv@hello-world.txt
POST /post HTTP/1.1
Accept: */*
Accept-Encoding: gzip, deflate
Connection: keep-alive
Content-Length: 277
Content-Type: multipart/form-data; boundary=183f70d3da41432d95bcd839e2cc20e2
Host: pie.dev
User-Agent: HTTPie/2.5.0

--183f70d3da41432d95bcd839e2cc20e2
Content-Disposition: form-data; name="name"

Krace
--183f70d3da41432d95bcd839e2cc20e2
Content-Disposition: form-data; name="cv"; filename="hello-world.txt"
Content-Type: text/plain

hello-world

--183f70d3da41432d95bcd839e2cc20e2--
$ http -f POST pie.dev/post name='Krace' cv@hello-world.txt
HTTP/1.1 200 OK
CF-Cache-Status: DYNAMIC
CF-RAY: 69be2564a8fc3c0c-BLR
Connection: keep-alive
Content-Encoding: gzip
Content-Type: application/json
Date: Sun, 10 Oct 2021 07:30:54 GMT
NEL: {"success_fraction":0,"report_to":"cf-nel","max_age":604800}
Report-To: {"endpoints":[{"url":"https:\/\/a.nel.cloudflare.com\/report\/v3?s=yNaTSZJ7ouYH%2FSJZw6LIR5vNl6KNTFeDKF1u8V60abb3ClKLzdOj0zkchcIAWqTvZDbxXm5MnffDkCLdqviMQuAo7DqFA2GH%2Bm%2FiZ6sH90oOr0HyFGAuS4Gp"}],"group":"cf-nel","max_age":604800}
Server: cloudflare
Transfer-Encoding: chunked
access-control-allow-credentials: true
access-control-allow-origin: *
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400, h3-28=":443"; ma=86400, h3-27=":443"; ma=86400

{
    "args": {},
    "data": "",
    "files": {
        "cv": "hello-world\n"
    },
    "form": {
        "name": "Krace"
    },
    "headers": {
        "Accept": "*/*",
        "Accept-Encoding": "gzip",
        "Cdn-Loop": "cloudflare",
        "Cf-Connecting-Ip": "49.207.222.139",
        "Cf-Ipcountry": "IN",
        "Cf-Ray": "69be2564a8fc3c0c-FRA",
        "Cf-Visitor": "{\"scheme\":\"http\"}",
        "Connection": "Keep-Alive",
        "Content-Length": "277",
        "Content-Type": "multipart/form-data; boundary=252c6fcd1dcc40e09de54958660d672d",
        "Host": "pie.dev",
        "User-Agent": "HTTPie/2.5.0"
    },
    "json": null,
    "origin": "xxx.xxx.xxx.xxxx",
    "url": "http://pie.dev/post"
}

```



[Explore more in HTTPie website.](https://httpie.io/)

# 🐈 How to verify the the generated request works? 🐈



[Netcat](https://en.wikipedia.org/wiki/Netcat) is a utility for sending and receiving data in network connection using TCP or UDP. Netcat take hostname, port, and body as arguments and sends it to the server and displays the response.

```bash
$http --offline  PUT http://httpbin.org/put lang=py version=3.10 os:='["GNU/Linux", "Mac OSX"]' | nc httpbin.org 80
HTTP/1.1 200 OK
Date: Sun, 10 Oct 2021 08:27:58 GMT
Content-Type: application/json
Content-Length: 631
Connection: keep-alive
Server: gunicorn/19.9.0
Access-Control-Allow-Origin: *
Access-Control-Allow-Credentials: true

{
  "args": {},
  "data": "{\"lang\": \"py\", \"version\": \"3.10\", \"os\": [\"GNU/Linux\", \"Mac OSX\"]}",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "application/json, */*;q=0.5",
    "Accept-Encoding": "gzip, deflate",
    "Content-Length": "65",
    "Content-Type": "application/json",
    "Host": "httpbin.org",
    "User-Agent": "HTTPie/2.5.0",
    "X-Amzn-Trace-Id": "Root=1-6162a40e-34b9a83f40868b4a73e8fa09"
  },
  "json": {
    "lang": "py",
    "os": [
      "GNU/Linux",
      "Mac OSX"
    ],
    "version": "3.10"
  },
  "origin": "xxx.xxx.xxx.xxx",
  "url": "http://httpbin.org/put"
}
```



Netcat doesn't support encrypted network connections.

<blockquote class="twitter-tweet" data-dnt="true" data-theme="dark"><p lang="en" dir="ltr">❓How to understand what HTTP request the client sends to the server?<br>💡Use HTTPie `--offline` feature to print the HTTP request.<br><br>Details on how to send various HTTPie options in the blog post - <a href="https://t.co/mIEG2FpBkP">https://t.co/mIEG2FpBkP</a><a href="https://twitter.com/hashtag/TIL?src=hash&amp;ref_src=twsrc%5Etfw">#TIL</a> <a href="https://twitter.com/hashtag/Python?src=hash&amp;ref_src=twsrc%5Etfw">#Python</a> <a href="https://twitter.com/hashtag/CLI?src=hash&amp;ref_src=twsrc%5Etfw">#CLI</a> <a href="https://t.co/kBvfc9QQAt">pic.twitter.com/kBvfc9QQAt</a></p>&mdash; kracekumar || கிரேஸ்குமார் (@kracetheking) <a href="https://twitter.com/kracetheking/status/1447289495234105345?ref_src=twsrc%5Etfw">October 10, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

# References

- HTTPie - https://httpie.io/
- HTTPBin - http://httpbin.org
- NetCat - https://en.wikipedia.org/wiki/Netcat
- HTTP MDN Docs: https://developer.mozilla.org/en-US/docs/Web/HTTP
- HTTP Header Image: https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/HTTP_logo.svg/2880px-HTTP_logo.svg.png
- [Type Check Your Django Application](https://kracekumar.com/post/type_check_your_django_app/index.md): Type Check Your Django Application
---
title: "Type Check Your Django Application"
date: 2021-09-18T18:03:27+05:30
draft: false
Tags: ["Python", "Django", "type-hints", "gradual-typing", "pycon-india", "talk"]
---

Recently, I gave a talk, **Type Check your Django app** at two conferences - [Euro Python 2021](https://ep2021.europython.eu/talks/BsaKGk4-type-check-your-django-app/) and [PyCon India 2021](https://in.pycon.org/cfp/2021/proposals/type-check-your-django-app~ejRql/). The talk was about adding Python [gradual typing](http://mypy-lang.org/) to Django using third-party package [Django-stubs](https://github.com/TypedDjango/django-stubs) focussed heavily around [Django Models](https://docs.djangoproject.com/en/3.2/topics/db/models/). The blog post is the write-up of the talk. Here is the [unofficial link recorded video of the PyCon India talk](https://youtu.be/efs3RXaLJ4I).

<iframe width="560" height="315" src="https://www.youtube.com/embed/efs3RXaLJ4I" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>



Here is the link to [PyCon India Slides](https://slides.com/kracekumar/type-hints-in-django/fullscreen). The slides to [Euro Python Talk](https://slides.com/kracekumar/type-hints-in-django-euro-pycon-2021) (both slides are similar).

<iframe src="https://slides.com/kracekumar/type-hints-in-django/embed" width="576" height="420" scrolling="no" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>



# Gradual Typing

![](/images/type_check_django_app/highway.jpeg)

Photo by [John Lockwood](https://unsplash.com/@justjohnl?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/city-junction?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)



Python from the 3.5 version onwards started supported optional static typing or gradual typing. Some parts of the source code contain type annotation, and some parts may have no type annotation. The python interpreter doesn't complain about the lack of type hints. The third-party library mypy does the type check.

Throughout the post, the source code example follows Python 3.8+ syntax and Django version 3.2. By default, static type checker refers to mypy, even though there are other type checkers like pyre from Facebook and pylance/pyright from Microsoft.

# Types

![](/images/type_check_django_app/fruits.jpeg)

Photo by [Alexander Schimmeck](https://unsplash.com/@alschim?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/many-objects?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

## Types at Run Time

```python
>>> type(lambda x: x)
<class 'function'>
>>> type(type)
<class 'type'>
>>> type(23)
<class 'int'>
>>> type(("127.0.0.1", 8000))
<class 'tuple'>
```

Python's in-built function [type](https://docs.python.org/3/library/functions.html#type) returns the type of the argument. When the argument is `("127.0.0.1", 8000)`  the function returns type as tuple.

```python
>>>from django.contrib.auth.models import User
>>>type(User.objects.filter(
	email='not-found@example.com'))
django.db.models.query.QuerySet

```

On a Django filter method result, type functions returns the type as `django.db.models.query.QuerySet`.

## Types at Static Checker Time

```python
addr = "127.0.0.1"
port = 8000

reveal_type((addr, port))

```

Similar to the `type` function, the static type checker provides `reveal_type` function returns the type of the argument during static type checker time. The function is not present during Python runtime but is part of mypy.

```python
$mypy filename.py
note: Revealed type is
  'Tuple[builtins.str, builtins.int]'
```



The reveal_type returns the type of the tuple as `Tuple[builtins.str, builtins.int]`. **The reveal_type function also returns the type of tuple elements. In contrast, the type function returns the object type at the first level.**



```python
# filename.py
from django.contrib.auth.models import User

reveal_type(User.objects.filter(
  email='not-found@example.com'))

```



```python
$ mypy filename.py
note: Revealed type is
  'django.contrib.auth.models.UserManager
  [django.contrib.auth.models.User]'


```

Similarly, on the result of Django's User object's filter method, `reveal_type` returns the type as `UserManager[User]`. Mypy is interested in the type of objects at all levels.

# Mypy config

```ini
# mypy.ini
exclude = "[a-zA-Z_]+.migrations.|[a-zA-Z_]+.tests.|[a-zA-Z_]+.testing."

allow_redefinition = false

plugins =
  mypy_django_plugin.main,

[mypy.plugins.django-stubs]
django_settings_module = "yourapp.settings"
```



The [Django project](https://code.djangoproject.com/ticket/29299) does not contain type annotation in the source code and not in road map. Mypy needs information to infer the Django source code types. The mypy configuration needs to know the Django Stubs' entry point and the Django project's settings. Pass Django stub plugins to plugins variable and settings file location of the Django project to Django stubs plugin as `django_settings_module` variable in `mypy.plugins.django-stubs`.

# Annotation Syntax

![](/images/type_check_django_app/leather-jacket.jpeg)

Photo by [Lea Øchel](https://unsplash.com/@lealea_leaa?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/jacket?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

```python
from datetime import date

# Example variable annotation
lang: str = "Python"
year: date = date(1989, 2, 1)

# Example annotation on input arguments
# and return values
def sum(a: int, b: int) -> int:
  return a + b

class Person:
  # Class/instance method annotation
  def __init__(self, name: str, age: int,
               is_alive: bool):
    self.name = name
    self.age = age
    self.is_alive = is_alive
```



Type annotation can happen in three places.

1. During variable declaration/definition. Example: `lang: str = "Python"`. The grammar is `name: <type> = <value>`.
2. The function declaration with input arguments and return value types annotated. `sum(a: int, b: int) -> int`. The function `sum` input arguments annotation looks similar to variable annotation. The return value annotation syntax, `->` arrow mark followed by `return value type.` In sum function definition, it's `-> int`.
3. The method declaration. The syntax is similar to annotating a function. The `self` or `class` argument needs no annotation since mypy understand the semantics of the declaration. Except `__init__` method, when the function, method does return value, the explicit annotation should be `-> None` .

# Annotating Django Code



<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/HTTP_logo.svg/2880px-HTTP_logo.svg.png">

# Views

Django supports `class-based views` and `function-based views`. Since function and method annotations are similar, the example will focus on function-based views.

```python
from django.http import (HttpRequest, HttpResponse,
			HttpResponseNotFound)

def index(request: HttpRequest) -> HttpResponse:
    return HttpResponse("hello world!")


```

The view function takes in a `HttpRequest` and returns a `HttpResponse`. The annotating view function is straightforward after importing relevant classes from `django.http` module.

```python
def view_404(request:
               HttpRequest) -> HttpResponseNotFound:
    return HttpResponseNotFound(
      '<h1>Page not found</h1>')

def view_404(request: HttpRequest) -> HttpResponse:
    return HttpResponseNotFound(
      '<h1>Page not found</h1>')


# bad - not precise and not useful
def view_404(request: HttpRequest) -> object:
	return HttpResponseNotFound(
      '<h1>Page not found</h1>')
```

Here is another view function, `view_404`. The function returns `HttpResponseFound`- Http Status code 404. The return value annotation can take three possible values -  `HttpResponseNotFound, HttpResponse, object`. The mypy accepts all three annotations as valid.

### Why and How? MRO

[Method Resolution Order(mro)](https://www.python.org/download/releases/2.3/mro/) is the linearization of multi-inheritance parent classes. To know the mro of a class, each class has a mro method.

```python
>>>HttpResponse.mro()
[django.http.response.HttpResponse,
 django.http.response.HttpResponseBase,
 object]


>>>HttpResponseNotFound.mro()
[django.http.response.HttpResponseNotFound,
 django.http.response.HttpResponse,
 django.http.response.HttpResponseBase,
 object]
```



`HTTPResponseNotFound` inherits `HttpResponse`, `HTTPResponse` inherits `HttpResponseBase`, `HttpResponseBase` inherits `objects`.

### LSP - Liskov substitution principle

[Liskov substitution principle](https://en.wikipedia.org/wiki/Liskov_substitution_principle) states that in an object-oriented program, substituting a superclass object reference with an object of any of its subclasses, the program should not break.

`HTTPResponseNotFound` is a special class of `HTTPResponse` and `object`; hence mypy doesn't complain about the type mismatch.

## Django Models

![](/images/type_check_django_app/store-box.jpeg)

Photo by [Jan Antonin Kolar](https://unsplash.com/@jankolar?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/database?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

## Create

```python
from django.db import models
from django.utils import timezone


class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField("date published")


def create_question(question_text: str) -> Question:
    qs = Question(question_text=question_text,
                  pub_date=timezone.now())
    qs.save()
    return qs

```



`Question` is a Django model with two explicit fields: `question_text of CharField` and `pub_date of DateTimeField` . `create_question` is a simple function that takes in `question_text` as an argument and returns `Question` instance.

When the function returns an object, the return annotation should be the class's reference or the class's name as a string.

### Read

```python

def get_question(question_text: str) -> Question:
    return Question.objects.filter(
      question_text=question_text).first()
```

`get_question` takes a string as an argument and filters the Question model, and returns the first instance.

```python
error: Incompatible return value type
(got "Optional[Any]", expected "Question")
```

Mypy is unhappy about the return type annotation. The type checker says the return value can be None or Question instance. But the annotation is Question.

### Two solutions

```python
from typing import Optional

def get_question(question_text: str) -> Optional[Question]:
    return Question.objects.filter(
      question_text=question_text).first()

```



1. Annotate the return type to specify None value.
2. Typing module contains an `Optional` type, which means None. The return value Optional[Question] means None type or Question type.

```python
# mypy.ini
strict_optional = False

def get_question(question_text: str) -> Question:
    return Question.objects.filter(
      	question_text=question_text).first()

```



By default, mypy runs in `strict mode`. `strict_optional` variable instructs mypy to ignore None type in the annotations(in the return value, in the variable assignment, ...). There are a lot of such [config variables mypy](https://mypy.readthedocs.io/en/stable/config_file.html#none-and-optional-handling) to run in the lenient mode.

The lenient configs values can help to get type coverage quicker.

### Filter method

```python
In [8]: Question.objects.all()
Out[8]: <QuerySet [<Question: Question object (1)>,
                   <Question: Question object (2)>]>

In [9]: Question.objects.filter()
Out[9]: <QuerySet [<Question: Question object (1)>,
                   <Question: Question object (2)>]>
```



Django object manager filter method returns a QuerySet and is iterable. All bulk read, and filter operations return Queryset.  QuerySet carries the same model instances. It's a box type.

![](/images/type_check_django_app/plate-cup-cake.jpeg)

 Photo by [Alexandra Kusper](https://unsplash.com/@alexandrakusper?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/box-with-cup-cake?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

```python

def filter_question(text: str) -> QuerySet[Question]:
    return Question.objects.filter(
      text__startswith=text)

def exclude_question(text: str) -> QuerySet[Question]:
    return Question.objects.exclude(
      text__startswith=text)


```



Other object manager methods that return queryset are `all, reverse, order_by, distinct, select_for_update, prefetch_related, ...`

### Aggregate

```python

class Publisher(models.Model):
    name = models.CharField(max_length=300)

class Book(models.Model):
    name = models.CharField(max_length=300)
    pages = models.IntegerField()
    # use integer field in production
    price = models.DecimalField(max_digits=10,
                                decimal_places=2)
    rating = models.FloatField()
    publisher = models.ForeignKey(Publisher)
    pubdate = models.DateField()

```



The aggregate query is a way of summarizing the data to get a high-level understanding of the data. `Publisher` model stores the data of the book publisher with `name` as an explicit character field.

The `Book` model contains six explicit model fields.

- name - Character Field of maximum length 300
- `pages` - Integer Field
- `price` - Decimal Field
- rating - Decimal Field of maximum 10 digits and minimum 2 decimal digits
- publisher - Foreign Key to Publisher Field
- Pubdate - Date Field



```python
>>>def get_avg_price():
      return Book.objects.all().aggregate(
      	avg_price=Avg("price"))

>>>print(get_avg_price())
{'avg_price': Decimal('276.666666666667')}
```



The function `get_avg_price` returns the average price of all the books. avg_price is a Django query expression in the aggregate method. From the `get_avg_price` function output, the output value is a dictionary.

```python
from decimal import Decimal


def get_avg_price() -> dict[str, Decimal]:
    return Book.objects.all().aggregate(
      avg_price=Avg("price"))

```



Type annotation is simple here. The return value is a `dictionary`. `dict[str, Decimal]` is the return type annotation. The first type of argument(`str`) in the dict specification is the dictionary's key's type. The second type of argument(`Decimal`) is the value of the key, `Decimal`.



### Annotate Method

![](/images/type_check_django_app/records.jpeg)

Photo by [Joseph Pearson](https://unsplash.com/@josephtpearson?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/collections/47883473/old-things?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

[From Django doc's on annotate queryset method](https://docs.djangoproject.com/en/3.2/ref/models/querysets/#annotate)

> Annotates each object in the `QuerySet` with the provided list of [query expressions](https://docs.djangoproject.com/en/3.2/ref/models/expressions/). An expression may be a simple value, a reference to a field on the model (or any related models), or an aggregate expression (averages, sums, etc.) that has been computed over the objects that are related to the objects in the `QuerySet`.

```python
def count_by_publisher():
    return Publisher.objects.annotate(
      num_books=Count("book"))


def print_pub(num_books=0):
    if num_books > 0:
        res = count_by_publisher().filter(
          num_books__gt=num_books)
    else:
        res = count_by_publisher()
    for item in res:
        print(item.name, item.num_books)

```

The `count_by_publisher` function counts the books published by the publisher. The `print_pub function` filters the publisher count based on the num_book function argument and prints the result.

```python
>>># after importing the function
>>>print_pub()
Penguin 2
vintage 1
```



`print_pub` prints publication house name and their books count. Next is adding an annotation to both the function.

```python
from typing import TypedDict
from collections.abc import Iterable

class PublishedBookCount(TypedDict):
    name: str
    num_books: int

def count_by_publisher() ->
	Iterable[PublishedBookCount]:
  ...
```

 `count_by_publisher` returns more than one value, and the result is iterable. `TypedDict` is useful when the dictionary contents keys are known in advance. The attribute names of the class are the key names(should be a string), and the value type is an annotation to the key. `count_by_publisher` 's annotation is `Iterable[PublishedBookCount]`.

```bash
$# mypy output
scratch.py:46: error: Incompatible return value
    type (got "QuerySet[Any]", expected
"Iterable[PublishedBookCount]")
        return Publisher.objects.annotate(
          num_books=Count("book"))
               ^
scratch.py:51: error:
      "Iterable[PublishedBookCount]" has no attribute "filter"
       res = count_by_publisher().filter(
         num_books__gt=num_books)
```



The mypy found out two errors.

1. error: Incompatible return value type (got "QuerySet[Any]", expected "Iterable[PublishedBookCount]")

   Mypy says the `.annotate` method returns `QuerySet[Any]` whereas annotation says return type as  `Iterable[PublishedBookCount]`.

2. "Iterable[PublishedBookCount]" has no attribute "filter"

   `print_pub` uses return value from `count_by_publisher` to filter the values. Since the return value is iterable and the filter method is missing, mypy complains.

### How to fix these two errors?

```python
def count_by_publisher() -> QuerySet[Publisher]:
   ...

def print_pub(num_books: int=0) -> None:
    ...
    for item in res:
        print(item.name, item.num_books)
```



Modify the return value annotation for `count_by_publisher` to `QuerySet[Publisher]` as suggested by mypy. Now the first error is fixed, but some other error.

```bash
# mypy output
$mypy scratch.py
scratch.py:55: error: "Publisher" has
      no attribute "num_books"
      print(item.name, item.num_books)
```

Django dynamically adds the `num_books` attribute to the return QuerySet. The publisher model has one explicitly declared attribute name, and `num_books` is nowhere declared, and mypy is complaining.

[This was a bug in Django Stubs project and got fixed recently.](https://github.com/typeddjango/django-stubs/pull/398) The newer version of Django stubs provides a nice way to annotate the function.

### Option 1 - Recommended

```python
from django_stubs_ext import WithAnnotations

class TypedPublisher(TypedDict):
  	num_books: int

def count_by_publisher() -> WithAnnotations[Publisher, TypedPublisher]:
		...
```

`WithAnnotation` takes two argument `the model` and `TypedDict` with on the fly fields.

### Option 2 - Good solution

Another solution is to create a new model `TypedPublisher` inside `TYPE_CHECKING` block, which is only visible to mypy during static type-checking time. The `TypedPublisher` inherits `Publisher` model and declares the `num_books ` attribute as Django field, Then mypy will not complain about the missing attribute.

```python
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    class TypedPublisher(Publisher):
        num_books = models.IntegerField()

        class meta:
            abstract = True


def count_by_publisher() -> QuerySet[TypedPublisher]:
    return Publisher.objects.annotate(
      num_books=Count("book"))


def print_pub(num_books: int=0) -> None:
    if num_books > 0:
        res = count_by_publisher().filter(
          num_books__gt=num_books)
    else:
        res = count_by_publisher()
    for item in res:
        print(item.name, item.num_books)

```

The earlier solution is elegant and works with simple data-types, which group by/annotate returns.

# Tools

![](/images/type_check_django_app/toolboard.jpeg)

Board Photo by [Nina Mercado](https://unsplash.com/@nina_mercado?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/tool-box?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

It's hard to start annotation when the project has a significant amount of code because of the surface area and topics to learn. Except for Django ORM, most of the custom code in the project will be Python-specific data flow.

## Pyannotate

[Pyannotate](https://github.com/dropbox/pyannotate) is a tool to auto-generate type hints for a given Python project. Pyannotate captures the static types during execution code and writes to an annotation file. [Pytest-annotate](https://pypi.org/project/pytest-annotate/) is a pytest plugin to infer types during test time. In the anecdotal micro-benchmark, because of pytest-annotate, the tests take 2X time to complete.

### Phase 0 - Preparation

```python
from django.http import (HttpResponse,
		HttpResponseNotFound)

# Create your views here.
# annotate the return value
def index(request):
    return HttpResponse("hello world!")

def view_404_0(request):
    return HttpResponseNotFound(
      '<h1>Page not found</h1>')

```

Here is a simple python file with no type annotations.

```python
from polls.views import *
from django.test import RequestFactory

def test_index():
    request_factory = RequestFactory()
    request = request_factory.post('/index')
    index(request)


def test_view_404_0():
    request_factory = RequestFactory()
    request = request_factory.post('/404')
    view_404_0(request)
```

Then add relevant test cases for the files.

### Phase 1 - Invoking Pyannotate

```bash
$DJANGO_SETTINGS_MODULE="mysite.settings" PYTHONPATH='.' poetry run pytest -sv polls/tests.py --annotate-output=./annotations.json
```

While running the pytest pass extra option, `--annotate-ouput` to store the inferred annotations.

### Phase 2 - Apply the annotations

```json
$cat annotations.json
[...
    {
        "path": "polls/views.py",
        "line": 7,
        "func_name": "index",
        "type_comments": [
            "(django.core.handlers.wsgi.WSGIRequest) ->
          django.http.response.HttpResponse"
        ],
        "samples": 1
    },
    {
        "path": "polls/views.py",
        "line": 10,
        "func_name": "view_404_0",
        "type_comments": [
            "(django.core.handlers.wsgi.WSGIRequest) ->
          django.http.response.HttpResponseNotFound"
        ],
        "samples": 1
    }
]
```



After running the test, `annotations.json` file contains the inferred annotations.

```bash
$poetry run pyannotate --type-info ./annotations.json -w polls/views.py --py3
```

Now, apply the annotations from the `annotations.json` to the source code in `pools/views.py`. `--py3` flag indicates, the type-annotations should follow Python 3 syntax.

```python
from django.http import HttpResponse, HttpResponseNotFound
from django.core.handlers.wsgi import WSGIRequest
from django.http.response import HttpResponse
from django.http.response import HttpResponseNotFound

def index(request: WSGIRequest) -> HttpResponse:
    return HttpResponse("hello world!")

def view_404_0(request: WSGIRequest) -> HttpResponseNotFound:
    return HttpResponseNotFound('<h1>Page not found</h1>')

```



After applying the annotations, the file contains the available annotations and required imports.

One major shortcoming of pyannotate is `types` at test time, and runtime can be different. Example: Dummy Email Provider. That's what happened in the current case. Django tests don't use HTTPRequest, and the tests use `WSGIRequest` the request argument type annotation is WSGIRequest.

For edge cases like these, `pyannotate` is better(run Django server as part of pyannotate) and infers the type correctly.

## Python Typing Koans

![](/images/type_check_django_app/knot.jpeg)

 Photo by [John Lockwood](https://unsplash.com/@justjohnl?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/city-junction?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

Demo

<script id="asciicast-419119" src="https://asciinema.org/a/419119.js" async></script>

[Python Typing Koans repository](https://github.com/kracekumar/python-typing-koans) contains the standalone python programs to learn gradual typing in Python. The programs contain partial or no type hints. The learner will understand how type-checkers evaluate the types by adding type hints and fixing the existing type hints error.

![](https://kracekumar.com/images/python_typing_koans/python_callables.png)

The project contains koans for Python, Django, and Django Rest Framework. By removing the errors in each file, the learner will understand the typing concepts.

![](/images/type_check_django_app/koan_output.png)

[The detailed write up about the project in the blog post.](https://kracekumar.com/post/python-typing-koans/)

# Conclusion

**Disclaimer**: Gradual Typing is evolving and not complete yet. For example, it's still hard to annotate decorators(python 3.10 release should make it easier), so it's hard to annotate all dynamic behaviors. Adding type-hints to a project comes with its own cost, and not all projects would need it.

I hope you learned about Python Django, and if you're using type-hints, I'd like to hear about it.

If you're struggling with type-hints in your projects or need some advice I'll be happy to offer. Shoot me an email!

[Found a bug or typo, and have spare time to fix, send a PR and the file is here!](https://gitlab.com/kracekumar/kracekumar.gitlab.io/-/blob/master/content/post/type_check_your_django_app.md)

### References

1. Euro PyCon - https://ep2021.europython.eu/talks/BsaKGk4-type-check-your-django-app/
2. PyCon India - https://in.pycon.org/cfp/2021/proposals/type-check-your-django-app~ejRql/
3. Mypy - http://mypy-lang.org/
4. Django Stub - https://github.com/TypedDjango/django-stubs
5. Django Models - https://docs.djangoproject.com/en/3.2/topics/db/models/
6. Video recording - https://youtu.be/efs3RXaLJ4I
7. PyCon India Slides - https://slides.com/kracekumar/type-hints-in-django/fullscreen
8. LSP - https://en.wikipedia.org/wiki/Liskov_substitution_principle
9. Method Resolution Order - https://www.python.org/download/releases/2.3/mro/
10. Mypy Config variables - https://mypy.readthedocs.io/en/stable/config_file.html#none-and-optional-handling
11. Django Stubs Annotate fix - https://github.com/typeddjango/django-stubs/pull/398
12. Pyannotate - https://github.com/dropbox/pyannotate
13. Pytest-annotate - https://pypi.org/project/pytest-annotate/
14. Python Typing Koans - https://github.com/kracekumar/python-typing-koans
15. Python Typing Koans blog post - https://kracekumar.com/post/python-typing-koans

### Images References

- Fruits Photo by [Alexander Schimmeck](https://unsplash.com/@alschim?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/many-objects?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Highway Photo by [John Lockwood](https://unsplash.com/@justjohnl?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/city-junction?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Leather Jacket Photo by [Lea Øchel](https://unsplash.com/@lealea_leaa?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/jacket?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Cup cake Photo by [Alexandra Kusper](https://unsplash.com/@alexandrakusper?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/box-with-cup-cake?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

- Tool Board Photo by [Nina Mercado](https://unsplash.com/@nina_mercado?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/tool-box?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Store Photo by [Jan Antonin Kolar](https://unsplash.com/@jankolar?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/database?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Knot Photo by [John Lockwood](https://unsplash.com/@justjohnl?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/s/photos/city-junction?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
- Records Photo by [Joseph Pearson](https://unsplash.com/@josephtpearson?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) on [Unsplash](https://unsplash.com/collections/47883473/old-things?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)

# Discussions

1. Lobste.rs - https://lobste.rs/s/exvuuc/type_check_your_django_application
2. Hacker News - https://news.ycombinator.com/item?id=28640033
3. Reddit - [r/python](https://www.reddit.com/r/Python/comments/pufsg7/type_check_your_django_app/) and [r/Django](https://www.reddit.com/r/django/comments/pufsxm/type_check_your_django_application/)
4. Twitter Thread - https://twitter.com/kracetheking/status/1441329460754595846

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">1/3. Blog Post of <a href="https://twitter.com/pyconindia?ref_src=twsrc%5Etfw">@pyconindia</a> and <a href="https://twitter.com/europython?ref_src=twsrc%5Etfw">@europython</a> talk, Type Check your Django App is out. <a href="https://t.co/hAWhBljSYD">https://t.co/hAWhBljSYD</a> <a href="https://twitter.com/hashtag/Python?src=hash&amp;ref_src=twsrc%5Etfw">#Python</a> <a href="https://twitter.com/hashtag/Django?src=hash&amp;ref_src=twsrc%5Etfw">#Django</a></p>&mdash; kracekumar || கிரேஸ்குமார் (@kracetheking) <a href="https://twitter.com/kracetheking/status/1441329460754595846?ref_src=twsrc%5Etfw">September 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

**Notes:**

1. Some the django stubs bugs mentioned were captured during prepartion of the talk, while you're reading the blog post bug might be fixed.

- [Pulse Plus](https://kracekumar.com/post/pulse-plus/index.md): Pulse Plus
---
title: "Pulse Plus"
date: 2021-09-06T01:37:00+05:30
draft: false
Tags: ["CLI", "pet-project", "PhonePe"]
---

[PhonePe](https://twitter.com/PhonePe_/status/1434054060148084736 )  recently released Pulse repo from their [payment data](https://github.com/PhonePe/pulse). It was hard to get an overview of the data without doing some data transformation.

The data is eight levels deep, nested, and multiple files for similar purpose data. Hard to do any command-line aggregate queries for data exploration. 

![](/images/pulse/pulse_data_tree.png)

It's hard to do any analysis with 2000+ files. So I created an [SQLite](https://github.com/kracekumar/pulse-plus/blob/main/data/v1/pulse.db ) database of the [data using python sqlite-utils](). 

The SQLite database aggregated data and top data in 5 tables - aggregated_user, aggregated_user_device, aggregated_transaction, top_user, top_transaction. Link to the schema - https://github.com/kracekumar/pulse-plus#all-tables-schema.

`python pulse/cli.py ../pulse/data --output pulse.db` creates the SQLite file from the pulse repo data.

The same five tables are available as five CSV files in [data/v1/ sub-directory of the repo](https://github.com/kracekumar/pulse-plus/tree/main/data/v1). [All aggregated transaction CSV file](https://github.com/kracekumar/pulse-plus/blob/main/data/v1/aggregated_transaction.csv).

![](/images/pulse/csv_file_list.png)

The data is flat(now) in CSV files and SQLite files, easy to explore in notebooks, metabase, or any data exploration tools. If you're comfortable with sql, analyze it using the [datasette tool](https://datasette.io/). 

![](/images/pulse/datasette_example.png)

## High-level data quality observations. 

- There is no currency unit in any of the datasets for the amount field. 🤦Is the transaction represented in rupee or paisa? E.g.: Transaction data

- Amount field is a float field with arbitrary precision(poor JSON conversion). Example: 6611459.8729725825. Typically representation for the money is integer or decimal(float in JSON) with two-digit precision. What do ten digits after decimal represent?

  ![](/images/pulse/float_precision.png)

- In some datasets, "from" and "to" date information is available(transaction) and missing in others(user_device). The only reliable way is to get dates is from the directory and file location.

- Two entries in top transactions for the state Ladakh by pin codes have no name - Pincodes are missing.

  ![](/images/pulse/ladakh_pincode_missing_2019.png)

  ![](/images/pulse/ladakh_pincode_missing_2020.png)

Releasing datasets should be simple keep users(data scientists, analysts) in minds.

Tweet Thread

<blockquote class="twitter-tweet"><p lang="en" dir="ltr">1. <a href="https://twitter.com/PhonePe_?ref_src=twsrc%5Etfw">@PhonePe_</a> recently released Pulse data from their payment data. It was hard to get an overview of the data without doing some data transformation. Here is a thread about data format, transformation, and feedback about data quality. <a href="https://t.co/7QP0RwnL1p">https://t.co/7QP0RwnL1p</a> 🧵</p>&mdash; kracekumar || கிரேஸ்குமார் (@kracetheking) <a href="https://twitter.com/kracetheking/status/1434613848347082756?ref_src=twsrc%5Etfw">September 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>

## References:

- Pulse Repo: https://github.com/PhonePe/pulse
- Pulse Announcement Tweet: https://twitter.com/PhonePe_/status/1434054060148084736
- Pulse Plus Repo: https://github.com/kracekumar/pulse-plus
- Pulse SQLite DB: https://github.com/kracekumar/pulse-plus/blob/main/data/v1/pulse.db
- Datasette: https://datasette.io/

- [TIL - A new site](https://kracekumar.com/post/til/index.md): TIL - A new site
---
title: "TIL - A new site"
date: 2021-08-28T23:15:00+05:30
draft: false
tags: ["TIL"]
---

Quite often, as a programmer, I learn something new. Some are utilitarian; some are philosophical; some are opinions in programming. I want to document these learnings for later use and also to remember. So I'm starting a new site, [til.kracekumar.com](https://til.kracekumar.com), to demonstrate this learning. So far, there are [six posts](til.kracekumar.com).

The inspiration comes from [Simon Willson's TIL website](https://til.simonwillison.net). 

![](/images/til.png)

## Why the new site?

Two reasons

1. I'm planning to write often; sometimes, the post will fit in just two tweets.
2. I don't want the existing followers of the site to see a lot of small and new content.

In case you'd to follow the learnings and tips, [you can subscribe to RSS feed](https://til.kracekumar.com/index.xml).

## Website Setup

- The site uses a static site generator, [Hugo](https://gohugo.io/).
- The repo resides in [Gitlab](https://gitlab.com/kracekumar/til).
- The analytics and visitor tracking using [plausible.io](https://plausible.io/).
- The custom website dark theme, [photophobia](https://github.com/kracekumar/photophobia) forked from [setsevireon/photophobia](https://github.com/setsevireon/photophobia).

## Links

- TIL: https://til.kracekumar.com
- Simon Willson's TIL: https://til.simonwillison.net
- RSS feed: https://til.kracekumar.com/index.xml
- Hugo: https://gohugo.io/
- Gitlab Repo: https://gitlab.com/kracekumar/til
- Plausible: https://plausible.io/
- Photophobia theme: https://github.com/setsevireon/photophobia

- [Python Typing Koans](https://kracekumar.com/post/python-typing-koans/index.md): Python Typing Koans
---
title: "Python Typing Koans"
date: 2021-06-27T16:00:00+05:30
draft: false
tags: ["python", "Django", "gradual-typing", "type-hints"]
---

Python 3 introduced type annotation syntax. [PEP 484](https://www.python.org/dev/peps/pep-0484/) introduced a provisional module to provide these standard definitions and tools, along with some conventions for situations where annotations are not available.

Python is a dynamic language and follows gradual typing. When a static type checker runs a python code, the type checker considers code without type hints as `Any`.

```
def print_name(name):
       print(name)

planet: str = "earth"
```

In the above example, the `name` argument type hint will be `Any` since the type hint is missing while the type hint for the `planet` variable is a string.

Gradual typing is still an emerging topic in Python, and there is a gap in resources to educate the Python developers about the utility and Python typing concepts.

On the surface, it looks easy to annotate the code. But the dynamic nature makes a certain part of the code harder to annotate. I have been using type-hints over the past three years and find it hard sometimes. A lot of new developers also face the same problem.

## Koans

To make the learning easier, simpler, I have created a [GitHub repository](https://github.com/kracekumar/python-typing-koans).

The repository contains the standalone python programs. The programs contain partial or no type hints. By adding new type hints and fixing the existing type hints, the learner will understand how type-checkers evaluate the types and what's a difference in types at run-time.

Here is a simple demo to use of the command line.

[![asciicast](https://asciinema.org/a/419119.svg)](https://asciinema.org/a/419119)

## Steps

1. Clone the repository. `git clone git@github.com:kracekumar/python-typing-koans.git`
2. Install all the dependencies(advised to use Python Poetry, virtual env should also work.). `poetry install`. It requires Python 3.9.4
3. List all the koans using the command line program. `poetry run python cli.py list`
4. Pick up a file to learn.
5. Run the file with the command line program. `poetry run python cli.py one koans/py/100-easy-variable-wrong-type.py`
6. Repeat the process till there are no type errors.

*One central missing part is how the learner will know to fix the type errors?*

The comments in the files carry the links to relevant concepts, which aids the learner in understanding the ideas to use.

Screenshots of a few koans

![Python Callables](/images/python_typing_koans/python_callables.png)

![Easy Protocol](/images/python_typing_koans/protocol.png)

![Medium Protocol](/images/python_typing_koans/python_protocol_medium.png)


## Topics

Python topics covered are
- Primitive Types
- dictionaries - dict/typedict
- Callables
- Design pattern - factory pattern, the builder pattern
- Decorators
- Type Alias
- Protocol, classes, objects

20 Python programs(koans) help the learner to understand gradual typing. The filenames indicate the learning level like `easy, medium, and hard.`

The repository also contains Django and Django Rest Framework examples.

The Django koans teach the annotating `views, models, model methods, queryset methods like filter, all, annotate, aggregate, Q object` etc..

![Django Annotate](/images/python_typing_koans/django_annotate_aggregate.png)

![Django Queryset](/images/python_typing_koans/django_queryset.png)


The DRF koans teach how to annotate  `DRF serializers and DRF Views`.


![DRF View](/images/python_typing_koans/drf_view.png)

![DRF Serializer](/images/python_typing_koans/drf_serializer.png)


If you face any issues while solving the koans, please open an issue in the Github repository; I'd happy to answer and explain the relevant concepts.

## Links

- PEP 484 - https://www.python.org/dev/peps/pep-0484/
- Github Repository - https://github.com/kracekumar/python-typing-koans
- Python typing documentation - https://docs.python.org/3/library/typing.html

- [Model Field - Django ORM Working - Part 2](https://kracekumar.com/post/django_model_fields_orm_part2/index.md): Model Field - Django ORM Working - Part 2
---
title: "Model Field - Django ORM Working - Part 2"
date: 2021-05-31T01:30:00+05:30
draft: false
tags: ["python", "Django", "series"]
---


The last post covered [the structure of Django Model](https://kracekumar.com/post/structure_django_orm_working_part1/). This post covers how the model field works, what are the some important methods and functionality and properties of the field.

[Object-Relational Mapper](https://en.wikipedia.org/wiki/Object%E2%80%93relational_mapping) is a technique of declaring, querying the database tables using Object relationship in the programming language. Here is a sample model declaration in Django.


```python

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')
```

Each class inherits from `models.Model` becomes a table inside the SQL database unless explicitly marked as abstract. The `Question` model becomes `<app_name>_question` table in the database. `question_text` and `pub_date` become columns in the table. The properties of the each field are declared by instantiating the respective class. Below is the method resolution order for `CharField`.

```python
In [5]: models.CharField.mro()
Out[5]:
[django.db.models.fields.CharField,
 django.db.models.fields.Field,
 django.db.models.query_utils.RegisterLookupMixin,
 object]
```



`CharField` inherits `Field` and `Field` inherits `RegisterLookUpMixin`.



## High-level role of Field class



1. The role of field class is to map type of the field to SQL database type.

2. Serialization - to convert the Python object into relevant SQL database value.
3. DeSerialization - to convert the SQL database value into Python object.
4. Check declared validations at the field level and built-in checks before serializing the data. For example, in a `PositiveIntegerField` the value should be greater than zero - built-in constraint.



## Structure of Field Class



```python
# Find out all the classes inheriting the Field

In [7]: models.Field.__subclasses__()
Out[7]:
[django.db.models.fields.BooleanField,
 django.db.models.fields.CharField,
 django.db.models.fields.DateField,
 django.db.models.fields.DecimalField,
 django.db.models.fields.DurationField,
 django.db.models.fields.FilePathField,
 django.db.models.fields.FloatField,
 django.db.models.fields.IntegerField,
 django.db.models.fields.IPAddressField,
 django.db.models.fields.GenericIPAddressField,
 django.db.models.fields.TextField,
 django.db.models.fields.TimeField,
 django.db.models.fields.BinaryField,
 django.db.models.fields.UUIDField,
 django.db.models.fields.json.JSONField,
 django.db.models.fields.files.FileField,
 django.db.models.fields.related.RelatedField,
 django.contrib.postgres.search.SearchVectorField,
 django.contrib.postgres.search.SearchQueryField,
 fernet_fields.fields.EncryptedField,
 enumchoicefield.fields.EnumChoiceField,
 django.contrib.postgres.fields.array.ArrayField,
 django.contrib.postgres.fields.hstore.HStoreField,
 django.contrib.postgres.fields.ranges.RangeField]
```



Here `fernet_fields` is a third-party library which implements the `EncryptedField` by inheriting the `Field` class. Also these are high level fields. For example, Django implements other high-level fields which inherit the above fields.

For example, `EmailField` inherits the `CharField`.

```python
In [10]: models.CharField.__subclasses__()
Out[10]:
[django.db.models.fields.CommaSeparatedIntegerField,
 django.db.models.fields.EmailField,
 django.db.models.fields.SlugField,
 django.db.models.fields.URLField,
 django.contrib.postgres.fields.citext.CICharField,
 django_extensions.db.fields.RandomCharField,
 django_extensions.db.fields.ShortUUIDField]
```

Here is the `Field` class initializer signature


```python
In [11]: models.Field?
Init signature:
models.Field(
    verbose_name=None,
    name=None,
    primary_key=False,
    max_length=None,
    unique=False,
    blank=False,
    null=False,
    db_index=False,
    rel=None,
    default=<class 'django.db.models.fields.NOT_PROVIDED'>,
    editable=True,
    serialize=True,
    unique_for_date=None,
    unique_for_month=None,
    unique_for_year=None,
    choices=None,
    help_text='',
    db_column=None,
    db_tablespace=None,
    auto_created=False,
    validators=(),
    error_messages=None,
)
```



The `Field` initializer contains 22 arguments. Most of the arguments are related to SQL database column properties and rest of the arguments are for Django admin and model forms.

For example, Django provides Admin interface to browse the database records and allows you to edit. `blank` parameter determines whether the field is required while filling up data in the admin interface and custom form. `help_text` field is used while display the form.

The most commonly used fields are `max_length, unique, blank, null, db_index, validators, default, auto_created`. `null` attribute is a boolean type when set to `True`, the allows null value while saving to the database. `db_index=True` created a `B-Tree` index on the column.  `default` attribute stores the default value passed on to the database, when the value for the field is missing.

`validators` attribute contains list of validators passed on by the user and Django's internal validators. The function of the validator is to determine the value is valid or not.  For example, in our `question_text` field declaration `max_length` is set to `200`. When the field value is greater than 200, Django raises `ValidationError`.  `max_length` attribute is useful only for text field and `MaxLengthValidator` will be missing in non-text fields.

```python
In [29]: from django.core.exceptions import ValidationError

In [30]: def allow_odd_validator(value):
    ...:     if value % 2 == 0:
    ...:         raise ValidationError(f'{value} is not odd number')
    ...:

In [31]: int_field = models.IntegerField(validators=[allow_odd_validator])

In [32]: int_field.validators
Out[32]:
[<function __main__.allow_odd_validator(value)>,
 <django.core.validators.MinValueValidator at 0x1305fdac0>,
 <django.core.validators.MaxValueValidator at 0x1305fda30>]

In [33]: # let's look into question_text field validators

In [38]: question_text.validators
Out[38]: [<django.core.validators.MaxLengthValidator at 0x12e767fa0>]
```



As long as the validator function or custom class doesn't raise exception, the value is considered as `valid`.

The details of each field can found in the [Django model field reference](https://docs.djangoproject.com/en/3.2/ref/models/fields/)

## Field Methods



```python
In [41]: import inspect

In [44]: len(inspect.getmembers(models.Field, predicate=inspect.isfunction))
Out[44]: 59

In [45]: len(inspect.getmembers(models.Field, predicate=inspect.ismethod))
Out[45]: 6
```



The `Field` class consists of (along with inherited ones) 65 methods. Let's look at some of the important ones.

### to_python

`to_python` method is responsible to convert the value passed on to the model  during intialization. For example, `to_python` for `IntegerField` will convert the value to Python integer. The original value could be `string` or `float`. Every field will override `to_python` method. Here is an example of `to_python` method invocation on an `IntegerField`.

```python
In [46]: int_field.to_python
Out[46]: <bound method IntegerField.to_python of <django.db.models.fields.IntegerField>>

In [47]: int_field.to_python('23')
Out[47]: 23

In [48]: int_field.to_python(23)
Out[48]: 23

In [49]: int_field.to_python(23.56)
Out[49]: 23
```



### get_db_prep_value

`get_db_prep_value` method is responsible to convert Python value to SQL database specific value. Each field may have a different implementation depending on field type. For example, `Postgres` has a native `UUID` type, whereas in `SQLite` and `MySQL` Django uses `varchar(32)`. Here is the implementation for `get_db_prep_value` from `UUIDField`.

```python
def get_db_prep_value(self, value, connection, prepared=False):
	if value is None:
  	return None
  if not isinstance(value, uuid.UUID):
    value = self.to_python(value)

  if connection.features.has_native_uuid_field:
  	return value
  return value.hex
```

 `connection` is a Database Connection or Wrapper object of underlying database. Below is an example output from a `Postgres Connection` and `SQLite Connection` for uuid field check.

```python
In [50]: from django.db import connection
    ...:

In [51]: connection
Out[51]: <django.utils.connection.ConnectionProxy at 0x10e3c8970>

In [52]: connection.features
Out[52]: <django.db.backends.postgresql.features.DatabaseFeatures at 0x1236a6a00>

In [53]: connection.features.has_native_uuid_field
Out[53]: True
```

```python
In [1]: from django.db import connection

In [2]: connection
Out[2]: <django.utils.connection.ConnectionProxy at 0x10fe3b4f0>

In [3]: connection.features
Out[3]: <django.db.backends.sqlite3.features.DatabaseFeatures at 0x110ba5d90>

In [4]: connection.features.has_native_uuid_field
Out[4]: False
```



One thing to note, Django uses `psycopg2` driver for Postgres and it will take care of handling UUID specific to Postgres because UUID Python object needs to be converted to string or bytes before sending to the Postgres server.

Similar to `get_db_prep_value`, `get_prep_value`  which converts Python value to `query value`.

### formfield

Django supports `ModelForm` which is one to one mapping of HTML form to Django model. The Django admin uses `ModelForm` . The form consists of several fields. Each field in the form maps to field in the model. So Django can automatically construct the form with a list of validators from the model field.

Here is the implementation for the UUIDField.

```python
def formfield(self, **kwargs):
	return super().formfield(**{
  	'form_class': forms.UUIDField,
    **kwargs,
   })
```

When you create a custom database field, you need to create a custom form field to work with Django admin and pass it as an argument to super class method.

### deconstruct

`deconstruct` method returns value for creating an exact copy of the field. The method returns a tuple with 4 values.

- The first value is the `name` of the field passed during initialisation. The default value is `None`.
- The import path of the field.
- The list of positonal arguments passed during the field creation.
- The dictionary of keyword arguments passed during the field creation.

```python
In [62]: # Let's see the question_text deconstruct method return value

In [63]: question_text.deconstruct()
Out[63]: (None, 'django.db.models.CharField', [], {'max_length': 200})

In [65]: # let's create a new integer field with a name

In [66]: int_field = models.IntegerField(name='int_field', validators=[allow_odd_validator])

In [67]: int_field.deconstruct()
Out[67]:
('int_field',
 'django.db.models.IntegerField',
 [],
 {'validators': [<function __main__.allow_odd_validator(value)>]})

In [68]: models.IntegerField(**int_field.deconstruct()[-1])
Out[68]: <django.db.models.fields.IntegerField>

In [69]: int_2_field = models.IntegerField(default=2)

In [70]: int_2_field.deconstruct()
Out[70]: (None, 'django.db.models.IntegerField', [], {'default': 2})

```

Also when you implement a custom field, you can override the deconstruct method. Here is the `deconstruct` implementation for `UUIDField`.

```python
def deconstruct(self):
    name, path, args, kwargs = super().deconstruct()
    del kwargs['max_length']
    return name, path, args, kwargs
```



### __init__

`__init__` method is a good place to override some of the default values. For example, `UUIDField` max_length should always be `32` irrespective of the value passed on. In the decimal field, `max_digits` can be modified during initialization.

Here is the `UUIDField` initializer method implementation.

```python
def __init__(self, verbose_name=None, **kwargs):
    kwargs['max_length'] = 32
    super().__init__(verbose_name, **kwargs)
```

### db_type

`db_type` method takes Django connection as an argument and returns the database specific implementation type for this field. The method takes connection as an argument. Here is the output of db_type for Postgres and SQLite.

```python
In [72]: # Postgres

In [73]: uuid_field = models.UUIDField()

In [74]: uuid_field.db_type(connection)
Out[74]: 'uuid'
```



```python
In [8]: # Sqlite

In [9]: uuid_field = models.UUIDField()

In [10]: uuid_field.rel_db_type(connection)
Out[10]: 'char(32)'
```

`get_internal_type` method returns `internal` Python type which is companion to the `db_type` method. In practice, Django fields type and database field mapping is maintained as class variable in `DatabaseWrapper`. You can find, [Django fields and Postgres fields mapping in backends module](https://github.com/django/django/blob/main/django/db/backends/postgresql/base.py). Below is the mapping taken from source code.

```python
class DatabaseWrapper(BaseDatabaseWrapper):
    vendor = 'postgresql'
    display_name = 'PostgreSQL'
    # This dictionary maps Field objects to their associated PostgreSQL column
    # types, as strings. Column-type strings can contain format strings; they'll
    # be interpolated against the values of Field.__dict__ before being output.
    # If a column type is set to None, it won't be included in the output.
    data_types = {
        'AutoField': 'serial',
        'BigAutoField': 'bigserial',
        'BinaryField': 'bytea',
        'BooleanField': 'boolean',
        'CharField': 'varchar(%(max_length)s)',
        'DateField': 'date',
        'DateTimeField': 'timestamp with time zone',
        'DecimalField': 'numeric(%(max_digits)s, %(decimal_places)s)',
        'DurationField': 'interval',
        'FileField': 'varchar(%(max_length)s)',
        'FilePathField': 'varchar(%(max_length)s)',
        'FloatField': 'double precision',
        'IntegerField': 'integer',
        'BigIntegerField': 'bigint',
        'IPAddressField': 'inet',
        'GenericIPAddressField': 'inet',
        'JSONField': 'jsonb',
        'OneToOneField': 'integer',
        'PositiveBigIntegerField': 'bigint',
        'PositiveIntegerField': 'integer',
        'PositiveSmallIntegerField': 'smallint',
        'SlugField': 'varchar(%(max_length)s)',
        'SmallAutoField': 'smallserial',
        'SmallIntegerField': 'smallint',
        'TextField': 'text',
        'TimeField': 'time',
        'UUIDField': 'uuid',
    }
```

`get_internal_type` values are keys and values are Postgres field names.

I have skipped the implementation of the rest of the methods like `__reduce__` and `check` . You can go through the source code of Django fields in [GitHub](https://github.com/django/django/blob/main/django/db/models/fields/__init__.py) and also you will find class variables and private methods usages.

Django documentation has an excellent page on [how-to write custom model field](https://docs.djangoproject.com/en/3.2/howto/custom-model-fields/).

# Summary

1. `models.Field` is the root of all the model fields.
2. Field initializer takes configuration details like `name, default, db_index, null` for the database columns and `blank, help_text` for non-column features like Django model form and Django admin.
3. `__init__` method in the child class can override the user passed value and(or) set custom default value.
4. `validators` attribute in the field contains the user-defined validators and default validators specific to the field.
5. Every field needs to implement a few methods to work with the specific databases. Some of the useful methods are `to_python, get_db_prep_value, get_prep_value, deconstruct, formfield,  db_type`.
6. Django `Connection` object or wrapper contains details and features of the underlying the database.

- [Structure - Django ORM Working - Part 1](https://kracekumar.com/post/structure_django_orm_working_part1/index.md): Structure - Django ORM Working - Part 1
---
title: "Structure - Django ORM Working - Part 1"
date: 2021-05-15T18:30:00+05:30
draft: false
tags: ["python", "Django", "series"]
---

Django ORM hides a lot of complexity while developing the web application. The data model declaration and querying pattern are simplified, whereas it's structured differently behind the scenes. The series of blog posts will explain Django ORM working(not just converting Python code to SQL), model declaration, querying (manager, queryset), supporting multiple drivers, writing custom queries, migrations etc...

Consider a model definition from the [Django tutorial](https://docs.djangoproject.com/en/3.2/intro/tutorial02/).

```python
from django.db import models


class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')


class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_length=200)
    votes = models.IntegerField(default=0)
```



The Question and Choice model class derives from `models.Model`. Inheriting `Model` signals Django at run-time the class is a database model. `Question` model(later converted to a table) contains two extra class variables, `question_text` and `pub_date`, which will be two columns in the table. Their type is indicated by creating an instance of the respective type of fields, here `models.CharField, models.DateTimeField`. A similar work applies to the Choice model.

### Querying

Let's display all the instances of `Question` (all the rows in the table - polls_question).

````python
>>> Question.objects.all()
<QuerySet []>
````

As expected, the table is empty, and the method call returns empty result. The two things to notice are `objects.all()` and `QuerySet`. All the data entry in and out of the database through Django ORM happens through the object's interface. All the results are wrapped inside the QuerySet, even empty ones.

### From where does the `objects` instance come?

```python
>>> class Foo:
...     pass
...
>>> class Bar(Foo):
...     pass
...
>>> Bar.mro()
[<class 'Bar'>, <class 'Foo'>, <class 'object'>]
```



[Every Python class has a Method Resolution Order, which determines the behavior of the class/method invocation](https://docs.python.org/3.8/library/stdtypes.html#class.mro). In the above example, the Bar class inherits the Foo class. The `mro` method returns the order from child to patent class, Bar inherits Foo, Foo inherits object. As you can see, the Foo class has no inherited class in the code, but the `mro` method says Foo inherits object. All the Python class inherits from the object; hence `mro` method returns `object` as the last parent in the list.

```python
>>> Question.mro()
[<class 'polls.models.Question'>, <class 'django.db.models.base.Model'>, <class 'object'>]
```

MRO for `Question` is clear and returns result as expected.

```python
>>> Bar
<class 'Bar'>
>>> Question
<class 'polls.models.Question'>
>>> Question.objects
<django.db.models.manager.Manager object at 0x10bd7f1c0>
>>> Question.objects.mro()
Traceback (most recent call last):
  File "<console>", line 1, in <module>
AttributeError: 'Manager' object has no attribute 'mro'
>>> Question.objects.__class__.mro()
[<class 'django.db.models.manager.Manager'>, <class 'django.db.models.manager.BaseManagerFromQuerySet'>, <class 'django.db.models.manager.BaseManager'>, <class 'object'>]
```



The [representation](https://docs.python.org/3/reference/datamodel.html#object.__repr__) of `Question.objects` is different from the representation of `Bar` and `Question` classes. As the name indicates, `objects` is an instance of the `Manager`. The `objects`, the instance of Manager class, inherits `BaseManagerFromQuerySet` and `BaseManager`.

```python
>>> Choice
<class 'polls.models.Choice'>
>>> Choice.objects
<django.db.models.manager.Manager object at 0x10bd7f0a0>
>>> Question.objects
<django.db.models.manager.Manager object at 0x10bd7f1c0>
>>> Choice.objects is Question.objects
False
>>> Choice.objects == Question.objects
True
```

What? Even though the instance uses different id but equality test returns `True`.

````python
# It's decalared in manager class

def __eq__(self, other):
        return (
            isinstance(other, self.__class__) and
            self._constructor_args == other._constructor_args
        )
````



The logic for checking the equality has two pieces - both operands should be of the same type and their constructor args are same. In this case(behind the scenes), both the managers were called with empty arguments.

```python
>>> Question.objects._constructor_args
((), {})
>>> Choice.objects._constructor_args
((), {})
```



In the next post, I'll cover how `models.*Field` works.

### Summary

1. The Django model inherits `model.Model` and all the class variables initialized with `model.*Field` automatically behaves like a column in the table.
2. The interactions to the database happen through `ModelManager` via `objects` attribute.

- [jut - render jupyter notebook in the terminal](https://kracekumar.com/post/jut/index.md): jut - render jupyter notebook in the terminal
---
title: "jut - render jupyter notebook in the terminal"
date: 2021-03-19T01:00:00+05:30
draft: false
tags: ["python", "CLI", "rich", "jupyter", "pet-project"]
---


The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more. The definition copied from the [official website](https://jupyter.org/).

It's becoming common to use Jupyter notebook to [write books](https://github.com/fastai/fastbook), do data analysis, reproducible experiments, etc... The file produced out of notebook follows [JSON Schema](https://github.com/jupyter/nbformat/blob/master/nbformat/v4/nbformat.v4.schema.json). Yet to view the file, the user needs to use [web-application](https://github.com/jupyter/nbviewer) or local notebook instance or browser instance.

Here is a 20 lines output of a notebook.

```shell
$cat 02.02-The-Basics-Of-NumPy-Arrays.ipynb | head -20
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<img align=\"left\" style=\"padding-right:10px;\" src=\"figures/PDSH-cover-small.png\">\n",
    "\n",
    "*This notebook contains an excerpt from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do) by Jake VanderPlas; the content is available [on GitHub](https://github.com/jakevdp/PythonDataScienceHandbook).*\n",
    "\n",
    "*The text is released under the [CC-BY-NC-ND license](https://creativecommons.org/licenses/by-nc-nd/3.0/us/legalcode), and code is released under the [MIT license](https://opensource.org/licenses/MIT). If you find this content useful, please consider supporting the work by [buying the book](http://shop.oreilly.com/product/0636920034919.do)!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Understanding Data Types in Python](02.01-Understanding-Data-Types.ipynb) | [Contents](Index.ipynb) | [Computation on NumPy Arrays: Universal Functions](02.03-Computation-on-arrays-ufuncs.ipynb) >\n",
```

It's hard to follow the **content along with the schema**. When the code is checked into the repository, it's hard to view the content from the **command line**.

Over the weekend, I wrote a small command-line tool to render the notebook over the terminal with help of [fantastic python library, rich.](https://github.com/willmcgugan/rich)

### Tool

[jut](https://github.com/kracekumar/jut) is a command line tool to display jupyter notebook in the terminal. The tool takes in a jupyter notebook file or URL (Raw Github or any ipynb file link) and renders each cell content in the terminal. You can install the package by running pip command `pip install jut`.

Here is a quick **asciicinema demo**

[![asciicast](https://asciinema.org/a/400349.svg)](https://asciinema.org/a/400349)

### Usage

### Display first five cells

Example: `jut --input-file foo.ipynb --head 5`

![https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-head.png](https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-head.png)

### Display last five cells

Example: `jut --input-file foo.ipynb --tail 5`

![https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-tail.png](https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-tail.png)

### Download the file and display first five cells

Example: `jut --url https://raw.githubusercontent.com/fastai/fastbook/master/06_multicat.ipynb --head 5`

![https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-download.png](https://raw.githubusercontent.com/kracekumar/jut/main/images/jut-download.png)

### Limitation

The tool only display **text content like markdown, python code, python output, raw text, JSON** and ignores any multi-media contents like images, PDF documents.

I hope the tool is a useful utility for folks working and handling with the Jupyter notebook. The code is available in [GitHub](https://github.com/kracekumar/jut), feel free to file bug report and file feature requests.


### References

- Jupyter Project - https://jupyter.org/
- Github Page - https://github.com/kracekumar/jut
- Python Rich Library - https://github.com/willmcgugan/rich

- [Five reasons to use Py.test](https://kracekumar.com/post/five-reason-to-use-pytest/index.md): Five reasons to use Py.test
---
title: "Five reasons to use Py.test"
date: 2020-12-20T00:46:38+05:30
draft: false
tags: ["python", "pytest"]
---

[Pytest](https://docs.pytest.org/) library provides a better way to write tests, run the tests, and report the test results. This post is a comparison between the Python unit test standard library and pytest features and leaves out other libraries like nose2.

### TL;DR

- Single assert statement(`assert a == b`) over 40 different assert methods(`self.assertEqual, self.assertIsInstance, self.assertDictEqual`)
- Better and detailed error messages on failure.
- Useful command line options for reporting, discovering, and reporting tests like `--last-failed`, `--collect-only`.
- Pytest plugins for extending the pytest functionalities and modifying default behavior. `pytest-mon, pytest-clarity, pytest-cov`
- Pytest fixtures for seed data and implementation custom test behaviors


# 1. Single assert statement over 40 different assert methods

Here is a sample unittest code

``` python
import unittest


class TestUnitTestShowCase(unittest.TestCase):
		def test_equal(self):
				v1 = "start"
				v2 = "start+"

				self.assertEqual(v1, v2)

    def test_dictionary(self):
        rust = {'name': 'Rust', 'released': 2010}
        python = {'name': 'Python', 'released': 1989}

        self.assertDictEqual(rust, python)

    def test_list(self):
        expected_genres = ['Novel', 'Literary Fiction']
        returned_genres = ['Novel', 'Popular Fiction']

        self.assertListEqual(expected_genres, returned_genres)
```

`TestCase` class supports 40 different assert methods([https://docs.python.org/3/library/unittest.html](https://docs.python.org/3/library/unittest.html)). `assertDictEqual` method for comparing the equality of two dictionaries, the `assertListEqual` method for comparing the equality of two lists, and the `assertEqual` method is a superset for all comparisons. It can act on two variables that implement an equality check. So it's technically possible to use `assertEqual` over `assertDictEqual` and `assertListEqual`. It becomes quite daunting to remember which `assertMethod` to use. The one advantage of using special assert methods is they check the `arguments`' type before comparing the value. For example, `self.assertDictEqual("", "")` will fail because the first argument is not dictionary.


Pytest recommends using assert statement over any specialized function or method. Here is an example of pytest testcases.

``` python
def test_dictionary():
    rust = {'name': 'Rust', 'released': 2010}
    python = {'name': 'Python', 'released': 1989}

    assert rust == python

def test_list():
    expected_genres = ['Novel', 'Literary Fiction']
    returned_genres = ['Novel', 'Popular Fiction']

    assert expected_genres == returned_genres
```

One assert statement can check all types of equality. It's simple and easy to use and remember.



Pytest also supports executing the Tests inheriting `unittest.TestCase` with unit test assert methods or assert statements. You can write Pytest tests as a function.


# 2. Better Failure messages in Pytest


Consider the unittest two tests. One which checks two dictionaries are equal and two data class instances are equal.

``` python
import unittest

from dataclasses import dataclass

@dataclass
class Book:
    name: str
    year: int
    author: str


class TestUnitTestShowCase(unittest.TestCase):
    def test_assert_equal(self):
        expected_json = {'name': 'Haruki Murakami',
                         'language': 'Japanese',
                         'title': 'Windup Bird Chronicle',
                         'year_of_release': 1994,
                         'page_count': 607,
                         'genres': ['Novel', 'Science Fiction',
                                    'Pyschological Fiction']}
        return_json = {'name': 'Haruki Murakami',
                       'language': 'Japanese',
                       'title': 'Kafka on the shore',
                       'year_of_release': 2002,
                       'page_count': 505,
                       'genres': ['Novel', 'Magical Realism',
                                  'Fantasy Fiction']}

        self.assertDictEqual(return_json, expected_json)

    def test_dataclass(self):
        windup = Book(name='Windup Bird Chronicle', year=1994,
                      author='Haruki Murakami')
        kafka = Book(name='Kafka on the shore', year=2002,
                     author='Haruki Murakami')

        self.assertEqual(windup, kafka)


if __name__ == "__main__":
    unittest.main()
```

Output

``` python
$python test_unittest.py
FF
======================================================================
FAIL: test_assert_equal (__main__.TestUnitTestShowCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_unittest.py", line 41, in test_assert_equal
    self.assertDictEqual(return_json, expected_json)
AssertionError: {'nam[52 chars]e': 'Kafka on the shore', 'year_of_release': 2[77 chars]on']} != {'nam[52 chars]e': 'Windup Bird Chronicle', 'year_of_release'[86 chars]on']}
- {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction'],
+ {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction'],
   'language': 'Japanese',
   'name': 'Haruki Murakami',
-  'page_count': 505,
?                ^ ^

+  'page_count': 607,
?                ^ ^

-  'title': 'Kafka on the shore',
+  'title': 'Windup Bird Chronicle',
-  'year_of_release': 2002}
?                     ^^^^

+  'year_of_release': 1994}
?                     ^^^^


======================================================================
FAIL: test_dataclass (__main__.TestUnitTestShowCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_unittest.py", line 49, in test_dataclass
    self.assertEqual(windup, kafka)
AssertionError: Book(name='Windup Bird Chronicle', year=1994, author='Haruki Murakami') != Book(name='Kafka on the shore', year=2002, author='Haruki Murakami')

----------------------------------------------------------------------
Ran 2 tests in 0.002s

FAILED (failures=2)
```

The differing values on the first argument start with - minus sign and the second argument start with + sign.

When two dictionary values are different, the error message prints each key and value.
Even though the deviating values are printed, it's not straightforward to grasp the unit test runner output errors.

When the diff gets longer, the unit test redacts the error message.
The behavior is painful when the test fails in the CI which takes a long time to run and rerun after changing the configuration.

Here is the modified `test_assert_equal`.

``` python
    def test_assert_equal(self):
        expected_json = {'name': 'Haruki Murakami',
                         'language': 'Japanese',
                         'title': 'Windup Bird Chronicle',
                         'year_of_release': 1994,
                         'page_count': 607,
                         'genres': ['Novel', 'Science Fiction',
                                    'Pyschological Fiction'],
                         'translations': {'en': ['Jay Rubin'],
                                          'translators': {'name': 'Jay Rubin',
                                                          'location': 'tokyo'}}}
        return_json = {'name': 'Haruki Murakami',
                       'language': 'Japanese',
                       'title': 'Kafka on the shore',
                       'year_of_release': 2002,
                       'page_count': 505,
                       'genres': ['Novel', 'Magical Realism',
                                  'Fantasy Fiction'],
                       'translations': {'ta': ['Nilavan']}}

        self.assertDictEqual(return_json, expected_json)
```


Output

``` python
$python test_unittest.py
FF
======================================================================
FAIL: test_assert_equal (__main__.TestUnitTestShowCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test_unittest.py", line 45, in test_assert_equal
    self.assertDictEqual(return_json, expected_json)
AssertionError: {'nam[52 chars]e': 'Kafka on the shore', 'year_of_release': 2[114 chars]n']}} != {'nam[52 chars]e': 'Windup Bird Chronicle', 'year_of_release'[184 chars]o'}}}
Diff is 696 characters long. Set self.maxDiff to None to see it.
...
```

Output after setting `maxDiff=None`

``` python
...
AssertionError: {'nam[52 chars]e': 'Kafka on the shore', 'year_of_release': 2[114 chars]n']}} != {'nam[52 chars]e': 'Windup Bird Chronicle', 'year_of_release'[184 chars]o'}}}
- {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction'],
+ {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction'],
   'language': 'Japanese',
   'name': 'Haruki Murakami',
-  'page_count': 505,
?                ^ ^

+  'page_count': 607,
?                ^ ^

-  'title': 'Kafka on the shore',
+  'title': 'Windup Bird Chronicle',
-  'translations': {'ta': ['Nilavan']},
?                    ^^     ^ ----   -

+  'translations': {'en': ['Jay Rubin'],
?                    ^^     ^^^^^^^

+                   'translators': {'location': 'tokyo', 'name': 'Jay Rubin'}},
-  'year_of_release': 2002}
?                     ^^^^

+  'year_of_release': 1994}
?                     ^^^^

```

Now there is a new element translations inside the dictionary. Hence the error message is longer.
The `translators` details are missing in one of the dictionary and it's hard to find out from the error message.

The second test test_dataclass failure message doesn't say which of the attributes are different.

``` python
AssertionError: Book(name='Windup Bird Chronicle', year=1994, author='Haruki Murakami')
!= Book(name='Kafka on the shore', year=2002, author='Haruki Murakami')

```

It's hard to figure out what attributes values are different when the data class contains 15 attributes.

On the other hand, pytest error messages are clear and explicitly states difference with differing verbose level.

Here is the same test case for running with Pytest.

``` python
from dataclasses import dataclass

@dataclass
class Book:
    name: str
    year: int
    author: str


def test_assert_equal():
    expected_json = {'name': 'Haruki Murakami',
                     'language': 'Japanese',
                     'title': 'Windup Bird Chronicle',
                     'year_of_release': 1994,
                     'page_count': 607,
                     'genres': ['Novel', 'Science Fiction',
                                'Pyschological Fiction']}
    return_json = {'name': 'Haruki Murakami',
                   'language': 'Japanese',
                   'title': 'Kafka on the shore',
                   'year_of_release': 2002,
                   'page_count': 505,
                   'genres': ['Novel', 'Magical Realism',
                              'Fantasy Fiction']}
    assert  return_json == expected_json

def test_dataclass():
    windup = Book(name='Windup Bird Chronicle', year=1994,
                  author='Haruki Murakami')
    kafka = Book(name='Kafka on the shore', year=2002,
                 author='Haruki Murakami')

    assert windup == kafka
```

Output

``` python
$pytest test_pytest.py
======================================================================================================================================== test session starts =========================================================================================================================================
platform darwin -- Python 3.8.5, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/user/code/personal/why-pytest
collected 2 items

test_pytest.py FF                                                                                                                                                                                                                                                                              [100%]

============================================================================================================================================== FAILURES ==============================================================================================================================================
_________________________________________________________________________________________________________________________________________ test_assert_equal __________________________________________________________________________________________________________________________________________

    def test_assert_equal():
        expected_json = {'name': 'Haruki Murakami',
                         'language': 'Japanese',
                         'title': 'Windup Bird Chronicle',
                         'year_of_release': 1994,
                         'page_count': 607,
                         'genres': ['Novel', 'Science Fiction',
                                    'Pyschological Fiction']}
        return_json = {'name': 'Haruki Murakami',
                       'language': 'Japanese',
                       'title': 'Kafka on the shore',
                       'year_of_release': 2002,
                       'page_count': 505,
                       'genres': ['Novel', 'Magical Realism',
                                  'Fantasy Fiction']}
>       assert  return_json == expected_json
E       AssertionError: assert {'genres': ['...nt': 505, ...} == {'genres': ['...nt': 607, ...}
E         Omitting 2 identical items, use -vv to show
E         Differing items:
E         {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction']} != {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction']}
E         {'year_of_release': 2002} != {'year_of_release': 1994}
E         {'page_count': 505} != {'page_count': 607}
E         {'title': 'Kafka on the shore'} != {'title': 'Windup Bird Chronicle'}
E         Use -v to get the full diff

test_pytest.py:25: AssertionError
___________________________________________________________________________________________________________________________________________ test_dataclass ___________________________________________________________________________________________________________________________________________

    def test_dataclass():
        windup = Book(name='Windup Bird Chronicle', year=1994,
                      author='Haruki Murakami')
        kafka = Book(name='Kafka on the shore', year=2002,
                     author='Haruki Murakami')

>       assert windup == kafka
E       AssertionError: assert Book(name='Wi...uki Murakami') == Book(name='Ka...uki Murakami')
E
E         Omitting 1 identical items, use -vv to show
E         Differing attributes:
E         ['name', 'year']
E
E         Drill down into differing attribute name:
E           name: 'Windup Bird Chronicle' != 'Kafka on the shore'...
E
E         ...Full output truncated (6 lines hidden), use '-vv' to show

test_pytest.py:46: AssertionError
====================================================================================================================================== short test summary info =======================================================================================================================================
FAILED test_pytest.py::test_assert_equal - AssertionError: assert {'genres': ['...nt': 505, ...} == {'genres': ['...nt': 607, ...}
FAILED test_pytest.py::test_dataclass - AssertionError: assert Book(name='Wi...uki Murakami') == Book(name='Ka...uki Murakami')
========================================================================================================================================= 2 failed in 0.05s ==========================================================================================================================================
```
The failure message contains not only the error message but the entire function definition to understand the failure along with enough mismatch information.

Let's see the `test_dataclass` error.

- `Omitting 1 identical items, use -vv to show` - Says out of all the attributes one attribute contains the same value.

- `Differing attributes: ['name', 'year']` - Says the name of the differing attributes.

- `Drill down into differing attribute name:` - Next the error message drills down the differing attribute values.

- `name: 'Windup Bird Chronicle' != 'Kafka on the shore'...` - Says the attribute name, prints two comparing values.

- `...Full output truncated (6 lines hidden), use '-vv' to show` - Now Pytest redacts the remaining error message but says
how many more lines are present which is quite useful and developer can decide rerun with `-vv` flag.
After passing `-vv` flag, the output is richer.

``` python
def test_dataclass():
        windup = Book(name='Windup Bird Chronicle', year=1994,
                      author='Haruki Murakami')
        kafka = Book(name='Kafka on the shore', year=2002,
                     author='Haruki Murakami')

>       assert windup == kafka
E       AssertionError: assert Book(name='Windup Bird Chronicle', year=1994, author='Haruki Murakami') == Book(name='Kafka on the shore', year=2002, author='Haruki Murakami')
E
E         Matching attributes:
E         ['author']
E         Differing attributes:
E         ['name', 'year']
E
E         Drill down into differing attribute name:
E           name: 'Windup Bird Chronicle' != 'Kafka on the shore'
E           - Kafka on the shore
E           + Windup Bird Chronicle
E
E         Drill down into differing attribute year:
E           year: 1994 != 2002
E           +1994
E           -2002
```

For dictionary mismatch, Pytest prints each key and value in the same way. By having these varying levels, it quicker for developers to fix the errors.

``` python
# without verbose error

E       AssertionError: assert {'genres': ['...nt': 505, ...} == {'genres': ['...nt': 607, ...}
E         Omitting 2 identical items, use -vv to show
E         Differing items:
E         {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction']} != {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction']}
E         {'year_of_release': 2002} != {'year_of_release': 1994}
E         {'page_count': 505} != {'page_count': 607}
E         {'title': 'Kafka on the shore'} != {'title': 'Windup Bird Chronicle'}
E         Use -v to get the full diff

# with verbose error

E       AssertionError: assert {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction'],\n 'language': 'Japanese',\n 'name': 'Haruki Murakami',\n 'page_count': 505,\n 'title': 'Kafka on the shore',\n 'translations': {'ta': ['Nilavan']},\n 'year_of_release': 2002} == {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction'],\n 'language': 'Japanese',\n 'name': 'Haruki Murakami',\n 'page_count': 607,\n 'title': 'Windup Bird Chronicle',\n 'translations': {'en': ['Jay Rubin'],\n                  'translators': {'location': 'tokyo', 'name': 'Jay Rubin'}},\n 'year_of_release': 1994}
E         Common items:
E         {'language': 'Japanese', 'name': 'Haruki Murakami'}
E         Differing items:
E         {'year_of_release': 2002} != {'year_of_release': 1994}
E         {'genres': ['Novel', 'Magical Realism', 'Fantasy Fiction']} != {'genres': ['Novel', 'Science Fiction', 'Pyschological Fiction']}
E         {'title': 'Kafka on the shore'} != {'title': 'Windup Bird Chronicle'}
E         {'page_count': 505} != {'page_count': 607}
E         {'translations': {'ta': ['Nilavan']}} != {'translations': {'en': ['Jay Rubin'], 'translators': {'location': 'tokyo', 'name': 'Jay Rubin'}}}
E         Full diff:
E           {
E            'genres': ['Novel',
E         +             'Magical Realism',
E         -             'Science Fiction',
E         ?              ^^^^ ^^
E         +             'Fantasy Fiction'],
E         ?              ^^ ^^^^         +
E         -             'Pyschological Fiction'],
E            'language': 'Japanese',
E            'name': 'Haruki Murakami',
E         -  'page_count': 607,
E         ?                ^ ^
E         +  'page_count': 505,
E         ?                ^ ^
E         -  'title': 'Windup Bird Chronicle',
E         +  'title': 'Kafka on the shore',
E         -  'translations': {'en': ['Jay Rubin'],
E         ?                    ^^     ^ ^^^^^^
E         +  'translations': {'ta': ['Nilavan']},
E         ?                    ^^     ^^^ ^^   +
E         -                   'translators': {'location': 'tokyo',
E         -                                   'name': 'Jay Rubin'}},
E         -  'year_of_release': 1994,
E         ?                     ^^^^
E         +  'year_of_release': 2002,
E         ?                     ^^^^
E           }
```

# 3. Useful command line options

Pytest comes with a lot of useful command-line options for `execution, discovering tests, reporting, debugging, and logging`.

Pytest has an option `--last-failed` which only runs the test which failed during the last execution.

Here is an example where one test fails.

```python
$ pytest test_pytest.py
======================================================================================================================================== test session starts =========================================================================================================================================
platform darwin -- Python 3.8.5, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/user/code/personal/why-pytest
collected 2 items

test_pytest.py F.                                                                                                                                                                                                                                                                              [100%]

============================================================================================================================================== FAILURES ==============================================================================================================================================
_________________________________________________________________________________________________________________________________________ test_assert_equal __________________________________________________________________________________________________________________________________________

    def test_assert_equal():
..
E         ...Full output truncated (2 lines hidden), use '-vv' to show

test_pytest.py:29: AssertionError
====================================================================================================================================== short test summary info =======================================================================================================================================
FAILED test_pytest.py::test_assert_equal - AssertionError: assert {'genres': ['...nt': 505, ...} == {'genres': ['...nt': 607, ...}
==================================================================================================================================== 1 failed, 1 passed in 0.05s =====================================================================================================================================
```

The short summary says, `1 failed, 1 passed in 0.05s`.  Next time, while running the test, `pytest --last-failed test_pytest.py` executes only failed test from previous run.

```python
$pytest --last-failed test_pytest.py
======================================================================================================================================== test session starts =========================================================================================================================================
platform darwin -- Python 3.8.5, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/user/code/personal/why-pytest
collected 2 items / 1 deselected / 1 selected
run-last-failure: rerun previous 1 failure

test_pytest.py F                                                                                                                                                                                                                                                                               [100%]

============================================================================================================================================== FAILURES ==============================================================================================================================================
_________________________________________________________________________________________________________________________________________ test_assert_equal __________________________________________________________________________________________________________________________________________

    def test_assert_equal():
...
test_pytest.py:29: AssertionError
====================================================================================================================================== short test summary info =======================================================================================================================================
FAILED test_pytest.py::test_assert_equal - AssertionError: assert {'genres': ['...nt': 505, ...} == {'genres': ['...nt': 607, ...}
================================================================================================================================== 1 failed, 1 deselected in 0.04s ===================================================================================================================================
```

The short summary says, `1 failed, 1 deselected in 0.04s`

`pytest --collect-only` collect all the test files and test function/classes in the path.
While reporting, `pytest` notifies files with the same name across different directories.
For example, if there is a file `test_models` in the unit test directory and in the integration directory,
pytest refuses to run the tests and also complains during the collect phase.
The test file name should be unique across the project.

```python
$tree
.
├── test
│   └── test_unittest.py
├── test_pytest.py
└── test_unittest.py

1 directory, 3 files

$pytest --collect-only
======================================================================================================================================== test session starts =========================================================================================================================================
platform darwin -- Python 3.8.5, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: /Users/user/code/personal/why-pytest
collected 4 items / 1 error / 3 selected

<Module test_pytest.py>
  <Function test_assert_equal>
  <Function test_dataclass>
<Module test_unittest.py>
  <UnitTestCase TestUnitTestShowCase>
    <TestCaseFunction test_assert_equal>
    <TestCaseFunction test_dataclass>

=============================================================================================================================================== ERRORS ===============================================================================================================================================
_______________________________________________________________________________________________________________________________ ERROR collecting test/test_unittest.py _______________________________________________________________________________________________________________________________
import file mismatch:
imported module 'test_unittest' has this __file__ attribute:
  /Users/user/code/personal/why-pytest/test_unittest.py
which is not the same as the test file we want to collect:
  /Users/user/code/personal/why-pytest/test/test_unittest.py
HINT: remove __pycache__ / .pyc files and/or use a unique basename for your test file modules
====================================================================================================================================== short test summary info =======================================================================================================================================
ERROR test/test_unittest.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
```

### 4. Pytest plugins

Pytest is built using [pluggy framework](https://pluggy.readthedocs.io/en/latest/)
and it's built as a set of composable and extensible plugins.
As a result, any developer can extend the functionality of the pytest runner and pytest test functionality.
There are more than 300 plus plugins available.

Earlier in the blog post, I said, how the pytest error message text is better compared to the `unittest`.
The default error rendering doesn't support colorized diff and side by side comparison.

The two plugins `pytest-icdiff` and `pytest-clarity` improves the error message rendering and message text for better and quick understanding.

A sample pytest-icdiff output

<img src="https://raw.githubusercontent.com/hjwp/pytest-icdiff/master/example_output.png">

A sample `pytest-clarity` output with enhanced error messages

<img src="https://raw.githubusercontent.com/darrenburns/pytest-clarity/master/pytest-clarity.png">

### Reporting

- `pytest-cov`, [https://github.com/pytest-dev/pytest-cov](https://github.com/pytest-dev/pytest-cov)
is pytest integration for coverage plugin. It supports generating coverage as HTML reports.

- `pytest-html`, [https://github.com/pytest-dev/pytest-html/](https://github.com/pytest-dev/pytest-html/) is a plugin for generating HTML test results.
On failure, the plugin takes screenshots of the selenium test browser.

- `pytest-xdist`, [https://pypi.org/project/pytest-xdist/](https://pypi.org/project/pytest-xdist/)
allows entire python tests to run in multiple CPUs rather than running on a single process.
The fixtures may need a little change depending on the implementation.

- The default test discovery process works in most of the case, but for larger
projects which takes significant time to run, running the new tests first will save a lot of time.
When a test fails, the entire test can be stopped and rerun after fixing the error.
When running in CI server, this functionality comes in handy for three reasons - quicker feedback,
save a few dollars, and reduce CI waiting for pending PRs.
`pytest-mon`, [https://testmon.org/determining-affected-tests.html](https://testmon.org/determining-affected-tests.html)
uses coverage to find out the modified code and schedules the modified code to run first.

# 5. Fixtures

Fixtures, [https://docs.pytest.org/en/latest/fixture.html#fixture](https://docs.pytest.org/en/latest/fixture.html#fixture),
are seed data for the tests. For example, to test an API endpoint,
the test may need `user account associated company and billing details`.
With fixtures, setting, and deleting the data after the tests are easier.
Pytest supports five different scopes for these fixtures - `function, class, module, package, and session`.
Here is an example from the docs for a SMTP connection fixture with session scope.

```python
#conftest.py
@pytest.fixture(scope="module")
def smtp_connection():
    return smtplib.SMTP("smtp.gmail.com", 587, timeout=5)

# test_module.py

def test_ehlo(smtp_connection):
    response, msg = smtp_connection.ehlo()
    assert response == 250
    assert b"smtp.gmail.com" in msg
    assert 0  # for demo purposes
```

Apart from creating the seed data for the database, it's possible to create a mock or factory for testing with help of fixtures.

- `pytest-asyncio` provides a set of asyncio markers and fixtures for running the asyncio tests,
https://pypi.org/project/pytest-asyncio/.

`pytest-django` provides a set of Django specific helpers and fixtures for
writing Django application test cases. The `pytest.mark.django_db` marker allows
only test cases marked to access the database.
This is a handy feature for separating database access. `django_assert_max_num_queries`
helper allows only `n` times to access a database in a test function or method.
There are quite a few handy helpers in the package, [pytest-django](https://pytest-django.readthedocs.io/en/latest/index.html).

Overall, Pytest is a powerful, feature-rich library to write better test cases.
The library uses more functions and decorators to implement, extend the core features.
Especially writing and understanding fixtures involves a bit of a learning curve,
at the same time, pytest fixtures are scalable and robust.
The other powerful feature is function [parameterization](https://docs.pytest.org/en/stable/parametrize.html) which can save a lot of boiler code.

In general, pytest is a far more powerful, extensible, and configurable testing
framework compared to the `unittest` framework in the standard library.
You can still inherit the `TestCase` and use pytest as a test runner.
It's worth investing time to learn and use it.



### Links from the post

- Pytest, https://docs.pytest.org/
- Standard unittest, https://docs.python.org/3/library/unittest.html
- Pluggy Framework, https://pluggy.readthedocs.io/en/latest/
- pytest-icdiff, https://github.com/hjwp/pytest-icdiff
- pytest-clarity, https://github.com/darrenburns/pytest-clarity
- pytest-mon, https://testmon.org/determining-affected-tests.html
- pytest-cov, https://github.com/pytest-dev/pytest-cov
- pytest-xdist, https://pypi.org/project/pytest-xdist
- Fixtures, https://docs.pytest.org/en/latest/fixture.html#fixture
- pytest-asyncio, https://pypi.org/project/pytest-asyncio/
- pytest-django, https://pytest-django.readthedocs.io/en/latest/index.html
- parameterization, https://docs.pytest.org/en/stable/parametrize.html

- [Build Plugins with Pluggy](https://kracekumar.com/post/build_plugins_with_pluggy/index.md): Build Plugins with Pluggy
---
title: "Build Plugins with Pluggy"
date: 2020-10-18T20:46:38+05:30
draft: false
tags: ["python", "pluggy", "plugin architecture"]
---

### Introduction

The blog post is a write up of my two talks from [PyGotham](http://pygotham.tv) and [PyCon India](https://in.pycon.org/2020/) titled, `Build Plugins with Pluggy.` The write-up covers a trivial use-case, discusses why a plugin-based architecture is a good fit, what is plugin-based architecture, how to develop plugin-based architecture using pluggy, and how pluggy works.

[Link to PyCon India 2020 Talk](https://www.youtube.com/watch?v=uwM7OgWLCPE)

<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/uwM7OgWLCPE" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>


### Trivial Use Case

For the scope of the blog post, consider a command-line application queries `gutenberg` service, processes the data, and displays the relevant information. Let's see how to build such an application using pluggy.

Here is the JSON output from the application.

```python
 $python host.py search -t  "My Bondage and My Freedom"
[
    {
        "bookshelves": [
            "African American Writers",
            "Slavery"
        ],
        "copyright": false,
        "download_count": 1538,
        "media_type": "Text",
        "name": "Douglass, Frederick",
        "title": "My Bondage and My Freedom",
        "xml": "http://www.gutenberg.org/ebooks/202.rdf"
    }
]
```

### Normal code

![Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Normal_Architecture.png](/images/plugin/Normal_Architecture.png)

The application has three parts - **user input processor, details gatherer, and result renderer.**

The below is the code

```python
import click
import requests
import json
from pygments import highlight, lexers, formatters

def colorize(formatted_json):
    return highlight(
        formatted_json.encode("UTF-8"),
        lexers.JsonLexer(),
        formatters.TerminalFormatter(),
    )

def print_output(resp, kwargs):
    data = resp.json()
    table = [
        {
            "name": result["authors"][0]["name"],
            "bookshelves": result["bookshelves"],
            "copyright": result["copyright"],
            "download_count": result["download_count"],
            "title": result["title"],
            "media_type": result["media_type"],
            "xml": result["formats"]["application/rdf+xml"],
        }
        for result in data["results"]
    ]
    if kwargs.get('format', '') == 'json':
        indent = kwargs.get("indent", 4)
        formatted_json = json.dumps(table, sort_keys=True, indent=indent)
        if kwargs.get('colorize'):
            print(colorize(formatted_json))
        else:
            print(formatted_json)
    # TODO: Add YAML Format
    # TODO: Add Tabular Format

class Search:
    def __init__(self, term, kwargs):
        self.term = term
        self.kwargs = kwargs

    def make_request(self):
        resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
        return resp

    def run(self):
        resp = self.make_request()
        print_output(resp, self.kwargs)

@click.group()
def cli():
    pass

@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
@click.option("--format", "-f", type=str, help="Output format", default='json')
def search(title, author, **kwargs):
    if not (title or author):
        print("Pass either --title or --author")
        exit(-1)
    else:
        search = Search(title or author, kwargs)
        search.run()

if __name__ == '__main__':
    cli()
```

The `print_output` function supports one output format. It's easy to add one more format. When the application is a library, `print_output` suffers from a few issues while supporting more output renderers. It's hard for a developer to support all possible and requested formats by end-users. It's painful to extend the functionality to every format. One way to extend the functionality is to re-architect the code to follow plugin based architecture.

### What are plugins?

**noun**: plug-in is a software component that adds a specific feature to an existing computer program.

A plugin is a software component that enhances or modifies the behavior of the program at run-time. For example, Google Chrome extension or Firefox addon change the behavior or adds functionality to the browser. The browser extensions are good example for plugin based architecture.

![Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Plugin_Architecture_.png](/images/plugin/Plugin_Architecture_.png)

In general, plugin architecture has two main components - **host/caller/core system** and **plugin/hook**. The host or core system is responsible for calling the plugin or hook at registered functionality.

### Pluggy introduction

Pluggy is a Python library that provides a structured way to manage, discover plugins, and enable hooks to change the host program's behavior at runtime.

Here is the code structure of the application.

```python
$tree                                                                                                                                                                                               (pluggy_talk)
.
├── LICENSE
├── README.md
├── hookspecs.py
├── host.py
├── output.py
├── requirements.txt
└── tests.py
```

Apart from the test file, there are three python files. Before getting to know what are these three files, let's familiarize them with pluggy concepts.

- **Host Program/Core system** - `host.py` is the core system that orchestrates the program flow by discovering, registering, and calling them.

    ```python
    class Search:
        def __init__(self, term, hook, kwargs):
            # initializes the attrs

        def make_request(self):
            # makes the request to gutenberg URL

        def run(self):
            # co-ordinates the flow

    def get_plugin_manager():
        # plugin spec, implementation registration

    @click.group()
    def cli():
        pass

    @cli.command()
    # click options
    def search(title, author, **kwargs):
        # validates the user input, manages search workflow

    def setup():
        pm = get_plugin_manager()
        pm.hook.get_click_group(group=cli)

    if __name__ == "__main__":
        setup()
        cli()
    ```

- **Plugin** - The file `output.py` implements the plugin[s] logic.
- **Plugin Manager (instance in host.py)** - Plugin manager is responsible for creating instances for plugin management.
- **Hook Specification (hookspec.py)** - Hook specification is the blueprint or contract for the plugin. The hook specification is a python function or a method with an empty body.
- **Hook Implementation (function/method in output.py)** - Hook implementation carries hook logic.

### Pluggy walkthrough

![Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/Plugin_Flow.png](/images/plugin/Plugin_Flow.png)

The plugin workflow happens in a single machine. The registration, hooking calling occurs in the same process as of host program. The above image represents the logical flow of the plugin-based architecture. Each colored block represents different functionality, and the arrow represents the direction of the flow.


**Hook Spec**

```python
# hookspec.py

import pluggy
hookspec = pluggy.HookspecMarker(project_name="gutenberg")

@hookspec
def print_output(resp, config):
    """Print formatted output"""
```

A hook specification is a contract for the hook to implement. The first step in declaring the hook specification is to create an instance of `HookspecMarker` with the desired name. The second step is to mark the python function as hookspec using the marker as a decorator.

`print_output` hook name is `print_output` and defines two arguments in the function signature - **response object and configuration object.**

**Hook Implementation**

```python
# Name should match hookspec marker (plugin.py)
hookimpl = pluggy.HookimplMarker(project_name="gutenberg")

@hookimpl
def print_output(resp, config):
    """Print output"""
    data = resp.json()
    table = [
        {
            "name": result["authors"][0]["name"],
            "bookshelves": result["bookshelves"],
            "copyright": result["copyright"],
            "download_count": result["download_count"],
            "title": result["title"],
            "media_type": result["media_type"],
            "xml": result["formats"]["application/rdf+xml"],
        }
        for result in data["results"]
    ]
    indent = config.get("indent", 4)
    if config.get('format', '') == 'json':
        print(f"Using the indent size as {indent}")
        formatted_json = json.dumps(table, sort_keys=True,
                                    indent=indent)
        if config.get('colorize'):
            print(colorize(formatted_json))
        else:
            print(formatted_json)
```

The function `print_output` implements the hook implementation.
Hook spec and hook implementation functions should carry the same function signature.

The first step in hook specification is to create an instance of `HookimplMarker` with the same name in `HookspecMarker`. The second step is to mark the python function as a hook implementation using the marker as a decorator.

`print_output` function performs serious of operations - read JSON data from the response, collect the relevant details from the JSON data, collect the configuration operation passed to the plugin, and at last, print the details.

**Plugin Manager**

```python
import hookspecs
import output

def get_plugin_manager():
    pm = pluggy.PluginManager(project_name="gutenberg")
    pm.add_hookspecs(hookspecs)
    # Add a Python file
    pm.register(output)
    # Or add a load from setuptools entrypoint
    pm.load_setuptools_entrypoints("gutenberg")
    return pm
```

The plugin manager is responsible for discovering, registering the hook specification, and hook implementation.

The first step is to create the `PluggyManager` instance with the common name. The second step is to add the hook specification to plugin manager. The final step is to register or discover the python hook implementation. The implementation can be in python files in the import system path or registered using python setup tools entry point. In the example, the `output.py` resides in the same directory.

**Invoke the hook**

```python
# host.py
class Search:
    def __init__(self, term, hook, kwargs):
        self.term = term
        self.hook = hook
        self.kwargs = kwargs

    def make_request(self):
        resp = requests.get(f"http://gutendex.com/books/?search={self.term}")
        return resp

    def run(self):
        resp = self.make_request()
        self.hook.print_output(resp=resp, config=self.kwargs)


@cli.command()
@click.option("--title", "-t", type=str, help="Title to search")
@click.option("--author", "-a", type=str, help="Author to search")
def search(title, author, **kwargs):
    if not (title or author):
        print("Pass either --title or --author")
        exit(-1)
    else:
        pm = get_plugin_manager()
        search = Search(title or author, pm.hook, kwargs)
        search.run()
```

After setting up all the parts for hook calling, the final step in the workflow is to call the hook at the proper time. The `run` method after receiving the response, calls the `print_output` hook.

**Output**

![Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/output_indent_2.png](/images/plugin/output_indent_2.png)

![Build%20Plugins%20with%20Pluggy%203e282afb83124aa3a24625f192178932/output_indent_4.png](/images/plugin/output_indent_4.png)

The two screenshots are from two different inputs. The input terms are `My freedom and My bondage` and indent as 4 and 8.

### Internal details

It's possible to register multiple hook implementation for a single hook specification. In our case, there can be two `print_output` implementations, one for JSON rendering and another for YAML rendering. The pluggy will call each hook one after the other in `Last In First Out` order.

The hooks can return output. When the hooks return values, the caller will receive the return values as a list. In our case, `self.hook.print_output(resp=resp, config=self.kwargs)`, hooks don't return any value because there is only one plugin.

It's sub-optimal to call other hooks when the previous hook returns a value. To short circuit the flow, pluggy provides an option while declaring the specification.
`@hookspec(firstresult=True)` notifies the plugin manager
to stop calling the hooks once a **return value** is available.

### Testing the Plugin

Testing the hook implementation is same as testing any other python function.

Here is how the unit test looks like

```python
def test_print_output(capsys):
    resp = requests.get("http://gutendex.com/books/?search=Kafka")
    print_output(resp, {})

    captured = capsys.readouterr()
    assert len(json.loads(captured.out)) >= 1
```

Here is how the integration test looks like

```python
def test_search():
    setup()
    runner = CliRunner()
    result = runner.invoke(
        search,
        ["-t", "My freedom and My bondage",
        "--indent", 8, "--colorize", "false"],
    )

    expected_output = """
[
        {
                "bookshelves": [
                        "African American Writers",
                        "Slavery"
                ],
                "copyright": false,
                "download_count": 1201,
                "media_type": "Text",
                "name": "Douglass, Frederick",
                "title": "My Bondage and My Freedom",
                "xml": "http://www.gutenberg.org/ebooks/202.rdf"
        }
]
    """
    assert result
    assert result.output.strip() == expected_output.strip()
```

### Conclusion

- [Pytest]([https://github.com/pytest-dev/pytest](https://github.com/pytest-dev/pytest)) test runner uses pluggy extensively. There are 100+ pytest plugins use pluggy framework to develop the testing features like `test coverage` .
- [Tox]([https://tox.readthedocs.io/en/latest/plugins.html](https://tox.readthedocs.io/en/latest/plugins.html)) aims to automate and standardize testing in Python. It is part of a larger vision of easing the packaging, testing and release process of Python software.
- [Datasette]([https://docs.datasette.io/en/stable/plugins.html](https://docs.datasette.io/en/stable/plugins.html)) is a python tool to publish and explore the dataset.
- The concept of plugin is a powerful concept, it has a lot of advantage while managing highly configurable and extensible systems.

### Important links from the blog post:

- Slides - [https://slides.com/kracekumar/pluggy](https://slides.com/kracekumar/pluggy)
- Pluggy Documentation - [https://pluggy.readthedocs.io/en/latest/](https://pluggy.readthedocs.io/en/latest/)
- GitHub Source Code -[https://github.com/kracekumar/pluggy_talk](https://github.com/kracekumar/pluggy_talk/blob/master/normal_code.py)

- [Render local images in datasette using datasette-render-local-images](https://kracekumar.com/post/datasette-render-local-images/index.md): Render local images in datasette using datasette-render-local-images
---
title: "Render local images in datasette using datasette-render-local-images"
date: 2020-10-17T22:46:38+05:30
draft: false
tags: ["python", "datasette"]
---

[Datasette](https://docs.datasette.io) python library lets publishing a dataset as a SQLite database and
inspects them using a web browser. The database can store text, numbers, arrays,
images, binary data, and all possible formats. datasette-render-images
lets you save the image in the database and render the image using `data-uris`.

Sometimes, while publishing the dataset, you may want to store the images in a
separate directory for various reasons and include the relative path to the
database's images in the table.

The [datasette-render-local-images](https://github.com/kracekumar/datasette-render-local-images/)
plugin lets you perform, rendering the images from the local system.

When the cell value is `images/train/001_inbox.jpg`,
the plugin checks for the path; if the path exists,
the plugin reads the file. Next, checks, whether the file has content,
belongs to one of the image formats, jpg, png, jpeg, and renders the individual cell's image.

Below is the sample image.

![](/images/datasette-render-local-images/sample.png)

The sample image dataset consists of table `books` with
four columns `id, name, author, author_image`.

`author_image` column contains the path to the respective book authors' image.
The first four columns had images satisfying image display criteria.

The next three rows are an example of failure cases.
`No Exit` book author image path doesn't exist; hence plugin displays the exact value.
`Dummy` book author image path exists, but the content is empty; thus, unprocessed the value.
`Scalable_Vector_Graphics` book author image is the format is SVG; therefore, the unprocessed value.

Sometimes the dataset images can be of different sizes, so you may want
to view all the images in the same dimensions. You can modify the image dimension in metadata.json.

The configuration JSON looks like below one.

``` javascript

{
    "plugins": {
        "datasette-render-local-images": {
            "height": 150,
            "width": 150
        }
    }
}

```

The plugins configuration dictionary takes a plugin name as a key and
configuration for the plugin as a dictionary.

In our case, the plugin name is `datasette-render-local-images`, and
the two configurable options height and width.
As of now, the plugin supports only two configurable options.

### Using it in a project

I have a dataset on receipts stored in the table `input`.
Here is a sample out of a row using `sqlite-utils`.

``` bash
$sqlite-utils query ocr.db "select * from input limit 1"

[{"id": "b2df4606e368783ff2d024d0e4e58a620dacfa71",
"filename": "uber_receipts_with_driver_name.png",
"path": "/Users/user/code/work/open_dataset/uber/uber_receipts_with_driver_name.png",
"type": "uber",
"dataset_name": "open_dataset"}]

```

The input table contains five columns, `id, filename, path, type, and dataset_name`.
Let's see the output with and without plugin.

#### Install the plugin

``` bash

$datasette install datasette-render-local-images==0.0.7
...
Successfully installed datasette-render-local-images-0.0.7

```


#### Run the datasette

`$datasette ocr.db`

#### Screenshot of the input table

![](/images/datasette-render-local-images/datasette-image-render-with-no-heigh-width.png)

The images take their height and width and not compact to view.

#### Let's set the default height and width.

Configure height and width in metadata.json

``` javascript


{
    "plugins": {
        "datasette-render-local-images": {
            "height": 100,
            "width": 100
        }
    }
}
```

#### Run datasette command again

`$datasette ocr.db -m metadata.json`

#### Screenshot of the input table

![](/images/datasette-render-local-images/datasette-image-render-local-images-with-metadata.png)

You can play around with different height and width.
Of course, you can `Photo Zoom Plus` or any other extension to zoom the image
while hovering over the image to see the image in full size.

You can report issues, bug, or feature request in the project [github repo](https://github.com/kracekumar/datasette-render-local-images)

### Links from the post:

- Datasette Project Docs - https://docs.datasette.io
- Datasette render local images GitHub - https://github.com/kracekumar/datasette-render-local-images/

- [Tamil 1k Tweets For Binary Sentiment Analysis](https://kracekumar.com/post/tamil_1k_tweets_binary_sentiment/index.md): Tamil 1k Tweets For Binary Sentiment Analysis
---
title: "Tamil 1k Tweets For Binary Sentiment Analysis"
date: 2020-06-09T22:46:38+05:30
draft: false
---

To find a labeled data for Tamil NLP task is a difficult task. Some papers talk about Tamil Neural Translation, but the article doesn't release code. If you're working part-time or possess an interest in Tamil NLP, you have a tough time finding data.

When I was looking for labeled data for simple sentiment analysis, I couldn't find any. It's understandable because there is no one working on it. So I decided to build my dataset. Twitter seemed a perfect place with lots of data. I scrapped data using [Twint Python library](https://github.com/twintproject/twint).

After spending a whole weekend annotating the data as "Happy/Sad" for 1000 tweets, I'm releasing the data in the public domain. You can find the data in [Kaggle](https://www.kaggle.com/kracekumar/tamil-binary-classification-1k-tweets-labels-v1). The dataset has two columns, `tweet and sentiment.` The `tweet` column contains Tamil text, and `sentiment` column includes relevant `sentiment`.

```
 $head -10 tamil_binary_sentiment_1k_tweets_v1.csv
tweet,sentiment
உன்னைத்தொட்டால் உன்னுள்ளத்தை நொருக்கமாட்டியோ!! என்னைப் போல பெண்ணைப்ப் பார்த்து மயங்க மாட்டியோ!!  #RaOne #chammakChallo #tamilLyrics,Happy
"நதியா நதியா நயில் நதியா
…
இடை தான் கொடியா
கொடி மேல் கனியா
#RDBurnam #HindMusic #TamilLyrics",Happy
"உறக்கம் விற்று கனவுகள் வாங்கலையா?! #TamilLyrics RT @JanuShath: கனவுகள் விற்றுக் கவிதைகள் வாங்குவதும், கவிதைகள் விற்றுக் காதலை வாங்குவதுமாய்.",Sad
மீண்டும் உன்னை காணும் மனமே ... வேண்டும் எனக்கே மனமே மனமே !!! #TamilLyrics,Sad
உயிரை தொலைத்தேன் அது உன்னில் தானோ ... இது நான் காணும் கனவோ நிஜமோ...அன்பே உயிரை தொடுவேன் உன்னை தாலாட்டுதே பார்வைகள் ! #TamilLyrics,Sad
```

The dataset includes 1011 tweets. If you do sentiment analysis on the dataset, consider uploading the kernel to Kaggle. If you're using it in your research work, mention the DOI.

```
Kracekumar, “Tamil Binary Classification 1K tweets Labels V1.” Kaggle, doi: 10.34740/KAGGLE/DSV/1226691.
```

You can download the data from [GitHub](https://github.com/kracekumar/tamil-dataset) as well.

### Points to remember
- The sentiment is labeled based on the tweet and not on the multi-media or hyperlink attachment in the tweet.
- While creating the dataset, I have not looked at the image attached or the user handle for adult content. There may be NSFW attachment.
- The attached link or URL in the tweet may or may not exist.
- The content of the tweet may contain English sentences, words, emojis, etc.


### Labeling process

It took the entire weekend to label the tweets. Google Sheets was the annotation tool. Even though there are only a thousand tweets, I had to read two to three thousand tweets to assign the label. Roughly, the whole process took 17 hours(~1 tweet per minute). It's even hard to read some hundred tweets in a stretch and label. Fun news, I was targeting to label 10K tweets minimum :-) I don't know whether I'll do more labeling any time soon. There are close to 4 lakh tweets in the DB :-)

Happy NLP!

### Important Links in the blog post

- Kaggle Dataset - https://www.kaggle.com/kracekumar/tamil-binary-classification-1k-tweets-labels-v1
- Twint - https://github.com/twintproject/twint
- Github Repo - https://github.com/kracekumar/tamil-dataset

- [Parameterize Python Tests](https://kracekumar.com/post/618264170735009792/parameterize-python-tests/index.md): Parameterize Python Tests
+++
date = "2020-05-16 09:51:00+00:00"
draft = false
tags = ["python", "testing", "pytest"]
title = "Parameterize Python Tests"
url = "/post/618264170735009792/parameterize-python-tests"
+++
### Introduction

A single test case follows a pattern. Setup the data, invoke the function or method with the arguments and assert the return data or the state changes. A function will have a minimum of one or more test cases for various success and failure cases.

Here is an example implementation of `` wc `` command for a single file that returns `` number of words, lines, and characters `` for an ASCII text file.

    def wc_single(path, words=False, chars=False, lines=False):
        """Implement GNU/Linux `wc` command
           behavior for a single file.
        """
        res = {}
        try:
            with open(path, 'r') as fp:
                # Consider, the file is a small file.
                content = fp.read()

                if words:
                    lines = content.split('\n')
                    res['words'] = sum([1
                                        for line in lines
                                        for word in line.split(' ')
                                        if len(word.strip()) >= 1])

                if chars:
                    res['chars'] = len(content)

                if lines:
                    res['lines'] = len(content.strip().split('\n'))

                return res
        except FileNotFoundError as exc:
            print(f'No such file {path}')
            return res

For the scope of the blog post, consider the implementation as sufficient and not complete. I’m aware; for example, the code would return the number of lines as 1 for the empty file. For simplicity, consider the following six test cases for `` wc_single `` function.

1.   A test case with the file missing.
2.   A test case with a file containing some data, with `` words, chars, lines `` set to `` True. ``
3.   A test case with a file containing some data, with `` words `` alone set to `` True. ``
4.   A test case with a file containing some data, with `` lines `` alone set to `` True. ``
5.   A test case with a file containing some data, with `` chars `` alone set to `` True. ``
6.   A test case with a file containing some data, with `` words, chars, lines `` alone set to `` True. ``

I’m skipping the combinatorics values for two of the three arguments to be `` True `` for simplicity.

### Test Code

The `` file.txt `` content (don’t worry about the comma after `` knows ``)

    Welcome to the new normal.
    No one knows, what is new normal.


    Hang on!

`` wc `` output for `` file.txt ``

    $wc file.txt
    5      14      72 file.txt

Here is the implementation of the six test cases.

    class TestWCSingleWithoutParameterized(unittest.TestCase):
        def test_with_missing_file(self):
            with patch("sys.stdout", new=StringIO()) as output:
                path = 'missing.txt'
                res = wc_single(path)

                assert f'No such file {path}' in output.getvalue().strip()

        def test_for_all_three(self):
            res = wc_single('file.txt', words=True, chars=True, lines=True)

            assert res == {'words': 14, 'lines': 5, 'chars': 72}

        def test_for_words(self):
            res = wc_single('file.txt', words=True)

            assert res == {'words': 14}

        def test_for_chars(self):
            res = wc_single('file.txt', chars=True)

            assert res == {'chars': 72}

        def test_for_lines(self):
            res = wc_single('file.txt', lines=True)

            assert res == {'lines': 5}

        def test_default(self):
            res = wc_single('file.txt')

            assert res == {}


The test case follows the pattern, setup the data, invoke the function with arguments, and asserts the return or printed value. Most of the testing code is a copy-paste code from the previous test case.

### Parameterize

It’s possible to reduce all six methods into a single test method with <a href="https://github.com/wolever/parameterized" target="_blank">parameterized libray</a>. Rather than having six methods, a decorator can inject the _data for tests, expected value_ after the test. Here is how the code after parameterization.

    def get_params():
        """Return a list of parameters for each test case."""
        missing_file = 'missing.txt'
        test_file = 'file.txt'
        return [('missing_file',
                {'path': missing_file},
                f'No such file {missing_file}',
                True),

                ('all_three',
                {'path': test_file, 'words': True, 'lines': True, 'chars': True},
                 {'words': 14, 'lines': 5, 'chars': 72},
                 False),

                ('only_words',
                {'path': test_file, 'words': True},
                {'words': 14},
                False),

                ('only_chars',
                {'path': test_file, 'chars': True},
                {'chars': 72}, False),

                ('only_lines',
                {'path': test_file, 'lines': True},
                {'lines': 5},
                False),

                ('default',
                {'path': test_file},
                {},
                False)
        ]

    class TestWCSingleWithParameterized(unittest.TestCase):
        @parameterized.expand(get_params())
        def test_wc_single(self, _, kwargs, expected, stdout):
            with patch("sys.stdout", new=StringIO()) as output:
                res = wc_single(**kwargs)

                if stdout:
                    assert expected in output.getvalue().strip()
                else:
                    assert expected == res


The pytest runner output

     pytest -s -v wc.py::TestWCSingleWithParameterized
    ============================================================================================================ test session starts =============================================================================================================
    platform darwin -- Python 3.6.9, pytest-5.4.1, py-1.8.1, pluggy-0.13.1 -- /usr/local/Caskroom/miniconda/base/envs/hotbox/bin/python
    cachedir: .pytest_cache
    rootdir: /Users/user/code/snippets
    plugins: kubetest-0.6.4, cov-2.8.1
    plugins: kubetest-0.6.4, cov-2.8.1
    collected 6 items

    wc.py::TestWCSingleWithParameterized::test_wc_single_0_missing_file PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_1_all_three PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_2_only_words PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_3_only_chars PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_4_only_lines PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_5_default PASSED

### What did `` @parameterized.expand `` do?

The decorator collected all the arguments to pass the test method, `` test_wc_single. `` When pytest runner ran the test class, the decorator injected a new function name following default rule, `` <func_name>_<param_number>_<first_argument_to_pass> ``. Then each item in the list returned by `` get_params `` to the test case, `` test_wc_single. ``

### What did `` get_params `` return?

`` get_params `` returns a list(iterable). Each item in the list is a bunch of parameters for _each test case_.

`` ('missing_file', {'path': missing_file}, f'No such file {missing_file}', True) ``

Each item in the list is a tuple containing the parameters for a test case. Let’s take the first tuple as an example.

First item in the tuple is a function suffix used while printing the function name(`` 'missing_file' ``). The second value in the tuple is a dictionary containing the arguments to call the _wc\_single_ function(`` {'path': missing_file} ``). Each test case passes a different number of arguments to the `` wc_single. `` Hence the second item in the first and second tuple has different keys in the dictionary. The third item in the tuple is the _expected_ value to assert after calling the testing function(`` f'No such file {missing_file}' ``). The fourth item in the tuple determines what to assert after the function call, the return value, or stdout(`` True ``).

The principle is decorator will expand and pass each item in the iterable to the test method. Then how to receive the parameter and structure the code is up to the programmer.

### Can I change the function name printed?

Yes, the `` parameterized.expand `` accepts a function as a value to the argument `` name_func ``, which can change each function name.

Here is the custom name function which suffixes the first argument for each test in the function name

    def custom_name_func(testcase_func, param_num, param):
        return f"{testcase_func.__name__}_{parameterized.to_safe_name(param.args[0])}"

    class TestWCSingleWithParameterized(unittest.TestCase):
        @parameterized.expand(get_params(),
                              name_func=custom_name_func)
        def test_wc_single(self, _, kwargs, expected, stdout):
            with patch("sys.stdout", new=StringIO()) as output:
                res = wc_single(**kwargs)

                if stdout:
                    assert expected in output.getvalue().strip()
                else:
                    assert expected == res

Test runner output

    $pytest -s -v wc.py::TestWCSingleWithParameterized
    ...
    wc.py::TestWCSingleWithParameterized::test_wc_single_all_three PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_default PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_missing_file PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_only_chars PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_only_lines PASSED
    wc.py::TestWCSingleWithParameterized::test_wc_single_only_words PASSED

### Is it possible to run a single test after parameterization?

Yes. You should give the full generated name and rather than _actual method name in the code_.

    $pytest -s -v "wc.py::TestWCSingleWithParameterized::test_wc_single_all_three"
    ...
    wc.py::TestWCSingleWithParameterized::test_wc_single_all_three PASSED

Not like these

    $pytest -s -v wc.py::TestWCSingleWithParameterized::test_wc_single
    ...
    =========================================================================================================== no tests ran in 0.24s ============================================================================================================
    ERROR: not found: /Users/user/code/snippets/wc.py::TestWCSingleWithParameterized::test_wc_single
    (no name '/Users/user/code/snippets/wc.py::TestWCSingleWithParameterized::test_wc_single' in any of [<UnitTestCase TestWCSingleWithParameterized>])
    $pytest -s -v "wc.py::TestWCSingleWithParameterized::test_wc_single_*"
    ...
    =========================================================================================================== no tests ran in 0.22s ============================================================================================================
    ERROR: not found: /Users/user/code/snippets/wc.py::TestWCSingleWithParameterized::test_wc_single_*
    (no name '/Users/user/code/snippets/wc.py::TestWCSingleWithParameterized::test_wc_single_*' in any of [<UnitTestCase TestWCSingleWithParameterized>])


### Does test failure give enough information to debug?

Yes, Here is an example of a test failure.

    $pytest -s -v wc.py
    ...
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    self = <wc.TestWCSingleWithParameterized testMethod=test_wc_single_only_lines>, _ = 'only_lines', kwargs = {'lines': True, 'path': 'file.txt'}, expected = {'lines': 6}, stdout = False

        @parameterized.expand(get_params(), name_func=custom_name_func)
        def test_wc_single(self, _, kwargs, expected, stdout):
            with patch("sys.stdout", new=StringIO()) as output:
                res = wc_single(**kwargs)

                if stdout:
                    assert expected in output.getvalue().strip()
                else:
    >               assert expected == res
    E               AssertionError: assert {'lines': 6} == {'lines': 5}
    E                 Differing items:
    E                 {'lines': 6} != {'lines': 5}
    E                 Full diff:
    E                 - {'lines': 5}
    E                 ?           ^
    E                 + {'lines': 6}
    E                 ?           ^

    wc.py:96: AssertionError

### Pytest parametrize

<a href="https://docs.pytest.org/en/latest/parametrize.html" target="_blank">pytest</a> supports parametrizing test functions and not subclass methods of `` unittest.TestCase ``. <a href="https://github.com/wolever/parameterized" target="_blank">parameterize</a> library support all Python testing framework. You can mix and play with test functions, test classes, test methods. And pytest _only supports UK spelling_ `` paramterize `` whereas _parameterize library supports American spelling `` parameterize ``_. Pytest refused to support both the spellings.

<a href="https://www.youtube.com/watch?v=2R1HELARjUk" target="_blank">In recent PyCon 2020, there was a talk about pytest parametrize</a>. It’s crisp and provides a sufficient quick introduction to parametrization.

### Why parameterize tests?

1.   It follows DRY(Do not Repeat Yourself) principle.
2.   Changing and managing the tests are easier.
3.   In a lesser number of lines, you can test the code. At work, for a small sub-module, the unit tests took 660 LoC. After parameterization, tests cover only 440 LoC.

Happy parameterization!

### Important links from the post:

1.   Parameterized - <a href="https://github.com/wolever/parameterized" target="_blank">https://github.com/wolever/parameterized</a>
2.   Pytest Parametrize - <a href="https://docs.pytest.org/en/latest/parametrize.html" target="_blank">https://docs.pytest.org/en/latest/parametrize.html</a>
3.   PyCon 2020 Talk on Pytest Parametrize - <a href="https://www.youtube.com/watch?v=2R1HELARjUk" target="_blank">https://www.youtube.com/watch?v=2R1HELARjUk</a>
4.   Complete code - <a href="https://gitlab.com/snippets/1977169" target="_blank">https://gitlab.com/snippets/1977169</a>

- [Incomplete data is useless - COVID-19 India data](https://kracekumar.com/post/615946721097285632/incomplete-data-is-useless-covid-19-india-data/index.md): Incomplete data is useless - COVID-19 India data

+++
date = "2020-04-20 19:56:04+00:00"
draft = false
tags = ["covid-19", "data analysis"]
title = "Incomplete data is useless - COVID-19 India data"
url = "/post/615946721097285632/incomplete-data-is-useless-covid-19-india-data"
+++
The data is a representation of reality. When a value is missing in the piece of data, it makes it less useful and reliable. Every day, articles, a news report about COVID-19 discuss the new cases, recovered cases, and deceased cases. This information gives you a sense of hope or reality or confusion.

Regarding COVID-19, everyone believes or accepts specific details as fact like mortality rate is `` 2 to 3 percent ``, over the age of fifty, the chance of death is `` 30 to 50 percent ``. These are established based on previously affected places, and some details come out of the simulation. The mortality rate, deceased age distribution, patient age distribution, mode of spread differs from region to country. With accurate and complete data, one can understand the situation, make the decision, and update the facts.

<a href="https://api.covid19india.org/" target="_blank">Covid19India</a> provides an API to details of all the COVID-19 cases. The dataset for the entire post comes from an API response on 18th April 2020. The APIs endpoints for the analysis are <code><a href="https://api.covid19india.org/raw_data.json" target="_blank">https://api.covid19india.org/raw_data.json</a></code>, and <code><a href="https://api.covid19india.org/deaths_recoveries.json" target="_blank">https://api.covid19india.org/deaths_recoveries.json</a></code>.

When one moves away cumulative data like total cases to specific data like age, gender, the details are largely missing and makes it difficult to comprehend what’s going in the state. For all patients in India, only `` 11% `` of age brackets and `` 19.2% `` of gender details are available.

<figure class="tmblr-full" data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details.png" data-orig-width="700"><img data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details.png" data-orig-width="700" src="https://66.media.tumblr.com/c55c0b6c9281913ca08445b2f41f2448/5f9aca6138e8031e-36/s540x810/1867eb78cd3697d8f4a20e46913bb73582c9ae3d.png"/></figure>

Analyzing missing data (empty value = “) state-wise reveals how each state reveals data. The below image is a comparison of missing data for the state of Karnataka and Maharashtra.

<figure class="tmblr-full" data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details_karnataka.png" data-orig-width="700"><img data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details_karnataka.png" data-orig-width="700" src="https://66.media.tumblr.com/e42d829d66c17e0de67265fd39a8564b/5f9aca6138e8031e-5e/s540x810/abd55514ff56a96d54b7d95ad3283023b1049845.png"/></figure>

<figure class="tmblr-full" data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details_maharashtra.png" data-orig-width="700"><img data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_patients_field_details_maharashtra.png" data-orig-width="700" src="https://66.media.tumblr.com/865422ab789890d6c97b6284f9a8f214/5f9aca6138e8031e-f7/s540x810/94901b5ce08528b9c8066ecd4e974332e5344926.png"/></figure>

Looking further into each case, it’s clear, <a href="https://twitter.com/ANI/status/1251399310270279683" target="_blank">Karnataka officials</a> release date at the individual level such as age bracket, gender, other details in a tabular format(not machine-readable) compared to the <a href="https://twitter.com/rajeshtope11/status/1251194830962724865?s=19" target="_blank">State of Maharashtra</a>. <a href="https://arogya.maharashtra.gov.in/pdf/epressnote31.pdf" target="_blank">Maharashtra</a> releases only cumulative data like the total number of new cases, the total number of recovered patients. Each state follows its format(not so useful). Next to Karnataka, Andhra Pradesh data contains more than `` 50% `` of age bracket and gender. The rest of the states except Kerala, to a certain extent, all have close to `` 90% `` of _missing data_ for gender and age bracket.

With missing data, we can’t identify which age group is dying in Maharashtra, which age group is most affected, does deceased age group vary across all states, is there a state where a considerable amount of young people are dying?

### Karnataka Case

<figure class="tmblr-full" data-orig-height="450" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/histogram_of_all_cases_in_karnataka.png" data-orig-width="700"><img data-orig-height="450" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/histogram_of_all_cases_in_karnataka.png" data-orig-width="700" src="https://66.media.tumblr.com/df32ef17c874fefd3d37aae55c6fd863/5f9aca6138e8031e-d0/s540x810/dcb3ac70b4b1f724b1ac358e869ef905abf82b3c.png"/></figure>

If we divide the age bracket in the range of ten like `` 40 - 50 ``, all the age groups are more or less equally affected. There is no significant variation. You can also use arbitrary age groups like `` 0 to 45, 45 to 60, 60 to 75, 75+ `` as mentioned in the <a href="https://twitter.com/EconomicTimes/status/1251484936889970689" target="_blank">tweet</a>.

<figure class="tmblr-full" data-orig-height="500" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/age_group_of_all_cases_in_karnataka.png" data-orig-width="700"><img data-orig-height="500" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/age_group_of_all_cases_in_karnataka.png" data-orig-width="700" src="https://66.media.tumblr.com/26a02d34e5dfd5cca055d629fa6fec52/5f9aca6138e8031e-f5/s540x810/9b733a16955aa67d5c674d19e4361324a652700b.png"/></figure>

The `` raw_data.json `` API provides the status of the patients. The patient’s status change numbers don’t be match compared to the primary web application(covid19india.org). My guess is because of API update frequency.

<figure class="tmblr-full" data-orig-height="450" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/histogram_of_recovered_patients_karnataka.png" data-orig-width="700"><img data-orig-height="450" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/histogram_of_recovered_patients_karnataka.png" data-orig-width="700" src="https://66.media.tumblr.com/41e4525c4cb32a23e29a477e0202d56c/5f9aca6138e8031e-68/s540x810/e80b7f619ccc06778d9b232def28504e9f99f6a3.png"/></figure>

The patients in the age group 0-10 take more time to recover compared to the rest of the age group. The age group 10-20 less time to recover compared to the rest of the group. The two issues here. First, the dataset is small, only 60 cases. Second, the accuracy of dates makes a considerable difference; i.e `` dateannounced `` and `` statuschangedata `` in the API is crucial. If the data is available for the top three affected states like Maharashtra, Delhi, Gujarat, it would reveal which age group and gender are recovering fast and deceasing, in how many days the acute patients die.

### Deceased Case

From the `` deaths_recoveries.json `` API, the age bracket is available for `` 39.6% `` of deceased cases. In Karnataka for

<figure class="tmblr-full" data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_vs_available_of_deceased.png" data-orig-width="700"><img data-orig-height="400" data-orig-src="http://anthology.kracekumar.com/images/18th_april_covid19_analysis/missing_vs_available_of_deceased.png" data-orig-width="700" src="https://66.media.tumblr.com/010d390803f6e1be9d48f3361afffdca/5f9aca6138e8031e-5c/s540x810/e2f1a107c81e45c4fbe53c53109da0e2ad22c814.png"/></figure>

In the available data, the most number of deaths are in the age-group, `` 40-50 (44 deaths), and 20-30 (25 deaths) ``. To get a better picture, one needs to compare with the population distributions as well. Without complete details, it’s challenging to say the age group `` 20-30 `` mortality rate is `` 0.1% ``.

The mere number of deceased patients doesn’t represent anything close to reality. Out of `` 13 deaths `` in Karnataka, age bracket and gender are available for 10 cases. All deceased cases are in the age bracket of `` 65 to 80 ``, two females and eight males.

The small data suggests all corona related deaths happen with seven days of identification. Is this true for all states?

### Conclusion

I’m aware; volunteers maintain the API. Their effort requires a special mention, and by analyzing the two sets of APIs, it’s clear how hard it is to mark patients’ status change. When the Government doesn’t release the unique patient id, it’s confusing, and local volunteer intuition and group knowledge takes over.

Every data points help us to know more about the pandemic. All the state governments need to release the complete data in a useable format. Remember, age and gender are necessary information, with more details like the existing respiratory condition, hospital allocated, the model can help in prioritizing the resources later. There is no replacement for testing and testing early. Without clean data, it’s impossible to track the epidemic and associated rampage. We haven’t seen the cases for re-infection yet. Several other factors contribute to one’s survival, like a place of residence, access to insurance, socioeconomic status. We haven’t moved further from numbers, the details like how crucial is a patient when identified, how each patient is recovering are never released in public.

By furnishing the incomplete data, the Government denies our choice of making an informed decision, and there is no independent verification for the claims on the pattern of the spread. In the prevailing conditions, the only way to understand the scenario is both qualitative and quantity (data) stories.

Code: <a href="https://github.com/kracekumar/covid19_india/blob/master/Data%20Analysis%2018th%20April.ipynb" target="_blank">Age Analysis</a>, <a href="https://github.com/kracekumar/covid19_india/blob/master/Death%20Recoveries.ipynb" target="_blank">Death Recoveries</a>

- [“Don’t touch your face” - Neural Network will warn you](https://kracekumar.com/post/612817685627191296/dont-touch-your-face-neural-network-will-warn/index.md): “Don’t touch your face” - Neural Network will warn you
+++
date = "2020-03-17 07:01:22+00:00"
draft = false
tags = ["python", "yolov3", "covid-19", "pytorch"]
title = "“Don’t touch your face” - Neural Network will warn you"
url = "/post/612817685627191296/dont-touch-your-face-neural-network-will-warn"
+++
A few days back, Keras creator, <a href="https://twitter.com/fchollet/status/1234883862385156098?s=21" target="_blank">Francois Chollet tweeted</a>

> A Keras/TF.js challenge: make a model that processes a webcam feed and detects when someone touches their face (triggering a loud beep).“ The very next day, I tried the Keras yolov3 model available in the Github. It was trained on the `` coco80 `` dataset and could detect person but not the face touch.

<figure class="tmblr-full" data-orig-height="941" data-orig-src="https://github.com/kracekumar/facetouch/blob/master/demo_images/output/malcolm-x.jpeg?raw=true" data-orig-width="1200"><img data-orig-height="941" data-orig-src="https://github.com/kracekumar/facetouch/blob/master/demo_images/output/malcolm-x.jpeg?raw=true" data-orig-width="1200" src="https://66.media.tumblr.com/e3d87b5c06a42b08bc0b4f4d156e18d6/2db36526cdb5e07c-2c/s540x810/a8c55db5ee8071e3165cfa37377d90616d9a3031.jpg" width="500px"/></figure>

### V1 Training

The original Keras implementation lacked documentation to train fromscratch or transfer learning. While looking for alternative implementation, I came across the PyTorch <a href="https://github.com/ultralytics/yolov3/wiki/Example:-Transfer-Learning" target="_blank">implemetation with complete documentation</a>.

I wrote a <a href="https://github.com/kracekumar/facetouch/blob/master/face_touch_download.py" target="_blank">simple python program</a> to download images based on keyword.

> python face\_touch\_download.py –keyword "chin leaning” –limit 100 –dest chin\_leaning

The script downloads 100 images and store the images in the directory. Next, I annotated the images using <a href="https://github.com/tzutalin/labelImg" target="_blank">LabelImg</a> in Yolo format. With close to 80 images, I trained the network on a single class `` facetouch `` for 100 epochs. The best of the weights couldn’t identify any face touch on four images. Now, I wasn’t clear should I train all classes(80 +1) or need more images with annotation.

### V2 Training

Now, I downloaded two more sets of images using two different queries `` face touch `` and `` black face touch. ``After annotation, the dataset was close to 250 images. Then I decided to train along with a few other original classes of coco labels. I selected the first nine classes and face touch. Now, the annotated `` facetouch `` images need a change in class. For example, in V1, there was only one class(0), now those annotations are misleading because some other class takes up the number, now 0 means person and9 means facetouch. Also, I picked random 500 images from the original coco dataset for training. After remapping the existing annotation and creating a new annotation, I trained the network for 200 epochs.

After training the network, I again ran the inference on four sample images - Dhanush rolling up mustache (1),another person rolling up mustache (2), Proust sitting in a chair leaning his face on the chin(3), andVirginia wolf leaning her face on the chin(4). The network labeled the person rolling up mustache, second image as face touch.First Victory! At the same time, the network was not predicting other images for classes person and face touch.

<figure class="tmblr-full" data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/blob/master/four_images.png?raw=true" data-orig-width="794"><img data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/blob/master/four_images.png?raw=true" data-orig-width="794" src="https://66.media.tumblr.com/af5744c2f5e8c678e624b2bd5935be35/2db36526cdb5e07c-92/s540x810/cc4288fc115b1b4f8435892d7db17609f6413f6f.png" width="500px"/></figure>

### V3 Training

After re-reading the same documentation again, it was clear to me, the network needs more training images, and _needn’t be trained_along with the rest of the classes. Now I deleted all the dataset v2 and annotations and came up with a list of keywords - `` face touch, ```` black face touch, `` `` chin leaning. `` After annotation of close to 350 images, running the training,this time network was able to label two images - Another person rolling up mustache (2),Proust is sitting in a chair, leaning his face on the chin(3).

By now, it was clear to me that yolov3 can detect chin leaning andmustache as long as the rectangular box is of considerable size and next step.

### Final Training

Now the list of keywords grew to ten. The new ones apart from previous ones are `` finger touching face, ```` ear touch, `` `` finger touching forehead, `` `` headache, `` `` human thinking face, `` `` nose touching, ```` rubbing eyes. `` The <a href="https://drive.google.com/drive/u/0/folders/17-rLAQ9GLda7M5mvDiitINqSLBUj_wbm" target="_blank">final dataset</a>was close to 1000 images. After training the network for 400 epochs on the dataset, the network was able to identify face touch on videos, images, and webcam feed.

Classification (bounding box) on 17 images

<figure class="tmblr-full" data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/raw/master/test_images.png" data-orig-width="790"><img data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/raw/master/test_images.png" data-orig-width="790" src="https://66.media.tumblr.com/b09d2d542e8c85ec510ba269bf79f8d8/2db36526cdb5e07c-0e/s540x810/b4db4f13648a2cc2d81f2f1e1932ce6b7d890402.png" width="500px"/></figure>

Out of 14 face touching images, the network able to box 11 images. The network couldn’tbox on two Murakami images.

Here is a youtube link to identifying facetouch on Salvoj Zizek clip - <a href="https://www.youtube.com/embed/n44WsmRiAvY." target="_blank">https://www.youtube.com/embed/n44WsmRiAvY.</a>

<iframe allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/n44WsmRiAvY" width="560"></iframe>

### Webcam

Now that, the network predicts anyone touching the face or trying to touch the face, you can run the program in the background, it warns you with the sound, `` Don't touch the face. ``

After installing the <a href="https://github.com/kracekumar/facetouch#installation-mac" target="_blank">requirements</a>, run the program,

> python detect.py –cfg cfg/yolov3-1cls.cfg –weights best\_dataset\_v5.pt –source 0

You’re all set to hear the sound when you try to touch the face.

The <a href="https://github.com/kracekumar/facetouch" target="_blank">repo</a> contains a link to the trained model, dataset, and other instructions. Let me know your comments and issues.

I’m yet to understand how yolo works, so far, only read yolo v1 paper.The precision for the model is around ~0.6. I think it’s possible to improve the precision of the model. Somewhere while transferring the image from the server, the `` results.png `` went missing.

### Few Issues

* If you run the `` detect.py `` with webcam source, the rendering sound is blocking code, you may notice a lag in the webcam feed.


* Sometimes, the model predicts the mobile phone in front of the image as a face touch.


* As shown in the images, if the face touch region is tiny like a single index finger over the chin, it doesn’t predict, look at the below image titled 6.



<figure class="tmblr-full" data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/raw/master/test_images.png" data-orig-width="790"><img data-orig-height="715" data-orig-src="https://github.com/kracekumar/facetouch/raw/master/test_images.png" data-orig-width="790" src="https://66.media.tumblr.com/b09d2d542e8c85ec510ba269bf79f8d8/2db36526cdb5e07c-0e/s540x810/b4db4f13648a2cc2d81f2f1e1932ce6b7d890402.png" width="500px"/></figure>

* The quality of the images was slightly issued because of the watermarks and focus angle. If the end deployment targets at the webcam, then it’s better to train image with front face focus. There was no clean way to ensure or scrap the high-quality images. Do you know how to solve the problem or even this assumption is true?. Of course, good model should detect face touch from all sides and angles.


* I haven’t tried augmenting the dataset and training it.



The network may not run on edge devices, you may want to train using yolov3-tiny weights.

*   Yolo site: <a href="https://pjreddie.com/darknet/yolo/" target="_blank">https://pjreddie.com/darknet/yolo/</a>
*   PyTorch yolov3 implememtation: <a href="https://github.com/ultralytics/yolov3" target="_blank">https://github.com/ultralytics/yolov3</a>
*   Facetouch Repo: <a href="https://github.com/kracekumar/facetouch" target="_blank">https://github.com/kracekumar/facetouch</a>

While writing the code and debugging, I must have touched my face more than 100 times. It’s part of evolution and habit, can’t die in few days. Does wearing a mask with spikes help then?

- [1000 more whitelist sites in Kashmir, yet no Internet](https://kracekumar.com/post/190951734050/1000-more-whitelist-sites-in-kashmir-yet-no/index.md): 1000 more whitelist sites in Kashmir, yet no Internet
+++
date = "2020-02-21 20:13:15+00:00"
draft = false
tags = ["kashmir", "censor"]
title = "1000 more whitelist sites in Kashmir, yet no Internet"
url = "/post/190951734050/1000-more-whitelist-sites-in-kashmir-yet-no"
+++
Kashmir is under lockdown for more than

``` bash
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
200,200,200,200,200,200,200,200,200,200,
```


200 days.

Last Friday(14th Feb, 2020), Government released a <a href="http://www.jkhome.nic.in/13TSTSof2020.pdf" target="_blank">document</a>with list of allowed whitelist sites at 2G speed. The text file with URLs extracted from the PDF - <a href="https://gitlab.com/snippets/1943725" target="_blank">https://gitlab.com/snippets/1943725</a>

In case you’re not interested in tech details, analysis, or short of time, look at the `` summary `` section at the bottom.

### DNS Query

The browser converts a domain name or URL into an IP address. This is step is called DNS querying.

I wrote the <a href="https://gitlab.com/snippets/1941923" target="_blank">script</a> to emulate the DNS request of the browser.

Out of 1464 entries in the PDF, only `` 90.9% (1331) `` of the URLs receive an answer(IP Address or sub-domain) for the DNS query.`` 9.1% (133) `` are either not domain names like (Airtel TV) and non-existent entities like <a href="http://www.unacademy.com" target="_blank">www.unacademy.com</a>.

<figure class="tmblr-full" data-orig-height="450" data-orig-src="https://i.imgur.com/1KCsXdJ.png" data-orig-width="700"><img data-orig-height="450" data-orig-src="https://i.imgur.com/1KCsXdJ.png" data-orig-width="700" src="https://66.media.tumblr.com/0a12232a67aed68c844a0a3eda426cdf/03ff75705f2ef1b0-2e/s540x810/d13c55c079889710bd8f521714ca7e3ec678d49b.png" width="600"/></figure>

Out of 1331 DNS answers, except one URL all other DNS query answer contains at least one IP address. That means no further DNS query and all the webserver is reachable.

<figure class="tmblr-full" data-orig-height="934" data-orig-src="https://i.imgur.com/0xshUU8.png" data-orig-width="2448"><img data-orig-height="934" data-orig-src="https://i.imgur.com/0xshUU8.png" data-orig-width="2448" src="https://66.media.tumblr.com/1747cce97ec7e45cd9400d89af3c9826/03ff75705f2ef1b0-ab/s540x810/0bbcdf99dbb38eb74764ba9fee5bad08475ea645.png" width="600"/></figure>

Out of `` 133 `` failed DNS queries, `` 113 `` are non-existent URLs/URIs(for simplicity you can assume non-existent hostname/sub-domain)like <code><a href="http://www.unacademy.com" target="_blank">www.unacademy.com</a>(the application is available at unacademy.com), <a href="http://www.testzone.smartkeeda.com" target="_blank">http://www.testzone.smartkeeda.com</a>, <a href="http://164.100.44.112/ficn/" target="_blank">http://164.100.44.112/ficn/</a></code> and 22 entries are invalid like `` Gradeup (App), 122.182.251.118 precision biometrics ``.

<figure class="tmblr-full" data-orig-height="898" data-orig-src="https://i.imgur.com/NbCvFyM.png" data-orig-width="1506"><img data-orig-height="898" data-orig-src="https://i.imgur.com/NbCvFyM.png" data-orig-width="1506" src="https://66.media.tumblr.com/5d0534e41f6fb2c03b83fb47b5292cdc/03ff75705f2ef1b0-3b/s540x810/9a97a0712ea55d47df6985e0ddb017bad8a6aebb.png" width="600"/></figure>

<code><a href="http://www.unacademy.com" target="_blank">www.unacademy.com</a></code> doesn’t have ‘A’ address associated, and browser or resolver will fail, and <code><a href="http://unacademy.com" target="_blank">http://unacademy.com</a></code> only has the A address.

<figure class="tmblr-full" data-orig-height="182" data-orig-src="https://i.imgur.com/Xak8tt7.png" data-orig-width="2730"><img data-orig-height="182" data-orig-src="https://i.imgur.com/Xak8tt7.png" data-orig-width="2730" src="https://66.media.tumblr.com/626b65c4cc13cb25e697d052cc7a0452/03ff75705f2ef1b0-e2/s540x810/8480e846807ffbb42be21009609cbd7f91f69d9e.png" width="600"/></figure>

<figure class="tmblr-full" data-orig-height="1672" data-orig-src="https://i.imgur.com/rVUYjmV.png" data-orig-width="2730"><img data-orig-height="1672" data-orig-src="https://i.imgur.com/rVUYjmV.png" data-orig-width="2730" src="https://66.media.tumblr.com/6d7ca79a1c1add58afc60ce5ae029586/03ff75705f2ef1b0-03/s540x810/6806db6b9294f9818dfe104c05949364b6f6da70.png" width="600"/></figure>

`` 11 `` entries in the list are IP addresses without hostname like <code><a href="http://164.100.44.112/ficn/" target="_blank">http://164.100.44.112/ficn/</a></code>.

Four out of eleven IP addresses refuse to accept the `` HTTP GET `` request. Two of the IP address doesn’t host any page, and two of the IP address refuse connection due to SSL Error, `` SSLCertVerificationError("hostname '59.163.223.219' doesn't match either of 'tin.tin.nsdl.com', 'onlineservices.tin.egov-nsdl.com" ``.

All of the explicit IPs in the list are some sort of government service.

### Whitelist group

Some of the entries in the PDF are from the same domain but with different sub-domain like `` example.com, api.example.com. ``

<figure class="tmblr-full" data-orig-height="886" data-orig-src="https://i.imgur.com/pkoASK4.png" data-orig-width="2214"><img data-orig-height="886" data-orig-src="https://i.imgur.com/pkoASK4.png" data-orig-width="2214" src="https://66.media.tumblr.com/48fc1db33861302e3b2a5183aa0d26d2/03ff75705f2ef1b0-68/s540x810/87559bdf9845e3f54a5dcf6b089e34e91df8bc6c.png" width="600"/></figure>

Analysis of the domain name association reveals, there are 23 URLs of trendmicro domain - a cybersecurity company,19 URLs of rbi(reserve bank of India), 10 URLs of belonging to Google(Alphabet) product yet google.co.in is not on the list.

<figure class="tmblr-full" data-orig-height="1360" data-orig-src="https://i.imgur.com/tKgery3.png" data-orig-width="2056"><img data-orig-height="1360" data-orig-src="https://i.imgur.com/tKgery3.png" data-orig-width="2056" src="https://66.media.tumblr.com/2d673c39c7e23be6bd6a187f9d52abc4/03ff75705f2ef1b0-56/s540x810/f3c7e64bcf450a56db9035be1ec1322a61aa20b0.png" width="600"/></figure>

### Spellings

In the last two PDFs, haj committee domain was misspelled. The misspelled still exists along with the correct spelling and irrelevant entry Haj\_Committee.gov.in.

The new misspelled domains are <a href="http://www.scientificsmerican.com" target="_blank">www.scientificsmerican.com</a>(scientific American?), flixcart.com(Flipkart?).

### Product Names

It’s unclear how the ISP will interpret the URL, `` http://Netflix. `` Netflix loads resources from the following places`` netflix.com, codex.nflxext.com, assets.nflxext.com, nflximg.com ``.Does the ISP download the netflix.com, figure out all the network calls, and whitelist all the network resources?Or they allow only the Netflix domain?

JIO chat mentioned in previous PDFs is missing in the current list.

A classification of URLs from the PDF

<figure class="tmblr-full" data-orig-height="450" data-orig-src="https://i.imgur.com/pLTdFHm.png" data-orig-width="700"><img data-orig-height="450" data-orig-src="https://i.imgur.com/pLTdFHm.png" data-orig-width="700" src="https://66.media.tumblr.com/26fa80710ab8f5354caecde27cf07304/03ff75705f2ef1b0-d9/s540x810/a0e994f7f72c99f43005ae82ce3e1c16f0ebb8db.png" width="600"/></figure>

### Site Reachability

Say if all the URLs DNS requests pass through(even though it’s not), how many URLs will return a response to the browser? The source code available in <a href="https://gitlab.com/snippets/1943713" target="_blank">GitLab</a>

Only `` 74.6% `` entries are reachable - the browser receives a proper response (HTTP 200 response for `` GET `` request).`` 20 % `` of the websites fail to respond to HTTP GET requests due to various errors like SSL issues, non-existent siteor technical misconfiguration, or unable to interpret the URL like Airtel TV.

`` 6% `` URLs may or may not work since some of the URLs doesn’t exist(<code><a href="https://www.khanacademy.org/math/in" target="_blank">https://www.khanacademy.org/math/in</a></code>)or accept GET request(<code><a href="https://blazenet.citruspay.com/" target="_blank">https://blazenet.citruspay.com/</a></code>), but the domain or sub-domain or IP address exists, or issues redirect or don’t accept GET request.

<figure class="tmblr-full" data-orig-height="450" data-orig-src="https://i.imgur.com/ysbmYOD.png" data-orig-width="700"><img data-orig-height="450" data-orig-src="https://i.imgur.com/ysbmYOD.png" data-orig-width="700" src="https://66.media.tumblr.com/6383ecd4dfb79b850581ef86bf2b80a9/03ff75705f2ef1b0-7d/s540x810/62630b2d5b2b9cb4a906b03ea25319c345bf5d67.png" width="600"/></figure>

The domain <a href="http://aiman.co" target="_blank">http://aiman.co</a> redirects to tumblr.com.

The list doesn’t contain domain `` safexpay `` but the load balancer(going by the convention of naming), `` safexpay-341596490.ap-south-1.elb.amazonaws.com `` is in the list.

Do human agents verify these URLs before adding to the list?

### Missing static content hosts

The Websites host JavaScripts, fonts, StyleSheet, static assets on third party services for fast load time. All cloud services provide a similar service. The important CDNs to load are present in the whitelist like`` googleapis.com, cdnjs.cloudflare.com, gstatic.com, fbcdn.net (facebook CDN), ``

Other popular services are missing like `` s3.amazonaws.com, nflxext.com, akamai ``.

All the problems mentioned in the <a href="https://kracekumar.com/post/190341665270/153-sites-allowed-in-kashmir-but-no-internet" target="_blank">post</a>, to load static assets and in-general continue.

### 2G speed

I picked up random sites as a sample set(less than 100 websites - worth analysisng entire list); on average, each website makes 68 calls to load the home page, and an average size is 2.5 MB.These 68 network calls are from the mixed bag - same host, sub-domains, third party sites for JS, CSS, and images, etc.

Theoretical 2G speed is 40 kilobits per second. 2.5 G speed is 384kilobitss per second. Even when the browser can load all the resources in parallel, it may take exponential time and not linear time while deducing from higher speed to lower speed.

The browser opens only six connections to the host. If the server timeouts, the browser doesn’t even retry.

In the chrome developer tool, there is no option to test a site in 2G. Allowed speed throttling is 3G, 4G.

In 20 Mbps speed, the browser takes 5 seconds to load a web page(consider all DNS queries pass through) of size 1.9 MB and, in a slow 3G connection, takes 13.79s in Chrome.

The real speed will be worse in 2G.

The chrome developer tools don’t provide an option for simulating 2G speed. It is clear, no one test web sites in 2G speed. Airtel plans to shut down 2G service by mid of 2020 and reliance already shut down 2G service.

The process of whitelisting websites raises pertinent questions.

*   who chooses these websites?
*   what is the methodology to choose the websites?
*   what is the rationale of choosing 2G speed and banning VPN usage?
*   why even whitelist?
*   more

### Summary

1.   DNS query resolves a domain name into an IP address. Out of 1464 entries in the PDF, only 90.9% (1331) of the URLs receive an answer(IP Address or sub-domain) for the DNS query.9.1% (133) are either not domain name like (Airtel TV) and non-existent entities like <a href="http://www.unacademy.com" target="_blank">www.unacademy.com</a>.
2.   Out of 1330, except one URL all other DNS query answer contains at least one IP address.
3.   Out of 133 failed DNS queries, 113 are non-existent URLs/URIs(for simplicity you can assume non-existent hostname/sub-domain)like <code><a href="http://www.unacademy.com" target="_blank">www.unacademy.com</a>(the application is available at unacademy.com), <a href="http://www.testzone.smartkeeda.com" target="_blank">http://www.testzone.smartkeeda.com</a>, <a href="http://164.100.44.112/ficn/" target="_blank">http://164.100.44.112/ficn/</a></code> and 22 entries are invalid like `` Gradeup (App), 122.182.251.118 precision biometrics ``.
4.   11 entries in the list are IP addresses without hostname like <code><a href="http://164.100.44.112/ficn/" target="_blank">http://164.100.44.112/ficn/</a></code>. Four out of eleven IP Addresses refuse to accept HTTP GET requests. Two of the IP address doesn’t host any page, and two of the IP address refuse connection due to SSL Error, \`SSLCertVerificationError(“hostname '59.163.223.219’ doesn’t match either of 'tin.tin.nsdl.com,’ 'onlineservices.tin.egov-nsdl.com”.
5.   Three entries are just hard to match keywords, unlike Airtel TV - umbrella, lancope, flexnetoperations.
6.   In the last two PDF haj committee domain was misspelled. The misspelled still exists along with the correct spelling and irrelevant entry Haj\_Committee.gov.in.
7.   Scientific American and Flipkart are misspelled - <a href="http://www.scientificsmerican.com" target="_blank">www.scientificsmerican.com</a>, flixcart.com.
8.   There are 21 URLs of trendmicro domain,19 URLs of rbi,10 URLs of belonging to Google(Alphabet) product, yet google.co.in is not on the list.
9.   74.6% entries are reachable - the browser receives proper response (HTTP 200 response for `` GET `` request).20% of the website fails to respond to HTTP GET requests due to various errors like SSL issues, non-existent sites. 6% URLs may or may not work since some of the URLs don’t exist, but the domainor sub-domain or IP address exist, or issues redirect or don’t accept GET request.
10.   Among the sample of URLs, on average, each website makes 68 calls to load the home page and an average size of 2.5 MB.
11.   Some of the CDNs to load JavaScripts, fonts, StyleSheet, static assets are present in the whitelist like`` googleapis.com, cdnjs.cloudflare.com, gstatic.com, fbcdn.net (facebook CDN), ``Other popular services are missing like `` s3.amazonaws.com, nflxext.com, akamai ``.
12.   The list doesn’t contain domain `` safexpay `` but the load balancer(going by the convention of naming),`` safexpay-341596490.ap-south-1.elb.amazonaws.com `` is in the list.
13.   All the problems mentioned in the post, <a href="https://kracekumar.com/post/190341665270/153-sites-allowed-in-kashmir-but-no-internet" target="_blank">https://kracekumar.com/post/190341665270/153-sites-allowed-in-kashmir-but-no-internet</a>continues - whitelist sites can’t provide complete access to the site and prevents the wholesome experience of the internet.
14.   2G speed is 40 kilobits per second. 2.5 G speed is 384 kilobits per second. In the chrome developer tool, there is no option to test a site in 2G. Allowed speed throttlings are 3G, 4G. In a slow 3G connection, the chrome browser takes 13.79s to load a 1.9 MB size webpage. It is clear, no one test sites in 2G speed. Airtel plans to shut down 2G service by mid of 2020 and reliance already shut down 2G service.

### Conclusion

> Internet (noun): an electronic communications network that connects computer networks and organizational computer facilities around the world

Overall, the internet cannot work by allowing only whitelist sites.Underhood, the browser makes calls to a plethora of locations mentioned by thedevelopers and cannot function entirely and unusable from the beginning to the end.

Credits:

*   HashFyre for converting the PDF to text file using OCR.
*   Kingsly for pointing out DNS specific analysis - <a href="https://twitter.com/kracetheking/status/1229155303062745088?s=20" target="_blank">https://twitter.com/kracetheking/status/1229155303062745088?s=20</a>

- [Capture all browser HTTP[s] calls to load a web page](https://kracekumar.com/post/190478935610/capture-all-browser-https-calls-to-load-a-web/index.md): Capture all browser HTTP[s] calls to load a web page

+++
date = "2020-01-26 17:31:40+00:00"
draft = false
tags = ["python", "proxy", "HTTP"]
title = "Capture all browser HTTP[s] calls to load a web page"
url = "/post/190478935610/capture-all-browser-https-calls-to-load-a-web"
+++
How does one find out what network calls, browser requests to load web pages?

The simple method - download the HTML page, parse the page, find out all the network calls using web parsers like beautifulsoup.

The shortcoming in the method, what about the network calls made by your browser before requesting the web page? For example, firefox makes a call to `` ocsp.digicert.com `` to obtain revocation status on digital certificates. The protocol is <a href="https://en.wikipedia.org/wiki/Online_Certificate_Status_Protocol" target="_blank">Online Certificate Status Protocol</a>.

The network tab in the browser dev tool doesn’t display the network call to ocsp.digicert.com.

One of the ways to find out all the requests originating from the machine is by using a proxy.

### Proxy

<a href="https://mitmproxy.readthedocs.io/en/v2.0.2/" target="_blank">MITMProxy</a> is a python interactive man-in-the-middle proxy for HTTP and HTTPS calls. Installation instructions <a href="https://mitmproxy.readthedocs.io/en/v2.0.2/install.html" target="_blank">here</a>.

After installing the proxy, you can run the command `` mitmweb `` or `` mitmdump ``. Both will run the proxy in the localhost in the port 8080, with `` mitmweb ``, you get intuitive web UI to browse the request/response cycle and order of the web requests.

<figure class="tmblr-full" data-orig-height="1624" data-orig-src="https://i.imgur.com/clvrOHz.png" data-orig-width="2852"><img data-orig-height="1624" data-orig-src="https://i.imgur.com/clvrOHz.png" data-orig-width="2852" src="https://66.media.tumblr.com/93077115fcf6344fefdfa8381a37b0e7/0e08bfea412a18df-e1/s540x810/f620e229b6e5e8c1e8f2839a15f6665a5a5f4ecf.png" width="600"/></figure>

As the name indicate, “man in the middle,” the proxy gives the ability to modify the request and response. To do so, a simple python script with a custom function can do the trick. The script accepts two python functions, `` request `` and `` response. `` After every request, `` request `` and after every response, `` response `` method will be called. There are other supported concepts like addons.

To extract the details of the request and response. Here is a small script, <a href="https://gitlab.com/snippets/1933443." target="_blank">https://gitlab.com/snippets/1933443.</a>

<figure class="tmblr-full" data-orig-height="1644" data-orig-src="https://i.imgur.com/u4Tlvv5.png" data-orig-width="2868"><img data-orig-height="1644" data-orig-src="https://i.imgur.com/u4Tlvv5.png" data-orig-width="2868" src="https://66.media.tumblr.com/aa5b111e46116ccf93f16aa9f8236150/0e08bfea412a18df-78/s540x810/df18e4a309b2774c221318e3c96fc52b1d051181.png" width="600"/></figure>

After receiving a successful response, the MITM invokes the function `` response ``; the function collects the details and dumps the details to the JSON file. `` uuid4 `` will ensure a unique file name for every function call.

Even though MITM provides an option to dump the response, it’s in binary format and suited for data analysis.Next is to simulate the browser request.

### Selenium

Selenium is one of the tools used by web-developers and testers for an end to end web testing. Selenium helps developers to test the code on the browser and assert the HTML elements on the same page.

To get selenium Python web driver working, one needs to install Java first, followed by <a href="https://github.com/mozilla/geckodriver/releases" target="_blank">geckodriver</a>, and finally Python selenium driver(`` pip install selenium ``). Don’t forget to place geckodriver in `` $PATH. ``

Code: <a href="https://gitlab.com/snippets/1933454" target="_blank">https://gitlab.com/snippets/1933454</a>

<figure class="tmblr-full" data-orig-height="990" data-orig-src="https://i.imgur.com/uw9tmKg.png" data-orig-width="1956"><img data-orig-height="990" data-orig-src="https://i.imgur.com/uw9tmKg.png" data-orig-width="1956" src="https://66.media.tumblr.com/4cd6a49bd2df0e85cce3133721970de8/0e08bfea412a18df-be/s540x810/49105b664a6f0b573f55e72af1ab9ecec04878a0.png" width="600"/></figure>

Run the MITM in a terminal, then selenium another terminal, the output directory must be filled with JSON files.

Here is how the directory looks

<figure class="tmblr-full" data-orig-height="716" data-orig-src="https://i.imgur.com/3Jp0FvI.png" data-orig-width="2702"><img data-orig-height="716" data-orig-src="https://i.imgur.com/3Jp0FvI.png" data-orig-width="2702" src="https://66.media.tumblr.com/aea3bc56c0b3ece1455bde9911904e97/0e08bfea412a18df-48/s540x810/7aa463e4c9144e30494f66f5635c87ed0dd6b57a.png" width="600"/></figure>

The sample JSON files output

<figure class="tmblr-full" data-orig-height="286" data-orig-src="https://i.imgur.com/i3A62F9.png" data-orig-width="2852"><img data-orig-height="286" data-orig-src="https://i.imgur.com/i3A62F9.png" data-orig-width="2852" src="https://66.media.tumblr.com/5bf5aab579c02e06bdd506e7d67237cf/0e08bfea412a18df-1e/s540x810/0f5b718ee85f4f61d60fb06b0d1a37e71a312785.png" width="600"/></figure>

- [153 sites allowed in Kashmir but no internet](https://kracekumar.com/post/190341665270/153-sites-allowed-in-kashmir-but-no-internet/index.md): 153 sites allowed in Kashmir but no internet
+++
date = "2020-01-19 11:22:57+00:00"
draft = false
tags = ["kashmir", "censor"]
title = "153 sites allowed in Kashmir but no internet"
url = "/post/190341665270/153-sites-allowed-in-kashmir-but-no-internet"
+++
Kashmir is locked down without the internet for more than 167 days as of 19th Jan 2020 since 5th Aug 2019. The wire recently published <a href="https://thewire.in/government/kashmir-internet-whitelisted-websites" target="_blank">an article</a> wherein the Government of India whitelisted 153 websites access in Kashmir. Below is the list extracted from the <a href="https://gitlab.com/snippets/1931378" target="_blank">document</a>

<script src="https://gitlab.com/snippets/1931378.js"></script>

. The internet shutdown is becoming common in recent days during protests.

Anyone with little knowledge to create a web application or work can say, every web application will make network calls to other sites to load JavaScript, Style sheets, Maps, Videos, Images, etc.

### Accessing a Wikipedia page

The wikipedia.org is one of the whitelisted sites. If a user accesses the <a href="https://en.wikipedia.org/wiki/Freedom" target="_blank">Freedom Page</a>, they will see all the text in jumbled and <a href="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg/440px-OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg" target="_blank">image</a> will be missing, since the image loads from _wikimedia.org_. wikimedia is not in the white listed sites. Even though you type one URL, your browser makes underhood requests to other sites.

<figure class="tmblr-full" data-orig-height="1814" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg/1280px-OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg" data-orig-width="1280"><img data-orig-height="1814" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg/1280px-OURS_TO_FIGHT_FOR._4_FREEDOMS_ON_ONE_SHEET_-_NARA_-_513635.jpg" data-orig-width="1280" src="https://66.media.tumblr.com/d11dbe320adc99b6c929c822beafa733/ac977494485d3af8-10/s540x810/47a228615a0350e32ded9d6aed9c5c0fb3bd3a0c.jpg" width="600"/></figure>

### Analysing these URLs

I wrote <a href="https://gitlab.com/snippets/1931380" target="_blank">a small script to download</a> all these sites’ home page and parsed the HTML to extract the potential network calls while loading JavaScript, StyleSheets, images, media, WebSocket, Fonts, etc.

These whitelisted sites make network calls to 1469 unique locations, including their sites and external sites like cdn.jsdelivr.net, googletagmanager.com, maps.googleapis.com. That means, to make these whitelisted sites work, the ISP needs to whitelist thousand more sites.

The whitelist sites contain top-level domains like ac.in, gov.in, nic.in - meaning allow users to access all the government sites.Can ISP filters pass on the traffic to specific top-level domains?

For example, IRCTC uses CDN(Content Delivery Network) to load bootstrap CSS(styling the web elements), font from cdn.jsdeliver.net and googleapis, theme from cdn.jsdeliver.net, google analytics service from googleanalytics.com, loads AI-enabled chatbot from <a href="https://corover.mobi/." target="_blank">https://corover.mobi/.</a> Failure to load all of these components will make the user unable to use the website - if not rendered by the browser, the site will be jumbled, the click actions on buttons will fail.

Other popular services that are missing in the whitelist - maps.googleapis.com, codex.nflxext.com, cdnjs.cloudflare.com, s.yimg.com, cdn.optimizely.com, code.jquery.com, gstatic.com, static.uacdn.net, cdn.sstatic.net, pixel.quantserve.com, nflxext.com, ssl-images-amazon.com, s3.amazonaws.com, rdbuz.com, oyoroomscdn.com, akamaized.net, etc.

Here is how amazon.in may look

<figure class="tmblr-full" data-orig-height="1784" data-orig-src="https://i.imgur.com/0gVszCi.png" data-orig-width="2868"><img alt="Amazon Home Page 1" data-orig-height="1784" data-orig-src="https://i.imgur.com/0gVszCi.png" data-orig-width="2868" src="https://66.media.tumblr.com/99e3373775d7ebd84c444ee8c060d318/ac977494485d3af8-28/s540x810/3dc5601c5e51edd0b394494af2a1e2dfb680c03b.png" width="600"/></figure>

<figure class="tmblr-full" data-orig-height="1784" data-orig-src="https://i.imgur.com/gAmyt5g.png" data-orig-width="2868"><img alt="Amazon Home Page 2" data-orig-height="1784" data-orig-src="https://i.imgur.com/gAmyt5g.png" data-orig-width="2868" src="https://66.media.tumblr.com/de9497d498a9326567a0fa01317e21f2/ac977494485d3af8-7c/s540x810/71cb69d044afa084746cc104a7f2d495fee09aac.png" width="600"/></figure>

There is a typo in one of the whitelist sites, <a href="http://www.hajcommitee.gov.in" target="_blank">www.hajcommitee.gov.in</a>, with a missing ’t,’ the correct URL is <a href="http://www.hajcommittee.gov.in" target="_blank">www.hajcommittee.gov.in</a>. These two sites: <a href="https://www.jkpdd.gov.in/," target="_blank">https://www.jkpdd.gov.in/,</a> <a href="https://www.jkpwdrb.nic.in" target="_blank">https://www.jkpwdrb.nic.in</a>, the browser fails to resolve.

Facebook, Twitter, Instagram, YouTube, WhatsApp, and other social media sites are blocked, whereas JioChat is allowed. The document mentions “JIO chat” does not specify the domain and application like www.

### Conclusion

> Internet (noun): an electronic communications network that connects computer networks and organizational computer facilities around the world

Overall, the internet cannot work by allowing only whitelist sites. As said earlier, underhood, the browser makes calls to a plethora of locations mentioned by the developers and cannot function entirely and unusable from the beginning to the end.

- [How long do Python Postgres tools take to load data?](https://kracekumar.com/post/170492996890/how-long-do-python-postgres-tools-take-to-load/index.md): How long do Python Postgres tools take to load data?
+++
date = "2018-02-04 13:15:49+00:00"
draft = false
tags = ["python", "postgres", "performance"]
title = "How long do Python Postgres tools take to load data?"
url = "/post/170492996890/how-long-do-python-postgres-tools-take-to-load"
+++
Data is crucial for all applications. While fetching a significant amount of data from database multiple times, faster data load times improve performance.

The post considers tools like `` SQLAlchemy statement, SQLAlchemy ORM, Pscopg2, psql `` for measuring latency. And to measure the python tool timing, `` jupyter notebook’s timeit `` is used. Psql is for the lowest time taken reference.

### Table Structure

    annotation=> \d data;
                          Table "public.data"
    Column |   Type    |                     Modifiers
    --------+-----------+---------------------------------------------------
    id     | integer   | not null default nextval('data_id_seq'::regclass)
    value  | integer   |
    label  | integer   |
    x      | integer[] |
    y      | integer[] |
    Indexes:
        "data_pkey" PRIMARY KEY, btree (id)
        "ix_data_label" btree (label)

    annotation=> select count(*) from data;
       count
    ---------
    1050475
    (1 row)

### SQLAlchemy ORM Declaration

    class Data(Base):
        __tablename__ = 'data'
        id = Column(Integer, primary_key=True)
        value = Column(Integer)
        # 0 => Training, 1 => test
        label = Column(Integer, default=0, index=True)
        x = Column(postgresql.ARRAY(Integer))
        y = Column(postgresql.ARRAY(Integer))

### SQLAlchemy ORM

    def sa_orm(limit=20):
        sess = create_session()
        try:
            return sess.query(Data.value, Data.label).limit(limit).all()
        finally:
            sess.close()

### Time taken

    %timeit sa_orm(1)
    28.9 ms ± 4.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The time is taken in milliseconds to fetch `` 1, 20, 100, 1000, 10000 `` queries.

    {1: 28.9, 20: 26.6, 100: 26.2, 1000: 29.5, 10000: 70.2}

![](https://lh4.googleusercontent.com/RVTwghoKivcQDD6Fhx0HVj6Z65mcu2roe5zouMvyGI-OypghYcX23-v-fXXRVUngUtj3loYeaAxp8g=w1920-h990)

### SQLAlchemy Select statement

    def sa_select(limit=20):
        tbl = Data.__table__
        sess = create_session()
        try:
            stmt = select([tbl.c.value, tbl.c.label]).limit(limit)
            return sess.execute(stmt).fetchall()
        finally:
            sess.close()

### Time Taken

The time is taken in milliseconds to fetch `` 1, 20, 100, 1000, 10000 `` queries.

     {1: 24.7, 20: 24.5, 100: 24.9, 1000: 26.8, 10000: 39.6}

![](https://lh3.googleusercontent.com/lhtoHYN0bL9ZiG2lzOt_YJeIikKrhIhHRXCVjFc7_Jhtr6WgNOBSPPC03e1UdkE8bgl_sl7RSqnyNA=w1920-h990)

### Pscopg2

Pscopg2 is one of the Postgres drivers. You can use pscopg2 along with SQLAlchemy or as a stand-alone tool.

    def pscopg2_select(limit=20):
        conn = psycopg2.connect("dbname=db user=user password=password host=localhost")
        cur = conn.cursor()
        try:
            # Note: In prod, escape SQL queries.
            stmt = f"select value, label from data limit {limit}"
            cur.execute(stmt)
            return cur.fetchall()
        finally:
            cur.close()
            conn.close()

### Pscopg2 Time Taken

The time is taken in milliseconds to fetch `` 1, 20, 100, 1000, 10000 `` queries.

    {1: 17.0, 20: 16.9, 100: 17.3, 1000: 18.1, 10000: 30.1},

![](https://lh3.googleusercontent.com/xuobKLuMaq3w00mOiIb2cpWzsFI_cfDixDUeZp1fG9sfA-W1fKsjH-99m_qKJSm-YnD6gmsgrHV6Uw=w1920-h990)

### Psql

    annotation=> explain (analyze, timing off) select label, value from data limit 10000;
                                           QUERY PLAN
    ------------------------------------------------------------------------------------------------
    Limit  (cost=0.00..322.22 rows=10000 width=8) (actual rows=10000 loops=1)
     ->  Seq Scan on data  (cost=0.00..33860.40 rows=1050840 width=8) (actual rows=10000 loops=1)
    Total runtime: 7.654 ms
    (3 rows)
    Time: 7.654 ms

### Psql time taken

The time is taken in milliseconds to fetch `` 1, 20, 100, 1000, 10000 `` queries.

    {1: 0.9, 20: 0.463, 100: 1.029, 1000: 1.643, 10000: 7.654}

### All timings

    {'pscopg2': {1: 17.0, 20: 16.9, 100: 17.3, 1000: 18.1, 10000: 30.1},
    'sa_orm': {1: 28.9, 20: 26.6, 100: 26.2, 1000: 29.5, 10000: 70.2},
    'sa_select': {1: 24.7, 20: 24.5, 100: 24.9, 1000: 26.8, 10000: 39.6},
    'sql_select': {1: 0.9, 20: 0.463, 100: 1.029, 1000: 1.643, 10000: 7.654}}

### Chart of all the timings

![](https://lh6.googleusercontent.com/yfK3ne2netEyVdNs1px9hO9VRSPwI8mb8DoNOkqZa-iGZmW8QWg7PdGsCDRJEmdccd6ZK6jaa5r41g=w1920-h990)

Lower the bar, better for the performance.

As you can see, SQLAlchemy ORM is slowest, and pscyopg2 is fastest. SQLAlchemy select statement is close to the pscopg2 performance, provides a sweet spot for not writing SQL Queries and handling data at a high level.

- [Debugging Python multiprocessing program with strace](https://kracekumar.com/post/168364964555/debugging-python-multiprocessing-program-with/index.md): Debugging Python multiprocessing program with strace

+++
date = "2017-12-09 19:19:08+00:00"
draft = false
tags = ["python", "debugging", "multiprocessing"]
title = "Debugging Python multiprocessing program with strace"
url = "/post/168364964555/debugging-python-multiprocessing-program-with"
+++
Debugging is a time consuming and brain draining process. It’s essential part of learning and writing maintainable code. Every person has their way of debugging, approaches and tools. Sometimes you can view the traceback, pull the code from memory, and find a quick fix. Some other times, you opt different tricks like the print statement, debugger, and rubber duck method.

Debugging multi-processing bug in Python is hard because of various reasons.

*   The print statement is simple to start. When all the processes write the log to the same file, there is no guarantee of the order without synchronization. You need to use `` -u `` or `` sys.stdout.flush() `` or synchronizing the log statements using process ids or any identifier. Without PID it’s hard to know which process is stuck.
*   You can’t put `` pdb.set_trace() `` inside a multi-processing pool target function.

Let’s say a program reads a metadata file which contains a list of JSON files and total records. Files may have a total of 100 JSON records, but you may need only 5. In any case, the function can return first five or random five records.

Sample code

<div class="gist"><a href="https://gist.github.com/kracekumar/20cfeec5dc99b084c879ce737f7c7214" target="_blank">https://gist.github.com/kracekumar/20cfeec5dc99b084c879ce737f7c7214</a></div>

The code is just for demonstrating the workflow and production code is not simple like above one.

Consider `` multiprocessing.Pool.starmap `` a function call with `` read_data `` as target and number of processes as 40.

Let’s say there are 80 files to process. Out of 80, 5 are problematic files(function takes ever to complete reading the data). Whatever be the position of five files, the processes continues forever, while other processes enter `` sleep `` state after completing the task.

Once you know the PID of the running process, you can look what system calls process calls using `` strace -s $PID ``. The process in `` running `` state was calling the system call `` read `` with same file name again and again. The while loop went on since the file had zero records and had only one file in the queue.

Strace looks like one below

<div class="gist">
<a href="https://gist.github.com/kracekumar/8cd44ff27b81e200343bc58d194dd595" target="_blank">https://gist.github.com/kracekumar/8cd44ff27b81e200343bc58d194dd595</a>
</div>

You may argue a well-placed print may solve the problem. All times, you won’t have the luxury to modify the running program or replicate the code in the local environment.

- [Notes from Root Conf Day 2 - 2017](https://kracekumar.com/post/160622791375/notes-from-root-conf-day-2-2017/index.md): Notes from Root Conf Day 2 - 2017
+++
date = "2017-05-13 15:51:29+00:00"
draft = false
tags = ["devops", "conference", "notes"]
title = "Notes from Root Conf Day 2 - 2017"
url = "/post/160622791375/notes-from-root-conf-day-2-2017"
+++
On day 2, I spent a considerable amount of time networking and attend only four sessions.

<a href="https://rootconf.talkfunnel.com/2017/72-spotswap-running-production-apis-on-spot-instances" target="_blank">Spotswap: running production APIs on Spot instance</a>

*   Amazon EC2 spot instances are cheaper than on-demand server costs. Spot instances run when the bid price is greater than market/spot instance price.
*   Mapbox API server uses spot instances which are part of auto-scaling server
*   Auto scaling group is configured with min, desired, max parameters.
*   Latency should be low and cost effective
*   EC2 has three types of instances: On demand, reserved and spot. The spot instance comes from unused space and unstable pricing.
*   Spot market starts with bid price and market price.
*   In winter 2015 traffic increased and price also increased increased
*   To spin up a new machine with code takes almost two minutes
*   Our machine fleet encompasses of spot and on-demand instances
*   When one spot machine from the fleet goes down, and auto scaling group spins up an on-demand machine.
*   Race condition: several instances go down at same time.
*   Aggressive spin up in on-demand machines when market is volatile.
*   Tag EC2 machines going down and then spin up AWS lambda.When spot instance returns shit down a lambda or on-demand instance. Auto Scaling group can take care of this.
*   Savings 50% to 80%
*   Source code: <a href="https://github.com/mapbox/spotswap" target="_blank">https://github.com/mapbox/spotswap</a>
*   No latency because over-provisioned
*   Set bid price as on-demand price.
*   Didn’t try to increase spot instance before going on-demand
*   Cfconfig to deploy and Cloud formation template from AWS

<a href="https://rootconf.talkfunnel.com/2017/1-adventures-in-postgres-management" target="_blank">Adventures with Postgres</a>

*   Speaker: I’m an Accidental DBA
*   The talk is a story of a Postgres debugging.
*   Our services include Real-time monitoring, on demand business reporting to e-commerce players. 4000 stores and 10 million events per day. Thousands of customers in a single database.
*   Postgres 9.4, M4.xlarge,16GB, 750 GB disk space with Extensive monitoring
*   Reads don’t block writes, Multi-Version Concurrency Model.
*   Two Clients A, B read X value as 3. When B updates the value X to 4, A reads the X value and gets back as 3. A reads the X value as 4 when B’s transaction succeeds.
*   Every transaction has a unique ID - XID.
*   XID - 32 bit, max transaction id is 4 billion.
*   After 2 billion no transaction happens.
*   All writes stop and server shutdown. Restarts in single user mode,
*   Read replicas work without any issue.
*   Our server reached 1 billion ids. 600k transaction per hour, so in 40 days transaction id will hit the maximum limit.
*   How to prevent?
*   Promote standby to master? But XID is also replicated.
*   Estimate the damage - txid\_current - Current Transaction ID
*   Every insert and update is wrapped inside a transaction
*   Now add begin and commit for a group of statements, this bought some time.
*   With current rate, 60 days is left to hit max transaction limit.
*   TOAST - The Oversized Attribute Storage Technique
*   Aggressive maintenance. Config tweaks: autovacuum\_workers, maintenance\_work\_mem, autovaccum\_nap\_time - knife to gun fight. Didn’t help
*   `` rds_superuser `` prevented from modifying pg system tables
*   Never thought about `` rds_superuser `` can be an issue.
*   VACUUM – garbage-collect and optionally analyze a database
*   vacuum freeze (\*) worked. Yay!
*   What may have caused issues - DB had a large number of tables. Thousands of tables
*   Better `` shard `` per customer
*   Understand the `` schema `` better
*   Configuration tweaks - max\_workers, nap\_time, cost\_limit, maintenance\_work\_mem
*   Keep an eye out XID; Long-lived transactions are problem
*   Parallel vacuum introduced in 9.5
*   `` pg_visibility `` improvements in 9.6
*   Similar problem faced other companies like GetSentry

<a href="https://rootconf.talkfunnel.com/2017/60-mysql-troubleshooting-tldr" target="_blank">MySQL troubleshooting</a>

*   Step 1 - Define the problem, know what is normal, read the manual
*   Step 2: collect diagnostics data (OS, MySQL). `` pt_stalk `` tool to collect diagnostics error
*   Lookup MySQL error log when DB misbehaves.
*   Check OOM killer
*   General performance issues - show global variables, show global status, show indexes, profile the query
*   Table corruption InnoDB, system can’t startup. Worst strategy force recovery and start from backup.
*   Log message for table corruption is marked as crashed
*   Replication issues - show master status, my.cnf/my.ini, show global variables, show slave status

OTR Session - Micro Service

*  OTR - Off The Record session is a group discussion. Few folks come together and moderate the session. Ramya, Venkat, Ankit and Anand C where key in answering and moderating the session.
*  What is service and micro service? Micro is independent, self-contained and owned by the single team. Growing code base is unmanageable, and the number of deploys increases. So break them at small scale. Ease of coupling with other teams. No clear boundary
*   Advantages of Microservices - team size, easy to understand, scale it. Security aspects. Two pizza team, eight-member team. Able to pick up right tools for the job, and change the data store to experiment, fix perf issues.
*   How to verify your app needs micro service?
*   Functional boundary, behavior which is clear. Check out and Delivery
*   PDF/Document parsing is a good candidate for Micro Service, and parsing is CPU intensive. Don’t create nano-service :-)
*   Failure is inevitable. Have logic for handling failures on another service. Say when MS 1 fails MS2 code base should handle gracefully.
*   Message queue Vs Simple REST service architecture. Sync Vs Async.The choice depends on the needs and functionality.
*   Service discovery? Service registry and discover from them.
*   Use swagger for API
*   Overwhelming tooling - you can start simple and add as per requirements
*   Good have to think from beginnings - how you deploy, build pipelines.
*   Auth for internal services - internal auth say Service level auth and user token for certain services. Convert monolithic to modular and then micro level.
*   API gateway to maintain different versions and rate limitingWhen to use role-based access and where does scope originate? Hard and no correct way. Experiment with one and move on.
*   Debugging in monolithic and micro service is different.
*   When you use vendor-specific software use mock service to test them. Also, use someone else micro service. Integration test for microservices are hard.
*   Use continuous delivery and don’t make large number of service deployment in one release.
*   The discussion went on far for 2 hours! I moved out after an hour. Very exhaustive discussion on the topic.

- [Notes from Root Conf Day 1 - 2017](https://kracekumar.com/post/160561774060/notes-from-root-conf-day-1-2017/index.md): Notes from Root Conf Day 1 - 2017
+++
date = "2017-05-11 21:11:23+00:00"
draft = false
tags = ["devops", "notes", "conference"]
title = "Notes from Root Conf Day 1 - 2017"
url = "/post/160561774060/notes-from-root-conf-day-1-2017"
+++
<a href="https://rootconf.in/2017/" target="_blank">Root Conf</a> is a conference on DevOps and Cloud Infrastructure. 2017 edition’s theme is service reliability. Following is my notes from Day 1.

1. <a href="https://rootconf.talkfunnel.com/2017/63-state-of-the-open-source-monitoring-landscape" target="_blank">State of the open source monitoring landscape</a>



    *   The speaker of the session is the co-founder of Icinga monitoring system. I missed first ten minutes of the talk.-The talk is a comparison of all available OSS options for monitoring, visualization.
    *   Auto-discovery is hard.
    *   As per 2015 monitoring tool usage survey, Nagios is widely used.
    *   <a href="https://en.wikipedia.org/wiki/Nagios" target="_blank">Nagios</a> is reliable and stable.
    *   <a href="https://en.wikipedia.org/wiki/Icinga" target="_blank">Icinga 2</a> is a fork of Nagios, rewrite in c++. It’s modern, web 2.0 with APIs, extensions and multiple backends.
    *   <a href="https://github.com/sensu/sensu" target="_blank">Sensu</a> has limited features on OSS side and a lot of features on enterprise version. OSS version isn’t useful much.
    *   <a href="https://en.wikipedia.org/wiki/Zabbix" target="_blank">Zabbix</a> is full featured, out of box monitoring system written in C. It provides logging and graphing features. Scaling is hard since all writes are written to single Postgres DB.
    *   <a href="http://riemann.io/" target="_blank">Riemann</a> is stream processor and written in Clojure. The DSL stream processing language needs knowledge of Clojure. The system is stateless.
    *   <a href="https://www.opennms.org/en" target="_blank">OpenNMS</a> is a network monitoring tool written in Java and good for auto discovery. Using plugins for a non-Java environment is slow.
    *   <a href="http://graphiteapp.org/" target="_blank">Graphite</a> is flexible, a popular monitoring tool for time series database.
    *   <a href="https://github.com/prometheus/prometheus" target="_blank">Prometheus</a> is flexible rule-based alerting and time series database metrics.
    *   <a href="https://www.elastic.co/products/x-pack/monitoring" target="_blank">Elastic</a> comes with Elastic search, log stash, and kibana. It’s picking up a lot of traction. Elastic Stack is extensible using X-PACK feature.
    *   <a href="https://grafana.com/" target="_blank">Grafana</a> is best for visualizing time series database. Easy to get started and combine multiple backends. - - Grafana annotations easy to use and tag the events.
    *   There is no one tool which fits everyone’s case. You have to start somewhere. So pick up a monitoring tool, see if it works for you else try the next one til you settle down.



2.

    <a href="https://rootconf.talkfunnel.com/2017/17-deployment-strategies-with-kubernetes" target="_blank">Deployment strategies with Kubernetes</a>



    *   This was talk with a live demo.
    *   Canary deployment: Route a small amount of traffic to a new host to test functioning.
    *   If new hosts don’t act normal roll back the deployment.
    *   <a href="https://www.martinfowler.com/bliki/BlueGreenDeployment.html" target="_blank">Blue Green Deployment</a> is a procedure to minimize the downtime of the deployment. The idea is to have two set of machines with identical configuration but one with the latest code, rev 2 and other with rev 1. Once the machines with latest code act correctly, spin down the machines with rev 1 code.
    *   Then a demo of `` kubectl `` with adding a new host to the cluster and roll back.



3.

    <a href="https://rootconf.talkfunnel.com/2017/7-a-little-bot-for-big-cause" target="_blank">A little bot for big cause</a>



    *   The talk is on a story on developing, push to GitHub, merge and release. And shit hits the fan. Now, what to do?
    *   The problem is developer didn’t get the code reviewed.
    *   How can automation help here?
    *   Enforcing standard like I unreviewed merge is reverted using GitHub API, Slack Bot, Hubot.
    *   As soon as developer opens a PR, <a href="https://github.com/moengage/alice" target="_blank">alice</a>, the bot adds a comment to the PR with the checklist. When the code is merged, bot verifies the checklist, if items are unchecked, the bot reverts the merge.
    *   The bot can do more work. DM the bot in the slack to issue commands and bot can interact with Jenkins to roll back the deployed code.
    *   The bot can receive commands via slack personal message.



4.

    <a href="https://rootconf.talkfunnel.com/2017/18-necessary-tooling-and-monitoring-for-performance-c" target="_blank">Necessary tooling and monitoring for performance critical applications</a>



    *   The talk is about collecting metrics for German E-commerce company Otto.
    *   The company receives two orders/sec, million visitors per day.On an average, it takes eight clicks/pages to complete an order.
    *   Monitor database, response time, throughput, requests/second, and measure state of the system
    *   Metrics everywhere! We talk about metrics to decide and diagnose the problem.
    *   <a href="http://metrics-clojure.readthedocs.io/en/latest/" target="_blank">Metrics</a> is a Clojure library to measure and record the data to the external system.
    *   The library offers various features like Counter, gauges, meters, timers, histogram percentile.
    *   Rather than extracting data from the log file, measure details from the code and write to the data store.
    *   Third party libraries are available for visualization.
    *   The demo used d3.js application for annotation and visualization. In-house solution.
    *   While measuring the metrics, measure from all possible places and store separately. If the web application makes a call to the recommendation engine, collect the metrics from the web application and recommendation for a single task and push to the data store.



5.

    <a href="https://rootconf.talkfunnel.com/2017/51-what-should-be-pid-1-in-a-container" target="_blank">What should be PID 1 in a container?</a>



    *   In older version of Docker, Docker doesn’t reap child process correctly. As a result, for every request, docker spawns a new application and never terminated. This is called <a href="https://rootconf.talkfunnel.com/2017/51-what-should-be-pid-1-in-a-container" target="_blank">PID 1 zombie problem</a>.
    *   This will eat all available PIDs in the container.
    *   Use `` Sysctl-a | grep pid_max `` to find maximum available PIDs in the container.
    *   In the bare metal machine, PID 1 is `` systemd `` or any init program.
    *   If the first process in the container is bash, then is PID 1 zombie process doesn’t occur.
    *   Using bash is to handle all signal handlers is messy.
    *   Yelp came up with <a href="https://github.com/Yelp/dumb-init" target="_blank">Yelp/dumb-init</a>. Now, `` dumb-init `` is PID 1 and no more zombie processes.
    *   Docker-1.13, introduced the flag, `` --init ``.
    *   Another solution uses `` system `` as PID 1
    *   Docker allows running `` system `` without privilege mode.
    *   Running system as PID 1 has other useful features like managing logs.



6.

    <a href="https://rootconf.talkfunnel.com/2017/9-razor-sharp-provisioning-for-baremetal-servers" target="_blank">‘Razor’ sharp provision for bare metal servers</a>



    *   I attended only first half of the talk, fifteen minutes.
    *   When you buy physical rack space in a data server how will you install the OS? You’re in Bangalore and server is in Amsterdam.
    *   First OS installation on bare metal is hard.
    *   There comes Network boot!
    *   <a href="http://www.syslinux.org/wiki/index.php?title=PXELINUX" target="_blank">PXELinux</a> is a syslinux derivative to boot OS from NIC card.
    *   Once the machine comes up, DHCP request is broadcasted, and DHCP server responds.
    *   <a href="https://cobbler.github.io/" target="_blank">Cobbler</a> helps in managing all services running the network.
    *   DHCP server, TFTP server, and config are required to complete the installation.
    *   Microkernel in placed in TFTP server.
    *   <a href="https://puppet.com/blog/introducing-razor-a-next-generation-provisioning-solution" target="_blank">Razor</a> is a tool to automate provisioning bare metal installation.
    *   Razor philosophy, consume the hardware resource like the virtual resource.
    *   Razor components - Nodes, Tags, Repository, policy, Brokers, Hooks



7.

    <a href="https://rootconf.talkfunnel.com/2017/77-freebsd-is-not-a-linux-distribution" target="_blank">FreeBSD is not a Linux distribution</a>



    *   FreeBSD is a complete OS, not a distribution
    *   Who uses? NetFlix, WhatsApp, Yahoo!, NetApp and more
    *   Great tools, mature release model, excellent documentation, friendly license.
    *   Now a lot of forks NetBSD, FreeBSD, OpenBSD and few more
    *   Good file system. UFS, and ZFS. UFS high performance and reliable. - If you don’t want to lose data use ZFS!
    *   Jails - GNU/Linux copied this and called containers!
    *   No GCC only llvm/clang.
    *   FreeBSD is forefront in developing next generation tools.
    *   Pluggable TCP stacks - BBR, RACK, CUBIC, NewReno
    *   Firewalls - Ipfw , PF
    *   Dummynet - live network emulation tool
    *   FreeBSD can run Linux binaries in userspace. It maps GNU/Linux system call with FreeBSD.
    *   It can run on 256 cores machine.
    *   Hard Ware - <a href="https://en.wikipedia.org/wiki/Non-uniform_memory_access" target="_blank">NUMA</a>, ARM64, Secure boot/UEFI
    *   Politics - Democratically elected core team
    *   Join the Mailing list and send patches, you will get a commit bit.
    *   Excellent mentor program - GSoC copied our idea.
    *   FreeBSD uses SVN and Git revision control.
    *   Took a dig at GPLV2 and not a business friendly license.
    *   Read out BSD license on the stage.

- [Book Review: The Culture Map](https://kracekumar.com/post/160523067765/book-review-the-culture-map/index.md): Book Review: The Culture Map

+++
date = "2017-05-10 19:22:01+00:00"
draft = false
tags = ["culture", "book-review"]
title = "Book Review: The Culture Map"
url = "/post/160523067765/book-review-the-culture-map"
+++
<a href="http://erinmeyer.com/" target="_blank">The Culture Map: Breaking Through the Invisible Boundaries of Global Business</a> is a book on cultural differences in communication by Erin Meyer.

Last year, I spent three months in NYC. Whenever I entered food outlet or made an eye contact, all the conversation started with “How are you doing?”. I replied, “I’m good and how are you doing?”. Most of the times, I didn’t receive a response. It was a sign. Pedestrians smile at you if you make an eye contact in the US. This doesn’t happen back in India. Neither I’d dare to do it. That’s considered crazy. You smile at someone whom you know or about to talk. Anything else is inviting problem.

The Culture Map is a book about how different culture perceives communication, persuasion, respect, politeness, leadership, decision-making and trust. The book documents all little details how neighboring countries like the UK and France differs. The book provides a relative scale how one culture compares to other. For example, confrontation in a debate in India is considered as outrageous whereas it is an essential aspect of the sound argument in France. French culture emphasis on `` thesis-antithesis-synthesis ``. Asian culture doesn’t encourage open negative feedback in public.

If you’re operating a business or working with multi-ethnicity people, understanding the cultural difference is a key ingredient of a fruitful outcome and pleasant experience. Knowing the difference will save you future trouble.

The book covers a lot of cultures differences in many countries, how to work in multi-cultural environment and decode culture difference. The book filled with a good amount of real-life stories and relative comparison of various culture.

Having worked and working with Canadians, Americans, Dutch, and Indians I’m surprised by author’s effort in narrating the culture difference with solid backing. Next time when you encounter a communication gap, you will start thinking does culture play a role here? The book is absolute treasure trove on various cultures.

- [Book Review: Software Architecture with Python](https://kracekumar.com/post/160521154075/book-review-software-architecture-with-python/index.md): Book Review: Software Architecture with Python

+++
date = "2017-05-10 18:13:33+00:00"
draft = false
tags = ["python", "book-review"]
title = "Book Review: Software Architecture with Python"
url = "/post/160521154075/book-review-software-architecture-with-python"
+++
The book <a href="https://www.packtpub.com/application-development/software-architecture-python" target="_blank">Software Architecture with Python</a> is by <a href="https://twitter.com/skeptichacker" target="_blank">Anand B Pillai</a>. The book explains various aspects of software architecture like testability, performance, scaling, concurrency and design patterns.

The book has ten chapters. The first chapter speaks about different architect roles like solution architect, enterprise architect, technical architect what is the role of an architect and difference between design and architecture. The book covers two lesser spoken topics debugging and code security which I liked. There is very few literature available in debugging. The author has provided real use cases of different debugging tips and tools without picking sides. The book has some good examples on OverFlowErrors in Python.

My favorite chapter in the book is ‘Writing Applications that scale’. The author explains all the available concurrency primitives like threading, multiprocess, eventlet, twisted, gevent, asyncio, queues, semaphore, locks, etc. The author doesn’t stop by explaining how to use them but paves the path to figuring out how to profile the code, find out where the bottleneck lies and when to use which concurrency primitives. This chapter is the longest in the book and deluged with examples and insights. The author’s approach of using `` time `` command to measure the performance rather than sticking with wall-clock time gives the developer understanding of where programming is spending most of the time. The infamous GIL is explained!

The book covers vital details on implementing Python Design Patterns and how Python’s dynamic nature brings joy while creating the creational, structural and design pattern. The showcased examples teach how Python metaclass works and it’s simplicity. The author avoids the infamous gang of four patterns.

The book documents significantly scattered wisdom in Python and Software Architecture world in one place. The book strongly focuses on design pattern and concurrency. The in-depth coverage like concurrency is missing to other chapters. The book is must read for all Python developers.

The book is indeed a long read and solid one in size and content. Having said that book is pretty hands on and loaded with trivial and non-trivial examples. I won’t be surprised if half the book covers the code snippets. The author hasn’t shied away skipping code snippets for explaining any concept. The book introduces a plethora of tools for developing software, and at the same time references required literature.

- [RC checklist for Indian Applicants](https://kracekumar.com/post/157858156340/rc-checklist-for-indian-applicants/index.md): RC checklist for Indian Applicants

+++
date = "2017-03-01 10:51:23+00:00"
draft = false
tags = ["recursecenter"]
title = "RC checklist for Indian Applicants"
url = "/post/157858156340/rc-checklist-for-indian-applicants"
+++
One Sunny Sunday morning one can get up and question their self-existence, or one can ask every few days or few months what am I doing at the current job? The answer will push you to a place you have never been.

One can meticulously plan for extravagant programming tour or programmer’s pilgrimage for three months. Yes, that what my outlook of <a href="https://recurse.com" target="_blank">RC</a> is! RC is a different place from a usual workplace, meetups, college or any educational institute. The two striking reasons are peers and social rules. If you haven’t thought of attending it, give it a thought. I am jotting down the list of steps to ease the planning.

### What is RC?

The Recurse Center is a free, three-month self-directed program for people who want to get better at programming.

### Application Process

RC application process is straightforward. The applicant fills an online form with few details like programming experience, what are your plans at RC and few other details. The exact details are available at <a href="https://www.recurse.com/apply/retreat" target="_blank">Apply Page</a>. Once you submit the application, RC interviewer gets back to you in next few days about application status.

The first round for shortlisted candidate is casual Skype chat. The Skype chat is a step to get to know the applicant. The conversation lasts for few minutes to half an hour. The next round is pair programming round. You will pair up with an interviewer to add a simple feature to the code you submitted as part of the application process. The ideal duration is half an hour. Finally, you get the green card or red card. If you get a red card, you need to wait for six more months to apply. If you get a green card, you’re entitled to jump out of the chair or open a beer bottle and celebrate. You can read about the experience of a recurser on his <a href="https://blog.stanzheng.com/recurse/the-application-process/" target="_blank">interview process</a>.

### Choosing a batch

The application form has the drop down list of batches you’d like to attend. Normally, the drop down list contains next four batches with timelines. You will know how many days in advance to apply for a RC batch the bottom.

### Book flight tickets and accommodation

First and foremost expense for the trip is purchasing flight tickets. You should book two-way flight ticket with or without the cancellation option. On an average, the flight tickets cost 70K INR (~1050 USD) for a two-way journey.

The trip is for three months, and the hotel stay is out of the equation since it’s expensive. The other viable option is Airbnb. RC has a community mailing list for housing. You may get access to the list few weeks before the batch. But if you don’t have a US visa you can’t wait until the last moment. I’ll explain why this is important in a bit. Airbnb charges for the first month stay in advance for the three-month stay. If the user cancels the trip, Airbnb will refund the money with some deduction depending on terms and conditions.One can book Airbnb for the first month and upgrade/shift depending on the experience. Most of the times, Airbnb has a flat discount for one month stay.

By now you must have noticed the savings disappeared like melting of ice.

### B-1/B-2 Visa - walking on the rope

Every year lakhs of people from the Indian Union travel to the USA for different purpose like education, business, family visit, conference, relocation, tourism, etc.. As per 2015 available stats on <a href="https://en.wikipedia.org/wiki/Tourism_in_the_United_States" target="_blank">Wikipedia article</a>, 1,175,153 tourists visited the USA. If you already have a US visa, skip this section.

Every purpose of the tour has a different class of visa. <a href="https://travel.state.gov/content/visas/en/general/all-visa-categories.html" target="_blank">Here is an exhaustive list of visa types</a> for immigrants and non-immigrants. All grad school students apply for non-immigrant <a href="https://travel.state.gov/content/visas/en/study-exchange/student.html" target="_blank">F1 visa</a>. The Recurse Center isn’t typical education institute since you don’t get a degree at the end of the stay. The motto of RC is `` Never Graduate. `` So you can’t apply for the F1 visa. The alternate option in front of the computer is B-1/B-2 visa.

B-1 visa is for business visitors, and B-2 is for pleasure. If your company is paying for the trip, apply for B-1 else apply for the B-2 visa. In the end, your visa carries a stamp stating B-1/B-2 visa type. The permit is valid for ten years.

### Visa application

<a href="https://www.youtube.com/watch?v=38cSiKp5jJY" target="_blank">In the Tamil movie Vaaranam Aayiram  - A thousand elephants, the hero shows up on US visa Center when he decides to fly to meet the girl whom he encountered on a train journey and fell in love with head over heels</a>. That’s one heck of a crazy scene! But the real life picture is entirely different. You need to apply for a visa online. The application process is time-consuming and takes multiple hours to fill. The application requires a bag full of references. Let me put down points I remember.

*   Date of journey and return
*   Purpose of visit - RC admission letter is suffice
*   Accommodation information
*   Flight details
*   Your last two employer name
*   Your point of contact in the USA
*   Your relative or friends address\[es\] in USA
*   Your trip sponsor information. If you’re paying for the trip, itself
*   Your past crime records :-) Hopefully void!

During the online application process, your session must have expired multiple times. The applicant needs to select two dates for in-person interview. Everyone wants to have their interview as soon as possible. The number of visa interviewers is less than 20 in a Center, and one can speak to approximately 20 to 40 applicants in a day. Depending on which day of the year, which season the next available slot will fall in one to four weeks. During the education season, the interview slot fills up faster with minimum three weeks from date of application. The <a href="https://travel.state.gov/content/visas/en/general/wait-times.html/" target="_blank">US travel site</a> displays the turn around based on location.

The first day the applicant submits paper documents of residence, passport information. The Center takes candidate photo, collects fingerprints. The first day is tension free. The second-day applicant meets the interviewer for 10 minutes who will judge your intent and all the carried paper documents. This day is boxing day.

### Criteria

Some people take off from the job and go to RC, some resign from the job and go to RC.

The visa clearance depends on what is your case out of two and other probabilities like marital status, loan, plans after RC, etc …

US embassy loves people who travel for the business purpose. That should be the simple guess. What does that mean here? If your company pays for the trip, the chance of visa approval is high.

The prime concern for the interviewer is the applicant may disappear in the USA and never fly back. Hope you heard similar lines in the linked video. To break the fear, you need to answer the questions and show the proof of return. Now you must have understood why booking flight ticket is the must. As you know, the return flight ticket can be canceled and not a cohesive proof of return but showcases your plan.

So, What else can be a substantial evidence for flying back?

*   A loan like education loan, car loan, housing loan or personal loan.
*   The latest wedlock certificate.
*   A statement from the employer that you will join the current job after the short visit.

If you don’t satisfy above three criteria and you’re in the twenties, your visa request has the high probability of rejection. That’s the price you pay for being a free bird!

### DS-160

An applicant can try twice consecutively for visa application. AFAIK for the third time, the applicant needs to wait for six months to apply. The interviewer for subsequent interviews are different, and there is a high chance of conversion on the second attempt. You need to fill out <a href="http://www.ustraveldocs.com/in/in-niv-ds160info.asp" target="_blank">DS-160 application</a> to get started on the application process and then pay 160 USD. You will receive the email notification at all phases. Once your payment is through, login to <a href="http://www.ustraveldocs.com/in/" target="_blank">ustravel docs portal</a> and pick up the interview slot. You can read step by step procedure on <a href="http://www.ustraveldocs.com/in/in-niv-visaapply.asp" target="_blank">the site</a>. <a href="https://paper.dropbox.com/doc/First-time-in-the-US-9FGsN7v1bkS9TshniZZNU" target="_blank">Here</a> is a document for the first-time traveler.

It took two attempts for me to get a US visa. So one of the safest bet is to obtain US visa for the conference or business and then later apply for RC if you have time. If you work in India focused startups, high chance the company won’t send you to the USA. So you don’t have USA visa.

The current political climate in the US may bring new rule during your transit to the USA. So odds are in all lands.

At the end of the interview, the concerned officer informs the status of the visa request. For the selected candidate, US embassy sends the visa affixed in the passport to one of the specified Collection Center in a week.

### Timelines:

*   RC application process: 2 - 3 weeks
*   Visa application process first attempt: 2 weeks
*   Visa application process second attempt: 2 weeks
*   Visa delivery time: 1 week
*   Time to put down paper at the current company or relieve: 1 week to 3 months

So the applicant needs two to three months to complete all paperwork to attend RC. If you have US visa, you can fly next month to attend RC!

### How much does it cost?

Economics played a crucial role in planning the trip.

I will provide two cost breakup. First one is from a fellow recurser Satabdi Das and second is mine.

She is not a vegetarian and stayed with an RC alumni who rented the other bedroom in his 2 Bedroom apartment in Brooklyn. Her commute time was between 25-30 minutes (25 minutes when I got the express train - A line) in <a href="http://web.mta.info/nyct/service/aline.htm" target="_blank">A</a> and <a href="http://web.mta.info/nyct/service/cline.htm" target="_blank">C</a> lines. Usually, she prepared her breakfast. In the beginning, she tried to cook dinner too, but that turned out to be a little time consuming, and there was no dearth of good and cheap options.Most of the meals used to be around $5 to $10 (rarely $10, if she felt like splurging) and sometimes even $2 (thanks to those yummy Chinese buns!). And she used to buy fruits almost weekly.

List of her most frequented places - Halal cart- Falafel sandwich- Hummus sandwich- NY Dosa- Chinese Buns from Golden Steamer- Sweet Green

## Expenses:

*   Rent - 1185 per month = 3555
*   Commute - 115 per month = 345
*   Mobile Bill - 40 per month = 120
*   Food - 2000
*   Laundry - $3 per wash and dry.

Total $6020 (excluding the visa fee and plane fare and laundry cost)

Thanks Satabdi for sharing the expenses.

The second story is mine.

I am atheist vegetarian by choice and eat egg. I stayed in Brooklyn, traveled to RC by subway on <a href="http://web.mta.info/nyct/service/aline.htm" target="_blank">A</a> and <a href="http://web.mta.info/nyct/service/cline.htm" target="_blank">C</a> lines. The subway is operational throughout the clock. My Airbnb stay was a single room of 10\*16 with queen size bed and dressing table in a three floored building. The room came with a shared kitchen, fridge, wifi, and toilet. Only two rooms were occupied, out of four rooms on the floor.

I mostly lived on fatty bread with peanut butter, <a href="https://goo.gl/maps/RdBru89CbmP2" target="_blank">French Sandwich</a>, <a href="https://goo.gl/maps/RuAgKTAfeqz" target="_blank">Falafel Sandwich</a>, <a href="https://goo.gl/maps/QFAATGDjzxF2" target="_blank">Dosa from NY Dosas</a>,<a href="https://goo.gl/maps/rjtu7mgLPj22" target="_blank">Dosa from Hampton Chutney</a>,<a href="https://goo.gl/maps/nPHvz6dn2cm" target="_blank">Veg Bowl from Sweet Green</a>, <a href="https://goo.gl/maps/CoE2Yqpvyd12" target="_blank">Dollar Pizzas</a> and fruits.

## Three months expenses

*   Room Rent - 3000 (1000/month)
*   Subway - 345(115/month)
*   Food - 1700
*   Mobile Bill - 105 (35$/per month with unlimited calls, 4GB/4G data)
*   Misc - 200 (Laundry, Haircut, Sightseeing, etc …)
*   Visa fee - 340 (Two attempts, Travel to Chennai)
*   Flights - 1000

Total = 6690 USD

With 1 USD as 67 INR, the trip cost was approximately 4.5 lakhs.

Hope the article gave a sense of trip planning. If you have any other doubts, feel free drop me an email.

Thank you Satabdi and Alicja Raszkowska for reviews and comments on the blog post.

- [Return Postgres data as JSON in Python](https://kracekumar.com/post/156769849745/return-postgres-data-as-json-in-python/index.md): Return Postgres data as JSON in Python

+++
date = "2017-02-03 20:47:13+00:00"
draft = false
tags = ["python", "postgres"]
title = "Return Postgres data as JSON in Python"
url = "/post/156769849745/return-postgres-data-as-json-in-python"
+++
Postgres supports `` JSON `` and `` JSONB `` for a couple of years now. The support for JSON-functions landed in version 9.2. These functions let Postgres server to return JSON serialized data. This is a handy feature. Consider a case; Python client fetches 20 records from Postgres. The client converts the data returned by the server to tuple/dict/proxy. The application or web server converts tuple again back to JSON and sends to the client. The mentioned case is common in a web application. Not all API’s fit in the mentioned. But there is a use case.

### Postgres Example

Consider two tables, `` author `` and `` book `` with the following schema.

<div class="gist"><a href="https://gist.github.com/kracekumar/322a2fd5ea09ee952e8a7720fd386184" target="_blank">https://gist.github.com/kracekumar/322a2fd5ea09ee952e8a7720fd386184</a></div>

Postgres function `` row_to_json `` convert a particular row to JSON data. Here is a list of authors in the table.

<div class="gist"><a href="https://gist.github.com/kracekumar/3f4bcdd16d080b5a36436370823e0495" target="_blank">https://gist.github.com/kracekumar/3f4bcdd16d080b5a36436370823e0495</a></div>

This is simple, let me show a query with an inner join. The `` book table `` contains a foreign key to `` author table ``. While returning list of books, including author name in the result is useful.

<div class="gist"><a href="https://gist.github.com/kracekumar/eb9f1009743ccb47df2b3a5f078a4444" target="_blank">https://gist.github.com/kracekumar/eb9f1009743ccb47df2b3a5f078a4444</a></div>

As you can see the query construction is verbose. The query has an extra select statement compared to a normal query. The idea is simple. First, do an inner join, then select the desired columns, and finally convert to JSON using row\_to\_json. `` row_to_json `` is available since version `` 9.2 ``. The same functionality can be achieved using other function like `` json_build_object `` in 9.4. You can read more about it in the <a href="https://www.postgresql.org/docs/9.4/static/functions-json.html" target="_blank">docs</a>.

### Python Example

Postgres drivers `` pyscopg2 `` and `` pg8000 `` handles JSON response, but the result is parsed and returned as a tuple/dictionary. What that means, if you execute raw SQL the returned JSON data is converted to Python dictionary using `` json.loads ``. Here is the function that facilitates the conversion in <a href="https://github.com/psycopg/psycopg2/blob/51aa166d5219bf6bcda1f68f33399c930113a1f1/lib/_json.py#L109" target="_blank">pyscopg2</a> and <a href="https://github.com/mfenniak/pg8000/blob/4712bd870fec11b10b961a12b54ef7ccb0f70790/pg8000/core.py#L1433" target="_blank">pg8000</a>.

<div class="gist"><a href="https://gist.github.com/kracekumar/2d1d0b468cafa5197f5e21734047c46d" target="_blank">https://gist.github.com/kracekumar/2d1d0b468cafa5197f5e21734047c46d</a></div>

The psycopg2 converts returned JSON data to list of tuples with a dictionary.

One way to circumvent the problem is to cast the result as text. The Python drivers don’t parse the text. So the JSON format is preserved.

<div class="gist"><a href="https://gist.github.com/kracekumar/b8a832cd036b54075a2715acf2086d62" target="_blank">https://gist.github.com/kracekumar/b8a832cd036b54075a2715acf2086d62</a></div>

Carefully view the printed results. The printed result is a list of tuple with a string.

For SQLAlchemy folks here is how you do it

<div class="gist"><a href="https://gist.github.com/kracekumar/287178bcb26462a1b34ead4de10f0529" target="_blank">https://gist.github.com/kracekumar/287178bcb26462a1b34ead4de10f0529</a></div>

Another way to run SQL statement is to use <a href="http://docs.sqlalchemy.org/en/latest/core/sqlelement.html?highlight=text#sqlalchemy.sql.expression.text" target="_blank">`` text `` function</a>.

The other workaround is to unregister the JSON converter. These two lines should do

    import psycopg2.extensions as ext
    ext.string_types.pop(ext.JSON.values[0], None)

Here is a relevant issue in <a href="https://github.com/psycopg/psycopg2/issues/172" target="_blank">Pyscopg2</a>.

- [RFCS We Love](https://kracekumar.com/post/156493415835/rfcs-we-love/index.md): RFCS We Love

+++
date = "2017-01-28 17:47:02+00:00"
draft = false
tags = ["rfcs_we_love"]
title = "RFCS We Love"
url = "/post/156493415835/rfcs-we-love"
+++
<figure class="tmblr-full" data-orig-height="400" data-orig-src="https://pbs.twimg.com/profile_images/824954998295404545/lN8AVM2L_400x400.jpg" data-orig-width="400"><img data-orig-height="400" data-orig-src="https://pbs.twimg.com/profile_images/824954998295404545/lN8AVM2L_400x400.jpg" data-orig-width="400" src="https://66.media.tumblr.com/5859dd19b30a1f6ae958ab2a8285d452/tumblr_inline_pjzofzv8AP1qc390z_540.jpg"/></figure>

A simple question can open a door for new exploration. While I was at <a href="https://recurse.com" target="_blank">RC</a>, I tweeted “Is there any meetup group similar to papers we love for discussing RFCS?”. In an interval of nine minutes, <a href="https://twitter.com/jaseemabid" target="_blank">Jaseem</a> replied: “Let’s start one :)”

<blockquote class="twitter-tweet" data-lang="en"><p dir="ltr" lang="en">Is there any meetup group similar to <a href="https://twitter.com/papers_we_love" target="_blank">@papers_we_love</a> for discussing RFCs?</p>— kracekumar (@kracetheking) <a href="https://twitter.com/kracetheking/status/803851332973105152" target="_blank">November 30, 2016</a></blockquote>

<script async="" charset="utf-8" src="//platform.twitter.com/widgets.js"></script>

Above discussion on Twitter, lead to a new focus group <a href="https://twitter.com/rfcs_we_love" target="_blank">RFCS We Love</a> in Bangalore by <a href="https://twitter.com/captn3m0" target="_blank">Nemo</a>, <a href="https://twitter.com/jaseemabid" target="_blank">Jaseem</a>, <a href="https://twitter.com/iavins" target="_blank">Avinash</a> and <a href="https://twitter.com/kracetheking" target="_blank">Kracekumar</a>. The first meetup held <a href="https://github.com/rfcswelove/rfcs_we_love#meetup-1" target="_blank">today, 28-01-2017</a>. 15 interested and ethusiatic people attended the meetup. <a href="https://twitter.com/ChillarAnand" target="_blank">Anand</a> and <a href="https://twitter.com/xgovindx" target="_blank">Govind</a> presented <a href="https://tools.ietf.org/html/rfc3629" target="_blank">RFC 3629 (UTF-8)</a> and <a href="https://tools.ietf.org/html/rfc6238" target="_blank">RFC 6238 (TOTP)</a> respectively. The slides and videos links are available in <a href="https://github.com/rfcswelove/rfcs_we_love#meetup-1" target="_blank">GitHub README</a>.

The Indian Union recognises <a href="https://en.wikipedia.org/wiki/Languages_with_official_status_in_India#Official_languages_of_the_Union" target="_blank">two official languages</a>, <a href="https://en.wikipedia.org/wiki/Languages_with_official_status_in_India#Eighth_Schedule_to_the_Constitution" target="_blank">twenty-two scheduled languages</a> with no national language. For the above mentioned reason, presenting Unicode as the first talk of the new experiment is apt, and I can’t ask for a better start. It was a pure coincidence to have Unicode RFC in the first ever meetup. I don’t know whether Anand, thought in the above lines.

If RFCS interests you, and you’re in Bangalore, follow the group and join the meetup. If you’re in any other city, you can start one too!

A journey of a thousand miles begins with a single step - <a href="https://en.wiktionary.org/wiki/a_journey_of_a_thousand_miles_begins_with_a_single_step" target="_blank">Lao</a>.

- [Expose jupyter notebook over the network](https://kracekumar.com/post/156322146345/expose-jupyter-notebook-over-the-network/index.md): Expose jupyter notebook over the network

+++
date = "2017-01-24 20:21:21+00:00"
draft = false
tags = ["python", "jupyter", "ipython"]
title = "Expose jupyter notebook over the network"
url = "/post/156322146345/expose-jupyter-notebook-over-the-network"
+++
What is the Jupyter notebook?

>  
> The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualisations and explanatory text.
> 

The definition is from the <a href="https://jupyter.org/" target="_blank">official site</a>. I use `` IPython/Jupyter `` shell all time. If you haven’t tried, spend 30 minutes and witness the power!

At times, I want to share some code snippet with folks in the same building during work, workshop or training. The Jupyter notebook configuration allows the user to expose the notebook cluster, terminal over the network or the internet. The notebook is available over the network with two changes in the configuration file. The first config value is IP other than default localhost. The second one is the password for users to connect to the notebook.

*   Run the command `` jupyter notebook --generate-config `` to generate the config file. The command creates configuration lies in the path `` ~/.jupyter/jupyter_notebook_config.py ``. By default, the server listens on localhost. Modify the IP address to local machine IP. You can use `` ifconfig `` to find out the IP and change the value of `` c.NotebookApp.ip `` to IP like `` # c.NotebookApp.ip = '192.168.0.101' ``.
*   Next, find the line which contains `` c.NotebookApp.password `` declaration. Then execute `` from the notebook.auth import passwd; passwd() `` in the IPython/Jupyter shell. The code is to generate password, and placed above the attribute assignment. Enter the password; copy the output and assign the output to the attribute `` c.NotebookApp.password ``. Now run the server and open the URL `` http://IP:8888 `` from another PC or mobile to browse notebooks after entering the correct password.

The IP address can be overridden using the flag `` --ip `` like `` jupyter notebook --ip 192.168.0.100 ``

- [RC Return Statement](https://kracekumar.com/post/155718711280/rc-return-statement/index.md): RC Return Statement

+++
date = "2017-01-11 14:31:45+00:00"
draft = false
tags = ["recursecenter"]
title = "RC Return Statement"
url = "/post/155718711280/rc-return-statement"
+++
<a href="https://www.recurse.com/" target="_blank">The Recurse Center</a> is a free, three-month self-directed program for people who want to get better at programming. I attended fall batch 2 in 2016 from September to December. If you aren’t aware of Recurse Centre, take a couple of minutes to go through recurse.com and read the rest of the piece.

<img height="500" src="https://lh3.googleusercontent.com/VICVgbLCX3Io0EB_GLnKwBxsQ5SaRKmhuaDnhu3nIHwbTJq8smWmOUhd2OntANm2Euihozka1rH2TlCwkBWMZP-P6OkTz1BN5Abg9e8NQC09jtgPtkszI9iKZCn8-ZLQKWfCs_r7bU6q_mMfiq3c_jeTtycZNCJeyOE9IklVQdZFDBhDNHlzHYiULner4RLYKHTs5rTCL3zDwv6FOwVUDAIQ6oVFifx7Bh8S-_B6dBuCDFSgvkY40rlmyzQWaCXv9U_t4VmCi360VUGIGhUBUX-jLWq5ujp1WXekU9ePdznh3IUObRubgh_2hX_6LPFYqw8wE8BgbRHw1Nr8kteG5bJbtYk5SiqbgZWAc3fMa5CwjFYPDMzDYeURum2jaCRaa_85qHu5D_RxtUgypry4MMc8VkjHI-hzsUJCHDlfL9TChj4HX6bATt3ldopx_CkHg4EguRoc_5rvRyP-X3ULysr9zjc2tczvsovWZfiTIC2pX0ezz8DhdfzKa2qMw9F-lEPFIMAZgedwj6jWt0kTMbkrfczKqOzGQ1QjrPjohHMl5ynpjIXbuM5fgLXbxM77vbrOo1K_hK6-UnQCx4Ih_m4q0LMjw9LpMzUzq17D2fz5vdlK4pD6NVL-dvClX-wS2g45e99NhMZdLgKkJfdN2oX-Rb54w4tbPi8hD3CltrQ=w743-h990-no" width="500"/>

Every day we open a significant amount of doors to get to desired places. Some doors are unique but look dull and plain, but the incredible universe behind the door hides excitement, adventures, gems, friends, and insights.

When I first walked into the door on the first day, I was blank and didn’t know what to expect from RC. Over a period, I did program more, learned programming languages, lessons, picked up tools, thought process, made friends, etc…

### Work

<img height="500" src="https://lh3.googleusercontent.com/uUyc-3yCw7-egBFhSkOv-qFAUFED5ZEXK4_Elwoq7THicrGfx2dXo7Nj_JssnW96wNk1xwB69ks=w1251-h990-no" width="500"/>

During twelve week experimentation, I worked on four projects.

*   I wrote a <a href="https://github.com/kracekumar/bt" target="_blank">BitTorrent Client</a> in Python.
*   <a href="https://github.com/kracekumar/imon" target="_blank">Imon</a> - Internet bandwidth monitor in Rust
*   I worked on <a href="https://github.com/zulip/zulip/pull/2615" target="_blank">Zulip</a> to add reaction backend missing API.
*   Incomplete implementation of <a href="https://github.com/kracekumar/edht" target="_blank">Amazon’s DynamoDB paper</a> in Erlang.

### Learning

<img height="500" src="https://lh3.googleusercontent.com/71CF7rwXs5tAZ3lHgRIrpfNbc4PosiEf8BKBBFHlfGfMr8W8-JBDFYQ2pgLr54l88M6VYdAaFo4Fo8ZyT80WUodJtjgbPciHFz6AE-ccormsBJf5wFyaJ5ZVNPw5dAqIVEzPhg2IX3AlNe8tvGGvAHy91HhZgKdEW2YBtN4oimcOTmlfqib0z7zt2_BwC23aSu0qqAHlkXd9ZjIOMb2A1p8q12ojO4E8R_dYLWzgUUN75McjHbR-ezPEGBD_M7DQlP2pqTxB11gfqcxlUsOZjlUwrO5XFNgdkuk9ZGSS2Ihr4CuxWILfXuVtr6hgzkMQjheNmt2FFOmoGMFJlcyUh_s9P1PIphMEEmcvORlfV-4lkBHrw8lmdKhnDRaoEb_lC0njwiWymB-u-P9vO168Mo12K1doYVsOKsLFgnxuOQTIjfsWPROAl_vyz3Mu_fjXnQ8lWeJh4yDuyNUrzkwMkDA_2LRxUvWdb3AE2_53gLjHuwTB72-TmSKPYeq_mbH8SAjwRJ89OkLMYyiFVDGrELShp2X5yCQSkw8sj4TDLsxM14e6_51LUPXaidX5KpEOjshPIq3OTEAqV3fivbb79XcXFkfyyPSX86YoCyg-tKJJ0MHOFHmYvD9xih4PS1e09htoamjJ3aYQ5VuS-qwWy_lNm773sR5kNnKMXSLVO3E=w743-h990-no" width="500"/>

At RC I worked on tasks which were difficult to work at other times. Writing down what I learned is hard for various reasons. I am jotting down points I remember.

*   I picked up two new languages, Rust and Erlang which I am euphoric.
*   I rekindled my interest in system, network and distributed programming.
*   RC is the only place where I have seen folks aren’t afraid to say `` I don't know ``. This attitude is valuable and activates the mind to discover. I picked up the habit of saying `` I don't know, what does that mean? ``.
*   I learned few things about `` compassion, empathy, kindness, gratitude and generosity `` and dropped a bit of arrogance.
*   How to prototype experimental projects like don’t worry about file structures, modularity, optimization, right abstraction, etc …
*   A better state of mind to focus on programming and curiosity to learn and discover.
*   I have previously pair programmed at work but never been fun. Now I have a clear expectation in pair programming and how to handle conflicts when it works and fails.
*   <a href="https://twitter.com/r0ml" target="_blank">R0ml’s</a> talk on write all the code in a single file was intellectually stimulating. I practiced this technique in the project <a href="https://github.com/kracekumar/edht/blob/master/src/edht_sup.erl" target="_blank">edht</a>. It saved a considerable amount of time. I considered multiple times to refactor the code into multiple files but resisted. Question existing practices and assumptions to find out the real reason for existence and validate personal reasons.

### Difficult times

<img height="500" src="https://lh3.googleusercontent.com/q5K3OUy__xckCoEPdNCRz7ITo0n_O69KwU5ODPEsrNwH_3rd-Ol5FKqhHL35kz-nIXCNHYe8ARrNYt28AYTSJAZNstFxaooe8R8Cl5zzXagSh1CyfVclyh-aNgyv8PD8U3Cyk_pSXLsSvFuqiWCOhJ0kCsv-d3Crw54APOeIWIcnzbzGYEz0TGKge01tq73_DH4Moa3vvWchWDVK-qg1dBc6TNBQlnPFATX3kh5pltclbaquKvYzRZd7jSIoxRi6Byy9bgT8TWX7jTaTjP2eYhZHOAI0t8Zt_x-buAQd2zF2QId2MTomK2PQxe1N4vGtT8UfEwfwH2Surc9YNIYNUB61ZJG0xPlcg8y_f3XveMnWbbk7WunfRT9u5TZ28moqMh2E2JQKBTj7BojaM5mz-FAJjiaxA7O_TQoyp0dg1zdqbn31Xu0PiOvijJn4xFIFkDHvHCX7Ll1MzbP0xfXkTAbwCVenJN4onkMh0ZpxUGJSCZ-E9XYjLJyifG6mer0DAoBV95zsc0ZzQEwhLZBjkFat2op0qjoUloOj-vPHBpOimrh8NfyC_VXnVXVbe_3SfUxHLqe-Oq9dSPqP3pzMBcFM_txAu9-WwHSr4lHLBbq8_9Y8b0JiF5nmn327MFS2iev2I44_rMg1XoQOlAHytyHjd-yIGt6x0UVb0lAeOFo=w743-h990-no" width="500"/>

I had my dark days(I am not talking about winter evenings), difficult time to concentrate and felt lost. Sometimes I spent the whole day staring at the code and debugging.I felt depressed when I couldn’t make a progress. Reading Dynamo DB was hard, and paper is intended for a different set of audience. As a developer, every day I read documentations, man pages. The paper expects the reader familiar with concepts in distributed computing.

### Next

It’s been four weeks since the batch got over. Every day I am trying to learn something new in programming and learn from the Zulip discussions. Concentrating on everything going on in different streams is hard. That’s how I think I can grow as a better programmer. Push the boundary of what you know, try to grasp and learn different topics.

### What I think about RC

RC is a gift to programmers and three months is a sound investment in a programming career. The place and events teaches a lot of technical skills and non-technical skills. <a href="https://recurse.com/manual" target="_blank">Social rules</a> of RC is what makes the place welcoming and puts everyone at home. RC transforms everyone into better programmer or puts you on the track and make you feel better human being. I won’t claim; I became a better programmer and but I can sense the difference in attitude and steps towards improving myself. RC shaped me as a better person! I believe to become a better programmer you need to be a better person.

I am honored to be part of RC and grateful it exists. Thanks for batchmates, alumni, and staff providing an opportunity to grow. Special Thanks to <a href="https://twitter.com/jmsdnns" target="_blank">James Dennis</a> who introduced me to Recurse Center (Hacker School at that time) and <a href="https://twitter.com/suren" target="_blank">Suren</a>, <a href="https://twitter.com/punchagan" target="_blank">Punchagan</a> for helping in RC planning.

It was hard to close the door on the last day knowing I will miss the cherished space.

- [Grokking algorithm: Book Review](https://kracekumar.com/post/155490330815/grokking-algorithm-book-review/index.md): Grokking algorithm: Book Review

+++
date = "2017-01-06 18:32:51+00:00"
draft = false
tags = ["algorithm", "bookreview"]
title = "Grokking algorithm: Book Review"
url = "/post/155490330815/grokking-algorithm-book-review"
+++
<img height="700" src="https://lh3.googleusercontent.com/XHxwHUqOHWdcb8jmVqYgsgMrbb07mcA2PhvD1xhik5Zku6qG33gAid4ruOYkW-87tmsh3FGRPbMvtM2SaDBXw53Gz0ku103gI297T08BUbA1LG_p2BQSO3c5BgG1kSdnqYSkGYPxOT0UYQ8fpYD9ulNZN8KjVBF9Rvxi3q_Yq4DWM4ZM_8-wuPcInDmzgY1eK1QFT0xQmVsRnyqOPD5db9bk7gr4rPNPArm86JmoLCMbp6_emhxrohPDevrmAFPx1oOTZJex5giA2wYXQSsJ4g_X_yAXy6PZZFWjY_j6U-u9cnndDUKlQOHdlIWIFfFDsO4F1DiU6J7NiokvO4m_QGXyRDP1dum3xjtxBoy-6oUNnDaNL_6eO-7LImhZaL44rU0e0hL_jN2P1tBwN9zwy7VR7z-uYvT02l6KYW8FbCSSe7d19vfsOnmRdvg-KcbkF2hYWzlyMHB3zZoIEB-d06gmBWB5PXWShdWaRnW4XRg4DgOJ_etuD8GDu8fl8Buib2pfqlyPbdQNHPmTAn07w-88uC6uuZXoElvo8lBCm2-Rf6VmNUGdKcIc_iMfp5q-kQ_XSLFFcN43WWa2kXu47bgB55JhJGpSxTWtN537zJ7TPkJw4ovOsg=w743-h990-no" width="550"/>

<a href="https://www.manning.com/books/grokking-algorithms" target="_blank">Grokking algorithm - An illustrated guide for programmers and other curious people</a> is the book by Aditya Y. Bhargava about understanding algorithms. The book is different from other algorithm books. The book explains the chosen concepts with illustrated guide like cartoons, xkcd and doesn’t let the followers lost in the sea of mathematical formulas or procedure.

The book discusses a handful of foundational topics on algorithms like `` Big O notation, Binary Search, Sorting, Recursion, Hash Tables, graphs, greedy algorithm, dynamic programming ``. Here is the link to <a href="https://www.manning.com/books/grokking-algorithms" target="_blank">Table Of Contents</a>. The book is straightforward and easy to follow. The pictures with detailed walkthrough make learning algorithm fun; lively and joyous. The book covers multiple worked out (pictures!) problems for harder concepts like Breadth-first search and knapsack problems.

Here is a picture, solution for a knapsack problem. Consider you’re a robber(I know you aren’t one in real life) and broke into a store with a knapsack capacity 4lbs(No metric system :-(). There are three items `` stereo, laptop, guitar `` of different weight and worth. Come up with an algorithm to steal maximum valuable elements that fit in the knapsack.

Here is how worked out solution looks

<img alt="soltuion" height="400" src="https://raw.githubusercontent.com/egonSchiele/grokking_algorithms/master/images/09_dynamic_programming/092_knapsack_problem_faq/reordered_rows.png" width="500"/>

If you are aware of dynamic programming, explanation, pseudocode may fit on a page, or two and solution in Python programming language may have 50 lines. The worked out solution spans across nine pages with detailed steps and data structures progression at each step! The explanation with images makes learning algorithm exciting and less frightening.

<a href="https://manning-content.s3.amazonaws.com/download/2/656dfe9-e8fa-49ac-ac80-46584976ff49/GrokkingAlgorithms_SampleChapter6.pdf" target="_blank">Here is a free version of breadth first algorithm</a>. All the source code and images used in the book is available <a href="https://github.com/egonSchiele/grokking_algorithms" target="_blank">Github</a>. The cool part is images are available under free for non-commercial use. If you’re teaching algorithm, you can use them in your class!

The book has over 400 pictures and not a reference book but a good one to get started on algorithms.

- [2016 Books](https://kracekumar.com/post/155204597600/2016-books/index.md): 2016 Books

+++
date = "2016-12-31 11:43:21+00:00"
draft = false
tags = ["books"]
title = "2016 Books"
url = "/post/155204597600/2016-books"
+++
I was watching one of my favorite Tamil writer <a href="https://en.wikipedia.org/wiki/S._Ramakrishnan" target="_blank">S Ramakrishnan’s</a> <a href="https://www.youtube.com/watch?v=34e1Cczkcmc" target="_blank">video</a> about books. Somewhere in the middle of the talk, he suggests the audience to write down the list of books they read this year and share with others. Irrespective of how short or tall the list is, undoubtedly people will pick up from the list someday. I bought the idea and documenting the book list.

### English

#### Read

*   <a href="https://www.goodreads.com/book/show/14891980-the-5-elements-of-effective-thinking" target="_blank">The 5 Elements of Effective Thinking</a>
*   <a href="https://www.goodreads.com/book/show/8520610-quiet" target="_blank">Quiet: The Power of Introverts in a World That Can’t Stop Talking</a> - The book is about introverts, what constittues who they are and their world - <a href="http://kracekumar.com/post/152278984540/quiet-book-review" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/464260.The_Fire_Next_Time" target="_blank">The Fire Next Time</a>
*   <a href="https://www.goodreads.com/book/show/71728.Jonathan_Livingston_Seagull" target="_blank">Jonathan Livingston Seagull</a>
*   <a href="https://www.goodreads.com/book/show/7613.Animal_Farm" target="_blank">Animal Farm</a> - Satire on Russian Revolution - <a href="http://kracekumar.com/post/145749737410/animal-farm-review" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/485894.The_Metamorphosis" target="_blank">The Metamorphosis</a>
*   <a href="https://www.goodreads.com/book/show/6114128-outliers" target="_blank">Outliers: The Story of Success</a> - the factors that contribute to high levels of success
*   <a href="https://www.goodreads.com/book/show/17204679-man-s-search-for-meaning" target="_blank">Man’s Search for Meaning</a> - Hope during adversity - <a href="http://kracekumar.com/post/152845373255/mans-search-for-meaning-book-review" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/14059029-how-to-stay-sane" target="_blank">How to Stay Sane</a>

#### Currently reading

*   <a href="https://www.goodreads.com/book/show/7015403-the-gifts-of-imperfection" target="_blank">The Gifts of Imperfection: Let Go of Who You Think You’re Supposed to Be and Embrace Who You Are</a>
*   <a href="https://www.goodreads.com/book/show/71730.Nonviolent_Communication" target="_blank">Nonviolent Communication: A Language of Life</a>
*   <a href="https://www.goodreads.com/book/show/22847284-grokking-algorithms-an-illustrated-guide-for-programmers-and-other-curio" target="_blank">Grokking Algorithms An illustrated guide for programmers and other curious people</a>
*   <a href="https://www.goodreads.com/book/show/6149162-mirrors" target="_blank">Mirrors: Stories of Almost Everyone</a> - World historic events in few sentences
*   <a href="https://www.goodreads.com/book/show/17690.The_Trial" target="_blank">The Trial</a>
*   <a href="https://www.goodreads.com/book/show/18373.Flowers_for_Algernon" target="_blank">Flowers for Algernon</a>

#### Dropped the ball

*   <a href="https://www.goodreads.com/book/show/5544.Surely_You_re_Joking_Mr_Feynman_" target="_blank">Surely You’re Joking, Mr. Feynman!</a>
*   <a href="https://www.goodreads.com/book/show/17470674-fahrenheit-451" target="_blank">Fahrenheit 451</a>
*   <a href="https://www.goodreads.com/book/show/7507825-at-home" target="_blank">At Home: A Short History of Private Life</a>
*   <a href="https://www.goodreads.com/book/show/25666050-algorithms-to-live-by" target="_blank">Algorithms to Live By: The Computer Science of Human Decisions</a>

### Tamil

#### Read

*   <a href="https://www.goodreads.com/book/show/24244705" target="_blank">உணவு யுத்தம் - Food war - Essays and Facts</a> - <a href="https://www.goodreads.com/review/show/1522038158" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/29069846" target="_blank">வாளோர் ஆடும் அமலை - Victory dance(not greatest translation) - History</a>
*   <a href="https://www.goodreads.com/book/show/12084194" target="_blank">அப்போதும் கடல் பார்த்துக்கொண்டிருந்தது - Sea was seeing even that time - Short stories</a>
*   <a href="https://www.goodreads.com/book/show/23571233" target="_blank">உரைகல் - Touchstone - History &amp; Analysis</a>
*   <a href="https://www.goodreads.com/book/show/29539381" target="_blank">சொல்லத் தோணுது - Wanting to say - Political commentary</a> - <a href="https://www.goodreads.com/review/show/1618578119" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/23389562-pookuzhi" target="_blank">பூக்குழி - Death pit - Novel</a>
*   <a href="https://www.goodreads.com/book/show/22012022-konguthaer-vaazhkkai" target="_blank">Konguthaer Vaazhkkai - Short stories</a> - <a href="https://www.goodreads.com/review/show/1636281974" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/30293551" target="_blank">ஞாயிற்றுக் கிழமைப் பள்ளிக்கூடம் - Sunday School - Modern Poetry</a>
*   <a href="https://www.goodreads.com/book/show/30293553" target="_blank">பறவையின் நிழல் - Bird’s shadow - Modern Poetry</a>
*   <a href="https://www.goodreads.com/book/show/24482410" target="_blank">மிதவை - Drift wood - Novel</a>
*   <a href="https://www.goodreads.com/book/show/29970319-thigambaram" target="_blank">திகம்பரம் - Nude - Essays</a>
*   <a href="https://www.goodreads.com/book/show/30829491" target="_blank">கைம்மண் அளவு - Fistful of sand - Essays</a>
*   <a href="https://www.goodreads.com/book/show/18744456" target="_blank">அந்தரக்கன்னி</a>
*   <a href="https://www.goodreads.com/book/show/31076019" target="_blank">டாங்கிகளில் சரியும் முல்லை நிலா - Jasmine moon deteriorating in tanks - Poetry</a> - <a href="https://www.goodreads.com/review/show/1696058665" target="_blank">My Review</a>
*   <a href="https://www.goodreads.com/book/show/30374989" target="_blank">இடக்கை - Left hand - Novel</a>
*   <a href="https://www.goodreads.com/book/show/17560718-kelvikuri" target="_blank">kelvikuri - Question mark - Philosophy</a>

If you read so far, you care for books. Write a blog post and share with the world. I’d be happy to see your reading list. Happy reading!

- [RC Week 1011](https://kracekumar.com/post/154669792910/rc-week-1011/index.md): RC Week 1011

+++
date = "2016-12-19 08:21:35+00:00"
draft = false
tags = ["recursecenter"]
title = "RC Week 1011"
url = "/post/154669792910/rc-week-1011"
+++
### EDHT

I continued to work on the project and added the few features.

*   Data replication - One of the main features of DHT is replicate the data across a subset of nodes in the cluster. Remember not all nodes! Depending on the number of copies to store, say N, the data is stored in `` N - 1 `` nodes starting from the primary node in the anti-clockwise direction.
*   Routing - Every node in the cluster has equal responsibility, and there is no master. For `` key metamorphosis `` the data is stored in the nodes `` n1, n2, n3 `` as per consistent hashing. The node `` n4 `` receives the `` GET `` request for the `` key metamorphosis ``. Node `` n4 `` doesn’t have the data and acts as coordinator. The coordinator forwards the requests to any one of the nodes, `` n1, n2, n3 ``. Depending on the minimum number of successful response configuration, the receiver requests other nodes, collates the response and sends the response back to the coordinator.

I read multiple articles about Vector clocks - <a href="http://basho.com/posts/technical/why-vector-clocks-are-easy/" target="_blank">vector clocks are easy</a>, <a href="http://basho.com/posts/technical/why-vector-clocks-are-hard/" target="_blank">vector clocks are hard</a>, <a href="http://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks" target="_blank">why Cassandra doesn’t need vector clocks</a>. The vector clock is the next feature to implement in the project. The vector clock’s primary usage is conflict resolution. The DynamoDB paper suggests offloading the conflict resolution to the client.

### Good luck

Thursday was cold in New York, -4-degree Celsius and end of the batch party. The evening was an emotional roller coaster. In last twelve weeks, I picked up different perspective in programming and thinking; multiple thoughtful conversations; developed interests in particular streams; personal discovery, made friends and more. Thanks for all the batchmates and alum for making the stay memorable, educational, caring and helping in my development. I learned from every one of you through personal conversations, presentations, and observations. Thanks for all the RC staff without whom the batch wouldn’t be possible. Of course, I feel grateful for the US Embassy in Chennai for granting VISA on my second attempt - the First-time officer rejected the VISA request stating I may not return to India after the short overseas visit.

Good luck to everyone’s future!

- [RC Week 1010](https://kracekumar.com/post/154363774320/rc-week-1010/index.md): RC Week 1010

+++
date = "2016-12-12 05:18:55+00:00"
draft = false
tags = ["recursecenter"]
title = "RC Week 1010"
url = "/post/154363774320/rc-week-1010"
+++
This week has mostly been calm and cold in New York.

### EDHT

<a href="http://github.com/kracekumar/edht" target="_blank">Distributed Hash Table</a> implementation in Erlang is slowly coming along. The project now supports multi-node communication.

The project uses <a href="https://github.com/basho/bitcask/" target="_blank">bitcask</a> which <a href="https://github.com/basho/riak" target="_blank">riak</a> uses. Erlang’s Key/Value store data is local to the single process. Building Key/Value store from the ground up requires reinventing wheel and time consuming. The leveraging existing library made sense. Bitcask takes care of persisting the data to the disk and can access data without race conditions so far.

<a href="https://en.wikipedia.org/wiki/Consistent_hashing" target="_blank">Consistent hashing</a> is the key part of building distributed hash table. With partitioning and replication, the data is available when one or multiple nodes isn’t available. <a href="http://michaelnielsen.org/blog/consistent-hashing/" target="_blank">Michael Nielsen’s post</a> on consistent hashing with Python implementation was helpful in understanding key concepts. The post doesn’t talk about replication. <a href="https://github.com/carlosgaldino/concha/" target="_blank">Concha</a> library in Erlang implements consistent hashing and has nice API.

The read/write data partitioning and replication is next step!

My curiosity towards Erlang concurrency’s is growing day by day.

### Htop

<a href="http://hisham.hm/htop/" target="_blank">Htop</a> is my go-to tool for process memory debugging and process information. I prefer to use it on `` GNU/Linux `` and `` OSX ``. <a href="https://peteris.rocks/blog/htop/" target="_blank">I read the post by Pieter - htop explained</a>. The post explain how htop works, how Linux handles processes, where and how the running process information is stored. This post is useful for anyone interested in systems.

### Times Square

On Saturday - 10th Dec 2016, I visited <a href="https://en.wikipedia.org/wiki/Times_Square" target="_blank">Times Square</a> at 2:03 AM. I took a subway train from Nostrand Avenue to 50th Street at 1:15 AM. The train was half full at Nostrand Avenue. Few people got into every station in Brooklyn. Few were sleeping in the train stretching the legs in the adjacent seats. Some homeless folks (subway is their home) were resting in the public station seating and besides turnstiles. MTA groundsmen were cleaning the station, collecting the garbage and working in underground construction. The majority of passengers in the car were seniors. There was a crowd in every Manhattan station till 50th Street. The crowd was mostly youth. At 50th street, there was a little place to stand in the car. It was -1 degree Celsius at Times Square. Times Square was colorful with gigantic screens. All the screen was playing ads, movie trailers, and other commercials. Maximum of fifteen people was around <a href="https://en.wikipedia.org/wiki/George_M._Cohan" target="_blank">George M Cohan’s</a> statue. I walked down few streets and reached back home by 3:30 AM in the subway.

There was snow showers in Brooklyn on Sunday. Hope it snows heavily in next few days :-). Next week is the last week of Fall'02 batch.

- [RC week 1001](https://kracekumar.com/post/154068164505/rc-week-1001/index.md): RC week 1001

+++
date = "2016-12-05 07:34:14+00:00"
draft = false
tags = ["recursecenter"]
title = "RC week 1001"
url = "/post/154068164505/rc-week-1001"
+++
This week I spent most of the time on working with Zulip, Erlang and <a href="https://github.com/kracekumar/edht" target="_blank">EDHT</a>.

### Zulip

This week, we (<a href="https://stanzheng.com/" target="_blank">Stan</a>, Jennifer and I) continued efforts on implementing message reaction. We made decent progress and had a good create reaction frontend. During the journey, we discovered an <a href="https://github.com/zulip/zulip/issues/2492" target="_blank">interesting bug</a>. Debugging the bug on production minified JS with the debugger was fun. We are close to completing the completing the initial version. It’s still a few days away from closure.

### Erlang

Day by day I am getting comfortable with Erlang. I learned about <a href="http://erlang.org/doc/man/supervisor.html" target="_blank">supervisors</a> and <a href="http://learnyousomeerlang.com/what-is-otp" target="_blank">OTP</a>. Using supervisors along with process feels like writing an OS. Supervisors provide the ability to monitor the process or processes and take action on failure. The functionality is critical while writing production code.

### EDHT

Past three weeks I have been learning Erlang and reading upon on DHT - Dynamo DB paper. That means little or no code. <a href="https://github.com/granders" target="_blank">Geoff Anderson</a> and I paired on Saturday started implementing <a href="https://github.com/kracekumar/edht" target="_blank">DHT</a>. The client and server (still single node, the cluster is coming ) communicate over UDP using protobuf protocol. Still, the communication format among nodes, failure detection is unclear. But spawning is Erlang is child’s play. I can think how hard is to spawn a new thread in `` rust `` and think about parameter lifetime. Both has it’s own advantage and disadvantage. Cool, no flame wars now.

### Statue de la Liberté

I avoided tourist spots in NYC for past two months. A good friend from India is visiting NYC, so I accompanied to Statue de la Liberté. The lady was tall bearing a torch and standing still. She was performing her business. I couldn’t find a single soul profoundly speaking about liberty or current status quo. Everyone was joyfully photographing the lady in the background. The museum in <a href="https://en.wikipedia.org/wiki/Ellis_Island" target="_blank">Ellis Island</a> near Statue of Liberty holds excellent US immigration information over centuries. One of the section represents the country wise immigrant visualization across all US states. The immigrants from India mostly live in BayArea, New York. The immigrants from Mexico, Germany, Netherland, Britain are spread across all states in similar proportion. The museum provides an audio guide device for interested visitors. The device comes with a small display and nine buttons like a feature phone. Every section has a number. So pressing a number `` 3 `` and hitting the start buttons plays a recorded audio about immigration process in the past. Yes, I can hear you geek, what happens when the total number of sections exceeds 9?

I hit a personal human bottom and lost all my credibility. Please don’t ask.

Last two weeks of the RC batch.

- [RC week 1000](https://kracekumar.com/post/153759852775/rc-week-1000/index.md): RC week 1000
+++
date = "2016-11-28 03:49:59+00:00"
draft = false
tags = ["recursecenter"]
title = "RC week 1000"
url = "/post/153759852775/rc-week-1000"
+++
This week has been quiet, bikeshedding, holiday week and unproductive week so far in RC.

### Zulip

<a href="https://github.com/zulip/zulip" target="_blank">Zulip</a> is a Python-based open source group chat. RC uses zulip for the internal chat. Unlike GitHub, zulip doesn’t support inline emoji reaction for messages. I am collaborating with <a href="https://github.com/arpith" target="_blank">Arpith</a> and <a href="https://github.com/stanzheng" target="_blank">Stan Zheng</a> to add the missing <a href="https://github.com/zulip/zulip/pull/2387" target="_blank">feature</a>. The first time I came across the <a href="https://github.com/zulip/zulip/pull/2387/commits/159854cc62f19c70193a56fc868853f157640730#diff-c29a639b9c074638e5b08d59f0be3997R974" target="_blank">limitation</a> of Django’s `` values `` method. If you’re considering setting up a group chat for your community or company, try out zulip!

### Erlang

Now I am getting the hang of the language. The uncomfortable part is writing recursive functions. I still think in loops; sometimes I write counterpart in Python and port to Erlang. I feel Erlang code is readable like Python. Now, I am reading up socket handling and message passing. Once I cross few more chapters, I will get my hands dirty on implementing DHT.

### Streisand

For very long time, I was thinking of using VPN. Finally, I asked the advice and setup <a href="https://github.com/jlund/streisand" target="_blank">streisand</a>. So what is Streisand? Quoting directly from GH description

>
> Streisand sets up a new server running L2TP/IPsec, OpenConnect, OpenSSH, OpenVPN, Shadowsocks, sslh, Stunnel, and a Tor bridge. It also generates custom configuration instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.
>

The project comes with ansible playbook and orchestration is done from local computer for a chosen providers. So you roll out your VPN server without losing sanity! I decided to deploy in DigitalOcean. The default procedure had issues with picking up `` SSH Keys ``, so provisioned the machine and ran the command `` ansible-playbook playbooks/streisand.yml ``. The setup took ten minutes, and all the services were up and running. Till now no issues with the server at all.

The project auto generates documentation with instructions for using various supported protocols and devices. The generated instruction is super helpful and accurate except setting up `` L2TP/IPSec `` on Linux. The project supports different protocols; I choose `` L2TP/IPSec ``.

<a href="https://www.wikiwand.com/en/Layer_2_Tunneling_Protocol" target="_blank">L2TP - Layer 2 Tunneling Procotol</a> is a protocol for supporting virtual private network without encryption. For encryption <a href="https://www.wikiwand.com/en/IPsec" target="_blank">IPsec</a> is used. IPsec takes care of authentication and encryption. The device connects to a VPN server with `` username `` and `` password ``. Once the authentication is successful, all the traffic origination from the machine is directed to VPN server. So you can avoid incidents like <a href="https://medium.com/@karthikb351/airtel-is-sniffing-and-censoring-cloudflares-traffic-in-india-and-they-don-t-even-know-it-90935f7f6d98#.wtz9evom2" target="_blank">this</a> and get rid off geographic firewall, packet sniffing, and other threats.

Setting up `` L2TP/IPsec `` on `` iOS, Mac OSX, Andriod `` is a cake walk. But on `` Linux `` I burnt all my midnight oil. The Streisand has no documentation for setting on any Linux system. The tutorials from <a href="https://www.elastichosts.com/support/tutorials/linux-l2tpipsec-vpn-client/" target="_blank">elastichosts</a> and <a href="https://wiki.archlinux.org/index.php/Openswan_L2TP/IPsec_VPN_client_setup" target="_blank">ArchLinux wiki</a> was my saviour. I had never configured any of these before, and the configuration file format is different. I spent 3-4 hours battling with `` ipsec verify `` and `` xl2tpd `` to get up and running. Finally, `` syslog `` helped me debug the issue.

For the first time, I got a chance to use <a href="https://www.wikiwand.com/en/Iptables" target="_blank">iptables</a>. Iptables is a <a href="https://www.wikiwand.com/en/User_space" target="_blank">user-space</a> command line packet filtering tool. The tool provides options to filter the incoming and outgoing packets. So try the command `` ip route show `` in your terminal.

During the setup, multiple times I configured the wrong values in iptables and my machine was out of internet connection. I shot my foot several times :-( In `` L2TP/IPsec ``, iptables is used to forward all packets to VPN server. All the `` iptables `` entries live as long as the machine is running. After the restart or reboot system, all the values need to be reconfigured. So there is a project `` iptables-persistent `` to persist entries after the system reboot.

I am still having issues with automating `` init `` bash scripts for Linux. The iptables entries differ depending on the network subnets. My home network subnet is `` 192.168.0.0/24 ``, and at premise, the subnet is `` 10.0.0.0/16 ``, `` 10.0.0.0/24 ``. The values change for devices like `` wlan0 `` and `` eth0 ``. As of now I running the script every time, I restart the machine.

Here is a sample add entry

`` ip route add <vpn server> via 192.168.1.1 dev wlan0 ``

Here `` 192.168.1.1 `` is the gateway and `` wlan0 `` refers WiFi device.

### Thanksgiving

Last Thursday was <a href="https://www.wikiwand.com/en/Thanksgiving" target="_blank">Thanksgiving</a>, and the premise was quiet, and vibe of the place was totally missing.

>
> On Thursday, all my favorite veg brunch and dinner places were shut down; but all halal carts were on the streets. Few city blocks were dark, and all the building light was turned off but fine dining places were decorated and colorful. Few avenues had street vendors; the busiest roads were resting contentfully with occasional visitors. The subway car was quiet, folks were relaxed and more than half the seats were empty. The majority of humans departed the city but not the cars. At 10:30 PM street person was waiting for the traffic signal to turn red to collect few spare changes. So who celebrates Thanksgiving?
>

The countdown starts last three weeks of my RC batch.

- [RC Week 0111](https://kracekumar.com/post/153465926785/rc-week-0111/index.md): RC Week 0111
+++
date = "2016-11-21 08:09:16+00:00"
draft = false
tags = ["recursecenter"]
title = "RC Week 0111"
url = "/post/153465926785/rc-week-0111"
+++
This week most of the time I spent learning <a href="https://www.wikiwand.com/en/Distributed_hash_table" target="_blank">Distributed Hash Tables</a> and <a href="http://learnyousomeerlang.com/" target="_blank">Erlang</a>. That means I didn’t write a significant amount of code for the project.

I am reading DynamoDB paper which is Distributed Hash Table and Distributed Key/Value store. Some implementation like `` etcd `` and `` consul `` provides Distributed Key/Value store but replicates all data across nodes. So those aren’t Distributed Hash Table. I am reading the one sections at a time. DynamoDB paper is the first technical paper reading session and on fourth sections. There are a lot of new concepts for me like `` vector clocks, internode communication, consistent hashing `` etc … I will be implementing DHT in Erlang.

I am picking up Erlang, and the syntax is completely weird. While entering Erlang world, leave all your assumptions, expectation outside. The language astonishes me with its design decision. Certain design choice put me into deep thinking like functions declaration doesn’t start with `` def `` or `` fn `` or `` func ``. Here are few interesting examples.

*   Equal to operator `` =:= ``
*   Period `` . `` is the delimiter like `` Lang = [python, rust, erlang]. ``
*   Variable names start with Captial letter.
*

    Pattern matching in function



    can\_vote(X) when X >= 18 -> true;can\_vote(\_) -> false.



If the function `` can_vote `` is called with an argument whose value is equal or greater than `` 18 `` `` true `` is returned. The important point is to note is `` _ `` which means any other value. The pattern matching can be applied to a total number of arguments which I haven’t come across before. I don’t think so this is same as Polymorphism.

I have only started writing Erlang code and still got a long way to go to write a DHT. Implementing and maintaining DHT can be a good way to learn distributed systems.

On November 20, the temperature in NY/Brooklyn was around 6 degrees Celcius in the evening. This is the first time I am experiencing such a cold weather and the wind.

Every day near my residence, I find an old street person at a signal asking for alms. Once in a while, I speak to him and offer coins. Today while I was returning from the laundry in the night (8:30 PM) in the cold weather, he was waiting for a car to stop or passerby near the signal to share a spare change. I was running to home to escape hard winter weather. He spotted me and said me Good Night. I smiled and reciprocated the greeting. Once I reached home, this incident was oscillating in mind. Everyone in the alley was moving quick to get home, but he was doing his routine in adverse weather.

This lead me to ask the series of questions.- When was the last I was in adverse uncomfortable condition? - Why can’t he catch an early sleep or relax?- Has he become numb to the surroundings?

- [Side Project Feasibility](https://kracekumar.com/post/153252087240/side-project-feasibility/index.md): Side Project Feasibility
+++
date = "2016-11-16 08:27:53+00:00"
draft = false
tags = ["recursecenter", "sideproject"]
title = "Side Project Feasibility"
url = "/post/153252087240/side-project-feasibility"
+++
It’s common for developers to work on side projects. The reason why developers work on a side project is a long list of imaginable and unimaginable reasons. My main reason to work on a side project is to learn how things work or to build a utility program for use.

One of my recent projects is to monitor the internet traffic and aggregate the traffic based on the domain name - bandwidth monitor. Before jumping into the code, I read a little bit about OSI Layers, TCP, UDP, IP and Ethernet packets. The project revolves around “domain name.” The assumption was at some layer domain name will be available in the packet. After capturing the packets, decoding the packet to TCP layer, I realized domain name is present in the HTTP header. I was happy with the approach. Everything worked for HTTP traffic. During the conversation of the project with <a href="https://github.com/porterjamesj" target="_blank">James  J Porter</a>, I mentioned parsing HTTP header is the way to retrieve the domain name and aggregate traffic information. He gave an alternative idea of caching DNS requests. But I was stuck with the notion of parsing HTTP request.

While testing `` HTTPS `` traffic, this solution fell apart. Unless the program has the SSL certificate of the domain and the key, the code can’t decode HTTPS packet. I couldn’t find a solution to parse encrypted traffic. Accidently, I opened a terminal and typed the command `` curl -v <favsite> ``. Suddenly the laser beam appeared from the dark terminal in the form of a cache miss message, `` * Hostname was NOT found in DNS cache ``. What? `` curl `` uses DNS cache? If `` curl `` caches DNS requests, there is a profound reason for it. Why not replicate the same behavior, then pull out IP from the IPv4 packet and derive domain name from DNS answer? There must be a reason why James even suggested the approach. This decision was the core for completing the <a href="https://github.com/kracekumar/imon" target="_blank">project</a>.

While working on the project, this thought kept bouncing in my mind, “What would have happened to the project if you weren’t at RC?”. I had two answers

*   I must have dumped the project in the middle of the journey and pivoted the project.
*   Somehow, I must have found DNS cache approach while walking in the park.

Take away for future projects

*   While I am at RC, I can explain the project intention, implementation approach, what I already know and I don’t know to someone who has knowledge of the subject or to a willing soul to hear me out.
*   What can I do while I am at not RC? Write the known, unknown details and ask for an opinion in Email/Twitter/IRC.
*   For certain projects like building an API Client or implementing a paper finding a reference implementation and looking at different parts of the code answers a lot of questions. I followed this approach while implementing BitTorrent client.

If your intention of the side project is to measure the feasibility, then this post is not for you :-)

What are your thoughts? What method do you follow while creating something for the first time?

- [RC Week 0110](https://kracekumar.com/post/153157531640/rc-week-0110/index.md): RC Week 0110

+++
date = "2016-11-14 04:17:29+00:00"
draft = false
tags = ["rust", "recursecenter"]
title = "RC Week 0110"
url = "/post/153157531640/rc-week-0110"
+++
This week, I reached the first milestone of the project - <a href="http://github.com/kracekumar/imon" target="_blank">imon</a> which I worked on for past couple of weeks.

`` imon `` is a command-line utility to record all network traffic and classify the data according to the domain name.

Here is a <a href="https://slides.com/kracekumarramaraju/who-is-eating-my-bandwidth/" target="_blank">high-level presentation</a> of the project.I am looking for code review on the project and happy to answer any questions.

I wrote a <a href="http://kracekumar.com/post/153116176705/my-experience-with-rust" target="_blank">blog post about my experience with Rust</a>. I am excited about the project and hope to keep working on it. If you have any comments on Rust or have any experience using it, please share.

Next, I will be reading about <a href="https://www.wikiwand.com/en/Distributed_hash_table" target="_blank">Distributed Hash Table</a> and implement in <a href="https://www.erlang.org/" target="_blank">Erlang</a>. Also, I haven’t developed any project in Erlang other than `` hello world `` :-)

- [My Experience With Rust](https://kracekumar.com/post/153116176705/my-experience-with-rust/index.md): My Experience With Rust
+++
date = "2016-11-13 07:25:37+00:00"
draft = false
tags = ["recursecenter", "rust"]
title = "My Experience With Rust"
url = "/post/153116176705/my-experience-with-rust"
+++
When I was about to leave to <a href="http://recurse.com" target="_blank">RC</a> in few weeks, wrote an E-mail to <a href="https://twitter.com/punchagan" target="_blank">Puneeth</a> asking for `` Do's and Don'ts at RC ``. One of the line in the mail said,

>
> Since you are a Python guy, don’t write any Python code while you are there. Do something completely different.
>

I contemplated which language to choose. Other than `` Python ``, I knew a decent amount of `` Go-lang `` and `` Javascript ``. I previously attempted to learn `` rust `` but never dived deep into it. I reconsidered learning it and came up with the project idea.

>
> “Capture all the internet packet and figure out how much each website is consuming the bandwidth.”
>

After spending three weeks at RC, I started to work on the project <a href="http://slides.com/kracekumarramaraju/who-is-eating-my-bandwidth" target="_blank">imon</a>. I was excited to work on the project for several reasons

*   Everyone around me spoke about Rust’s memory model and how well rust is a good candidate for writing safe system programming code.
*   I have never done any low-level networking projects, though I had built a real-time backend for a messaging app.
*   This project involved working with multi-threaded programming, coroutine, and async networking at the same time.

### First bummer - Lifetime

<a href="https://github.com/ebfull/pcap" target="_blank">pcap</a> is a rust crate to capture the packets from a device(`` Wifi ``, `` Ethernet port ``). To get any useful information from the packet, you need to decode the packet carrying data of various layers. So decode `` Ethernet packet `` to pull out the payload. The payload becomes the next layer packet. The step goes on till you find required information. In my case, I need to have `` TCP `` or `` UDP `` packet to pull out wanted data. Capturing and decoding packet in same thread will make the program drop a lot of frames. So having a separate thread to sniff and decode made the program performant. So how do you communicate between two threads? I choose message passing since I had experience with `` coroutines ``, and `` go channels ``.

While passing the message via the channel from `` sniffer `` thread to `` decoder `` thread, packet didn’t live long enough to go through the channel. Here is the <a href="https://github.com/ebfull/pcap/issues/58" target="_blank">GH issue</a>.

The first suggestion which came up when I asked around was `` clone `` the packet and send across the channel. After hours of debugging, I didn’t get the solutions. After few days, <a href="https://twitter.com/k4rtik" target="_blank">karthik</a> replied

>
> The problem is that you are trying to send a reference with a limited lifetime across a thread boundary, which is not allowed. The `` clone `` is for a slice (a reference to a byte array), you probably want to do a `` clone_from_slice `` as shown here into a static byte array, then you should be able to talk across a thread (though this technique requires using unsafe).
>

I had multiple encounters with a lifetime of `` strings ``, `` string slice ``. Still, I am facing life time issues. As name suggests, `` life time `` problem is for `` life time ``.

### Traits

To collate the traffic data, the inferred information from packets needs to reside somewhere. SQLite was my preferred choice for ease of installation - I <a href="https://gist.github.com/kracekumar/d58ec0beab1d2ea4b17dff77aab22a58" target="_blank">https://gist.github.com/kracekumar/d58ec0beab1d2ea4b17dff77aab22a58</a>

Using the above code to insert records failed because `` FromSql trait `` wasn’t satisfied for the field `` date ``.

Adding the following lines didn’t fix the problem either.

    impl FromSql for chrono::Date<utc> {
        fn column_result(value: u32){
             println!("{:?}", value);
        }
    }

The compiler threw up an another error

`` The compiler allows declaring trait for types defined in the current crate. Date<utc> in Chrono doesn't implement FromSql trait. ``

`` rusqlite `` included support for `` DateTime `` <a href="https://github.com/jgallagher/rusqlite/blob/master/src/types/chrono.rs" target="_blank">in separate file</a>. Importing `` use rusqlite::type::chrono::{DateTime<utc>} `` failed. For a couple of hours, I was puzzled, how come code in the project directory is unimportable? Reading the <a href="https://github.com/jgallagher/rusqlite#optional-features" target="_blank">README</a> file carefully revealed, the <a href="http://doc.crates.io/manifest.html#the-features-section" target="_blank">Cargo</a> supports optional features to be included along with core library. Changing the `` Cargo.toml `` to have `` features `` attribute to `` dependencies.rusqlite `` fixed the issue.

    [dependencies.rusqlite]
    version = "0.7.3"
    features = ["chrono"]

What this means, the additional features which are part of the code base is compiled with binary only when specified. The use case is similar to installing `` SQLAlchemy `` and installing `` psycopg2 `` driver to connect `` Postgres ``.

### Serialization

The program follows `` daemon `` and `` client `` architecture. When client queries `` daemon `` to send traffic data like `` ./binary site kracekumar.com ``, the daemon sends out all the traffic data associated with the domain. I choose <a href="https://docs.rs/rmp/0.7.5/rmp/" target="_blank">msgpack</a> format to communicate. The `` rmp `` crate can automatically encode/decode `` struct `` to `` msgpack `` as long as attribute `` #[derive(RustcDecodable, RustcEncodable)] `` is used.

<div class="gist"><a href="https://gist.github.com/kracekumar/91bbc05d4d8201c35db0547bb8db086b" target="_blank">https://gist.github.com/kracekumar/91bbc05d4d8201c35db0547bb8db086b</a></div>

If the struct field is declared in an another crate like `` rusqlite ``, the source crate needs to support `` serialization ``. `` rusqlite `` supports `` serde_json `` and not `` rustc_serialize ``. So I ended up using custom tuple for serialization. Probably a good candidate for PR to `` rmp ``!

### Signal Handler

Rust natively doesn’t support signal handler like `` SIGTERM ``, `` SIGINT(Ctrl - C) ``. This is a good and bad decision. When the `` SIGINT `` interrupt is received; the daemon could write all the cached DNS mapping to a file and read the file during startup. <a href="https://github.com/BurntSushi/chan-signal" target="_blank">chan-signal</a> provides a way to do this, but only works for threaded code. Even if the program is designed to run on a single thread, `` chan_select `` requires you to spin a new thread, when the new thread receives the interrupt, the main thread executes `` chan_select! `` and clean up is performed inside macro.

<div class="gist"><a href="https://gist.github.com/kracekumar/c4012062b73612b3699e1de03774d11e" target="_blank">https://gist.github.com/kracekumar/c4012062b73612b3699e1de03774d11e</a></div>

My program uses a `` HashMap,  Arc `` to share data between threads safely. Using `` chan_signal `` with existing code structure caused lifetime issues for `` hashmap ``. I tried for a couple of hours to get it working, and it looked complicated than I thought. I have left to figure this out for future. Do you know any multi-threaded rust program that handles signals?

### Tests

I enjoy writing tests. It spots design smell and gifts subtle hints on `` API flexibility ``. Rust follows a different convention for `` unit tests `` and `` integration tests ``. Unit tests reside in the same file where the function is defined marked by the attribute `` #[test] ``. Integration tests are set in `` tests `` directory at the same level as `` src ``. By default rust, compiler harnesses multi-threading. If you’re web developer, you can think of what can happen in integration tests :-)

Running tests in multiple threads saves a considerable amount of time for large test suites. But tearing down tables or database for every test causes race conditions in DB. Test case for `` create_or_update_record `` requires serial execution. Setting environment variable `` RUST_TEST_THREADS=1 `` runs all tests in serial mode. `` RUST_TEST_THREADS `` environment variable affects both `` unit `` and `` integration `` tests.

`` Cargo.toml `` supports `` [[test]] `` section. Test section looks like

    [[test]]
    name = "db_integration"
    harness = false
    test = true

`` name `` points to file inside `` tests/db_integration.rs ``. `` db_integration.rs `` should have a entry point i.e `` main function ``. Tests inside `` db_integration.rs `` doesn’t contain `` #[test] `` attribute. The code looks like

<div class="gist"><a href="https://gist.github.com/kracekumar/dd289aa669e434c302b4620f987eb99e" target="_blank">https://gist.github.com/kracekumar/dd289aa669e434c302b4620f987eb99e</a></div>

Integration tests can only access `` public module/struct ``in the project crate. To assert on a struct’s field, the field should be declared `` pub `` keyword like `` pub has_changed: bool `` which makes sense.

### Error Messages

Rust error message is clear, concise, colorful and comes with a error code most of the times. Error code gives detailed write up about the possible scenario with an example. Here is an error message

<div class="gist"><a href="https://gist.github.com/kracekumar/ad339891e0792229ebce93d5eea90dca" target="_blank">https://gist.github.com/kracekumar/ad339891e0792229ebce93d5eea90dca</a></div>

Cargo `` explain `` flag displays verbose information about the error and link to the RFC.

<div class="gist"><a href="https://gist.github.com/kracekumar/3800fe67395acc5b8794c89e1597d830" target="_blank">https://gist.github.com/kracekumar/3800fe67395acc5b8794c89e1597d830</a></div>

This kind of error message explanation kindles interest to learn more. Some error message comes with excellent apt suggestions to fix the error.

At one place, rust error message is confusing and annoying. Error message returned while `` unwrapping a None `` is confusing and doesn’t print line number where error occurred. Here is an example

<div class="gist"><a href="https://gist.github.com/kracekumar/df1d975acfae9bc2a40ba3732b8b3ccf" target="_blank">https://gist.github.com/kracekumar/df1d975acfae9bc2a40ba3732b8b3ccf</a></div>

The correct way to unwrap an `` option `` is `` match ``, `` try! ``, `` ? ``, operator which is available in 1.13 or `` unwrap_or_else ``. Once I mistakenly accused foreign crate as unstable for the error. This took many encounters for me to figure out, the problem is with `` unwrap ``.

Here is output after setting `` RUST_BACKTRACE ``.

<div class="gist"><a href="https://gist.github.com/kracekumar/ac29d0f89ed9dfbcdf03b32b0d8a511b" target="_blank">https://gist.github.com/kracekumar/ac29d0f89ed9dfbcdf03b32b0d8a511b</a></div>

Having some sort of way to highlight the current project code in the backtrace can be visually useful for bigger project and nested code path.

### `` lib.rs `` and `` main.rs ``.

Here is my `` src `` directory structure.

    user@user-ThinkPad-T400 ~/c/imon> ls src
    cli.rs*  decoder.rs    ipc.rs   main.rs
    db.rs    formatter.rs  lib.rs*  packet.rs

`` main.rs `` is the entry point for the `` daemon `` and `` client ``. The code looks like

    #[macro_use] extern crate log;
    extern crate env_logger;
    extern crate imon;
    fn main(){
        env_logger::init().unwrap();
        imon::cli::parse_arguments();
    }

`` lib.rs `` contains all the external crate imports and public modules. The public modules declared here are only importable in any foreign crate utilizing the project.

    #[macro_use] extern crate log;
    extern crate env_logger;
    use std::fmt;
    pub mod cli;
    pub mod decoder;
    pub mod packet;

To use a `` macro `` defined in the crate `` log `` in any rust file inside `` src `` except `` lib.rs `` no import is needed. All foreign crate are imported in `` lib.rs `` and another rust file can use importables like `` use foo; foo.method() `` but in `` main.rs `` needs `` extern crate log; `` statement and doesn’t look in `` lib.rs ``. The import distinction between `` lib.rs `` and `` main.rs `` is a gray area to understand since both the files reside in the same directory.

Currently, I feel comfortable with `` rust ``. I haven’t used most of the features in `` rust `` like `` mutex, macros `` or distributed binaries. I have lots to read about rust, writing idiomatic rust but I am confident of using `` rust `` in production. I am learning what pieces can fall apart. There has been an enormous amount of work gone into rust compiler and tooling especially `` cargo ``. If you’re aware of Python world, `` Cargo `` is a single crate which performs multiple Python packages work `` pip ``, `` virtualenv ``, `` pytest `` and `` cookiecutter ``.

I owe a big part to RC folks who patiently assisted during the learning curve. Thanks to <a href="http://github.com/caipre" target="_blank">Nick Platt</a>, <a href="https://github.com/kamalmarhubi" target="_blank">Kamal Marhubi</a>, Mike Nielsen and others.

- [RC week 0101](https://kracekumar.com/post/152847253135/rc-week-0101/index.md): RC week 0101

+++
date = "2016-11-07 06:53:52+00:00"
draft = false
tags = ["rust", "recursecenter"]
title = "RC week 0101"
url = "/post/152847253135/rc-week-0101"
+++
>  
> Time flies faster than you can perceive.
> 

The first half of twelve weeks experimental journey ends this week.

This week was the last week for `` Fall 01, 2016 `` batch and my batch is due in next six weeks. I wish everyone in `` Fall 01, 2016 `` batch Good luck for their future endeavors. I made few friends, helped few people in their project; few helped me in my projects and a lot of unnoticed learnings.

Finally, I have a working prototype, the daemon captures all WiFi traffic, caches DNS requests and stores the domain traffic for every day in the SQLite DB. The client connects to the daemon over TCP socket and queries the data. Below is a screen shot of the daemon and the client side by side.

<img height="500" src="https://lh3.googleusercontent.com/qLdfH1dypbTx3jLvS9Ot6aGJed0Q_6P4VrnccSGHLUcQsbBlpZ6mNPtK1xsxxDThejLzPKVLCcqE=w1133-h708-no" width="600"/>

I am working on this project to this week to address cache misses, add test cases and refactor the existing code. You can follow the <a href="https://github.com/kracekumar/imon" target="_blank">GitHub repo</a> for updates.

`` Winter 01, 2016 `` batch starts on Monday, 7th November. Welcome to RC and enjoy your time here!

- [Man's Search for Meaning - Book Review](https://kracekumar.com/post/152845373255/mans-search-for-meaning-book-review/index.md): Man's Search for Meaning - Book Review
+++
date = "2016-11-07 05:48:12+00:00"
draft = false
tags = ["philosophy", "book-review"]
title = "Man's Search for Meaning - Book Review"
url = "/post/152845373255/mans-search-for-meaning-book-review"
+++
<figure class="tmblr-full" data-orig-height="500" data-orig-src="https://c1.staticflickr.com/5/4015/4319098051_719610b047_z.jpg?zz=1" data-orig-width="375"><img data-orig-height="500" data-orig-src="https://c1.staticflickr.com/5/4015/4319098051_719610b047_z.jpg?zz=1" data-orig-width="375" src="https://66.media.tumblr.com/4c7b551cff084d5f6ece1035136300ea/tumblr_inline_pk2eqrXMPd1qc390z_540.jpg"/></figure>

## What can you do during acute inhuman soul-crushing conditions in life? Hope. This book is about “Hope.” - dopamine for life.

Psychiatrist Viktor Frankl’s presents his life in Nazi death camps and learning for spiritual survival. The narrative is based on his experience and the experiences of others he treated later in his practice. The author aptly reiterated throughout the book “We cannot avoid suffering, but we can choose how to cope with it, and find a meaning to live.”

The book has three parts, life and lessons from the death camp, <a href="https://www.wikiwand.com/en/Logotherapy" target="_blank">logotherapy</a>, and final citings from various incidents of depression of his patients and others.

I was pleasantly surprised when the author described his outlook towards happiness, which I have been thinking lately and coincided with my formulation

> “Don’t aim at success. The more you aim at it and make it a target, the more you are going to miss it. For success, like happiness, cannot be pursued; it must ensue, and it only does so as the unintended side effect of one’s personal dedication to a cause greater than oneself or as the by-product of one’s surrender to a person other than oneself. Happiness must happen, and the same holds for success: you have to let it happen by not caring about it. I want you to listen to what your conscience commands you to do and go on to carry it out to the best of your knowledge. Then you will live to see that in the long-run—in the long-run, I say!—success will follow you precisely because you had forgotten to think about it.”

- [RC Week 0100](https://kracekumar.com/post/152543733085/rc-week-0100/index.md): RC Week 0100

+++
date = "2016-10-31 05:31:05+00:00"
draft = false
tags = ["rust", "recursecenter"]
title = "RC Week 0100"
url = "/post/152543733085/rc-week-0100"
+++
I had a decent progress with my rust HTTP traffic monitor. I read a lot about the different type of packets like Ethernet Packet, IPv4 Packet, TCP Packet, UDP Packet, DNS Packets. I wrote a parser for Physical Layer packet, TCP Packet, UDP Packet, DNS Packet. The parser helped me understand a lot about rust functions, data types like `` array, vectors ``, string and string literal. I am not enjoying the relationship with rust lifetimes. Lifetime is getting harder for me to grok with nested struct’s lifetime.

Big thanks to <a href="https://github.com/k4rtik" target="_blank">k4rtik</a> for solving my last <a href="https://github.com/ebfull/pcap/issues/58" target="_blank">week issue</a>. While writing the <a href="https://github.com/kracekumar/imon/commit/1d4ab6a2a300dfa36ca6ba7da6473119e950aeab" target="_blank">parser</a>, I missed <a href="https://docs.python.org/2/library/struct.html" target="_blank">Python’s struct module</a>. Python struct module helps writing network code easier by decoding different data types like `` string, char, unsigned and signed integer `` by specifying the format against a binary data. I think it’s a useful piece of utility to build in rust. The code is an ugly mess of pieces tied together to work. Once the utility works, I will convert the working bits into idiomatic rust.

Next step is to get the program to store the captured HTTP packet to store in SQLite table indexed by date.

Thanks to <a href="https://nick-platt.com/" target="_blank">Nick Platt</a> for answering my rust questions, assisting on debugging and deciphering rust errors messages.

- [RC week 0011](https://kracekumar.com/post/152280595435/rc-week-0011/index.md): RC week 0011

+++
date = "2016-10-25 04:40:21+00:00"
draft = false
tags = ["recursecenter"]
title = "RC week 0011"
url = "/post/152280595435/rc-week-0011"
+++
This week commenced with the second project at RC in rust. I am building a command line utility to monitor internet bandwidth consumption categorized by websites. I had a general idea of how to go about it and drew rough sketches.

As a lot of people suggested me, I am fighting battles with rust compiler in a life time and borrowing. I hit my first road block with `` pcap `` library and here is the <a href="https://github.com/ebfull/pcap/issues/58" target="_blank">issue</a>. I am using threads, socket in the application together (Tough battles at the same time!). So far progress is dull because of the unfamiliarity of the language. In a week’s time, I will have a useful piece which does one part.

On Sunday afternoon, I walked all the way from Nostrand Avenue in Brooklyn to Recurse center. Thanks, <a href="http://dpb.bitbucket.org/" target="_blank">David Branner</a> for seeding the thought. I enjoyed the road sidewalk amongst joggers, empty parks, couples cycling, graffiti, all other objects. Walking on Manhattan Bridge’s pedestrian was scenic, and I could spend hours look at the water, boats, and statue of liberty like a cheery on the cake.

- [Quiet - Book Review](https://kracekumar.com/post/152278984540/quiet-book-review/index.md): Quiet - Book Review

+++
date = "2016-10-25 03:56:09+00:00"
draft = false
tags = ["book review"]
title = "Quiet - Book Review"
url = "/post/152278984540/quiet-book-review"
+++
![](https://lh3.googleusercontent.com/tqj75ZOvxB1aTiC5gRj7dOCjCFN7cwBgc0KrDzGo6UgrWbjv1qJCPTx2Gt5UETL1oPNo9F1wJ9HGW2f4C7OrsevU0GibfgoZ8n79sxRWgKsr8vFVcYTUhSZh8bJdGUOYX5JcXgUAVIkARMVUjkyXorjqs4DYwCH0lFwrqhAEW72PlLj_64okm96ADMr_k7NhKlCtf4GiFU0pNubvrm-Gf0H_4h4OtQZCQ5dy_RmgplcwP025DweZnPjuQZMjYt-73r4M0RCxUr2Irf6nHs-yPbKviE6Vrhmo7VYLgfoHevZdkXAsYu2n9sndJCKm6zS3-nsET72nwtRz2umBReVGESWuNPF6It20eFq3TeriYo-DhM9glsHX53shVTPotbPY9d0fzht_KmcfgfBSaDAoGIOJN2gElWFyAAqzFScfqp0rNbtsKKT_HlUwiVlyKIFw3Ive2Sh5O_T7P6r_HGw-aX36zp6C-REkbq0pW4HtDWEh36IOzHbiH75K5djSuKhMB6vpHYRNENrNH2YndEIrzZmZWxsSq74q_QwpNHuO3jJ5RrLF7K3UQsEmsSx5rOn8uMXlG5qrvwiBvQeppu96lmngjv9WmRzeA1fWPu4NwjxvC9Sp=w399-h708-no)

<a href="https://www.goodreads.com/book/show/8520610-quiet" target="_blank">Quiet: The Power of Introverts in a World That Can’t Stop Talking</a> is an excellent book about introverts and what makes introvert an introvert. The book dives deep how different culture views introvert, analyses what intrigues introverts and in what circumstances they effectively work. The book makes a lot of parallel comparisons with extroverts. What intrigues extroverts and why introverts are good at solving complex problems.

The book accounts for various behavior patterns like how introverts behave in the social events, how to raise introvert kids without nonsense advice. The author has drawn a lot of case studies from researchers about extroverts and introvert behavior; what happens when introvert and extrovert talk each other and what sort of topics comes up and when does introvert act like an extrovert.

The book touches my favorite subject, why introvert dislikes open office culture which reduces productivity and impair memory.

Being an introvert, I thoroughly enjoyed the book, at times it was self-revelation and similar to personal incidents. After reading the book, one can relate how the world is extrovert-centric. If non-fiction interests you or you’re an introvert, you can’t miss the gem.

- [RC Week 0010](https://kracekumar.com/post/151921371645/rc-week-0010/index.md): RC Week 0010

+++
date = "2016-10-17 06:04:45+00:00"
draft = false
tags = ["python", "recursecenter"]
title = "RC Week 0010"
url = "/post/151921371645/rc-week-0010"
+++
This week has been a mixed ride with the torrent client. I completed the two pending features seeding and UDP tracker. The torrent client has a major issue with downloading larger torrent file like `` ubuntu `` iso file. The client starts the downloads from a set of peers and slowly halts at <a href="https://github.com/kracekumar/bt/blob/curio/bt/protocol.py#L75" target="_blank">sock.recv</a> after exchanging a handful of packets. At this juncture CPU spikes to 100% when `` sock.recv `` blocks. Initially, the code relied on `` asyncio `` only features, now the code uses `` curio `` library. Next time you write `` async `` code in Python 3, I would suggest use curio. Curio’s single feature of tracking all `` tasks `` states is magical wand for debugging. The live debugging facility helped me track down the blocking part of my code. Here is how curio’s debug monitor looks

<div class="gist"><a href="https://gist.github.com/kracekumar/c0513a589f14ec0fd270d526cec127c3" target="_blank">https://gist.github.com/kracekumar/c0513a589f14ec0fd270d526cec127c3</a></div>

I am sure; the bug should be a logical error in the code, or I am doing async completely wrong. I will travel with a bug for a day or two and see where I land. This task is slowly emerging the toughest bug I have faced.

I am happy, people at RC are keen to help and assist in all different ways. In case you’re reading this and have asyncio expertise to assist me, I will glad to hear. The easiest way to replicate the bug is to check out the code, switch to `` curio `` branch, install requirements in Python 3.5 venv and run the command `` python cli.py download ~/Downloads/ubuntu-16.04.1-desktop-amd64.iso.torrent --loglevel=debug ``. After a minute or two you can see the program using 100% CPU in `` htop `` or `` top ``. Feel free to leave a comment or ping me in twitter.

- [RC week 0001](https://kracekumar.com/post/151547342115/rc-week-0001/index.md): RC week 0001

+++
date = "2016-10-09 05:09:09+00:00"
draft = false
tags = ["recursecenter", "python", "bittorrent"]
title = "RC week 0001"
url = "/post/151547342115/rc-week-0001"
+++
This week, I made considerable progress on the BitTorrent client which I started a week back. The client is in a usable state to download the data from the swarm. The source code is available on <a href="https://github.com/kracekumar/bt" target="_blank">GitHub</a>. The project uses Python 3.5 `` async/await `` and <a href="https://docs.python.org/3/library/asyncio.html" target="_blank">asyncio</a>. I presented the torrent client in RC Thursday five minute presentation evening slot. Here is the link to the <a href="http://slides.com/kracekumarramaraju/bittorrent-client" target="_blank">slides</a>.

<a href="http://slides.com/kracekumarramaraju/bittorrent-client" target="_blank">

<img src="https://lh3.googleusercontent.com/3bvl-5OwgFkwHvero3PiqS4NAf0rVLEK86hGa1WEoBqJsatbf1zi6EAEhsPql4OfWVh4GLUkTXDz_ZsjfcpKOdfwK63li8KGp78xRW9xVUW3OwTCOBYoK_XSKsN4iIKVZi9hgr65_5oQtTeSBfbQRtJdHPgv_23LXKHlpO5jTIjCtLvYj-aXm_EkxOcCaud0WRISIOkYWIXFUZhfcTELtglNfwQF99ZV8VSeUB644VRLK2lSt_W0mo6LD8saggHdTS9Nae0a8z_EebfN1YpFYjOOGbLIcqiYHMUEDRWMikCsTQzm5943DAHakLXFGtPcvEYYdUSaWBlpOojg-ym1U1BBIy9T5MUFj6OnDm_Ig-1jBGVVvq0BEPtzFl5EZG-lwnYT643OjuUmBmWbIkylGVwhd6OwhG--VsogDZGSYbzFXNvC_VvJnFaUNn5g-1To4O8uARF-L5ApcVd91p4u233LRPWZYxqCCLf_T6OcCjbqPZmXzGxUFhyc73dJFMAcnV4B18JhiWamjhXH7TiIYmN4NkK6MbKOk7roUw3RLp0GaNMZ3ORrF97dsUxTz1GuRclwjZr9yn2NTLFBw9LXxhDGX32IfTCU53haG24raHpIau55=w1133-h708-no" width="600"/>

</a>

Here is quick video demo recorded with asciinema.

<a href="https://asciinema.org/a/88321" target="_blank">

<img src="https://asciinema.org/a/88321.png" width="600"/>

</a>

In the demo, the client downloads a file `` flag.jpg `` of 1.3MB. Thanks a lot for <a href="https://twitter.com/ballingt" target="_blank">Thomas Ballinger</a> for hosting the tracker and the seeder. The tracker and the seeder are boons for developers writing torrent clients.

The downloader has two known major issue

*   The client has performance issue for a file of size greater than 50 MB.
*   The client doesn’t support UDP tracker - Clients interact with piratebay tracker only on UDP protocol whereas the other trackers support HTTP endpoint.

The week had long hours of debugging blocking code in asyncio, tweaking the client to support receive data from the deluge client - as soon as the handshake is successful, the client starts sending `` bitfield, have `` messages before receiving the `` interested `` message.

The next step is to integrate seeding functionality to the torrent client and enhance UDP tracker functionality.

On one of the weekday, I witnessed the momentous incident. I decided to join other two other recursers for lunch at a nearby unvisited outlet to pick up the lunch. The shop was spacious with bar setup and we decided to eat in the building. We ordered three Panner Tika and were munching our meal. The manager/owner/bartender showed up to fill the water, said to another person, “Let me fill water for you.” She greeted him and asked, “How’s your day going? and …”. He enthusiastically replied; filled the glass; enquired about the food and moved on. After few moments, he returned with the exuberant joy; handed over her handful of candies and stated this is a gift for her good manners. Speechless moment! What a revelation and life lesson! The small act of courtesy made the person feel elated and must have made his day. All day and night, we spend time thinking of lighting smiles on family, friends, mates and others. Take a moment to spread joy among the strangers. Later, I recalled a quote,

>  
> “How you treat totally irrelevant person defines who you are.”
> 

- [RC week 0000](https://kracekumar.com/post/151276229490/rc-week-0000/index.md): RC week 0000

+++
date = "2016-10-03 05:27:30+00:00"
draft = false
tags = ["recursecenter", "bittorrent", "python"]
title = "RC week 0000"
url = "/post/151276229490/rc-week-0000"
+++
<img height="500" src="https://lh3.googleusercontent.com/zGakGJDUqDyLltKki0mYpbTrHQXd8v4SQOdYOjzttNaiTf5eFXUlZPJUhA_c96Bfbx7cm5Kp6p-NVWL0SVD4TVazdIFyJgWaTmxMBwTZdxh4TduWNQF7BJlW8lUzCA8dZYeCzFOo7AXuq6LzgKKDpi1w38JYo8S5Oj1S-d1Z9xjND6POOJJfU4adAnppFPrMBQL5yJICjge_mR5HQvPmSPzweEoIfz45Bts3by5BVcsdJoEVXTqPLWHf3jSTvUGy5TDqKevYljh2EGvJL5p1tXOy2E-3zsvpYemmIWvgfQ4wfJ0O3q5NU-S9IyGniugWXZ49NfQg3igBKWUCLLW_4Uf1OKP2hLAq6AkpUrV_4CucX95aDVEm-7Dub36SeoQs4-v8NnPPx_2hdTRAncNJi7SkxjsdQZHz6-0fpB1SGIYCpL3aRusWMZx3PYK9LAMcbGZU_DGjVGcW5QKBr01EY_FgXiRjB7aXIuEW-LlxCdcnuh20gCXVzA7ZrBghuAyl-nZAmn3v_Fed1JRwEUVNO8fam2FzeKqMT5dEPARk518HvTn-YiAMZv0qKj3eeh4FrF_kiFugZgIMW1DskJvn1xF584pDDCuTCdiHTV_s6M5y-8eh=s800" width="500"/>

The long awaited <a href="https://recurse.com" target="_blank">Recurse Center</a> debut day, 26th Sep 2016 kick started with a welcome note by Nicholas Bergson-Shilcock and David Albert ; decorated by other events and activities to get to know the batchmates; the culture of RC and ended with closing note by Sonali Sridhar.

<a href="https://commons.wikimedia.org/wiki/File%3ABittorrent_new_logo.svg" target="_blank" title="By BitTorrent, Inc. ([1]) [Public domain], via Wikimedia Commons"></a>

<figure class="tmblr-full" data-orig-height="147" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Bittorrent_new_logo.svg/512px-Bittorrent_new_logo.svg.png" data-orig-width="512"><img alt="Bittorrent new logo" data-orig-height="147" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/70/Bittorrent_new_logo.svg/512px-Bittorrent_new_logo.svg.png" data-orig-width="512" src="https://66.media.tumblr.com/7d9b4ba6cb8fcdfbfada9339208e6201/tumblr_inline_pk0028lKwW1qc390z_540.png" width="512"/></figure>

At the end of the day, I had decided to build a BitTorrent client as a first project. I was at the crossroad to choose Python or Rust or Go for the project. After a quick chat with batch mate, I decided to write the BitTorrent client in Python. I neither knew Rust well nor wrote a BitTorrent client in the past. Fighting two battles at the same time is hard.

My experience with network application is limited. I have maintained Web Socket server and fixed bugs at a previous job. The BitTorrent client is my first major network application.

As a first step towards building the client, I started reading <a href="https://www.wikiwand.com/en/BitTorrent" target="_blank">Wikipedia documentation</a> and <a href="http://www.bittorrent.org/beps/bep_0003.html" target="_blank">official proposal</a>. RC alum shared valuable resources, <a href="http://wiki.theory.org/BitTorrentSpecification" target="_blank">unofficial proposal</a> and <a href="http://www.kristenwidman.com/blog/33/how-to-write-a-bittorrent-client-part-1/" target="_blank">blog post</a> from an alum. I continued to read Wikipedia article grasped the higher level working of the BitTorrent working. In one afternoon, few batchmates got together and started to discuss the protocol; strategy recommended to download the data; security; jargon and authentication. We drew higher level steps in the life cycle of the download in a whiteboard. It was an enlightening session and helped me crystal the understanding.

<figure class="tmblr-full" data-orig-height="532" data-orig-src="https://lh3.googleusercontent.com/n2mB_QW2rQkxYvKt_My1RPTNtzAmnzTl5hggvKqnG5iLD8qRDereNyPUFKAGGRLALTSrIy2MXmod=w1265-h532-no" data-orig-width="1265"><img data-orig-height="532" data-orig-src="https://lh3.googleusercontent.com/n2mB_QW2rQkxYvKt_My1RPTNtzAmnzTl5hggvKqnG5iLD8qRDereNyPUFKAGGRLALTSrIy2MXmod=w1265-h532-no" data-orig-width="1265" src="https://66.media.tumblr.com/e6d168a9b8b32efbc6324e5b8d47a269/tumblr_inline_pk0029wiWn1qc390z_540.png" width="500"/></figure>

Resisting from coding was hard. I started to program the client by parsing torrent in <a href="https://www.wikiwand.com/en/Bencode" target="_blank">bencode</a> format. The significant portion of the data in the file is in binary format. Next step in the procedure is to gather how communication between tracker - server which stores information about connected and clients. The blog post suggested to capture all the packets during a torrent session. First I underestimated the value of the advice. Later while reading the spec and scribbling code, I found the value. Both Wireshark and tcpdump were helpful. I like tcpdump because of ease of use, lightweight and command line interface, but viewing captured packets is hard on the command line. The Wireshark infers data from the packet and renders in a useful format with a lot of switches like order by protocol, view the unencrypted packets, etc… Then I implemented communication layer for trackers using HTTP and not UDP. For example, the piratebay trackers completely use UDP.

The next piece is to build components which communicate with other seeders - clients who have complete or partial data. This part was time-consuming for me various reasons.

*   All the communication happens in binary data from the client. Having used JSON a lot and human readable format debugging binary data takes away a lot of time.
*   Understanding the message exchange format between clients and peer was tricky and different message encoding format. The client message carries data, type of data and length of data. This approach is entirely different coming from web application background where readability and usability are given importance.

Before starting to implement the crux of the client, `` asyncio `` clicked my mind and found reference <a href="https://github.com/eliasson/pieces" target="_blank">implementation</a>. The project is highly resourceful for me to build the torrent client. As of now, my client can parse the torrent file, contact the tracker to seeders information, contact peers, respond to a message and request a piece of data. All the above step constitute to 80% in the first milestone of a BitTorrent client. Yes, I can hear you murmuring 80-20 rule!

I’m on the verge of finishing the client and will release the source code in next few days. I’d suggest you to write a BitTorrent client if you haven’t.

OTH, RC’s address is 455, BroadWay, New York. The `` Broadway `` appealed to me and when I visited, the place was a multi laned road. Out of curiosity when I read more, it was revealing to know <a href="https://www.wikiwand.com/en/Broadway_(Manhattan)" target="_blank">broadway</a> is literal translation of Dutch words <a href="https://en.wiktionary.org/wiki/breed#Dutch" target="_blank">bret</a>, <a href="https://en.wiktionary.org/wiki/weg#Dutch" target="_blank">weg</a> and is one the oldest road stretching 53 km long through The New York City boroughs of Manhattan and the Bronx, New YorkThe county of Westchester, New York.

- [None](https://kracekumar.com/post/151005099060/im-attending-recurse-center-fall02-2016-batch/index.md): None

+++
date = "2016-09-27 11:38:33+00:00"
draft = false
tags = ["recursecenter", "hackerschool"]
title = "None"
url = "/post/151005099060/im-attending-recurse-center-fall02-2016-batch"
+++
<figure class="tmblr-full" data-orig-height="829" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Brooklyn_Bridge_Postdlf.jpg/1024px-Brooklyn_Bridge_Postdlf.jpg?1474976040402" data-orig-width="1024"><img data-orig-height="829" data-orig-src="https://upload.wikimedia.org/wikipedia/commons/thumb/f/f0/Brooklyn_Bridge_Postdlf.jpg/1024px-Brooklyn_Bridge_Postdlf.jpg?1474976040402" data-orig-width="1024" src="https://66.media.tumblr.com/1e1a8514a49ea684583761da869a166b/tumblr_inline_pk0kw6an4B1qc390z_540.jpg" width="600"/></figure>

I’m attending <a href="http://recurse.com" target="_blank">Recurse Center</a> fall'02, 2016 batch. I’m excited (yes, without emoticons) to be in RC and New York. RC is different from schools, boot camps, universities for various reasons. The two reason I like the most is “Never Graduate” and “self-directed learning.”

A lot to learn from peers, experiments, and discussions. Next, 12 weeks will be intense and what can a programmer ask for more than time to solve problems and understand how things for. Now, I’m afraid, 12 weeks may not be enough to learn what I’ve in mind, that perfectly justifies _“Never Graduate.”_

- [Language is power](https://kracekumar.com/post/150673262155/language-is-power/index.md): Language is power

+++
date = "2016-09-20 07:50:53+00:00"
draft = false
tags = ["politics", "hindi", "tamilnadu", "tamil"]
title = "Language is power"
url = "/post/150673262155/language-is-power"
+++
Road transport between Coimbatore and Bangalore is hit badly by fear, tension, security and protest. As of 18th Sep 2016, The current news is only two-wheelers are allowed to cross the Hosur border. The Tamil Nadu vehicles carrying vegetables, bananas, and other commodities are prohibited from entering Karnataka. Ordinary people travel till Hosur, cross the Border and walk for one or two kilometers to get a public transport and reach Bangalore. The another option is to take the train. My luck was another side, and two of my tickets were on wait list number 15 and 3. The only alternative left was to take the flight. I booked a flight from Coimbatore to Bengaluru which departs on 19th Sep, at 10.45 AM.

I was early and reached the final handbag clearance by 9:45. The security/cop/officer checked my boarding pass with a charming smile; uttered “Chennai?” and replied “Bangalore.” He smiled and started speaking to me in Hindi. I couldn’t follow and replied, “I don’t speak Hindi.” Now you can guess tone, reaction, and style of rest of the conversation.

_“Hindi is the language of Hindustan, and you don’t speak Hindi?”_ officer asked me. I was surprised when he said Hindustan. I have always read Hindustan in historical novels and haven’t heard in daily conversation. I was irritated, and angerly I replied: “Tamil Nadu first and India second.” He was taken back and asked are you sure about that and I replied “Yes.” Again he argued Hindustan means Hindi. I rebutted him by saying “I can speak Tamil and Kannada. Can you talk Tamil since you’re in Tamil Nadu?”. He asked me in Tamil, “Tamil theiryuma ( Do you know Tamil).” I replied in Tamil, and this conversation continued for few minutes. He kept speaking in Hindi, and I stayed in Tamil and switched to English once I understood he couldn’t follow Tamil. I left the place with angst.

I picked up the bags, sat in a steel chair and started to think about the conversation. If you know me personally or following me in Social Media, I am vociferous about raising concerns about such behavior. You may have concluded I am the Anti-Hindi outfit. But I would like to remind you, I am against Anti-Hindi impositions like the reported incident. I am confident officer intent isn’t to wage word of wars with me, or we both have any enmity. His duty involves interacting with thousands of people every day over years. He started a conversation to comfort me, but it turned into word tussle. The officer was in his late thirties more like my elder brother. He was never rude in his argument. He wouldn’t have asked the same kind of questions to an elderly man due to age and respect. The uniform and his training must have taught him or he is part of hegemony, or his assumption is aligned. I have gone through similar incident three years back in Delhi but not exact questions but in the same theme.

I was a mere passenger, If it were a foreign traveler at this point, he would have happily let him speak in English and tried himself speaking in English. I had met a lot of people in streets seeking help who speak only Hindi and helped them. Sometimes take extra effort to find out right person who can help.

There is an expectation for the public servants of non-state government in uniform all civilians has to speak Hindi. I don’t agree with that. India is composed of states. Without states, India doesn’t exist. Language is not just a tool of communication; it is the symbol of power and superiority. The uniform and designation want to exert control and educate a lesson rather than servicing the people. The primary duty of officers is to serve the public and not impart their belief with a weapon. The politicians, public, opinion makers, writers, activists have written; spoken; protested over years and still continuing to reach to deaf decision makers. No lessons learned. It’s not only duty of locals to speak back and also the responsibility of Hindi speaking people raise the voice and support.

<img height="400" src="https://lh6.googleusercontent.com/Ci31u_iqvn3_AezVkdBbjmFzFaKc-BqTNqhTP4iPTk11dTopcf67uFwqP6MHLG3H5lSwmqYTJNdfJd4=w1270-h688-rw" width="400"/>

In the tension, I missed to thank him for the service and boarded the flight. As usual flight announcement was in Hindi followed by English. While Hindi announcement was in progress, I turned back saw faces of the people, few were giggling; few were resting in the seats and others were peeking outside the window. There is long a request to make announcements in local languages. Jet Airways breakfast cover box had slogan “BON APPÉTIT,” my inner nerves started to laugh. I pay you, and you don’t have an announcement in local languages. But you’re happy to brand food cover with French slogan.

- [Flowers of Bangalore](https://kracekumar.com/post/150040563610/flowers-of-bangalore/index.md): Flowers of Bangalore

+++
date = "2016-09-06 19:18:59+00:00"
draft = false
tags = ["bangalore", "flowers"]
title = "Flowers of Bangalore"
url = "/post/150040563610/flowers-of-bangalore"
+++
<img height="500" src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/39/Delonix_regia_ap_003.JPG/1280px-Delonix_regia_ap_003.JPG" width="550"/>

Bangalore is called the Garden City of India. How often do you walk into a garden? I visit often. I envy you if you own a garden at home!

Every day you can spot a lot of flowers and trees during transit to work, casual walk, etc … In the evening, trees welcome all passer-bys with chosen fallen flowers in the pathway. Especially golden flowers from Copper pod. Trees and flowers are predominant in the buzzing part of the city.

<a href="http://www.wildwanderer.com/flowering-trees/" target="_blank">Here is a list</a> of commonly found flowers and their details. The document includes 56 flora species description with the photograph, flowering seasons, vernacular name and origin.

Next time when you step out of the house or see through high raised building window find out the name of the flower you may see.

- [Real time mobile app failure](https://kracekumar.com/post/148892131765/real-time-mobile-app-failure/index.md): Real time mobile app failure

+++
date = "2016-08-13 18:16:55+00:00"
draft = false
tags = ["travel", "uber"]
title = "Real time mobile app failure"
url = "/post/148892131765/real-time-mobile-app-failure"
+++
<figure class="tmblr-full" data-orig-height="576" data-orig-src="https://c1.staticflickr.com/3/2678/4279831984_cff70672e6_b.jpg" data-orig-width="1024"><img data-orig-height="576" data-orig-src="https://c1.staticflickr.com/3/2678/4279831984_cff70672e6_b.jpg" data-orig-width="1024" height="300" src="https://66.media.tumblr.com/c34b90066b6030d420818e99153b1ce8/tumblr_inline_pk103iUovB1qc390z_540.jpg"/></figure>

August 11th to 15th is a long weekend and I decided to leave Bangalore on August 11th. I booked KSRTC to Coimbatore. The boarding point, St John’s office is few kilometers away from my office in teachers colony. The pickup time was 9:45 PM. I booked Uber pool at 8:35. The first driver handcuffed me to cancel the ride with notorious usual reason - traffic. It took fifteen minutes to find an another car. The driver picked me up at 8:50 and started the vehicle in mushroomed traffic. I sat in the front seat. He asked my destination and reciprocated me about the co-rider pick up point. After battling traffic for fifteen, the driver picked up the second rider. As soon the co-rider stepped in, the driver tapped the unresponsive app on his android phone. The app wasn’t loading the information of the second rider. The loader was rotating in its galactic path. The driver killed the app and opened again. He restarted the app multiple times; nothing happened; finally, he restarted the phone. He started the car, moved few meters and stopped the card and opened the app. As expected, the co-rider information showed up on the screen. We lost 5 to 10 minutes and now the time was 9:15.

Now the driver is on the due to start the second rider trip. He clicked on `` Start the trip `` button. Immediately pop up window appeared with the message `` Null == value ``. I giggled inside, and programmer soul murmured “object type comparison has gone for a toss”. The driver clicked on the button many times with full of hope and anxiety. He drove few meters and again tried to start the trip. The Same error message popped. I was about to say him ”Nothing will change. We are helpless”. He made a wise move; logged out of the app. By this time, I started panicking. But I didn’t show it off and didn’t want to pass it on to him. Meanwhile, I got a call and lasted for 13 minutes. Now the time was close to 9:30. We stopped in front of a signal. He logged into the app and was still fighting with the same issue. He asked me to pull the diary in the door. The co-rider was looking into the app, and nothing was working in our way. He picked up the customer care number from the dairy and passed it on to the other rider. She started calling to customer care. Now the driver lifted the phone lying on his lap and placed on the dashboard.

He asked my destination again to both of us. I said mine and added, “I have a bus catch.” He accelerated the car and picked up the speed. I was worried but at the same time confident, I will get the bus. Only once in last ten trips, KSRTC was on time; other times the bus was behind the schedule by 10 to 60 minutes. I was impatient and asked what will happen to your payment. He gently replied the company would take care of it. While the car was closing the destination gap, the driver asked my payment mode. I replied Paytm and the car stopped in the signal. I requested him to open the door and thanked him and got down. The co-rider was still in the call, waiting to utter the first word in angst. I crossed the road and stood on the footpath and picked up the phone and time was 9:50. The car was still in my sight, and I started to gaze all outstation buses. The bus arrived after 20 minutes and I ran into an old friend too.

Though the error may be simple, the impact is a mental pain and turmoil. These failures panic passengers and the scene is entirely different for different age category and ill health passengers.

- [HTTP Exception as control flow](https://kracekumar.com/post/147895372680/http-exception-as-control-flow/index.md): HTTP Exception as control flow

+++
date = "2016-07-24 14:37:20+00:00"
draft = false
tags = ["python", "web", "exception"]
title = "HTTP Exception as control flow"
url = "/post/147895372680/http-exception-as-control-flow"
+++
As per <a href="https://en.wikipedia.org/wiki/Exception_handling" target="_blank">Wikipedia</a> , Exception handling is the process of responding to the occurrence, during computation, of exceptions – anomalous or exceptional conditions requiring special processing – often changing the flow of program execution.

In Python errors like `` SyntaxError ``, `` ZeroDivisionError `` are exceptions.Exception paves the way to alter the normal execution path.

While working with API, a web request goes through the following process,`` authentication, authorization, input validation, business logic `` and finally, the response is given out. Depending on complexity, few more steps can be involved.

Consider simple class based view. View looks similar to `` Django `` or `` Flask `` view.

<div class="gist"><a href="https://gist.github.com/kracekumar/ce25b463992b609a34eff7b92bd9c77a" target="_blank">https://gist.github.com/kracekumar/ce25b463992b609a34eff7b92bd9c77a</a></div>

The important point here is the number of if conditions inside `` post `` method. Though an another function can hold the code inside `` if validator.validate() `` block. Still, the `` if `` condition doesn’t go away.`` check_and_create_bucket `` returns a `` tuple ``. The caller needs to check the tuple and follow next steps. When the number of layers or functions increases, conditions also increases. These conditions are unnecessary and handled til bottom level.

On the other hand, if these `` function/method `` returns the object on success and raises `` exception ``, the code is easier to read. At a higher level, the exception can be handled.

Consider the modified code where all functions return `` an object `` or `` raise an exception ``.

<div class="gist"><a href="https://gist.github.com/kracekumar/78557dd6e2bf0e5176b3d6783300fdbb" target="_blank">https://gist.github.com/kracekumar/78557dd6e2bf0e5176b3d6783300fdbb</a></div>

In the above code, the validator raises an exception on any failure. Even if validator doesn’t raise an exception, the view will have only one `` if condition ``. At a higher level code is structured like a list of commands.

The illustrated example has only one business action post validation. When the number of business conditions increases, this method is smoother to manage.

Writing `` try/except `` code on every view is repetition. The little decorator can be handy.

<div class="gist"><a href="https://gist.github.com/kracekumar/eef4a2feb052c69bc2a536215756d9c6" target="_blank">https://gist.github.com/kracekumar/eef4a2feb052c69bc2a536215756d9c6</a></div>

<a href="https://wrapt.readthedocs.io/en/latest/" target="_blank">wrapt</a> is the useful library for writing decorators, wrappers, and monkey patching.

Designing code base around exceptions is highly useful. All the exceptions are handled at a higher level and reduce `` if/else `` complexity.

<a href="https://docs.djangoproject.com/en/1.9/ref/exceptions/" target="_blank">Django</a> and <a href="http://werkzeug.pocoo.org/docs/0.11/exceptions/" target="_blank">Flask</a> has inbuilt HTTP Exceptions support.

- [State machine in DB model](https://kracekumar.com/post/147507497215/state-machine-in-db-model/index.md): State machine in DB model
+++
date = "2016-07-16 20:25:06+00:00"
draft = false
tags = ["python", "state machine"]
title = "State machine in DB model"
url = "/post/147507497215/state-machine-in-db-model"
+++
A state machine is an abstract machine that can be in one of a finite number of states. The machine is in only one state at a time; the state it is in at any given time is called the current state.

While using the database, individual records should be in allowed states. The database or application stores rules for the states. There are many ways to design the database schema to achieve this. The most followed methods are using `` int `` or `` string `` field, where each value represents a state. Above is the direct method approach. An indirect and unaware method is the use of multiple boolean values to calculate the state. Like `` is_archived `` and `` is_published ``. The combination of two fields shows four different states.

At work, the similar situation arose. First, it started with single boolean value and after some time, the second column showed up. Then, of course, the third one. That’s when I realized; use `` state machine ``. I spent time looking up a relevant library and meet <a href="https://github.com/tyarkoni/transitions" target="_blank">transition</a>. The library supports `` transitions ``, `` state change validation ``.

Consider a simple model `` Order `` with five states. `` 'placed', 'dispatched', 'delivered', 'canceled', 'returned' ``. Well, real world model will have way more states. The important point of these states, object transition should follow rules. An order can be in `` returned `` state when the previous state was `` delivered ``. And this requires custom validation code with multiple `` if `` conditions. Here is state machine diagram.

<img alt="Drawing" src="https://dl.dropboxusercontent.com/u/39367302/state.jpg" style="width: 600px;"/>

When the number of states increases, the code starts to fall apart with `` conditions `` and `` repetition ``.

`` Transition `` library supports the declaration of `` states ``, `` allowed triggers `` and takes care of validation. It provides a set of helper methods to do transitions. Here is a simple example.

<div class="gist"><a href="https://gist.github.com/kracekumar/dc5628713340acbda373c6f6f54e72bb" target="_blank">https://gist.github.com/kracekumar/dc5628713340acbda373c6f6f54e72bb</a></div>

`` Machine `` class takes a list of `` transitions ``. `` Transitions `` is a collection of iterable with `` trigger ``, `` start state `` and `` destination state `` as arguments respectively.

To trigger an event, invoke `` machine.model.<trigger>() ``. Method returns `` True `` or `` False ``. `` True `` when the trigger succeeds and `` False `` when the trigger isn’t allowed in the current state. The machine calculates the logic from the list of configured transitions.

<div class="gist"><a href="https://gist.github.com/kracekumar/352daef114d84d13d7805b71727e79e8" target="_blank">https://gist.github.com/kracekumar/352daef114d84d13d7805b71727e79e8</a></div>

Delivery isn’t possible with the canceled order!

`` ignore_invalid_triggers=True `` makes the method to return `` True `` or `` False ``. The library raise `` MachineError Exception `` when `` ignore_invalid_triggers `` is set to `` False ``.

Declarative style of creating state machine makes the library easy to use and prevents a lot of boilerplate code.

- [Animal Farm Review](https://kracekumar.com/post/145749737410/animal-farm-review/index.md): Animal Farm Review

+++
date = "2016-06-11 10:35:27+00:00"
draft = false
tags = ["book-review"]
title = "Animal Farm Review"
url = "/post/145749737410/animal-farm-review"
+++
Animal Farm’s satire makes the novella interesting to read. Written after Bolshevik revolution and no reason why lot go over gaga. Confining to Bolshevik revolution is tunnel view. The core of the novella resembles with lot of today’s politics and ancient kingdom. For a moment forget about the bolshevik revolution. Rebellion outbursts, the leader comes to power and slowly vanquishes non believers in ideas or voices. Slowly becomes benevolent dictator for life, corrupt and power moves from people to few hands. Change the word rebellion to whatever it is/was called during the time period. Isn’t this the perfect politics all over the world ? Think of Indian political parties, Tamil Nadu parties. You got it right ? It rises few questions can some one have too much of power ? Rather than human, an organization can have (some say it is already at place) all power at disposal controlled by few.

It boils down to humans thirst of power, leverage the power, ride on others work and lead king size life!

I published this in Goodreads: <a href="https://www.goodreads.com/review/show?id=1665387679" target="_blank">https://www.goodreads.com/review/show?id=1665387679</a>

- [Asyncio and uvloop](https://kracekumar.com/post/144058400775/asyncio-and-uvloop/index.md): Asyncio and uvloop
+++
date = "2016-05-08 20:18:48+00:00"
draft = false
tags = ["python", "asyncio", "uvloop"]
title = "Asyncio and uvloop"
url = "/post/144058400775/asyncio-and-uvloop"
+++
Today, I read an article about <a href="http://magic.io/blog/uvloop-make-python-networking-great-again/" target="_blank">uvloop</a>. I am aware of libuv and its behind nodejs. What caught me was “In fact, it is at least 2x faster than any other Python asynchronous framework.”. So I decided to give it a try with <a href="https://aiohttp.readthedocs.io/en/stable/web.html#websockets" target="_blank">aiohttp</a>.

The test program was simple websocket code which receives a text message, doubles the content and echoes back. Here is the complete <a href="https://gist.github.com/kracekumar/daf10b3be3191a78b037c0c79667c26c" target="_blank">snippet</a> with uvloop.

I ran naive benchmark using <a href="https://github.com/observing/thor" target="_blank">thor</a> and results favoured uvloop.

* uvloop was able to handle more connection on 8GB, non SSD Mac OSX. Asyncio was able to hold 154 connections and uvloop 243 connections with any socket errors.
* Response time for 154 connections, 10 messages for each connection took 1030ms in asyncio event loop and 915ms in uvloop. This is not full blown proper benchmark but proves the point.

If you haven’t tried <a href="https://docs.python.org/3/library/asyncio.html" target="_blank">asyncio</a>, you should!

- [Permissions in Django Admin](https://kracekumar.com/post/141377389440/permissions-in-django-admin/index.md): Permissions in Django Admin
+++
date = "2016-03-20 17:07:40+00:00"
draft = false
tags = ["django", "python"]
title = "Permissions in Django Admin"
url = "/post/141377389440/permissions-in-django-admin"
+++
<a href="https://docs.djangoproject.com/en/1.9/ref/contrib/admin/" target="_blank">Admin dashboard</a> is one of the Django’s useful feature. Admin dashboard allows super users to `` create, read, update, delete `` database objects. The super users have full control over the data. `` Staff `` user can login into admin dashboard but can’t access data. In few cases, `` staff `` users needs restricted access . `` Super user `` can access all data from various in built and third party apps. Here is a screenshot of `` Super user `` admin interface after login.

<figure class="tmblr-full" data-orig-height="999" data-orig-width="684"><img data-orig-height="999" data-orig-width="684" src="https://66.media.tumblr.com/f4d9901583f8f69e58ef014fe12a23bb/tumblr_inline_o4ckudpdvy1qc390z_540.png"/></figure>

Staff users don’t have access to data.

<figure class="tmblr-full" data-orig-height="943" data-orig-width="1198"><img data-orig-height="943" data-orig-width="1198" src="https://66.media.tumblr.com/4897ce3f1f0b12bc3816b7bc8c26d83d/tumblr_inline_o4cktwItDQ1qc390z_540.png"/></figure>

__Allow staff user to access models__

<a href="https://docs.djangoproject.com/en/1.9/topics/auth/default/#topic-authorization" target="_blank">Django permissions</a> determines access to models and allowed actions in admin interface. Every model has three permissions. They are `` <app_label>.add_<model> ``, `` <app_label>.change_<model> ``, `` <app_label>.delete_<model> `` allows user to `` create, edit `` and `` delete `` objects.

`` API `` and `` Admin interface `` allows assigning permissions to the user.

<figure class="tmblr-full" data-orig-height="461" data-orig-width="1208"><img data-orig-height="461" data-orig-width="1208" src="https://66.media.tumblr.com/f63557b7fa4ea61c618df62c032ea74a/tumblr_inline_o4ckta9BDv1qc390z_540.png"/></figure>

<figure class="tmblr-full" data-orig-height="415" data-orig-width="1154"><img data-orig-height="415" data-orig-width="1154" src="https://66.media.tumblr.com/52a314dff3cc8c0a483e91fa8ac22176/tumblr_inline_o4ckssZhUe1qc390z_540.png"/></figure>

Staff user can perform various tasks on allowed models after assigning permissions.

<figure class="tmblr-full" data-orig-height="347" data-orig-width="1041"><img data-orig-height="347" data-orig-width="1041" src="https://66.media.tumblr.com/27755be91479ed665141e7c982805b1b/tumblr_inline_o4cks6Zfz31qc390z_540.png"/></figure>

__Filtering objects in model__

Conference management system hosts many conferences in a single instance. Each conference has different set of moderators. System allows only conference specific moderators to access the data. To achieve the functionality, Django provides an option to override `` queryset ``. Admin requires custom implementation of `` get_queryset `` method. Here is how a sample code looks like.

    class ConferenceAdmin(AuditAdmin):
        list_display = ('name', 'slug', 'start_date', 'end_date', 'status') + AuditAdmin.list_display
        prepopulated_fields = {'slug': ('name',), }

        def get_queryset(self, request):
            qs = super(ConferenceAdmin, self).get_queryset(request)
            if request.user.is_superuser:
                return qs
            return qs.filter(moderators=request.user)

    class ConferenceProposalReviewerAdmin(AuditAdmin, SimpleHistoryAdmin):
        list_display = ('conference', 'reviewer', 'active') + AuditAdmin.list_display
        list_filter = ('conference',)

        def get_queryset(self, request):
            qs = super(ConferenceProposalReviewerAdmin, self).get_queryset(
            request)
            if request.user.is_superuser:
                return qs
            moderators = service.list_conference_moderator(user=request.user)
            return qs.filter(conference__in=[m.conference for m in moderators])

Filtered moderator objects for staff user.

<figure class="tmblr-full" data-orig-height="921" data-orig-width="1599"><img data-orig-height="921" data-orig-width="1599" src="https://66.media.tumblr.com/16b1c22f92ce8e2fbdb5de3d40e1510f/tumblr_inline_o4ckrjEwGi1qc390z_540.png"/></figure>

Unfiltered moderator objects for superusers.

<figure class="tmblr-full" data-orig-height="915" data-orig-width="1604"><img data-orig-height="915" data-orig-width="1604" src="https://66.media.tumblr.com/e190d51fc6c2cc0d5da325a9a7b4c7fe/tumblr_inline_o4ckr2rMre1qc390z_540.png"/></figure>

Note the difference in total number of objects (23, 30) in the view.

- [Testing Django Views](https://kracekumar.com/post/138492827565/testing-django-views/index.md): Testing Django Views
+++
date = "2016-02-01 19:17:49+00:00"
draft = false
tags = ["django", "testing"]
title = "Testing Django Views"
url = "/post/138492827565/testing-django-views"
+++
Majority of web frameworks promote <a href="https://en.wikipedia.org/wiki/Model%E2%80%93view%E2%80%93controller" target="_blank">MVC/MTV</a> software pattern. The way web applications are designed today aren’t same as 5-6 years back. Back then it was server side templates, HTML, API’s weren’t widespread and mobile apps were becoming popular. Rise of mobile and Single Page Application shifted majority of web development towards API centric development. Testing API is super simple with data in and data out, but testing a django view in classic web application is difficult since HTML is returned. <a href="https://en.wikipedia.org/wiki/Representational_state_transfer" target="_blank">REST</a> semantics and `` status code `` helped to distinguish response without inspecting body.

`` View `` is the co ordinator in the web application. Writing integration tests makes sense for views. While writing integration tests most of the assertions are around `` permission, status code, returned data ``. Django provides set of nice helpers to test the views like <a href="https://docs.djangoproject.com/en/1.9/topics/testing/tools/#django.test.SimpleTestCase.assertTemplateUsed" target="_blank">assertTemplateUsed</a>, <a href="https://docs.djangoproject.com/en/1.9/topics/testing/tools/#django.test.SimpleTestCase.assertRedirects" target="_blank">assertRedirects</a>. The response returned contains the template `` context ``. All three are very handy to test majority of the view functionality and super helpful.

#### Sample test cases

<script src="https://gist.github.com/kracekumar/59f8d68bc39947148956.js"></script>

More of these tests can be found in <a href="https://github.com/pythonindia/junction/tree/master/tests/integrations" target="_blank">junction</a>.

- [Simple Json Response basic test between Flask and Django](https://kracekumar.com/post/117948332935/simple-json-response-basic-test-between-flask-and/index.md): Simple Json Response basic test between Flask and Django

+++
date = "2015-05-02 16:42:40+00:00"
draft = false
tags = ["python", "flask", "django"]
title = "Simple Json Response basic test between Flask and Django"
url = "/post/117948332935/simple-json-response-basic-test-between-flask-and"
+++
Django and Flask are two well known Python web frameworks. There are lot of benchmarks claim Flask is 2xfaster for simple JSON Response, one such is <a href="https://www.techempower.com/benchmarks/" target="_blank">Techempower</a>. After lookinginto the <a href="https://github.com/TechEmpower/FrameworkBenchmarks/tree/master/frameworks/Python/django" target="_blank">source</a>, it struckme Django can do better!

I will compare Flask and Django for simple json response. The machine used is Macbook pro, `` Intel Core i5-4258U CPU @ 2.40GHz ``,with `` 8 GB `` Memory on `` OS X 10.10.3 ``. `` gunicorn==19.3.0 `` will be used for serving WSGI application.

    #flask simple app
    from flask import Flask, jsonify
    
    app = Flask(__name__)
    
    @app.route('/index')
    def index():
        return jsonify({'hello': 'world'})
    
    if __name__ == "__main__":
        app.run()

Start the flask app, `` gunicorn -w 2 -b 127.0.0.1:5000 flask_app:app ``.

`` apachebench `` will be used for benchmarking, you can use any tool for this.

    ab -n 1000 -c 2 http://localhost:5000/index
    ...
    Server Software:        gunicorn/19.3.0
    Server Hostname:        localhost
    Server Port:            5000
    
    Document Path:          /index
    Document Length:        22 bytes
    
    Concurrency Level:      2
    Time taken for tests:   0.622 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      174000 bytes
    HTML transferred:       22000 bytes
    Requests per second:    1429.55 [#/sec] (mean)
    Time per request:       1.399 [ms] (mean)
    Time per request:       0.700 [ms] (mean, across all concurrent requests)
    Transfer rate:          242.91 [Kbytes/sec] received

Now let’s do same thing with Django 1.8 with default settings.

    #hello/views.py
    from django.http import JsonResponse
    
    # Create your views here.
    def index(request):
        return JsonResponse({'hello': 'world'})

Add `` url('^index/$', index), `` in `` urls.py `` and `` hello `` app in `` settings.py ``.Start the Django app, `` gunicorn -w 2 -b 127.0.0.1:8000 django_app.wsgi ``.

    ab -n 1000 -c 2 http://localhost:8000/index/
    ...
    Server Software:        gunicorn/19.3.0
    Server Hostname:        localhost
    Server Port:            8000
    
    Document Path:          /index/
    Document Length:        18 bytes
    
    Concurrency Level:      2
    Time taken for tests:   0.814 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      179000 bytes
    HTML transferred:       18000 bytes
    Requests per second:    1228.25 [#/sec] (mean)
    Time per request:       1.628 [ms] (mean)
    Time per request:       0.814 [ms] (mean, across all concurrent requests)
    Transfer rate:          214.70 [Kbytes/sec] received

Time taken for 1000 requests by Django is `` 0.814s `` and Flask is `` 0.622s ``. Clearly flask is faster.

Django is full fledged framework but Flask is micro framework. Django comes with lot of middlewares, contribmodels etc … Remove all those unused settings in `` settings.py ``.

Comment all `` middleware classes ``. The `` settings `` snippet looks like

    INSTALLED_APPS = (
        'django.contrib.admin',
        'django.contrib.auth',
        'django.contrib.contenttypes',
        'django.contrib.sessions',
        'django.contrib.messages',
        'django.contrib.staticfiles',
    
        'hello',
    )
    
    MIDDLEWARE_CLASSES = (
        # 'django.contrib.sessions.middleware.SessionMiddleware',
        # 'django.middleware.common.CommonMiddleware',
        # 'django.middleware.csrf.CsrfViewMiddleware',
        # 'django.contrib.auth.middleware.AuthenticationMiddleware',
        # 'django.contrib.messages.middleware.MessageMiddleware',
        # 'django.middleware.clickjacking.XFrameOptionsMiddleware',
    )

Restart the django app and run `` ab -n 1000 -c 2 http://localhost:8000/index/ ``.

    ...
    Server Software:        gunicorn/19.3.0
    Server Hostname:        localhost
    Server Port:            8000
    
    Document Path:          /index/
    Document Length:        18 bytes
    
    Concurrency Level:      2
    Time taken for tests:   0.560 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      150000 bytes
    HTML transferred:       18000 bytes
    Requests per second:    1784.37 [#/sec] (mean)
    Time per request:       1.121 [ms] (mean)
    Time per request:       0.560 [ms] (mean, across all concurrent requests)
    Transfer rate:          261.38 [Kbytes/sec] received

Now Django took only `` 0.560s `` for 1000 requests compared to flask `` 0.622s ``.

Now remove Django admin from `` urls.py ``, and remove all `` contrib `` apps in `` INSTALLED_APPS ``.

    INSTALLED_APPS = (
    #'django.contrib.admin',
    #'django.contrib.auth',
    #'django.contrib.contenttypes',
    #'django.contrib.sessions',
    #'django.contrib.messages',
    #'django.contrib.staticfiles',
    
    'hello',
    )

Restart the Django app and run the benchmark `` ab -n 1000 -c 2 http://localhost:5000/index ``.

    ...
    Server Software:        gunicorn/19.3.0
    Server Hostname:        localhost
    Server Port:            8000
    
    Document Path:          /index/
    Document Length:        18 bytes
    
    Concurrency Level:      2
    Time taken for tests:   0.553 seconds
    Complete requests:      1000
    Failed requests:        0
    Total transferred:      150000 bytes
    HTML transferred:       18000 bytes
    Requests per second:    1806.90 [#/sec] (mean)
    Time per request:       1.107 [ms] (mean)
    Time per request:       0.553 [ms] (mean, across all concurrent requests)
    Transfer rate:          264.68 [Kbytes/sec] received

That is much better. While developing API based application in Django tweak `` settings.py `` to getbetter performance out of Django. Source code for this benchmark can be found in <a href="https://github.com/kracekumar/flask_vs_django_bench" target="_blank">Github</a>.

- [django print exception to console](https://kracekumar.com/post/113057636135/django-print-exception-to-console/index.md): django print exception to console
+++
date = "2015-03-08 11:10:09+00:00"
draft = false
tags = ["python", "django"]
title = "django print exception to console"
url = "/post/113057636135/django-print-exception-to-console"
+++
Django has very good debug toolbar for debugging `` SQL ``. While working with `` Single Page Application `` and `` API `` exceptions can’t be displayed in browser. Exception is sent to front end. What if the exception can be printed to console ?

Django middleware gets called for every `` request/response ``. The small helper class looks like

<script src="https://gist.github.com/kracekumar/46ac62b2cffff12c72e0.js"></script>
Add the filename and class name to `` MIDDLEWARE_CLASSES `` in settings file like

<script src="https://gist.github.com/kracekumar/1a0543c48c80a5e01102.js"></script>

This is how exceptions looks

<script src="https://gist.github.com/kracekumar/20576674de2b620a5adc.js"></script>

- [Check for custom objects in Python set.](https://kracekumar.com/post/107618031945/check-for-custom-objects-in-python-set/index.md): Check for custom objects in Python set.
+++
date = "2015-01-09 19:50:06+00:00"
draft = false
tags = ["python", "set"]
title = "Check for custom objects in Python set."
url = "/post/107618031945/check-for-custom-objects-in-python-set"
+++
Python set data structure is commonly used for removing duplicate entriesand make lookup faster (O(1)). Any hashable object can be stored in set.For example, `` list `` and `` dict `` can’t be stored.

User defined objects can be stored. Here is how it looks.

    class Person(object):
        def __init__(self, name, age):
            self.name, self.age = name, age


    In [25]: s = set()
    In [26]: s.add(Person('kracekumar', 25))

    In [27]: s
    Out[27]: set([<__main__.Person at 0x1033c5e10>])

    In [29]: Person('kracekumar', 25) in s
    Out[29]: False

### Implement equality check

Even though `` Person `` object with same value is present but check failed.This is because default python `` __eq__ `` checks for reference.

    class Person(object):
        def __init__(self, name, age):
            self.name, self.age = name, age

        def __eq__(self, other):
            return (isinstance(other, self.__class__) and
                getattr(other, 'name', None) == self.name and
                getattr(other, 'age', None) == self.age)

        def __hash__(self):
            return hash(self.name + str(self.age))

    In [38]: s = set()

    In [39]: s.add(Person('kracekumar', 25))

    In [40]: Person('kracekumar', 25) in s
    Out[40]: True

    In [41]: s
    Out[41]: set([<__main__.Person at 0x1033d0590>])

    In [42]: s.add(Person('kracekumar', 25))

    In [43]: s
    Out[43]: set([<__main__.Person at 0x1033d0590>])

`` __hash__ `` is used for calculating hash of the object and `` __eq__ `` is used by`` in `` during check after hash value match.

- [class as decorator](https://kracekumar.com/post/101540141725/class-as-decorator/index.md): class as decorator
+++
date = "2014-11-02 01:17:42+00:00"
draft = false
tags = ["python", "class", "decorator"]
title = "class as decorator"
url = "/post/101540141725/class-as-decorator"
+++
### Decorator

Decorator is a callable which can modify the `` function, method, class `` on runtime. Most ofthe decorators uses `` closure `` but it is possible to use `` class ``.

### Closure

    import functools


    def cache(f):
        storage = {}

        @functools.wraps(f)
        def inner(n):
            value = storage.get(n)
            if value:
                print("Returning value from cache")
                return value
            value = f(n)
            storage[n] = value
            return value
        return inner


    @cache
    def factorial(n):
        if n <= 1:
            return 1
        return n * factorial(n - 1)

    >>>factorial(20)
    2432902008176640000
    >>>factorial(20)
    Returning from cache
    2432902008176640000

`` cache `` is a function which takes `` function as an argument `` and returns a `` function ``.`` factorial `` is a function which calculates the factorial of a given number, decoratedby `` cache ``. If factorial of a number is calculated or less than already calculatednumber it is retrieved from `` storage ``.

The `` cache `` function can be written using `` Class ``.

### Class

    import functools


    class Cache(object):
        def __init__(self, func):
            self.func = func
            self.storage = {}

            functools.update_wrapper(self, func)

        def __call__(self, n):
            value = self.storage.get(n)

            if value:
                print("Returning from cache")
                return value

            value = self.func(n)
            self.storage[n] = value
            return value


    @Cache
    def factorial(n):
        if n <= 1:
            return 1
        return n * factorial(n - 1)


    >>>factorial(20)
    2432902008176640000
    >>>factorial(20)
    Returning from cache
    2432902008176640000

`` Cache `` class has two dunder methods, `` __init__, __call__ ``. `` __init__ `` methodis called when interpreter encounters `` @Cache `` and `` __call__ `` is calledwhen `` factorial `` function is called.

- [Fluent interface in python](https://kracekumar.com/post/100897281440/fluent-interface-in-python/index.md): Fluent interface in python
+++
date = "2014-10-25 09:32:01+00:00"
draft = false
tags = ["python", "fluent interface", "method chaining"]
title = "Fluent interface in python"
url = "/post/100897281440/fluent-interface-in-python"
+++
<a href="https://en.wikipedia.org/wiki/Fluent_interface" target="_blank">Fluent Interface</a> is an implementation of API which improves readability.

### Example

`` Poem('The Road Not Taken').indent(4).suffix('Robert Frost') ``.

Fluent Interface is similar to method chaining. I was wondering how to implement this in `` Python ``.Returning `` self `` during method call seemed good idea .

    class Poem(object):
        def __init__(self, content):
            self.content = content

        def indent(self, spaces=4):
            self.content = " " * spaces + self.content
            return self

        def suffix(self, content):
            self.content += " - {}".format(content)
            return self

        def __str__(self):
            return self.content

    >>>print Poem('Road Not Taken').indent(4).suffix('Rober Frost').content
        Road Not Taken - Rober Frost

Everything seems to be ok here.

### Side effects

Above approach has side effect. We are mutating the same object during every method call.Consider the following code

    >>>p = Poem('Road Not Taken')
    >>>q = p.indent(4)
    >>>r = p.indent(2)
    >>>print str(q) == str(r)
    True
    >>>print id(q), id(r)
    4459640464 4459640464

Clearly this isn’t expected. `` q `` and `` r `` is pointing to same instance.Ideally we should create a new instance during every method call and return the object.

### New object

`` Decorator `` is a function which takes a `` function `` as argument and returns a `` function `` (most cases).

    from functools import wraps

    def newobj(method):
        @wraps(method)
        # Well, newobj can be decorated with function, but we will cover the case
        # where it decorated with method
        def inner(self, *args, **kwargs):
            obj = self.__class__.__new__(self.__class__)
            obj.__dict__ = self.__dict__.copy()
            method(obj, *args, **kwargs)
            return obj
        return inner


    class NPoem(object):
        def __init__(self, content):
            self.content = content

        @newobj
        def indent(self, spaces=4):
            self.content = " " * spaces + self.content

        @newobj
        def suffix(self, content):
            self.content += " - {}".format(content)

        def __str__(self):
            return self.content

    >>>p = NPoem("foo")
    >>>q = p.indent(4)
    >>>r = p.indent(2)
    >>>print str(q) == str(r)
    False
    >>>print(q)
        foo
    >>>print(r)
      foo
    >>>print id(q), id(r)
     4459642640 4459639248

In the above approach `` newobj `` decorator creates a new instance, copies all the atrributes,calls the method with arguments and returns new `` instance ``. Rather than using decoratorprivate function can do the same.

Fluent Interface is used heavily in SQLAlchemy and Django. <a href="https://github.com/django/django/blob/master/django/db/models/query.py#L954" target="_blank">Django</a>uses `` _clone `` method to create new `` QuerySet `` and<a href="https://github.com/zzzeek/sqlalchemy/blob/1cb94d89d5c4db7a914d9f30ebe0b676ab4e175b/lib/sqlalchemy/sql/base.py#L308" target="_blank">SQLAlchemy</a> usesdecorator based approach to create new `` Query `` instance.

Thanks <a href="https://twitter.com/kracetheking/status/525207441206558722" target="_blank">Mike Bayer</a> and <a href="https://twitter.com/kracetheking/status/525206909612085248" target="_blank">Daniel Roy Greenfeld</a>helping me understand this.

Do you know any other way to implement this ? Feel free to comment.

- [Python global keyword](https://kracekumar.com/post/100399630630/python-global-keyword/index.md): Python global keyword
+++
date = "2014-10-19 11:14:08+00:00"
draft = false
tags = ["python", "global", "closure"]
title = "Python global keyword"
url = "/post/100399630630/python-global-keyword"
+++
Python’s `` global `` keyword allows to modify the variable which is out of current scope.

    In [13]: bar = 1

    In [14]: def foo():
    ....:     global bar
    ....:     bar = 2
    ....:

    In [15]: bar
    Out[15]: 1

    In [16]: foo()

    In [17]: bar
    Out[17]: 2

In the above example, `` bar `` was declared before `` foo `` function. global `` bar `` refers to the `` bar `` variablewhich is outside the `` foo `` scope. After `` foo `` invocation `` bar `` value was modified inside `` foo ``. The value ismodified globally.

What happens when `` bar `` becomes function inside `` foo `` ?

    In [19]: bar = None

    In [20]: def foo():
       ....:     global bar
       ....:     def bar():
       ....:         return 'I am bar'
       ....:

    In [21]: bar

    In [22]: foo()

    In [23]: bar
    Out[23]: <function __main__.bar>

`` bar `` was variable outside `` foo ``’s scope. Now creating a new function `` bar `` inside `` foo `` aliased `` bar `` functionto `` bar `` global variable.

This may not be obvious at first step, then you realize function can be aliased to variable.

- [python source fileencoding](https://kracekumar.com/post/86530203730/python-source-fileencoding/index.md): python source fileencoding

+++
date = "2014-05-22 20:35:00+00:00"
draft = false
tags = ["python", "unicode"]
title = "python source fileencoding"
url = "/post/86530203730/python-source-fileencoding"
+++
Some of the python source file starts with `` -*- coding: utf-8 -*- ``. This particular linetells python interpreter all the content (byte string) is `` utf-8 `` encoded. Lets see how it affects the code.

`` uni1.py ``:

    # -*- coding: utf-8 -*-
    print("welcome")
    print("animé")

`` output ``:

    ➜  code$ python2 uni1.py
       welcome
       animé

Third line had a accented character and it wasn’t explictly stated as unicode. `` print `` function passed successfully.Since first line instructed interpreter all the sequences from here on will follow `` utf-8 ``, so it worked.

What if first line was missing ?

`` uni2.py ``

    print("welcome")
    print("animé")

`` output ``:

<pre><code>code$  python2 uni2.py
File "uni2.py", line 2
SyntaxError: Non-ASCII character '\xc3' in file uni2.py on line 2, but no encoding declared; see <a href="http://www.python.org/peps/pep-0263.html" target="_blank">http://www.python.org/peps/pep-0263.html</a> for details
</code></pre>

Now python complains that Non-ASCII character is found since default encoding is ASCII.More about source encoding can be found in <a href="http://legacy.python.org/dev/peps/pep-0263/" target="_blank">PEP 263</a>

Always set `` encoding `` in first or second line of python file.

- [How to install externally hosted files using pip](https://kracekumar.com/post/85545169530/how-to-install-externally-hosted-files-using-pip/index.md): How to install externally hosted files using pip

+++
date = "2014-05-12 18:56:00+00:00"
draft = false
tags = ["python", "pip"]
title = "How to install externally hosted files using pip"
url = "/post/85545169530/how-to-install-externally-hosted-files-using-pip"
+++
As of writing (12, May 2014) latest version of `` pip `` is `` 1.5.1 ``. `` pip `` doesn’tallow installing packages from non <a href="https://pypi.python.org" target="_blank">PyPI</a> based url.It is possible to upload `` tar `` or `` zip `` or `` tar.gz `` file to `` PyPI `` or specify`` download url `` which points other sites(Example: `` pyPdf `` points to <code><a href="http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz" target="_blank">http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz</a></code>).`` pip `` considers externally hosted packages as insecure. Agreed.

This is one of the reason why I kept using `` pip 1.4.1 ``. Finally decided to fix this issue.Below is the sample error which `` pip `` throws.

    (document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf
    Downloading/unpacking pyPdf
    Could not find any downloads that satisfy the requirement pyPdf
    Some externally hosted files were ignored (use --allow-external pyPdf to allow).
    Cleaning up...
    No distributions at all found for pyPdf
    Storing debug log for failure in /Users/kracekumar/.pip/pip.log
    
    (document-converter)➜  document-converter git:(fix_requirements) pip install --allow-external pyPdf
    You must give at least one requirement to install (see "pip help install")
    (document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf --allow-external pyPdf
    Downloading/unpacking pyPdf
    Could not find any downloads that satisfy the requirement pyPdf
    Some insecure and unverifiable files were ignored (use --allow-unverified pyPdf to allow).
    Cleaning up...
    No distributions at all found for pyPdf
    Storing debug log for failure in /Users/kracekumar/.pip/pip.log

The above method is super confusing and counter intutive. Fix is

    (document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf --allow-external pyPdf --allow-unverified pyPdf
    Downloading/unpacking pyPdf
    pyPdf an externally hosted file and may be unreliable
    pyPdf is potentially insecure and unverifiable.
    Downloading pyPdf-1.13.tar.gz
    Running setup.py (path:/Users/kracekumar/Envs/document-converter/build/pyPdf/setup.py) egg_info for package pyPdf
    
    Installing collected packages: pyPdf
    Running setup.py install for pyPdf
    
    Successfully installed pyPdf
    Cleaning up...

The above method is not used in production environment. In production environment it is recommended to do `` pip install -r requirements.txt ``.

    # requirements.txt
    --allow-external pyPdf
    --allow-unverified pyPdf
    pyPdf
    --allow-external xhtml2pdf==0.0.5

`` pyPdf `` has two issues, two flags needs to mentioned in `` requirements.txt ``. Since `` xhtml2pdf `` requires `` pyPdf `` `` --allow-external `` flag is passed. Wish it was possible to pass both the switches in same line. If you do so `` pip `` will ignore it. Now running `` pip install -r requirements.txt `` will works like a charm(with warnings).

Since current approach is super confusing, there is a <a href="https://github.com/pypa/pip/pull/1812" target="_blank">discussion</a>. Thanks <a href="https://github.com/Ivoz" target="_blank">Ivoz</a> for helping me to <a href="https://github.com/pypa/pip/issues/1816#issuecomment-42848611" target="_blank">resolve</a> this.

- [Bus journey ](https://kracekumar.com/post/84037307108/bus-journey/index.md): Bus journey 

+++
date = "2014-04-27 17:43:51+00:00"
draft = false
tags = ["bus", "wind"]
title = "Bus journey "
url = "/post/84037307108/bus-journey"
+++
I am big fan of bus travel. Still it is my only mode of transportation. The two reason I love it are wind and sight seeing. Whenever the wind kisses me I forget myself and start thinking about memories.

The best part of the wind (Thendral) is it kindles happiness, sad moments, memorable ones, wishes and missing. Thendral has complete effect of changing my mood and mode.

I don’t think only thendral has this effect. Trees, plants, flowers and water also produces same effect. Bus journey sows lot of peace in me.

Not to forgot this wind has full potential to make me shed tears. Nature has answer to questions and triggers thoughts. Love how small things can create big impact. 

\*Note\*: Thendral is a Tamil word. Thendral is a type of wind which originates from south and produces soothing effect.

- [How to learn Python ?](https://kracekumar.com/post/81918059722/how-to-learn-python/index.md): How to learn Python ?

+++
date = "2014-04-06 20:30:30+00:00"
draft = false
tags = ["python", "learning"]
title = "How to learn Python ?"
url = "/post/81918059722/how-to-learn-python"
+++
Over period of time few people have asked me in meetups, online `` I want
to learn python. Suggest me few ways to learn ``. Everyone who asked me had differentbackground and different intentions. Before answering the question I try to collectmore information about their interest and their previous approaches. Some learnt basicsfrom `` codecademy ``, some attended beginners session in <a href="http://www.meetup.com/bangpypers/" target="_blank">Bangpypers</a> meetup. In this postI will cover general questions asked and my suggested approach.

Q: Suggest some online resources and books to learn python ?

A: I suggest three resources, <a href="http://www.openbookproject.net/thinkcs/python/english2e/" target="_blank">How to think like computer scientist</a>, <a href="http://learnpythonthehardway.org/" target="_blank">Learn Python The Hardway</a>, <a href="https://www.udacity.com/course/cs101" target="_blank">CS101 from Udacity</a>. This is highly subjective because it depends on previous programming experience etc … I have a blog  <a href="http://kracekumar.com/post/71171551647/introduction-to-python" target="_blank">post</a> with lot of python snippet without  explanation (I know it is like sea without waves).

Q: It takes too much time to complete the book. I want to learn it soon.

A: I have been programming in python over 3 years now, still I don’t know in depth python.  You may learn it in six months or in a week. It is the journey which is interesting more than destination.

Q: How long will it take to learn python ?

A: It depends what you want to learn in python. I learnt python in 3 hours while commuting to college. You should be able to grasp basic concepts in few hours. Practice will make you feel confident.

Q: I learnt basics of Python, can you give me some problems to solve and I will get back with solutions ?

A: No. I would be glad to help you if you are stuck. Learning to solve your problem is great way to learn. My first usable python program was to download <a href="https://github.com/kracekumar/song-fetcher" target="_blank">Tamil songs</a>. I still use the code :-). So find the small problem or project to work on. I would be happy to review the code and give suggestions to it.

Q: I want to contribute to open source python projects can you suggest ?

A: Don’t contribute to project because you want to, rather find a library or project which is interesting to you and see if things can be made better. Your contribution can be small like fixing spelling mistake (I have contributed with single character change). Linux kernel accepts patch which fixes spelling mistake. Every contribution has its own effect. So contribute if it adds value.

In case you are reading this blog post and want to learn python or need help I would be glad to help.

- [Stop iteration when condition is meet while iterating](https://kracekumar.com/post/80342472437/stop-iteration-when-condition-is-meet-while/index.md): Stop iteration when condition is meet while iterating

+++
date = "2014-03-22 08:34:00+00:00"
draft = false
tags = ["python", "any", "next"]
title = "Stop iteration when condition is meet while iterating"
url = "/post/80342472437/stop-iteration-when-condition-is-meet-while"
+++
We are writing a small utility function called `` is_valid_mime_type ``. The function takes a `` mime_type ``as an argument and checks if the mime type is one of the allowed types. Code looks like

    ALLOWED_MIME_TYPE = ('application/json', 'text/plain', 'text/html')
    
    def is_valid_mimetype(mime_type):
        """Returns True or False.
    
        :param mime_type string or unicode: HTTP header mime type
        """
        for item in ALLOWED_MIME_TYPE:
            if mime_type.startswith(item):
                return True
        return False

Above code can refactored into single line using `` any ``.

    def is_valid_mimetype(mime_type):
        """Returns True or False.
    
        :param mime_type string or unicode: HTTP header mime type
        """
        return any([True for item in ALLOWED_MIME_TYPE
                                  if mime_type.startswith(item)])

One liner. It is awesome, but not performant. How about using `` next `` ?

    def is_valid_mimetype(mime_type):
        """Returns True or False.
    
        :param mime_type string or unicode: HTTP header mime type
        """
        return next((True for item in ALLOWED_MIME_TYPE 
                                  if mime_type.startswith(item)), False)

`` (True for item in ALLOWED_MIME_TYPE if mime_type.startswith(item) `` is <a href="https://stackoverflow.com/questions/1995418/python-generator-expression-vs-yield" target="_blank">generator expression</a>. When `` ALLOWED_MIME_TYPE `` is `` None `` or `` EMPTY `` exception will be raised. In order to avoid that `` False `` is passed as an argument to `` next ``.

### Edit:

    def is_valid_mimetype(mime_type):
        """Returns True or False.
    
        :param mime_type string or unicode: HTTP header mime type
        """
        return any(mime_type.startswith(item) for item in ALLOWED_MIME_TYPE)

Cleaner than `` Next ``.

- [Best weekend in recent times](https://kracekumar.com/post/79079050731/best-weekend-in-recent-times/index.md): Best weekend in recent times

+++
date = "2014-03-09 18:49:00+00:00"
draft = false
tags = []
title = "Best weekend in recent times"
url = "/post/79079050731/best-weekend-in-recent-times"
+++
Normally I don’t plan weekends. I code, watch movies. This weekend (8th March) was different though. March 7th, friday evening wasn’t good. I was banging my head at work to get api working. Then came back home. Relaxed for an hour Facebook, Youtube. Then opened emacs and started to play Raja sir’s music. Stared at code, walked along the execution. Figured out the issue. Can’t ask for more. Calm and code. Slept at 3.00 AM.

Woke up at 12:00 PM and had brunch. Plugged in my external hard disk. Found bunch of new movies downloaded by my friend. The wolf of wallstreet caught my eyes. Watched and enjoyed it. When movie was about to get over, someone knocked the door. With all the unwillingness, I woke up and opened the door. Apartment security gave a parcel. He lit my face. It was a courier containing books that were ordered three days back. Courier had four books <a href="http://www.nammabooks.com/Buy-S-Ramakrishnan-Books/%E0%AE%86%E0%AE%AF%E0%AE%BF%E0%AE%B0%E0%AE%A4%E0%AF%8D%E0%AE%A4%E0%AF%8A%E0%AE%B0%E0%AF%81-%E0%AE%85%E0%AE%B0%E0%AF%87%E0%AE%AA%E0%AE%BF%E0%AE%AF-%E0%AE%87%E0%AE%B0%E0%AE%B5%E0%AF%81%E0%AE%95%E0%AE%B3%E0%AF%8D-%20Ayirathoru-Arabia-Iravugal-buy-tamil-books-online-international-shipping" target="_blank">ஆயிரத்தொரு அரேபிய இரவுகள்</a> - Ayirathoru Arabia Iravugal, <a href="http://www.nammabooks.com/%E0%AE%95%E0%AE%B5%E0%AE%BF%E0%AE%B0%E0%AE%BE%E0%AE%9C%E0%AE%A9%E0%AF%8D-%E0%AE%95%E0%AE%A4%E0%AF%88-Kavirajan-Kadhai-Surya-Literature-Private-Limited-Tamil-Book-Buy-Shop-International-Shipping?filter_name=vairamuthu&amp;page=2" target="_blank">கவிராஜன் கதை</a> - Kavirajan Kadhai, <a href="http://www.nammabooks.com/%E0%AE%A4%E0%AE%BF%E0%AE%B0%E0%AF%81%E0%AE%A4%E0%AF%8D%E0%AE%A4%E0%AE%BF-%E0%AE%8E%E0%AE%B4%E0%AF%81%E0%AE%A4%E0%AE%BF%E0%AE%AF-%E0%AE%A4%E0%AF%80%E0%AE%B0%E0%AF%8D%E0%AE%AA%E0%AF%8D%E0%AE%AA%E0%AF%81%E0%AE%95%E0%AE%B3%E0%AF%8D-Thiruthi-Ezhudhiya-Theerpukal-Surya-Literature-Private-Limited-Tamil-Book-Buy-Shop-International-Shipping?filter_name=vairamuthu&amp;page=3" target="_blank">திருத்தி எழுதிய தீர்ப்புகள்</a> - Thiruthi Ezhudhiya Theerpukal, <a href="http://www.nammabooks.com/Naney-Enakkoru-Bodhimaram-Visa-Publications-Buy-Shop-Tamil-Books-International-Shipping-Balakumaran?filter_name=Naney%20Enakkoru%20Modhiram" target="_blank">நானே எனக்கொரு போதிமரம்</a> - Naney Enakkoru Bodhiram. Once the movie was over, as usual went to <a href="https://www.facebook.com/pages/Kaikondrahalli-Lake/153192558180201" target="_blank">Kaikondrahalli lake</a>.

After spending few hours outside, came back home. Started flipping the pages of Ayirathoru Arabia Iravugal. Couldn’t resist much. It was beauftifully written book (85 pages). <a href="https://ta.wikipedia.org/wiki/%E0%AE%8E%E0%AE%B8%E0%AF%8D._%E0%AE%B0%E0%AE%BE%E0%AE%AE%E0%AE%95%E0%AE%BF%E0%AE%B0%E0%AF%81%E0%AE%B7%E0%AF%8D%E0%AE%A3%E0%AE%A9%E0%AF%8D" target="_blank">S Ramakrishnan’s</a> writing and <a href="https://www.youtube.com/watch?v=daNbwdNFkjc" target="_blank">speech</a> always amazes me with simplicity. Watched half of Captain Philips movie and went to sleep.

Next day started reading Ayirathoru Arabia Iravugal. Completed the book in half an hour and started to watch the movie. Movie got over around 12.30. Next book is Kavirajan Kadhai. Read few pages and went for lunch. Came back in 30 minutes and continued. Completed the book in 3 hours. Needn’t say Vairamuthu made me shed tears at many places. Went for a evening walk around the lake. Spent an hour. Started with Thiruthi Ezhudhiya Theerpukal. This is also by <a href="https://en.wikipedia.org/wiki/Vairamuthu" target="_blank">Vairamuthu</a>. Enjoyed a lot. It was written in free verse in 1979. Completed the book in few hours. I was overjoyed reading back to back books.

What can I say now, books, movies, music (while writing the blog post) gave me absolute pleasure. Something is missing. Yes code ;-(. I missed. I am figuring out how can I get best of weekend and free time with books, code, movies, writing, meetups. Seems books, movies on saturday and code, writing on sunday.

    Books cooked happiness.
    Movies made the hour.
    Weekend went without deadend.
    Finally moon is missing.

- [Find n largest and smallest number in an iterable](https://kracekumar.com/post/78863855937/find-n-largest-and-smallest-number-in-an-iterable/index.md): Find n largest and smallest number in an iterable

+++
date = "2014-03-07 18:24:00+00:00"
draft = false
tags = ["python", "heapq"]
title = "Find n largest and smallest number in an iterable"
url = "/post/78863855937/find-n-largest-and-smallest-number-in-an-iterable"
+++
Python has `` sorted `` function which sorts iterable in ascending or descending order.

    # Sort descending
    In [95]: sorted([1, 2, 3, 4], reverse=True)
    Out[95]: [4, 3, 2, 1]
    
    # Sort ascending
    In [96]: sorted([1, 2, 3, 4], reverse=False)
    Out[96]: [1, 2, 3, 4]

`` sorted(iterable, reverse=True)[:n] `` will yield first `` n `` largest numbers. There is an alternate way.

Python has <a href="http://docs.python.org/2/library/heapq.html" target="_blank">heapq</a> which implements heap datastructure. `` heapq `` has function `` nlargest `` and `` nsmallest `` which take arguments `` n `` number of elements, `` iterable like list, dict, tuple, generator `` and optional argument `` key ``.

    In [85]: heapq.nlargest(10, [1, 2, 3, 4,])
    Out[85]: [4, 3, 2, 1]
    
    In [88]: heapq.nlargest(10, xrange(1000))
    Out[88]: [999, 998, 997, 996, 995, 994, 993, 992, 991, 990]
    
    In [89]: heapq.nlargest(10, [1000]*10)
    Out[89]: [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]
    
    In [99]: heapq.nsmallest(3, [-10, -10.0, 20.34, 0.34, 1])
    Out[99]: [-10, -10.0, 0.34]

Let’s say `` marks `` is a list of dictionary containing students marks. Now with `` heapq `` it is possible to find highest and lowest mark in a subject.

    In [113]: marks = [{'name': "Ram", 'chemistry': 23},{'name': 'Kumar', 'chemistry': 50}, {'name': 'Franklin', 'chemistry': 89}]
    
    In [114]: heapq.nlargest(1, marks, key=lambda mark: mark['chemistry'])
    Out[114]: [{'chemistry': 89, 'name': 'Franklin'}]
    
    In [115]: heapq.nsmallest(1, marks, key=lambda mark: mark['chemistry'])
    Out[115]: [{'chemistry': 23, 'name': 'Ram'}]

`` heapq `` can be used for building priority queue.

Note: <a href="http://ipython.org" target="_blank">IPython</a> is used in examples where `` In [114]: `` means `` Input line number 114 `` and `` Out[114] `` means `` Output line number 114 ``.

- [Counting elements with dictionary](https://kracekumar.com/post/77925927684/counting-elements-with-dictionary/index.md): Counting elements with dictionary

+++
date = "2014-02-26 19:07:00+00:00"
draft = false
tags = ["python", "dictionary"]
title = "Counting elements with dictionary"
url = "/post/77925927684/counting-elements-with-dictionary"
+++
Let’s say you want to find how many times each element is present in the list or tuple.

#### Normal approach

    words = ['a', 'the', 'an', 'a', 'an', 'the']
    d = {}
    for word in words:
        if word in d:
            d[word] += 1
        else:
            d[word] = 1
    print d
    {'a': 2, 'the': 2, 'an': 2} 

#### Better approach

    words = ['a', 'the', 'an', 'a', 'an', 'the']
    d = {}
    for word in words:
        d[word] = d.get(word, 0) + 1
    
    print d
    {'a': 2, 'the': 2, 'an': 2

Both the approach returned same values. The first one has 6 lines of logic and second has 3 lines of logic (less code less management).

Second approach uses `` d.get `` method. `` d.get(word, 0) `` return count of the word if key is present else `` 0 ``. If `` 0 `` isn’t passed `` get `` will return `` None ``.

#### Pythonic approach:

    import collections
    
    words = ['a', 'b', 'a']
    
    res = collections.Counter(words)
    
    print res
    Counter({'a': 2, 'b': 1})

Last approach is just one line and pythonic.

Snippet is extracted from <a href="https://www.youtube.com/watch?v=OSGv2VnC0go" target="_blank">Transforming Code into Beautiful, Idiomatic Python</a>. Do watch and enjoy.

- [Two scoops of django](https://kracekumar.com/post/76145181921/two-scoops-of-django/index.md): Two scoops of django
+++
date = "2014-02-09 20:44:31+00:00"
draft = false
tags = ["book-review", "django"]
title = "Two scoops of django"
url = "/post/76145181921/two-scoops-of-django"
+++
<a href="http://twoscoopspress.org/products/two-scoops-of-django-1-5" target="_blank">Two Scoops of Django -1.5</a> is book by <a href="https://twitter.com/pydanny" target="_blank">Pydanny</a> and <a href="https://twitter.com/audreyr" target="_blank">Audrey Roy</a> focusing on writing clean and better Django application.

If you are using Django in production this is must read book.

Q: I am using django since 0.8 do I need this book ?

A: Yes, consider the book as starting point to validate your assumption.

Q: I just started using django, should I read this ?

A: Yes. I started to use django in production last month. Sometimes I felt I should finish this book before pushing any code further. For every two or three chapters I can clearly find mistakes and fix it.

### Enjoyed pieces:

*   100% support for breaking dependencies and settings into multiple files. I always say this to my friends, never have single `` settings.py ``. Having private repo isn’t a solution to store secret information.
*   Never use `` Meta.exclude `` in Model forms. I did this mistake when I started.
*   Covering security pointers.
*   Advocacy for Class Based Views. Class based views makes code cleaner by breaking into relevant methods. It is easy to have fat function which can stretch to 10 to 15 lines while accpeting `` GET `` and `` POST ``.
*   Advicing not store to `` sessions, files, logs `` in DB. Django session will cause pain while migrating database (MySQL -> Postgres). It is better to store sessions in `` redis `` or `` riak ``.

### Conclusion:

If you are using django, read this book and see how many changes you are making to code. Worth every penny spent !

- [Updating model instance attribute in django](https://kracekumar.com/post/75983294047/updating-model-instance-attribute-in-django/index.md): Updating model instance attribute in django

+++
date = "2014-02-08 09:01:00+00:00"
draft = false
tags = ["python", "django"]
title = "Updating model instance attribute in django"
url = "/post/75983294047/updating-model-instance-attribute-in-django"
+++
It is very common to update single attribute of a model instance (say update `` first name `` in user profile) and save it to db.

    In [18]: u = User.objects.get(id=1)
    
    In [19]: u.first_name = u"kracekumar"
    
    In [20]: u.save()

Very straight forward approach. How does django send the sql query to database ?

    In [22]: from django.db import connection
    
    In [22]: connection.queries
    Out[22]: 
    [... 
    {u'sql': u'UPDATE "auth_user" SET "password" = \'pbkdf2_sha256$12000$vsHWOlo1ZhZg$DrC46wq+a2jEtEzxmUEw4vQw8oV/rxEK7zVi30QLGF4=\', "last_login" = \'2014-02-01 06:55:44.741284+00:00\', "is_superuser" = true, "username" = \'kracekumar\', "first_name" = \'kracekumar\', "last_name" = \'\', "email" = \'me@kracekumar.com\', "is_staff" = true, "is_active" = true, "date_joined" = \'2014-01-30 18:41:18.174353+00:00\' WHERE "auth_user"."id" = 1 ', u'time': u'0.001'}]

Not happy. Honestly it should be `` UPDATE auth_user SET first_name = 'kracekumar' WHERE id = 1 ``. Django should ideally update modified fields.

Right way to do is

    In [23]: User.objects.filter(id=u.id).update(first_name="kracekumar")
    Out[23]: 1
    
    In [24]: connection.queries
    Out[24]:
    [...
    {u'sql': u'UPDATE "auth_user" SET "first_name" = \'kracekumar\' WHERE "auth_user"."id" = 1 ', u'time': u'0.001'}]

Yay! Though both queries took same amount of time, latter is better.

### Edit: There is one more cleaner way to do it.

    In [60]: u.save(update_fields=['first_name'])
    
    In [61]: connection.queries
    Out[61]: 
    [...
    {u'sql': u'UPDATE "auth_user" SET "first_name" = \'kracekumar\'  WHERE "auth_user"."id" = 1 ',
    u'time': u'0.001'}]

- [how not to insult developer while hiring](https://kracekumar.com/post/75177239638/how-not-to-insult-developer-while-hiring/index.md): how not to insult developer while hiring

+++
date = "2014-01-31 20:28:18+00:00"
draft = false
tags = ["startup", "hiring"]
title = "how not to insult developer while hiring"
url = "/post/75177239638/how-not-to-insult-developer-while-hiring"
+++
In December I was looking for new job. I came across one and applied. After couple of rounds of interview, co founder informed me we will get back to you, but never happened. This sounds normal but it is not.

### Interview

First round of interview started with online pair programming with one of the co-founder X. After that I went to their office, met product manager, co founder X had discussion for an hour. Then we decided we will meet again. After couple of days solved two more problems. Then again pair programmed with other co-founder Y. Then discussed with co-founder X about company style, roles, expectation, before leaving I was interviewed by another team member over cup of coffee for half an hour. Co-founder X replied, I will get back to you tonight. Three days passed nothing happened, I dropped a Thank you email, still no response.

### Conclusion:

I have attended 5 rounds of interview (every thing was technical) over a week span and still I haven’t received any response (it’s more than a month now ). I know as a co-founder you are interviewing hundreds of developers, managing technical issue, dozens of emails to reply, deadline to ship, but it also important to drop a line saying `` Krace, we weren't happy with your performance, Blah Blah ``. Given that I have spent significant amount of time spent interviewing, I expect minimum value for my time. Now you have also given me a reason why I shouldn’t apply to you again. If you are founder/co-founder/recruiter reading this don’t make this kind of mistake. Drop an email after an interview with your decision.

- [On leaving HasGeek](https://kracekumar.com/post/71708603301/on-leaving-hasgeek/index.md): On leaving HasGeek

+++
date = "2013-12-31 03:41:00+00:00"
draft = false
tags = ["HasGeek", "Recruiterbox"]
title = "On leaving HasGeek"
url = "/post/71708603301/on-leaving-hasgeek"
+++
Today (31-12-2013) is my last working day at HasGeek. It was in July 2012, I <a href="http://kracekumar.com/post/26494437210/how-i-got-into-hasgeek-crew" target="_blank">joined</a> <a href="http://hasgeek.com" target="_blank">HasGeek</a>. It was fabulous journey for past 18 months, meeting lot of new people, being part of events, writing lot of code. I was part of large, medium, small conferences like <a href="https://fifthelephant.in/2013/" target="_blank">Fifth elephant</a> (2012, 2013), <a href="https://cartonama.com/2012/" target="_blank">Cartonama</a> (2012), <a href="https://jsfoo.in/2013/" target="_blank">JSFoo</a>(2012, 2013), <a href="http://droidcon.in" target="_blank">Droidcon</a> (2012, 2013), <a href="https://metarefresh.in/2014/" target="_blank">Metarefresh</a> (2013), various <a href="https://hacknight.in" target="_blank">hacknight</a> and <a href="https://geekup.in" target="_blank">geekup</a>.

I will be joning Aplopio Technology Inc, flagship product <a href="http://recruiterbox.com/" target="_blank">recruiterbox</a> on 16 January, 2014.

- [introduction to python](https://kracekumar.com/post/71171551647/introduction-to-python/index.md): introduction to python
+++
date = "2013-12-26 04:38:41+00:00"
draft = false
tags = ["python", "introduction"]
title = "introduction to python"
url = "/post/71171551647/introduction-to-python"
+++
This is the material which I use for teaching python to beginners.

__tld;dr__: Very minimal explanation more code.

__Python?__

*   Interpreted language
*   Multiparadigm

__Introduction__

    hasgeek@hasgeek-MacBook:~/codes/python/hacknight$ python
    Python 2.7.3 (default, Aug  1 2012, 05:14:39)
    [GCC 4.6.3] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>>


    >>> print "Let's learn Python"
    Let's learn Python

__Numbers__

    >>> 23 + 43
    66
    >>> 23 - 45
    -22
    >>> 23 * 45
    1035
    >>> 23 ** 4
    279841
    >>> 23 / 4
    5
    >>> 23 / 4.0
    5.75
    >>> 7 % 2
    1

__Expressions__

    >>> 3 < 2
    False
    >>> 3 > 2
    True
    >>> 3 > 2 < 1
    False
    >>> (3 > 2) and (2 < 1)
    False
    >>> 3 > 2 > 1 > 0
    True
    >>> (3 > 2) and (2 > 1) and (1 > 0)
    True
    >>> 1 or 2
    1
    >>> 2 or 1
    2
    >>> 1 + 2 + 3 * 4 + 5
    20
    1 + 2 + 3 * 4 + 5
    ↓
    3   + 3 * 4 + 5
            ↓
    3   +   12  + 5
          ↓
        15      + 5
              ↓
             20

    >>> "python" > "perl"
    True
    >>> "python" > "java"
    True

__Variables__

    >>> a = 23
    >>> print a
    23
    >>> a = "Python"
    >>> print a
    Python

__Guess the output__

    True = False
    False = True
    print True, False
    print 2 > 3

__Parallel Assignment__

    >>> language, version = "Python", 2.7
    >>> print language, version
    Python 2.7
    >>> x = 23
    >>> x = 23
    >>> y = 20
    >>> x, y = x, x + y
    >>> print x, y
    23 43

__Guess the output__

    z, y = 23, z + 23
    a, b = 23, 12, 20
    a = 1, 2

__Swap Variable__

    >>> x = 12
    >>> y = 21
    >>> x, y = y, x
    >>> print x, y
    21 12
    >>>

__String__

    >>> language = "Python"
    >>> print language
    Python
    >>> language = 'Python'
    >>> print language
    Python
    >>> language = """Python"""
    >>> print language
    Python
    >>> description = """Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability.
    ... It is an expressive language which provides language constructs intended to enable clear programs on both a small and large scale.
    ... Python supports multiple programming paradigms, including object-oriented, imperative and functional programming styles.
    ... """
    >>> print description
    Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability.
    It is an expressive language which provides language constructs intended to enable clear programs on both a small and large scale.
    Python supports multiple programming paradigms, including object-oriented, imperative and functional programming styles.
    >>>

__Guess output__

    name = "krace" + "kumar"
    print name
    print name[0]
    name[0] = "K"

__Guess output__

    print 1 + 2.5
    print "kracekumar" + 23

__Condition__

Write a program to find greatest of two numbers.

    >>> a = 12
    >>> b = 23
    >>> if a > b:
    ...     print "a is greater than b"
    ... else:
    ...     print "b is greater than a"
    ...
    b is greater than a
    >>> if a > 0:
    ...     print "a is positive"
    ... elif a == 0:
    ...     print "a is zero"
    ... elif a < 0:
    ...     print "a is negative"
    ...
    a is positive

__Data Structure__

_List_

List is a collection of heterogenous data types like integer, float, string.

    >>> a = [1, 2, 3]
    >>> b = ["Python", 2.73, 3]
    >>> len(a)
    3
    >>> len(b)
    3
    >>> a[0]
    1
    >>> a[-1]
    3
    >>> b[2]
    3
    >>> [1, 2] + [3, 4]
    [1, 2, 3, 4]
    >>> all = [a, b]
    >>> all[0]
    [1, 2, 3]
    >>> all[-1]
    ['Python', 2.73, 3]
    >>> all[3]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    IndexError: list index out of range
    >>> all.append("Bangalore")
    >>> all
    [[1, 2, 3], ['Python', 2.73, 3], 'Bangalore']
    >>> del all[-1]
    >>> all
    [[1, 2, 3], ['Python', 2.73, 3]]
    >>> all[1] = "insert"
    >>> all
    [[1, 2, 3], 'insert']
    >>> all
    [[1, 2, 3], 'insert']
    >>> 'insert' in all
    True
    >>> range(10)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> range(10, 2)
    []
    >>> range(10, 0, -1)
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
    >>> range(0, 12, 1)
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]



    range() -> `range([start,] stop[, step]) -> list of integers`

_Slicing_

    >>> l = [1, 2, 3, 4, 5, 6, 7]
    [1, 2, 3, 4, 5, 6, 7]
    >>> l[:2] #first two elements
    [1, 2]
    >>> l[2:] #exclude first two elements
    [3, 4, 5, 6, 7]
    >>> l[::2] #every second element
    [1, 3, 5, 7]
    >>> l[::1] #every element
    [1, 2, 3, 4, 5, 6, 7]
    >>> l[::3] #every third element
    [1, 4, 7]
    >>> l[::10] #every tenth element
    [1]
    >>> l[::-1]
    [7, 6, 5, 4, 3, 2, 1]

__Guess the output__

>  <<<BLOCKQUOTE: <<<BLOCKQUOTE:
> l\[1:7:2\] \[\]\[:2\] \[1\]\[:2\]
> >>>>>>

_Accessing list elements_

    >>> for item in all:
    ...     print item
    ...
    [1, 2, 3]
    insert
    >>> for number in range(10):
    ...     print number
    ...
    0
    1
    2
    3
    4
    5
    6
    7
    8
    9

_Find all odd numbers from 0 to 9_

    >>> for number in range(0, 10):
    ...     if number % 2:
    ...         print number
    ...
    1
    3
    5
    7
    9

__inbuilt functions__

    >>> help([])
    >>> min([1, 2, 3])
    1
    >>> max([1, 2, 3])
    3
    >>> sum([1, 2, 3])
    6
    >>> pow(2, 3)
    8

__Write program which takes a number as input and if number is divisible by 3, 5 print Fizz, Buzz, FizzBuzz respectively__

    import sys


    if __name__ == "__main__":
        if len(sys.argv) == 2:
            number = int(sys.argv[1])
        if number % 15 == 0:
            print "FizzBuzz"
        elif number % 3 == 0:
            print "Fizz"
        elif number % 5 == 0:
            print "Buzz"
        else:
            print number

    else:
        print "python filename.py 23 is the format"

__Tuples__

Tuple is a sequence type just like list, but it is immutable.A tuple consists of a number of values separated by commas.

    >>> t = (1, 2)
    >>> t
    (1, 2)
    >>> t[0]
    1
    >>> t[0] = 1.1
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object does not support item assignment
    >>> t = 1, 2
    >>> t
    (1, 2)
    >>> del t[0]
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    TypeError: 'tuple' object doesn't support item deletion
    >>> for item in t:
    ...     print item
    ...
    1
    2

__Sets__

Sets are unordered collection of unique elements.

    >>> x = set([1, 2, 1])
    >>> x
    set([1, 2])
    >>> x.add(3)
    >>> x
    set([1, 2, 3])
    >>> x = {1, 3, 4, 1}
    >>> x
    set([1, 3, 4])
    >>> 1 in x
    True
    >>> -1 in x
    False
    >>>

__Again Lists__

    >>> even_numbers = []
    >>> for number in range(0, 9):
    ...     if number % 2 == 0:
    ...         even_numbers.append(number)
    ...
    >>> even_numbers
    [0, 2, 4, 6, 8]

`` As a programmer your job is write lesser code ``

__List Comprehensions__

    >>> [x for x in range(10)]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> [x + 1 for x in range(10)]
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    >>> numbers = []
    >>> for x in range(10):
    ...     numbers.append(x + 1)
    ...
    >>> print numbers
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    >>> even_numbers = [x for x in range(10) if x %2 == 0]
    >>> even_numbers
    [0, 2, 4, 6, 8]
    >>> [(x, y) for x in range(5) for y in range(5) if (x+y)%2 == 0]
    [(0, 0), (0, 2), (0, 4), (1, 1), (1, 3), (2, 0), (2, 2), (2, 4), (3, 1), (3, 3    ), (4, 0), (4, 2), (4, 4)]
    >>>

__Dictionaries__

    >>> d = {'a': 1, 'b': 2, 'c': 3}
    >>> d['a']
    1
    >>> d.get('a')
    1
    >>> d['z']
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    KeyError: 'z'
    >>> d.get('z')
    >>>
    >>> d['a'] = 2
    >>> d
    {'a': 2, 'c': 3, 'b': 2}
    >>> d['z'] = 26
    >>> d
    {'a': 2, 'c': 3, 'b': 2, 'z': 26}
    >>> d.keys()
    ['a', 'c', 'b', 'z']
    >>> d.values()
    [2, 3, 2, 26]
    >>> d.items()
    [('a', 2), ('c', 3), ('b', 2), ('z', 26)]
    >>> type(d.items())
    <type 'list'>
    >>> d = {'a': 2, 'b': 2, 'c': 3, 'z': 26}
    >>> for key in d:
    ...     print key
    ...
    a
    c
    b
    z
    >>> for key, value in d.items():
    ...     print key, value
    ...
    a 2
    c 3
    b 2
    z 26
    >>> 'a' in d
    True
    >>> d.has_key('a')
    True

__Function__

Just like a value can be associated with a name, a piece of logic can also be associated with a name by defining a function.

    >>> def square(x):
    ...     return x * x
    ...
    >>> square(2)
    4
    >>> square(2+1)
    9
    >>> square(x=5)
    25
    >>> def dont_return(name):
    ...     print "Master %s ordered not to return value" % name
    ...
    >>> dont_return("Python")
    Master Python ordered not to return value
    >>> def power(base, to_raise=2):
    ...     return base ** to_raise
    ...
    >>> power(3)
    9
    >>> power(3, 3)
    27
    >>> def power(to_raise=2, base):
    ...     return base ** to_raise
    ...
    File "<stdin>", line 1
    SyntaxError: non-default argument follows default argument
    >>> square(3) + square(4)
    25
    >>> power(base=square(2))
    16
    >>> def sum_of_square(x, y):
    ...     return square(x) + square(y)
    ...
    >>> sum_of_square(2, 3)
    13
    >>> s = square
    >>> s(4)
    16
    >>> def fxy(f, x, y):
    ...     return f(x) + f(y)
    ...
    >>> fxy(square, 3, 4)
    25

__Methods__- Methods are special kind of functions that work on an object.

    >>> lang = "Python"
    >>> type(lang)
    <type 'str'>
    >>> dir(lang)
    ['__add__', '__class__', '__contains__', '__delattr__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
    '__getitem__', '__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__', '__le__', '__len__', '__lt__',
    '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__',
    '__sizeof__', '__str__', '__subclasshook__', '_formatter_field_name_split', '_formatter_parser',
    'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'index', 'isalnum',
    'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
    'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',     'split', 'splitlines',
    'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
    >>> lang.upper()
    'PYTHON'
    >>> help(lang.upper)
    >>> lang.startswith('P')
    True
    >>> help(lang.startswith)

    >>> lang.startswith('y', 1)
    True

__Files__

    >>> f = open('foo.txt', 'w')
    >>> help(f)

    >>> f.write("First line")
    >>> f.close()
    >>> f = open('foo.txt', 'r')
    >>> f.readline()
    'First line'
    >>> f.readline()
    ''
    >>> f = open('foo.txt', 'a')
    >>> f.write('Second line')
    >>> f.close()
    >>> f = open('foo.txt', 'r')
    >>> f.readline()
    'First lineSecond line'
    >>> f = open('foo.txt', 'a')
    >>> f.write("New line\n")
    >>> f.write("One more new line")
    >>> f.close()
    >>> f = open('foo.txt', 'r')
    >>> f.readline()
    'First lineSecond lineNew line\n'
    >>> f.readline()
    'One more new line'
    >>> f.readline()
    ''
    >>> f.close()
    >>> f = open('foo.txt')
    >>> f.readlines()
    ['First lineSecond lineNew line\n', 'One more new line']
    >>> f = open('foo.txt', 'w')
    >>> f.writelines(["1\n", "2\n"])
    >>> f.close()
    >>> f.readlines()
    >>> f = open('foo.txt')
    >>> f.readlines()
    ['1\n', '2\n']
    >>> f.close()

__Exception Handling__

    >>> f = open('a.txt')
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    IOError: [Errno 2] No such file or directory: 'a.txt'
    >>> try:
    ...     f = open('a.txt')
    ... except:
    ...     print "Exception occured"
    ...
    Exception occured
    >>> try:
    ...     f = open('a.txt')
    ... except IOError, e:
    ...     print e.message
    ...

    >>> e
    IOError(2, 'No such file or directory')
    >>> dir(e)
    ['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__getitem__', '__getslice__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '__unicode__', 'args', 'errno',     'filename', 'message', 'strerror']
    >>> e.strerror
    'No such file or directory'
    >>> try:
    ...     print l[4]
    ... except IndexError, e:
    ...     print e
    ...
    list index out of range
    >>> raise Exception("error message")
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    Exception: error message
    >>> try:
    ...     print "a"
    ...     raise Exception("doom")
    ... except:
    ...     print "b"
    ... else:
    ...     print "c"
    ... finally:
    ...     print "d"
    ...
    a
    b
    d

__Object Oriented Programming__

    >>> class BankAccount:
            def __init__(self):
                self.balance = 0

            def withdraw(self, amount):
                self.balance -= amount
                return self.balance

            def deposit(self, amount):
                self.balance += amount
                return self.balance

    >>> a = BankAccount()
    >>> b = BankAccount()
    >>> a.deposit(200)
    200
    >>> b.deposit(500)
    500
    >>> a.withdraw(20)
    180
    >>> b.withdraw(1000)
    -500
    >>> class MinimumBalanceAccount(BankAccount):
    ...    def __init__(self, minimum_balance):
    ...        BankAccount.__init__(self)
    ...        self.minimum_balance = minimum_balance
    ...
    ...    def withdraw(self, amount):
    ...        if self.balance - amount < self.minimum_balance:
    ...            print "Sorry, you need to maintain minimum balance"
    ...        else:
    ...            return BankAccount.withdraw(self, amount)
    >>> a = MinimumBalanceAccount(500)
    >>> a
    <__main__.MinimumBalanceAccount instance at 0x7fa0bf329878>
    >>> a.deposit(2000)
    2000
    >>> a.withdraw(1000)
    1000
    >>> a.withdraw(1000)
    Sorry, you need to maintain minimum balance
    >>> class A:
    ...     def f(self):
    ...         return self.g()
    ...     def g(self):
    ...         return "A"
    ...
    >>> a = A()
    >>> a.f()
    'A'
    >>> a.g()
    'A'
    >>> class A:
    ...     def __init__(self):
    ...         self._protected = 1
    ...         self.__private = 2
    ...
    >>> a = A()
    >>> a._protected
    1
    >>> a.__private
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    AttributeError: A instance has no attribute '__private'

__Sample Python Program__

    #! /usr/bin/env python
    # -*- coding: utf-8 -*-


    class BankAccount:
        def __init__(self):
            self.balance = 0

        def withdraw(self, amount):
            self.balance -= amount
            return self.balance

        def deposit(self, amount):
            self.balance += amount
            return self.balance


    class MinimumBalanceAccount(BankAccount):
        def __init__(self, minimum_balance):
            BankAccount.__init__(self)
            self.minimum_balance = minimum_balance

        def withdraw(self, amount):
            if self.balance - amount < self.minimum_balance:
                print "Sorry, you need to maintain minimum balance"
            else:
                return BankAccount.withdraw(self, amount)

        def __repr__(self):
            return "MinimuBalanceAccount, Balance: %d" %(self.balance)


    if __name__ == "__main__":
        a = MinimumBalanceAccount(500)
        print a.deposit(5000)
        print a.withdraw(4500)
        print a.withdraw(500)

Few examples are taken from <a href="http://anandology.com/python-practice-book/" target="_blank">python practice book</a>.

Github repo: <a href="https://github.com/kracekumar/python-training" target="_blank">https://github.com/kracekumar/python-training</a>

- [Deploying full fledged flask app in production](https://kracekumar.com/post/71120049966/deploying-full-fledged-flask-app-in-production/index.md): Deploying full fledged flask app in production
+++
date = "2013-12-25 18:08:34+00:00"
draft = false
tags = ["python", "flask", "nginx", "uwsgi", "deployment"]
title = "Deploying full fledged flask app in production"
url = "/post/71120049966/deploying-full-fledged-flask-app-in-production"
+++
This article will focus on deploying flask app starting from scratch like creating separate linux user, installating database, web server. Web server will be <a href="http://wiki.nginx.org/Main" target="_blank">nginx</a>, database will be <a href="http://www.postgresql.org/" target="_blank">postgres</a>, python 2.7 middleware will be <a href="http://uwsgi-docs.readthedocs.org" target="_blank">uwsgi</a>, server <a href="http://www.ubuntu.com/download/server" target="_blank">ubuntu 13.10 x64</a>. Flask app name is `` fido ``. Demo is carried out in <a href="https://www.digitalocean.com/?refcode=6c1c3a08e4ab" target="_blank">Digital ocean</a>.

### Step 1 - Installation


__Python header__

    root@fido:~# apt-get install -y build-essential python-dev

__Install uwsgi dependencies__

    root@fido:~# apt-get install -y libxml2-dev libxslt1-dev

__Nginx, uwsgi__

    root@fido:~#  apt-get install -y nginx uwsgi uwsgi-plugin-python

Start nginx

    root@fido:~# service nginx start
    * Starting nginx nginx                                                          [ OK ]

__Postgres__

    root@fido:~# apt-get install -y postgresql postgresql-contrib libpq-dev

### Step 2 - User

__Create a new linux user fido__

    root@fido:~# adduser fido

Enter all the required details.

    root@fido:~# ls /home
    fido

Successfully new user is created.

Grant `` fido `` root privilege.

    root@fido:~# /usr/sbin/visudo
    # User privilege specification

    root    ALL=(ALL:ALL) ALL
    fido    ALL=(ALL:ALL) ALL

Since `` fido `` is not normal user, delete fido’s home directory.

    root@fido:~# rm -rf /home/fido
    root@fido:~# ls /home
    root@fido:~#

__Create a new db user fido__

    root@fido:~# su - postgres
    postgres@fido:~$ createuser --pwprompt
    Enter name of role to add: fido
    Enter password for new role:
    Enter it again:
    Shall the new role be a superuser? (y/n) y

`` --pwprompt `` will prompt for password. `` release `` is the password I typed (we need this to connect db from app).

__Create a new database fido__

    postgres@fido:~$ createdb fido;
    postgres@fido:~$ psql -U fido -h localhost
    Password for user fido:
    psql (9.1.10)
    SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
    Type "help" for help.
    fido=# \d
    No relations found.

Done. New database role `` fido `` and database is created. We are successfully able to login.

### Step 3 - Python dependencies

__Install pip__

<pre><code>root@fido:# cd /tmp
root@fido:/tmp# wget <a href="https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py" target="_blank">https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py</a>
root@fido:/tmp# python ez_setup.py install
root@fido:/tmp# easy_install pip
# What is easy_install ? Python package manager.
# what is pip ? Python package manager.
# How to install pip ? easy_install pip.
# Shame on python :-(
...
Installed /usr/local/lib/python2.7/dist-packages/pip-1.4.1-py2.7.egg
Processing dependencies for pip
Finished processing dependencies for pip
</code></pre>

__Install virtualenv__

    root@fido:/tmp# pip install virtualenv

### Step 4 - Install app dependencies

Here is the sample app code. The app is just for demo. The app will be placed in `` /var/www/fido ``. Normally in production, this will be similar to `` git clone <url> `` or `` hg clone <url> `` inside directory. Make sure you aren’t sudo while cloning.

    root@fido:/tmp# cd /var
    root@fido:/var# mkdir www
    root@fido:/var# mkdir www/fido

Change the owner of the repo to `` fido ``.

    root@fido:/var# chown fido:fido www/fido
    root@fido:/var# ls -la www/
    total 12
    drwxr-xr-x  3 root root 4096 Dec 25 03:18 .
    drwxr-xr-x 14 root root 4096 Dec 25 03:18 ..
    drwxr-xr-x  2 fido fido 4096 Dec 25 03:18 fido

`` app.py `` - fido application.

    # /usr/bin/env python

    from flask import Flask, request
    from flask.ext.sqlalchemy import SQLAlchemy

    app = Flask(__name__)
    app.config['SQLALCHEMY_DATABASE_URI'] = "postgres://fido:release@localhost:5432/fido"
    db = SQLAlchemy(app)


    class Todo(db.Model):
        id = db.Column(db.Integer(), nullable=False, primary_key=True)
        name = db.Column(db.UnicodeText(), nullable=False)
        status = db.Column(db.Boolean(), default=False, nullable=True)


    @app.route("/")
    def index():
        return "Index page. Use /new create a new todo"


    @app.route('/new', methods=['POST'])
    def new():
        form = request.form
        name, status = form.get('name'), form.get('status') or False
        todo = Todo(name=name, status=status)
        db.session.add(todo)
        db.session.commit()
        return "Created todo: {}".format(name)


    if __name__ == "__main__":
        db.create_all()
        app.run('0.0.0.0', port=3333, debug=True)

Add a `` wsgi `` file `` website.py ``

    root@fido:/var/www/fido# cat website.py
    import sys
    import os.path
    sys.path.insert(0, os.path.dirname(__file__))
    from app import app as application

Files in `` fido `` directory.

    root@fido:/var/www/fido# tree .
    .
    ├── app.py
    ├── __init__.py
    └── website.wsgi

    0 directories, 3 files

### Step 5 - Virtual env and dependencies

    root@fido:/var/www/fido# virtualenv --no-site-packages env
    root@fido:/var/www/fido# . env/bin/activate
    (env)root@fido:/var/www/fido# pip install flask sqlalchemy flask-sqlalchemy psycopg2

### Step 6 - final setup

__Create uwsgi config file__

    # Add following lines to fido.ini file
    root@fido:/etc# cat uwsgi/apps-enabled/fido.ini
    [uwsgi]
    socket = 127.0.0.1:5000
    threads = 2
    master = true
    uid = fido
    gid = fido
    chdir = /var/www/fido
    home = /var/www/fido/env/
    pp = ..
    module = website

Check whether uwsgi is booting up properly.

    root@fido:/var/www/fido# uwsgi --ini /etc/uwsgi/apps-enabled/fido.ini
    ...
    ...
    Python version: 2.7.5+ (default, Sep 19 2013, 13:52:09)  [GCC 4.8.1]
    Set PythonHome to /var/www/fido/env/
    Python main interpreter initialized at 0xb96a70
    python threads support enabled
    your server socket listen backlog is limited to 100 connections
    your mercy for graceful operations on workers is 60 seconds
    mapped 165920 bytes (162 KB) for 2 cores
    *** Operational MODE: threaded ***
    added ../ to pythonpath.
    WSGI app 0 (mountpoint='') ready in 1 seconds on interpreter 0xb96a70 pid:  17559 (default app)
    *** uWSGI is running in multiple interpreter mode ***
    spawned uWSGI master process (pid: 17559)
    spawned uWSGI worker 1 (pid: 17562, cores: 2)

`` uwsgi `` is able to load the app without any issues. Kill the uwsgi using keyboard interrupt.

Now lets create table.

<pre><code>root@fido:/var/www/fido# . env/bin/activate
(env)root@fido:/var/www/fido# python app.py
* Running on <a href="http://0.0.0.0:3333/" target="_blank">http://0.0.0.0:3333/</a>
* Restarting with reloader
</code></pre>

Exit the program, `` db.create_all() `` must have created the table. Normally in production environment, it is advised to use `` python manage.py db create `` or any similar approach.

__Configure nginx__

    root@fido:/var/www/fido# cat /etc/nginx/sites-enabled/fido.in
    upstream flask {
        server 127.0.0.1:5000;
    }

    # configuration of the server
    server {
        # the domain name it will serve for
        listen 127.0.0.1; # This is very important to test the server locally
        server_name fido.in; # substitute your machine's IP address or FQDN
        charset     utf-8;

        location / {
            uwsgi_pass  flask;
            include uwsgi_params;
        }
    }

Now `` nginx `` and `` uwsgi `` will be running in background. Restart them.

    root@fido:/var/www/fido# service nginx restart
    * Restarting nginx nginx                                                             [ OK ]
    root@fido:/var/www/fido# service uwsgi restart
    * Restarting app server(s) uwsgi                                                    [ OK ]

Machine name is `` fido ``, so lets try `` curl http://fido ``

    root@fido:/var/www/fido# curl http://fido
    Index page. Use /new create a new todo

Create a new task.

    root@fido:/var/www/fido#curl --data "name=write blog post about flask deployment"  http://fido/new
    Created todo: write blog post about flask deployment

We have successfully deployed `` flask `` + `` uwsgi `` + `` nginx ``.

Since we installed `` uwsgi `` from ubuntu repo, it is started as `` upstart `` process, that is why we issue commands like `` service uwsgi restart ``.

To see all upstart service, try `` service --status-all ``.

If you are running multiple web application in a single server, create one user per application.

- [ipynb2viewer - Afternoon hack](https://kracekumar.com/post/70778191617/ipynb2viewer-afternoon-hack/index.md): ipynb2viewer - Afternoon hack
+++
date = "2013-12-22 11:50:16+00:00"
draft = false
tags = ["ipython", "python", "nbviewer", "ipynb2viewer"]
title = "ipynb2viewer - Afternoon hack"
url = "/post/70778191617/ipynb2viewer-afternoon-hack"
+++
`` ipython nbconvert `` has lot of handy options to convert `` ipynb `` to `` markdown ``, `` html `` etc… But I wanted to upload `` ipynb `` to `` gist.github.com `` and create a link in `` nbviewer.ipython.org ``. Started with `` curl `` and soon realized, it is getting messy. So wrote a small python program `` ipynb2viewer ``, <a href="https://github.com/kracekumar/ipynb2viewer" target="_blank">Source code</a>.

## Install

*   `` pip install ipynb2viewer ``

## Usage

Upload all ipynb files in the given path to gist.github.com and return nbviewer urls.

*   `` ipynb2viewer all <path> ``

Upload mentioned file to gist.github.com and return nbviewer url.

*   `` ipynb2viewer file <filename> ``

Upload mentioned file to gist.github.com and open nbviewer url in webbrowser.

*   `` ipynb2viewer file <filename> --open ``

Upload all ipynb files in the given path to gist.github.com and open nbviewer urls in webbrowser.

*   `` ipynb2viewer all <path> --open ``

## Example

<pre><code>➜  ipynb2viewer git:(master) ipynb2viewer all tests --open

Uploaded test.ipynb file to gist <a href="https://gist.github.com/8081202" target="_blank">https://gist.github.com/8081202</a>
nbviewer url: <a href="http://nbviewer.ipython.org/gist/anonymous/8081202" target="_blank">http://nbviewer.ipython.org/gist/anonymous/8081202</a>
Uploaded try.ipynb file to gist <a href="https://gist.github.com/8081203" target="_blank">https://gist.github.com/8081203</a>
nbviewer url: <a href="http://nbviewer.ipython.org/gist/anonymous/8081203" target="_blank">http://nbviewer.ipython.org/gist/anonymous/8081203</a>
</code></pre>

- [Why you should use IPython](https://kracekumar.com/post/70697151386/why-you-should-use-ipython/index.md): Why you should use IPython

+++
date = "2013-12-21 17:27:06+00:00"
draft = false
tags = ["ipython"]
title = "Why you should use IPython"
url = "/post/70697151386/why-you-should-use-ipython"
+++
In December bangpypers meetup, I gave a talk on Why you should use IPython. Here is the link to <a href="http://nbviewer.ipython.org/gist/kracekumar/8059932" target="_blank">nbviewer</a>.

- [Autogenerate Dockerfile from ubuntu image](https://kracekumar.com/post/70198562577/autogenerate-dockerfile-from-ubuntu-image/index.md): Autogenerate Dockerfile from ubuntu image
+++
date = "2013-12-16 16:38:44+00:00"
draft = false
tags = ["docker", "ubuntu"]
title = "Autogenerate Dockerfile from ubuntu image"
url = "/post/70198562577/autogenerate-dockerfile-from-ubuntu-image"
+++
I was learning docker to use in one of my projects. I kept installing packages, finally created required Ubuntu image. Suddenly thought it would be cool if I can generate `` Dockerfile `` like `` pip freeze > requirements.txt ``.

    sudo docker run ubuntu dpkg --get-selections | awk '{print $1}' >  base.list &amp;&amp; sudo docker run pyextra dpkg --get-selections | awk '{print $1}'> pyextra.list &amp;&amp; sort base.list pyextra.list | uniq -u > op.list &amp;&amp; python -c "f = open('Dockerfile', 'w');f.write('from ubuntu\n');f.write('RUN apt-get update\n');for line in open('op.list').readlines():f.write('run apt-get install -y {0}'.format(line));"

### Breakdown

Start the docker in daemon mode, then execute following commands.

`` sudo docker run ubuntu dpkg --get-selections | awk '{print $1}' >  base.list ``

Displays all packages installed in ubuntu base, extract the package name and write to `` base.list `` file.

`` sudo docker run pyextra dpkg --get-selections | awk '{print $1}'> pyextra.list ``

`` pyextra `` is docker image name. Get all installed packages names and write to `` pyextra.list ``.

`` sort base.list pyextra.list | uniq -u > op.list ``

Find the packages not in `` base `` distro and add to `` op.list ``. Now freshly installed package names are available.

    python -c "f = open('Dockerfile', 'w');
    f.write('from ubuntu\n');f.write('RUN apt-get update\n');
    for line in open('op.list').readlines():f.write('run apt-get install -y    {0}'.format(line));

Convert `` op.list `` file into `` Dockerfile ``.

### Disadavantages

*   This method produces dependencies as packages. `` apt-get install vim `` will depend on `` vim-common ``, `` vim-common `` is specified as package.

Is it worth to create a library which will do this ?

- [easy_install broken in os x mavericks  ](https://kracekumar.com/post/67761708966/easyinstall-broken-in-os-x-mavericks/index.md): easy_install broken in os x mavericks  
+++
date = "2013-11-22 17:02:21+00:00"
draft = false
tags = ["python", "easy_install", "setuptools"]
title = "easy_install broken in os x mavericks  "
url = "/post/67761708966/easyinstall-broken-in-os-x-mavericks"
+++
I hardly use `` easy_install ``. Nowadays all the python requirements are installed via `` pip ``.

IPython is my primary python console. After installing mavericks, I installed IPython and fired IPython console. Following warning message appeared

    ➜  ~  ipython
    /Library/Python/2.7/site-packages/IPython/utils/rlineimpl.py:94: RuntimeWarning:
    ******************************************************************************
    libedit detected - readline will not be well behaved, including but not limited to:
       * crashes on tab completion
       * incorrect history navigation
       * corrupting long-lines
       * failure to wrap or indent lines properly
    It is highly recommended that you install readline, which is easy_installable:
         easy_install readline
    Note that `pip install readline` generally DOES NOT WORK, because
    it installs to site-packages, which come *after* lib-dynload in sys.path,
    where readline is located.  It must be `easy_install readline`, or to a custom
    location on your PYTHONPATH (even --user comes after lib-dyload).
    ******************************************************************************
      RuntimeWarning)
    Python 2.7.5 (default, Aug 25 2013, 00:04:04)
    Type "copyright", "credits" or "license" for more information.

IPython complains `` readline `` is missing and insisting to use `` easy_install ``. Then I tried

    ➜   ~  sudo easy_install-2.7 --upgrade setuptools
    Traceback (most recent call last):
      File "/usr/bin/easy_install-2.7", line 7, in <module>
        from pkg_resources import load_entry_point
      File "/Library/Python/2.7/site-packages/pkg_resources.py", line 2797, in <module>
        parse_requirements(__requires__), Environment()
      File "/Library/Python/2.7/site-packages/pkg_resources.py", line 576, in resolve
        raise DistributionNotFound(req)
    pkg_resources.DistributionNotFound: setuptools==0.6c12dev-r88846

I know I can’t use pip, but `` distribute `` is already installed.

Now only way to install readline is to use `` easy_install `` in Python site-packages. Normally `` easy_install `` is installed in `` /usr/bin ``.

<pre><code>➜   ~  which easy_install
/usr/bin/easy_install
➜  ~  sudo python /Library/Python/2.7/site-packages/easy_install.py readline
Searching for readline
Reading <a href="https://pypi.python.org/simple/readline/" target="_blank">https://pypi.python.org/simple/readline/</a>
Best match: readline 6.2.4.1
Downloading <a href="https://pypi.python.org/packages/2.7/r/readline/readline-6.2.4.1-py2.7-macosx-10.7-intel.egg#md5=6ede61046a61219a6d97c44a75853c23" target="_blank">https://pypi.python.org/packages/2.7/r/readline/readline-6.2.4.1-py2.7-macosx-10.7-intel.egg#md5=6ede61046a61219a6d97c44a75853c23</a>
Processing readline-6.2.4.1-py2.7-macosx-10.7-intel.egg
creating /Library/Python/2.7/site-packages/readline-6.2.4.1-py2.7-macosx-10.7-intel.egg
Extracting readline-6.2.4.1-py2.7-macosx-10.7-intel.egg to /Library/Python/2.7/site-packages
Adding readline 6.2.4.1 to easy-install.pth file

Installed /Library/Python/2.7/site-packages/readline-6.2.4.1-py2.7-macosx-10.7-intel.egg
Processing dependencies for readline
Finished processing dependencies for readline

➜  ~  ipython
Python 2.7.5 (default, Aug 25 2013, 00:04:04)
Type "copyright", "credits" or "license" for more information.

IPython 1.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
</code></pre>

Whenever issue pops up with `` easy_install `` and `` setuptools ``, remember using `` easy_install `` from `` site packages `` like `` sudo python /Library/Python/2.7/site-packages/easy_install.py ``

- [Why programmers should love to read and write ](https://kracekumar.com/post/67127612627/why-programmers-should-love-to-read-and-write/index.md): Why programmers should love to read and write 
+++
date = "2013-11-16 05:50:33+00:00"
draft = false
tags = ["programming", "writing"]
title = "Why programmers should love to read and write "
url = "/post/67127612627/why-programmers-should-love-to-read-and-write"
+++
Everyday as a programmer we solve problems and introduce new problems. Most of the time is spent in reading other people source code, library documentation, replying to developers email. Though communication is what programmers do all the time with computer and humans.

Programmers around the globe suggests books like <a href="https://mitpress.mit.edu/sicp/" target="_blank">SICP</a>. I have never come across people who suggests books like <a href="http://www.goodreads.com/book/show/53343.On_Writing_Well" target="_blank">on writing well</a> for programmers. Though I haven’t read the book myself, the point is why aren’t people recommending books like how to write well.

### Why writing and reading is core to programmers

* As a programmer most of the time we spend is reading documentation or code. what if the documentation of entire library is in single page (Unix style), how many would like to read ?


* More people read the documentation compared to source code. If you are a programmer and don’t like to write, hardly your tools will save others time. If you don’t enjoy writing or language is not as good as your code, library will have lukewarm acceptance.


* If you aren’t writing well you can’t work remotely, though you may get a chance to work in awesome startup in your area.


* Everybody spends equal time reading and writing email as compared to code. If your email is not well written, you will have tough time getting the work done.


* If you are coming with new specs for your product and if your writing isn’t lucid to read and not conveying the message, you will have hard time convincing though the design may be brilliant.


* Most programmers feel happy when they commit lot of code also, when code is merged to master. You should have same kind of happiness when you read technical documentation or write one.

- [taking rest](https://kracekumar.com/post/64983193790/taking-rest/index.md): taking rest

+++
date = "2013-10-24 20:59:00+00:00"
draft = false
tags = ["random"]
title = "taking rest"
url = "/post/64983193790/taking-rest"
+++
On my way back to home from work I was thinking, I should take rest. In my world rest always meant working on some non work codebase, reading a book, watching a movie. But I don’t want to do any of these. Then I thought I would just listen to songs, quickly remembered that will not work out because I will start coding.

Finally I started to ask myself, is it possible for any human being to sit idle and take rest. Immediately I gave up. Brain will start thinking about something or other, you will be pulled into it.

Since I have computer, mobile phone, physical book I consider switching between. Well, somepeople will talk to other so called human beings in real life or in phone. I can’t find any one whom I can ring up and speak without an agenda.

How would my forefathers relaxed 200 years back ?

Oh ya, you would catch up for a beer right? I don’t drink. I can’t meditate. I don’t live near beach. Well, I have a lake near by but it is closed after 7 PM but I reach at 10 PM.

I can’t draw, gym, paint, sing. Yes I can write poetry but I don’t want to do that as well. I don’t have a pet too.

Finally, it seems machines play key role in relaxing. Damn it.

- [Check Tamil word or sentence is palindrome](https://kracekumar.com/post/63834696015/check-tamil-word-or-sentence-is-palindrome/index.md): Check Tamil word or sentence is palindrome

+++
date = "2013-10-12 17:01:00+00:00"
draft = false
tags = ["python", "tamil", "palindrome"]
title = "Check Tamil word or sentence is palindrome"
url = "/post/63834696015/check-tamil-word-or-sentence-is-palindrome"
+++
How to check given text is palindrome or not

    def sanitize(text):
        for char in [" ", ".", ",", ";", "\n"]:
            text = text.replace(char, "")
        return text
    
    def palindrome(word):
        # This approach is o(n), problem can be solved with o(n/2)
        # I am using this approach for brevity
        return word == word[::-1]
    
    palindrome(sanitize("madam")) # True
    palindrome(sanitize(u"விகடகவி")) # False

Here is the hand made version for Tamil

    # Assign the variable meal the value 44.50 on line 3!
    # hex values from க..வ
    def sanitize(text):
        for char in [" ", ".", ",", ";", "\n"]:
            text = text.replace(char, "")
        return text
    
    dependent_vowel_range = range(0xbbe, 0xbce)
    
    def palindrome_tamil(text):
        front, rear = 0, len(text) - 1
        while True:
            #We will start checking from both ends
            #If code reached centre exit
            if front == rear or abs(front - rear) == 1:
                return True
            else:
                if ord(text[front+1]) in dependent_vowel_range and ord(text[rear]) in dependent_vowel_range:
                    if text[front] == text[rear-1] and text[front+1] == text[rear]:
                        front += 2
                        rear -= 2
                    else:
                        return False
                else:
                    if text[front] == text[rear]:
                        front += 1
                        rear -= 1
                    else:
                        return False
    
    
    
    
    print palindrome_tamil(sanitize(u"விகடகவி")) == True
    text = u"""
    யாமாமாநீ யாமாமா யாழீகாமா காணாகா
    காணாகாமா காழீயா மாமாயாநீ மாமாயா
    """
    print palindrome_tamil(sanitize(text)) == True
    
    #output
    True
    True

- [How Tamil Unicode works](https://kracekumar.com/post/63832712202/how-tamil-unicode-works/index.md): How Tamil Unicode works
+++
date = "2013-10-12 16:36:37+00:00"
draft = false
tags = ["unicode", "python", "Tamil"]
title = "How Tamil Unicode works"
url = "/post/63832712202/how-tamil-unicode-works"
+++
Tamil has 247 characters. No panic. It is simple. 12 uyir eluthu(அ,ஆ..ஔ), 18 mei eluthu(க்,ங்..) , 216 uyirmei eluthu(12 \* 18 க,ங ).1 ayutham(ஃ).

I assume you know what is unicode. If not read <a href="http://www.joelonsoftware.com/articles/Unicode.html" target="_blank">The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets</a> and then read wikipedia page. You will understand most of it. Back to the post.

Every character or letter in unicode has value called `` code point ``. This is similar to ASCII where value of `` a `` is 97. All code point value is represented in hexadecimal. Tamil unicode character range starts from `` 0B80 - 0BFF ``. Unicode consortium has a <a href="http://www.unicode.org/charts/PDF/U0B80.pdf" target="_blank">complete mappings</a>. So value of அ is `` 0B85 ``.

You don’t believe me ? I will show you the code. Fireup `` python2 `` console(I am using ipython).

    In [343]: print(unichr(0x0b85))
    அ

How to find hex of a character ?

    In [344]: print(hex(ord(u'க')))
    0xb95

What is the code point of கி?

    In [345]: print(hex(ord(u'கி')))
    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
    <ipython-input-345-9e02bf5958df> in <module>()
    ----> 1 print(hex(ord(u'கி')))

    TypeError: ord() expected a character, but string of length 2 found

Python says len of the unicode character `` கி `` is 2. Lets check.

    In [346]: print(len(u'கி'))
    2

    In [348]: print(len(unichr(0x0b85)))
    1

    In [349]: print(unichr(0x0b85))
    அ

So clearly `` கி `` is composed of two characters.`` கி ``=`` க் ``+ `` இ `` from language point of view. Lets see what program says.

    In [355]: for c in u'கி':
        print c, hex(ord(c))
       .....:
    க 0xb95
     ி 0xbbf

When you are typing in browser or editor, you use `` க `` + `` இ `` after that it is the work of font and others work to replace it.

Below is the javascript code executed in chrome console.

    chr = "கி"
    for (var i in chr){console.log(chr[i]);}
    க
    ி
    undefined

So irrespective of the programming language or console or browser unicode is stored in the same way.

`` ி ``is dependent vowel sign. Let me list all of them.

    In [362]: for c in range(0xbbe, 0xbce):
        print c, unichr(c)
       .....:
    3006 ா
    3007 ி
    3008 ீ
    3009 ு
    3010 ூ
    3011 ௃
    3012 ௄
    3013 ௅
    3014 ெ
    3015 ே
    3016 ை
    3017 ௉
    3018 ொ
    3019 ோ
    3020 ௌ
    3021 ்

`` ௅ `` is a placeholder and that hex value doesn’t have any symbol or character. Dependent vowel will be used for replacing uyir eluthu.

Now lets read character by character in a word .

    In [366]: for i in u'விகடகவி':
            print i, hex(ord(i))
       .....:
    வ 0xbb5
     ி 0xbbf
    க 0xb95
    ட 0xb9f
    க 0xb95
    வ 0xbb5
     ி 0xbbf

Well above logic holds true for english but not for Tamil.

    In [395]: word = u'விகடகவி'

    In [396]: dependent_vowel_range = range(0xbbe, 0xbce)

    In [397]: pos, stop = 0, len(word) - 1

    In [398]: while not pos >= stop:
        #check if next char is dependent vowel if so, print together
        if ord(word[pos+1]) in dependent_vowel_range:
            print word[pos], word[pos+1]
            pos += 2
        else:
            print word[pos]
            pos += 1
    வ ி
    க
    ட
    க
    வ ி

Now try to write a program that will find a tamil word is palindrome or not and see the fun.

Note: I need to figure font that will club dependent vowel and print properly in terminal.

- [code simplicity book review](https://kracekumar.com/post/63657996702/code-simplicity-book-review/index.md): code simplicity book review

+++
date = "2013-10-10 18:00:00+00:00"
draft = false
tags = ["book-review", "opensource", "codesimplicity"]
title = "code simplicity book review"
url = "/post/63657996702/code-simplicity-book-review"
+++
<a href="http://shop.oreilly.com/product/0636920022251.do" target="_blank">Code Simplicity: The Fundamentals of Software</a> by <a href="http://max.kanat-alexander.com/" target="_blank">Max Kanat-Alexander</a> is crisp book about writing better software.

The book is just 88 pages, but insights are useful for any software engineer. Author is creator of bugzilla project.

Whenever I come across bold statements in the book, I would sit and think for 2 minutes to correlate with my programming experience. Most of the time I was able to come up with valid example.

This book is highly recommended for programmers who had spent significant time reading other programmers code. If you are an experienced programmer you can consider the points in the books as to validate your current beliefs. I tend to agree on all of the points mentioned in the book and corelate with open source projects in positive and negative sense.

Few of my favorites:

*   Everybody who writes software is designer.
*   Design is not democracy. Decisions should be made by individuals.
*   There is something about future you don’t know.
*   Be only as generic as you know you need to be right now.

Rating: \*\*\*\*

- [coverage.py to test web application coverage without writing tests](https://kracekumar.com/post/61125040813/coveragepy-to-test-web-application-coverage/index.md): coverage.py to test web application coverage without writing tests

+++
date = "2013-09-13 17:50:32+00:00"
draft = false
tags = ["python", "coverage", "unit testing"]
title = "coverage.py to test web application coverage without writing tests"
url = "/post/61125040813/coveragepy-to-test-web-application-coverage"
+++
Tests are mandatory for packages, modules, web apps. If you are lazy to write unit tests but wanted to see the coverage for web app here is the short cut.

Lets consider a Python web app uses Flask, `` runserver.py `` runs the development web server. Normally server is started by the command `` python runserver.py ``. Now use `` coverage run runserver.py ``.

`` coverage `` python package will run the python program til program exits or user halts. Once web app testing is complete via browser, run `` coverage report -m `` this will produce big list of lines covered in modules during execution.

One more handy feature is `` coverage html ``. This produces html file containing python code highlighting executed line and other. All the html files is produced inside `` hmtlcov `` directory.

Coverage <a href="http://nedbatchelder.com/code/coverage/" target="_blank">documentation</a>.

## Sample Output

<pre><code>➜ hacknight git:(issue-249) ✗ coverage run --source=hacknight runserver.py
/Library/Python/2.7/site-packages/flask_cache/__init__.py:100:   UserWarning: Flask-Cache: CACHE_TYPE is set to null, caching is  effectively disabled.
warnings.warn("Flask-Cache: CACHE_TYPE is set to null, "
* Running on <a href="http://0.0.0.0:8100/" target="_blank">http://0.0.0.0:8100/</a>
* Restarting with reloader
/Library/Python/2.7/site-packages/flask_cache/__init__.py:100: UserWarning: Flask-Cache: CACHE_TYPE is set to null, caching is effectively disabled.
warnings.warn("Flask-Cache: CACHE_TYPE is set to null, "
127.0.0.1 - - [13/Sep/2013 23:08:31] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [13/Sep/2013 23:08:34] "GET /kracekumar HTTP/1.1" 200 -
127.0.0.1 - - [13/Sep/2013 23:08:36] "GET /kracekumar/jsfoo-hackday HTTP/1.1" 200 -
^C%                                                                                       
➜  hacknight git:(issue-249) ✗ coverage report -m
Name                           Stmts   Miss  Cover   Missing
------------------------------------------------------------
hacknight/__init__                28      0   100%
hacknight/_version                 3      0   100%
hacknight/forms/__init__           0      0   100%
hacknight/forms/comment           12      0   100%
hacknight/forms/event             42      5    88%   58-59, 62-72
hacknight/forms/participant       12      0   100%
hacknight/forms/profile            6      0   100%
hacknight/forms/project            9      0   100%
hacknight/forms/sponsor            9      0   100%
hacknight/forms/venue             16      0   100%
hacknight/models/__init__         10      0   100%
hacknight/models/_profile         21     21     0%   3-37
hacknight/models/comment          61     18    70%   34-35, 41, 71-72, 78-91, 95, 98
hacknight/models/event           110     43    61%   45, 48-51, 84, 87-88, 91-93, 96-97, 100-109, 112-133
hacknight/models/participant      31      5    84%   34-37, 41
hacknight/models/project          72     31    57%   42-46, 50, 55, 58, 63, 68-85, 101-105
hacknight/models/sponsor          17      2    88%   25-26
hacknight/models/user             21      5    76%   20, 24, 28, 32, 35
hacknight/models/venue            21      2    90%   26-27
hacknight/models/vote             38     16    58%   17-18, 21-30, 33-36, 39
hacknight/tests/__init__           0      0   100%
hacknight/tests/test_data          3      3     0%   3-7
hacknight/tests/test_models       17     17     0%   4-93
hacknight/views/__init__           8      0   100%
hacknight/views/event            244    192    21%   28-33, 41-56, 70-87, 97-116, 131, 140-145, 155-182, 191-222, 231-260, 269-276, 285-301, 310-313, 325-344, 354-373
hacknight/views/index             36     23    36%   13-15, 20, 25, 30, 35, 40, 45-53, 58-65
hacknight/views/login             28     12    57%   15, 21-22, 28-30, 36-37, 42-45
hacknight/views/profile           36     22    39%   17-23, 30-47
hacknight/views/project          323    256    21%   30-49, 61-73, 85-108, 118, 127-240, 252-259, 270-277, 288-295, 306-315, 327-336, 346-357, 369-378, 387-397, 406-416, 426-440, 451-464, 475-501, 514
hacknight/views/sponsor           45     26    42%   18-30, 42-53, 65-68, 81
hacknight/views/venue             47     29    38%   15-16, 22, 28-38, 45-56, 64-66
hacknight/views/workflow          84     30    64%   42-46, 52, 57, 62, 67, 72, 77, 80, 114-119, 127-128, 136, 144, 152, 160, 168, 176, 182, 188, 194, 200, 206, 209
------------------------------------------------------------
TOTAL                           1410    758    46%
</code></pre>

- [Observations from handling python workshop in engineering colleges](https://kracekumar.com/post/58802831892/observations-from-handling-python-workshop-in/index.md): Observations from handling python workshop in engineering colleges
+++
date = "2013-08-20 18:06:00+00:00"
draft = false
tags = ["python", "tip", "training", "teaching", "college"]
title = "Observations from handling python workshop in engineering colleges"
url = "/post/58802831892/observations-from-handling-python-workshop-in"
+++
_Observations from handling python workshop in engineering colleges_

I handled 5 python workshop/sessions for novices from 8th march to 20th august, each one stretching from 1 hour to 2 days. It was worth the time.

Students who participated in the workshop where from Computer Science, Electronics background of Under Graduate and Post Graduate level. Minimum strength was 60 and maximum was close to 100.

1.   When handling the strength of 60 students in lab, remember every one have their own pace of picking up.
2.   Distribute the material in html or pdf format so students can look into it. Some students in the workshop start doing the examples as soon as they receive the material and these students will learn on their own, now you have segmented the students who need instructor support. These students also help their friends when stuck.
3.   Go slow and repeat twice every concept. Students from engineering college use C, C++. Python’s ease is difficult to digest at first.
4.   Make sure you don’t stand near your laptop for more than 20 minutes. Teach topic, show the example and move around so the students can approach you and you will get to know the difficulties of students. Not everyone will ask questions.
5.   Give problems for them to solve and don’t give problems which takes more than 10 minutes to solve. Spend the time in front of students terminal and help the struggling students. At the end of 10 minutes you will know how the students approached the problem and also you can get an insight how much students grasped. This gives you the signal whether your explanation understandable or not.
6.   Write the code for the problem in front of the students, please don’t show already written code. Discuss what are the approaches for the same problem and how others students solved.
7.   Having few volunteers to help during workshops is great. Students will start approaching them for hurdles.
8.   Don’t flood students with too much of data in a single day. Make sure workshop is for only 6 - 7 hours per day. They need to time to digest.
9.   When you are teaching list comprehensions make sure students write the same example using for loop and show them the one liner. Here most students gets confused with syntax. Now give them more problems to solve using list comprehension.
10.   Be careful in using variable names, students will use the same variable names in their code.
11.   Don’t teach `` classes `` for beginners, you will waste lot of time explaining public, private method, `` __init__ ``, `` self ``. Instead use the time to solve problems.
12.   Spend enough time in writing small programs(use text editor) using `` if ``, `` else ``, `` elif ``, `` for `` so that they get used to indentation.
13.   Give problems like greatest of three numbers to show them the use case of `` a > b > c `` rather than using `` a > b and b > c ``.
14.   Give problems like finding total number of lines, words in a file. This helps in getting rid of `` for loop with counter ``, rather encourages to use `` len(f.readlines()) ``.
15.   Don’t teach `` *args ``, `` **kwargs `` but spend time in making them understand function can accept functions as parameters, so it becomes easy for them to digest `` len(f.readlines()) ``.
16.   Make sure to teach `` dir `` and `` help `` this helps people who are interested to explore further.
17.   If you want to enforce pythonic way of writing code like list comprehension, passing function to function show few examples comparing pythonic and non pythonic way. Advocate the advantage.
18.   Leave your email id with students and collect the feedback via google forms or physical form make sure it is anonymous.
19.   Students will ask recommendation for books, projects etc … Be prepared to handle.
20.   Don’t teach `` raw_input `` or `` input `` teach them how to accept command line parameters.

`` There's no such thing as a bad student, only a bad teacher `` - Unknown

- [avoid accessing wrong column in csv](https://kracekumar.com/post/58766042678/avoid-accessing-wrong-column-in-csv/index.md): avoid accessing wrong column in csv

+++
date = "2013-08-20 05:31:00+00:00"
draft = false
tags = ["csv", "python", "tip"]
title = "avoid accessing wrong column in csv"
url = "/post/58766042678/avoid-accessing-wrong-column-in-csv"
+++
_Avoid accessing wrong column in csv_

I was parsing few csv files. All were from same provider. Format is`` Date, Name, Email, Company, City `` etc in one file. I made an assumption all the downloaded files are in the same format. For my surprise few files had same format and others din’t.

    with open(filename, 'rb') as f:
        reader = csv.reader(f)
        reader.next() # first row contains column names
        for row in reader:
            name, email, company = row[1], row[2], row[3]
            #save to db

In the above code the fundamental mistake is fixing the position of the columns in csv file. What happens if `` email `` and `` company `` position is interchanged? In simple `` screwed ``.

_How can I avoid these situations ?_

     with open(filename, 'rb') as f:
         reader = csv.reader(f)
         header = reader.next() #reader is not list
         name_pos, email_pos, company_pos =          header.index("Name"),header.index("Email"),header.index("Company")
    
     with open(filename, 'rb') as f:
         reader = csv.reader(f)
         reader.next()
         for row in reader:
             name, email, company = row[name_pos], row[email_pos], row[company_pos]
             #save to db

The code is bit longer here, but robost. The above snippet doesn’t rely on column position but only relies on column name in the first line. Ofcourse `` header.index `` will raise `` ValueError `` you can wrap the exception. First all the required column positions are retrieved, file is reopened, first row is omitted and other details are accessed.

- [Funny experience of using trace module to trace function call](https://kracekumar.com/post/56633639978/funny-experience-of-using-trace-module-to-trace/index.md): Funny experience of using trace module to trace function call
+++
date = "2013-07-27 21:35:32+00:00"
draft = false
tags = ["python", "trace", "httpie"]
title = "Funny experience of using trace module to trace function call"
url = "/post/56633639978/funny-experience-of-using-trace-module-to-trace"
+++
I came across this issue in <a href="https://github.com/jkbr/httpie/issues/128" target="_blank">httpie</a> and started my investigation.

The problem is while pretty printing the json, output is alpha sorted because keys are hashed and user wanted to preserve the order. Then I made 3 comments to the issue. <a href="https://github.com/jkbr/httpie/issues/128#issuecomment-21661339" target="_blank">First comment</a> was half correct and explained why it isn’t possible to get the desired output, quickly I figured my assumptions were wrong and <a href="https://github.com/jkbr/httpie/issues/128#issuecomment-21661648" target="_blank">second comment</a> explained what is actually happening, finally I proposed the <a href="https://github.com/jkbr/httpie/issues/128#issuecomment-21661700" target="_blank">solution</a>. Since I made wrong assumptions and to make further debugging easy, I want to find easiest way to trace all functions/methods invocations.

I remember <a href="http://nibrahim.net.in/" target="_blank">Noufal</a> sharing small snippet in twitter, but I wanted to use <a href="http://docs.python.org/2/library/trace.html" target="_blank">trace</a> and wrote small snippet(final one).

     #! -*- coding: utf-8 -*-

     from httpie.core import main
     #import pdb
     import sys
     #import os
     #import trace

     #pdb.set_trace()
     main(args=sys.argv[1:])

First I ran the command <code>python -m trace --trace httpie_test.py --pretty=all <a href="http://httpbin.org/get" target="_blank">http://httpbin.org/get</a></code> and it printed some 20 million lines. Then I got clever idea of using <a href="http://pythonconquerstheuniverse.wordpress.com/2009/09/10/debugging-in-python/" target="_blank">pdb</a>.

I debugged with pdb for half an hour pressing `` s `` key and fed up. The pdb was beautiful like her, I was enjoying each line it was printing, it was like watching her speak and I was mesemerized. After half an hour I gave up and went back to trace command. Finally I figured out I can use `` --ignore-module `` from command line.

After one hour of spending time, final command looked like(scroll completely)

<pre><code> ➜  snippets  python -m trace --trace --ignore- module=os,sre_compile,sre_parse,zipfile,text_file,sysconfig,pkg_resources,re,posixpath,genericpath,decoder,hex_codec,socket,httplib,pkgutil,stat,token,style,calendar,spawn,util,collections,abc,decimal,lexer,StringIO,plist,argparse,_abcoll,structures,Queue,threading,_collections,urlparse,cookielib,platform,cookielib,terminal256,formatter,__init__,stringprep,_weakrefset,filter,encoder,codecs,latin,connectionpool,plugin,html,pygmentplugin,htmlentitydefs,__future__,weakref,UserDict,atexit,functools,base64,struct,hashlib,ssl,textwrap,six,exceptions,warnings,mimetools,tempfile,random,tempfile,rfc822,urllib,filepost,uuid,_endian,dyld,dylib,framework,io,poolmanager,pyopenssl,utils,cgi,netrc,shlex,compat,copy,numbers,locale,locale,unicode_escape,urllib2,Cookie,_LWPCookieJar,_MozillaCookieJar,ordered_dict,cookies,certs,ascii,status_codes,gettext,scanner,config,minicompat,domreg,minidom,xmlbuilder,NodeFilter,shutil,copy_reg,string,plistlib,_mapping,bbcode,img,Image,FixTk,ImageMode,ImagePalette,ImageColor,ImageDraw,ImageFont,latex,other,console,rtf,svg,terminal,solarized,getpass,pprint,downloads,idna,agile,web,unistring,functional,jvm,compiled,mimetypes httpie_test.py --pretty=all --verbose <a href="http://headers.jsontest.com/" target="_blank">http://headers.jsontest.com/</a> >> op.txt
 ➜  snippets  wc -l op.txt
     2344 op.txt
</code></pre>

The essay is 1269 characters to terminal and the result of it is <a href="https://github.com/jkbr/httpie/pull/151" target="_blank">pull request</a>.

- [http request examples for luasocket](https://kracekumar.com/post/55856724724/http-request-examples-for-luasocket/index.md): http request examples for luasocket

+++
date = "2013-07-19 08:01:00+00:00"
draft = false
tags = ["lua", "http", "luasocket"]
title = "http request examples for luasocket"
url = "/post/55856724724/http-request-examples-for-luasocket"
+++
I was looking for `` http `` library in lua and landed in <a href="http://w3.impa.br/~diego/software/luasocket/http.html" target="_blank">luasocket.http page</a>. It isn’t well documented, sent few `` GET, POST, PUT `` requests and figured few bits. This blog post aims in bridging the gap(code examples).

In this example, I will use <a href="https://httpbin.org" target="_blank">httpbin</a> as target site. The complete code is available as <a href="https://gist.github.com/kracekumar/6037243" target="_blank">gist</a>.

The following code should be executed like a standalone lua file(`` lua lua_httpbin.lua ``) and while executing the code in interpreter please make local variables `` http, ltn12, base_url `` as global variables.

<pre><code>local http = require("socket.http")
local ltn12 = require("ltn12")

local base_url = "https://httpbin.org/"

--Helper for priniting nested table
function deep_print(tbl)
    for i, v in pairs(tbl) do
        if type(v) == "table" then 
            deep_print(v)
        else 
            print(i, v) 
        end
    end
end

function http_request( args )
--http.request(url [, body])
--http.request{
--  url = string,
--  [sink = LTN12 sink,]
--  [method = string,]
--  [headers = header-table,]
--  [source = LTN12 source],
--  [step = LTN12 pump step,]
--  [proxy = string,]
--  [redirect = boolean,]
--  [create = function]
--}
--
--
    local resp, r = {}, {}
    if args.endpoint then
        local params = ""
        if args.method == nil or args.method == "GET" then
            -- prepare query parameters like <a href="http://xyz.com" target="_blank">http://xyz.com</a>?q=23&amp;a=2
            if args.params then
                for i, v in pairs(args.params) do
                    params = params .. i .. "=" .. v .. "&amp;"
                end
            end
        end
        params = string.sub(params, 1, -2)
        local url = ""
        if params then url = base_url .. args.endpoint .. "?" .. params else url = base_url .. args.endpoint end
        client, code, headers, status = http.request{url=url, sink=ltn12.sink.table(resp),
                                                method=args.method or "GET", headers=args.headers, source=args.source,
                                                step=args.step,     proxy=args.proxy, redirect=args.redirect, create=args.create }
        r['code'], r['headers'], r['status'], r['response'] = code, headers, status, resp
    else
        error("endpoint is missing")
    end
    return r
end
</code></pre>

`` http_request `` takes table as an argument which is almost similar to `` http.request `` function with an extra parameter `` params `` which will be helpful for passing parameters to `` GET `` request.

`` http_request{endpoint=endpoint, params={age=23, name="kracekumar"}} `` rather than `` http_request{endpoint=endpoint .. "?age=23&amp;name=kracekumar"} ``.

Remaining piece of code

    function main()
        -- Normal GET request
        endpoint = "/user-agent"
        print(endpoint)
        deep_print(http_request{endpoint=endpoint})
        -- GET request with parameters
        endpoint = "/get"
        print(endpoint)
        deep_print(http_request{endpoint=endpoint, params={age=23, name    ="kracekumar"}})
        -- POST request with form
        endpoint = "/post"
        print(endpoint)
        local req_body = "a=2"
        local headers = {
        ["Content-Type"] = "application/x-www-form-urlencoded";
        ["Content-Length"] = #req_body;
        }
        deep_print(http_request{endpoint=endpoint, method="POST", source=ltn12.source.string(req_body), headers=headers})
        -- PATCH Method
        endpoint = "/patch"
        print(endpoint)
        deep_print(http_request{endpoint=endpoint, method="PATCH"})
        -- PUT Method
        endpoint = "/put"
        print(endpoint)
        deep_print(http_request{endpoint=endpoint, method="PUT", source    =ltn12.source.string("a=23")})
        -- Delete method
        endpoint = "/delete"
        print(endpoint)
        deep_print(http_request{endpoint=endpoint, method="DELETE",     source=ltn12.source.string("a=23")})
    
    end
    
    main()

_output_

    # First lua is directory name 
    ➜  lua  lua lua_httpbin.lua
    /user-agent
    status    HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:32:14 GMT
    content-length  37
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
      "user-agent": "LuaSocket 2.0.2"
    }
    /get
    status  HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:32:15 GMT
    content-length  258
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
      "headers": {
        "Host": "httpbin.org",
        "Connection": "close",
        "User-Agent": "LuaSocket 2.0.2"
      },
      "url": "http://httpbin.org/get?age=23&amp;name=kracekumar",
      "args": {
        "age": "23",
        "name": "kracekumar"
      },
      "origin": "106.51.166.47"
    }
    /post
    status    HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:41:54 GMT
    content-length  350
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
        "json": null,
        "origin": "106.51.166.47",
        "form": {
            "a": "2"
        },
        "url": "http://httpbin.org/post",
        "args": {},
        "headers": {
        "Connection": "close",
        "Host": "httpbin.org",
        "User-Agent": "LuaSocket 2.0.2",
        "Content-Type": "application/x-www-form-urlencoded",
        "Content-Length": "3"
        },
    "files": {},
    "data": ""
    }
    /patch
    status  HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:32:17 GMT
    content-length  251
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
      "files": {},
      "headers": {
        "Host": "httpbin.org",
        "Connection": "close",
        "User-Agent": "LuaSocket 2.0.2"
      },
      "args": {},
      "json": null,
      "form": {},
      "origin": "106.51.166.47",
      "url": "http://httpbin.org/patch",
      "data": ""
    }
    /put
    status  HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:32:18 GMT
    content-length  276
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
      "json": null,
      "origin": "106.51.166.47",
      "form": {},
      "url": "http://httpbin.org/put",
      "args": {},
      "headers": {
        "Connection": "close",
        "Host": "httpbin.org",
        "User-Agent": "LuaSocket 2.0.2",
        "Content-Length": "0"
      },
      "files": {},
      "data": ""
    }
    /delete
    status  HTTP/1.1 200 OK
    code    200
    connection  Close
    content-type    application/json
    date    Fri, 19 Jul 2013 07:32:19 GMT
    content-length  223
    access-control-allow-origin *
    server  gunicorn/0.17.4
    1   {
      "args": {},
      "json": null,
      "headers": {
        "Connection": "close",
        "User-Agent": "LuaSocket 2.0.2",
        "Host": "httpbin.org"
      },
      "origin": "106.51.166.47",
      "data": "",
      "url": "http://httpbin.org/delete"
    }

_My 2p_

1.   When sending `` GET `` request don’t send paramters in request body.
2.   When sending `` POST `` request set custom header for form.

My experience of `` luasocket.http `` produces solid reason to create a new library with helper functions like <a href="http://python-requests.org" target="_blank">requests</a>.

- [Why This Kolaveri Di song words language ](https://kracekumar.com/post/55796363713/why-this-kolaveri-di-song-words-language/index.md): Why This Kolaveri Di song words language 

+++
date = "2013-07-18 18:26:20+00:00"
draft = false
tags = ["python", "why-this-kolaveri-di", "tanglish", "tamil"]
title = "Why This Kolaveri Di song words language "
url = "/post/55796363713/why-this-kolaveri-di-song-words-language"
+++
I was wondering how many words in <a href="https://www.youtube.com/watch?v=JJpCV31n7T8" target="_blank">why this kolaveri di</a> song belongs to english. So I wrote this code to evaluate.

    #! /usr/bin/env
    #! -*- coding: utf-8 -*-
    
    lyrics = """
    yo boys i am singing song
    soup song
    flop song
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    rhythm correct
    why this kolaveri kolaveri kolaveri di
    maintain please
    why this kolaveri di
    
    distance la moon-u moon-u
    moon-u color-u white-u
    white background night-u night-u
    night-u color-u black-u
    
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    
    white skin-u girl-u girl-u
    girl-u heart-u black-u
    eyes-u eyes-u meet-u meet-u
    my future dark
    
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    
    maama notes eduthuko
    apdiye kaila snacks eduthuko
    pa pa paan pa pa paan pa pa paa pa pa paan
    sariya vaasi
    super maama ready
    ready 1 2 3 4
    
    whaa wat a change over maama
    
    ok maama now tune change-u
    
    kaila glass
    only english
    
    hand la glass
    glass la scotch
    eyes-u full-a tear-u
    empty life-u
    girl-u come-u
    life reverse gear-u
    love-u love-u
    oh my love-u
    you showed me bouv-u
    cow-u cow-u holy cow-u
    i want you hear now-u
    god i am dying now-u
    she is happy how-u
    
    this song for soup boys-u
    we dont have choice-u
    
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    why this kolaveri kolaveri kolaveri di
    
    flop song
    """
    dict_file_path = "/usr/share/dict/words"
    
    
    def sanitize(words):
        for index, word in enumerate(words):
            if word.endswith("-u") or word.endswith("-a"):
                words[index] = word[:-2]
    
    
    if __name__ == "__main__":
        # Get all words
        words = [word for line in lyrics.split("\n") for word in line.split(" ") if word != ""]
        # Load english words
        dictionary_words = open(dict_file_path).readlines()
        # Remove \n in dictionary words
        dictionary_words = [word.split("\n")[0] for word in dictionary_words]
        # Add missing words
        dictionary_words.append("boys")
        dictionary_words.append("snacks")
        dictionary_words.append("eyes")
        dictionary_words.append("english")
        dictionary_words.append("1")
        dictionary_words.append("2")
        dictionary_words.append("3")
        dictionary_words.append("4")
        dictionary_words.append("notes")
        dictionary_words.append("ok")
        dictionary_words.append("showed")
        # Remove -u which sounds like Tamil words
        sanitize(words)
        # Find unique words
        unique_words = set(words)
        # Find english words
        eng_words = [word for word in unique_words if word in dictionary_words]
        non_eng_words = unique_words - set(eng_words)
        # Remove empty element
        non_eng_words = [word for word in non_eng_words if word != ""]
        print("==English Words==")
        print(eng_words)
        print("==Non English Words==")
        print(non_eng_words)
        print("Total unique words: %d,\n English words: %d,\n Non English words: %d,\n percentage of english words: %f" % (len(unique_words), len(eng_words), len(non_eng_words), float(len(eng_words))/len(unique_words) * 100))

_Output_

    ➜  lua  python why_this_kolaveri_di.py
    ==English Words==
    ['over', 'skin', 'la', 'only', 'black', '4', 'rhythm', 'yo', 'di', 'choice', 'dark', 'background', '2', 'now', 'tear', 'notes', 'she', 'night', 'girl', 'for', 'god', 'please', 'moon', '3', 'correct', 'we', 'full', 'how', 'super', 'change', 'ok', 'reverse', 'cow', 'oh', 'love', 'dont', 'color', 'singing', 'come', 'pa', 'white', 'wat', 'empty', 'happy', 'eyes', 'gear', 'holy', 'boys', 'hear', 'me', 'distance', 'showed', 'this', 'soup', 'future', 'meet', 'my', 'heart', 'have', 'snacks', 'is', 'am', 'want', 'ready', 'dying', 'song', '1', 'you', 'hand', 'why', 'tune', 'a', 'glass', 'i', 'scotch', 'flop', 'life', 'maintain', 'english']
    ==Non English Words==
    ['kaila', 'sariya', 'paa', 'apdiye', 'eduthuko', 'vaasi', 'maama', 'whaa',  'bouv', 'paan', 'kolaveri']
    Total unique words: 90,
    English words: 79,
    Non English words: 11,
    Percentage of english words: 87.777778

It turns out, song contains `` 90 `` unqiue words, `` 79 `` words are `` english `` and `` 11 `` are `` non english ``(Tamil). `` 87.8% `` words are english. So here after call `` why this kolaveri di `` song as `` Tanglish(Tamil + English) `` song.

Gist: <a href="https://gist.github.com/kracekumar/6031683" target="_blank">https://gist.github.com/kracekumar/6031683</a>

- [SSL for flask local development ](https://kracekumar.com/post/54437887454/ssl-for-flask-local-development/index.md): SSL for flask local development 
+++
date = "2013-07-02 16:50:39+00:00"
draft = false
tags = ["python", "ssl", "https", "flask"]
title = "SSL for flask local development "
url = "/post/54437887454/ssl-for-flask-local-development"
+++
Recently at <a href="https://blog.hasgeek.com/2013/https-everywhere-at-hasgeek" target="_blank">HasGeek</a> we moved all our web application to `` https ``. So I wanted to have all my development environment urls to have `` https ``.

_How to have `` https `` in <a href="http://flask.pocoo.org" target="_blank">flask</a> app_

_Method 1_

    from flask import Flask
    app = Flask(__name__)
    app.run('0.0.0.0', debug=True, port=8100, ssl_context='adhoc')

In the above piece of code, `` ssl_context `` variable is passed to `` werkezug.run_simple `` which creates SSL certificates using `` OpenSSL ``, you may need to install `` pyopenssl ``. I had issues with this method, so I generated self signed certificate.

_Method 2_

*   Generate a private key
    ` openssl genrsa -des3 -out server.key 1024 `


*

    Generate a CSR
    ` openssl req -new -key server.key -out server.csr `


*

    Remove Passphrase from key
    `cp server.key server.key.org `
    `openssl rsa -in server.key.org -out server.key `


*

    Generate self signed certificate
    `openssl x509 -req -days 365 -in server.csr -signkey server.key -out server.crt `



_code_

    from flask import Flask
    app = Flask(__name__)
    app.run('0.0.0.0', debug=True, port=8100, ssl_context=('/Users/kracekumarramaraju/certificates/server.crt', '/Users/kracekumarramaraju/certificates/server.key'))

`` ssl_context `` can take option `` adhoc `` or `` tuple of the form (cert_file, pkey_file) ``.

In case you are using `` /etc/hosts ``, add something similar <code>127.0.0.1    <a href="https://hacknight.local" target="_blank">https://hacknight.local</a></code> for accessing the web application locally.

Here is much detailed post about generating self signed <a href="http://www.akadia.com/services/ssh_test_certificate.html" target="_blank">certificates</a>.

- [cp command implementation and benchmark in python, go, lua](https://kracekumar.com/post/53685731325/cp-command-implementation-and-benchmark-in-python/index.md): cp command implementation and benchmark in python, go, lua

+++
date = "2013-06-23 17:20:35+00:00"
draft = false
tags = ["cp", "go-lang", "python", "gevent", "lua", "rsync"]
title = "cp command implementation and benchmark in python, go, lua"
url = "/post/53685731325/cp-command-implementation-and-benchmark-in-python"
+++
I was wondering how much will be the speed difference between `` cp `` command, `` rsync `` and implementation in `` python ``, `` go ``, `` lua `` and so wrote this <a href="https://github.com/kracekumar/cp-tests" target="_blank">code</a>.

_Background_

1.   `` python `` has two versions one with `` gevent `` and without `` gevent ``. Both the version uses `` shutil `` for copying files and directory tree.
2.   `` go `` uses <a href="https://github.com/opesun/copyrecur" target="_blank">https://github.com/opesun/copyrecur</a> for copying recursively.
3.   `` lua `` uses `` lfs - LuaFileSystem `` module. `` lfs `` has support for creating directory but not for files, in order to copy the files low level file opening and writing to file technique is used.
4.   `` rsync --progress -ah -R `` was also added to the test.

_Code_

Directory chosen for the test has `` 28 `` repos which are basically python projects, go repos with git version control. Total size of the directory `` /Users/kracekumarramaraju/code `` is `` 300M ``(du -sh /Users/kracekumarramaraju/code) and destination is external disk which supports `` USB3.0 ``.

`` Python without gevent ``

    import sys
    import shutil
    
    
    def cp(source, dest):    
        shutil.copytree(source, dest)
    
    
    if __name__ == "__main__":
        if len(sys.argv) != 3:
            print("Help")
            print("python cp.py source dest is the format")
            sys.exit(1)
        cp(sys.argv[1], sys.argv[2])

`` Python with gevent support ``

    import sys
    import os
    import shutil
    import gevent
    
    
    def cp(source, dest):
        shutil.copytree(source, dest)
    
    
    def cpfile(source, dest):
        shutil.copy2(source, dest)
    
    
    if __name__ == "__main__":
        if len(sys.argv) != 3:
            print("Help")
            print("python cp.py source dest is the format")
            os.exit(1)
        source, dest = sys.argv[1], sys.argv[2]
        os.mkdir(dest)
        tasks = []
        for name in os.listdir(source):
            source_path, dest_path = os.path.join(source, name), os.path.join(dest, name)
            if os.path.isdir(source_path):
                tasks.append(gevent.spawn(cp, source_path, dest_path))
            else:
                tasks.append(gevent.spawn(cpfile, source_path, dest_path))
        gevent.joinall(tasks)

`` go ``

    package main
    
    import (
        "fmt"
        "github.com/opesun/copyrecur"
        "log"
        "os"
    )
    
    func cp(source, dest string) {
        err := copyrecur.CopyDir(source, dest)
        if err != nil {
            log.Fatal(err)
        } else {
            log.Print("Files copied.")
        }
    }
    
    func main() {
        if len(os.Args) != 3 {
            fmt.Println("Syntax: go run cp.go source destination")
            os.Exit(1)
        }
        cp(os.Args[1], os.Args[2])
        fmt.Println("cp.go completed")
    }

`` lua ``

    require "lfs"
    
    
    function cp(source, dest)
        -- body
        for filename in lfs.dir(source) do
            if filename ~= '.' and filename ~= '..' then
                local source_path = source .. '/' .. filename
                local attr = lfs.attributes(source_path)
                --print(attr.mode, path)
                if type(attr) == "table" and attr.mode == "directory" then 
                    local dest_path = dest .. "/" .. filename
                    lfs.mkdir(dest_path)
                    cp(source_path, dest_path)
                else
                    local f = io.open(source_path, "rb")
                    local content = f:read("*all")
                    f:close()
                    local w = io.open(dest .. "/" .. filename, "wb")
                    w:write(content)
                    w:close() 
                end
            end
        end
    end
    
    if #arg == 2 then
        cp(arg[1], arg[2])
    else
        print("Syntax:")
        print("lua lua.go source dest")
    end

`` rsync --progress -ah -R ``.  
 Plain `` cp `` command.

_Tests_

Shell <a href="https://github.com/kracekumar/cp-tests/blob/master/test.sh" target="_blank">script</a> to run the tests.

    echo "source directory size"
    du -sh /Users/kracekumarramaraju/code
    echo "cp.py - Python without gevent"
    time python cp.py /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/1
    du -sh /Volumes/My\ Passport/test/1
    echo "cp-gevent.py - Python with gevent"
    time python cp-gevent.py /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/2
    du -sh /Volumes/My\ Passport/test/2
    echo "alias cp='rsync --progress -ah' - Rsync"
    time cp -R /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/3
    du -sh /Volumes/My\ Passport/test/3
    echo "Plain cp command"
    time /bin/cp -R /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/4
    du -sh /Volumes/My\ Passport/test/4
    echo "cp.go - cp in Go lang"
    time go run cp.go /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/5
    du -sh /Volumes/My\ Passport/test/5
    echo "cp.lua - cp in lua"
    time lua cp.lua /Users/kracekumarramaraju/code /Volumes/My\ Passport/test/6
    du -sh /Volumes/My\ Passport/test/6

_Results_

    ➜  cp-tests  ./test.sh
    source directory size
    300M    /Users/kracekumarramaraju/code
    cp.py - Python without gevent
    
    real    1m23.354s
    user    0m1.818s
    sys 0m5.032s
    302M    /Volumes/My Passport/test/1
    cp-gevent.py - Python with gevent
    
    real    1m24.212s
    user    0m1.772s
    sys 0m4.748s
    302M    /Volumes/My Passport/test/2
    alias cp='rsync --progress -ah' - Rsync
    
    real    1m21.145s
    user    0m0.230s
    sys 0m5.172s
    302M    /Volumes/My Passport/test/3
    Plain cp command
    
    real    1m24.065s
    user    0m0.232s
    sys 0m5.174s
    302M    /Volumes/My Passport/test/4
    cp.go - cp in Go lang
    2013/06/23 21:04:38 Files copied.
    cp.go completed
    
    real    1m27.786s
    user    0m1.106s
    sys 0m3.369s
    302M    /Volumes/My Passport/test/5
    cp.lua - cp in lua
    
    real    1m19.340s
    user    0m1.905s
    sys 0m3.893s
    302M    /Volumes/My Passport/test/6

_Conclusion_

1.   Surprisingly `` lua `` was fastest with `` 1m19.340s `` and next one was `` rsync `` with `` 1m21.145s ``.
2.   Slowest one was `` go `` with `` 1m27.786s ``, I expected it to be faster than `` python gevent ``. Probably extra time was due to compiling go code.
3.   `` Python `` non `` gevent `` version took `` 1m23.354s `` and `` gevent `` version took `` 1m24.212s ``, `` gevent `` version spent less time in `` user `` and `` system `` space.
4.   `` cp `` command took `` 1m24.065s ``, which was second slowest.
5.   Since the test was basically I/O there ins’t much difference in speed of all versions.

_Further work_

1.   Benchmark 1GB single file transfer using `` lua `` and `` rsync ``.
2.   Add all features of cp command to any one of the implementation and bench mark.

- [Reliance filed 420 case against me in Delhi consumer court](https://kracekumar.com/post/53582319439/reliance-filed-420-case-against-me-in-delhi/index.md): Reliance filed 420 case against me in Delhi consumer court

+++
date = "2013-06-22 08:11:00+00:00"
draft = false
tags = ["reliance", "consumercourt"]
title = "Reliance filed 420 case against me in Delhi consumer court"
url = "/post/53582319439/reliance-filed-420-case-against-me-in-delhi"
+++
_22, June 2013_

I woke up like any other morning thinking of closing a github issue.I went out to pick up food items for the weekend, once I was back my friend said “ You got a call on Tamil Nadu number”.

I reverted the call and the person said “I am calling from Delhi consumer court, confirmed my name, address where I stayed in feb, 2009, added RC Gowtham has filled a 420 case against you(dc801/2013, couldn’t find status in <a href="http://164.100.72.12/ncdrcusersWeb/login.do?method=caseStatus" target="_blank">http://164.100.72.12/ncdrcusersWeb/login.do?method=caseStatus</a>) and gave his phone number”.

I called to RC Gowtham and asked for details. Reason is I haven’t paid my Reliance Bill and he gave me Reliance ID. He hung the call saying “Case will be in Delhi consumer court on monday”.

Then I rushed to near by Reliance communication and enquired the details against the reliance id. I heard the whooping amount of 4435 is pending but actual bill must be 500+.

Then I spoke to customer care found out the bill was generated on february 2009, amount was 540. Now with the penalty I need to pay 4435.

I remember paying the bill and properly terminating the connection by returning the equipment in Peelamedu, Coimbatore. In March(update: 06/02/2009) we shifted our house.

_1.00 PM_

I spoke to the person from Delhi who called me up earlier. I am supposed to pay the amount by 3.00 PM and pass on the bill details else I have to fight the case in Delhi consumer court. Seems the person who called me might be hired by Reliance.

_Question?_

1.   Why hasn’t reliance called to me in last 4 years to collect the bill ? I haven’t changed my number in past 6 years.
2.   Is there a way I can fight this ?
3.   How can a person from Coimbatore, living in Bangalore will fight a case in Delhi court ?

Apart from paying the money, I don’t have any option to exit from this problem. Any thoughts ?

_Updates_

1.   I have complaint filed against disconnection on 4th february 2009, complaint no: 88025881.
2.   Customer care couldn’t find the status or resolution for the complaint since its 4 year old.
3.   I have reopened the thread. Hope this time I get reply.
4.   At 3.05 PM I got the call from the person and I said “My lawyer will handle this”. He replied “case is on monday morning”.

- [progress bar for cp command](https://kracekumar.com/post/52958861147/progress-bar-for-cp-command/index.md): progress bar for cp command

+++
date = "2013-06-14 17:48:41+00:00"
draft = false
tags = ["linux", "command line"]
title = "progress bar for cp command"
url = "/post/52958861147/progress-bar-for-cp-command"
+++
I want `` cp `` command to display the progress of copy. I found three options

1.   Use tools like `` gcp ``, `` pv ``. Answer from <a href="http://askubuntu.com/questions/17275/progress-and-speed-with-cp" target="_blank">askubuntu</a>.
2.   Use `` rsync ``. `` alias cp='rsync --progress -ah' ``
3.   Use this <a href="https://chris-lamb.co.uk/posts/can-you-get-cp-to-give-a-progress-bar-like-wget" target="_blank">shell script</a>.

- [Little spoof of Kannukku Mai Azhagu lyrics ](https://kracekumar.com/post/52686134804/little-spoof-of-kannukku-mai-azhagu-lyrics/index.md): Little spoof of Kannukku Mai Azhagu lyrics 

+++
date = "2013-06-11 04:40:52+00:00"
draft = false
tags = ["lyrics"]
title = "Little spoof of Kannukku Mai Azhagu lyrics "
url = "/post/52686134804/little-spoof-of-kannukku-mai-azhagu-lyrics"
+++
Today my sister was cleaning her contact lens in front of mirror. I suddenly remembered <a href="http://www.youtube.com/watch?v=a6_ZBbIhXM4" target="_blank">Tamil song Kannukku Mai Azhagu</a> and <a href="http://www.paadalvarigal.com/940/kannukku-mai-azhagu-male-pudhiya-mugam-song-lyrics.html" target="_blank">lyrics</a> and little poet in me whispered `` Kannukku Contact lens Azhagu ``. This induced me to spoof little bit of lyrics pertaining to current trend. I will spoof lyrics which I could.

    Kannukku **Contact lens** Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal **Kandalae** Azhagu
    Ilamaiku **photo** Azhagu, Muthumaikku Narai Azhagu
    Kalvarkku **phone number** Azhagu, Kaathalarkku **facebook** azhagu

\*\* are spoofed ones.

## Original Version

    Kannukku Mai Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal Pen Azhagu..
    Kannukku Mai Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal Pen Azhagu..
    
    Kannukku Mai Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal Pen Azhagu..
    
    Ilamaiku Nadai Azhagu, Muthumaikku Narai Azhagu
    Kalvarkku Iravu Azhagu, Kaathalarkku Nilavazhagu
    Nilavukku Karai Azhagu, Paravaikku Siragazhagu
    Nilavukku Karai Azhagu, Paravaikku Siragazhagu
    Avvaiku Koon Azhagu, Annaikku Saei Azhagu
    
    Kannukku Mai Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal Pen Azhagu..
    Kannukku Mai Azhagu, Kavithaikku Poi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Koonthal Pen Azhagu..
    
    Vidikaalai Vin Azhagu, Vidiyum Varai Pen Azhagu
    Nellukku Naatrazhagu, Thennaikku Keetrazhagu
    Orukku Aar Azhagu, Orvalathil Thaer Azhagu
    Orukku Aar Azhagu, Orvalathil Thaer Azhagu
    Thamizhukku Zha Azhagu, Thalaivikku Naan Azhagu..
    
    Kannukku Mai Azhagu, Kavithaikku Pøi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Køønthal Pen Azhagu..
    Kannukku Mai Azhagu, Kavithaikku Pøi Azhagu
    Kannathil Kuzhi Azhagu, Kaar Køønthal Pen Azhagu..

- [Quora : I hate you for this](https://kracekumar.com/post/51741593651/quora-i-hate-you-for-this/index.md): Quora : I hate you for this

+++
date = "2013-05-30 19:24:13+00:00"
draft = false
tags = ["quora", "ux"]
title = "Quora : I hate you for this"
url = "/post/51741593651/quora-i-hate-you-for-this"
+++
I have been using quora for almost _two_ years. I have connected my `` Facebook, twitter, wordpress, tumblr `` accounts.

Unless you are logged into `` quora `` you cannot read the post, on the surface this is true. If you are web geek you know you can look into the source code and read the page content and by pass.

__The Problem__

Now I am logged into facebook in `` tab 1 ``, `` tab 2 `` is loading quora url, after few seconds dialog box appears and I am automatically logged into quora account. Now I logout of quora and again I visit the quora url, I am logged back into quora. I deleted the cookies and tried again, still the same. Unless I am logged out of facebook I can’t logout of quora.

This is F\*\*\*ING ANNOYING.

- [How much does it cost to spend 10 days in Mcleodganj](https://kracekumar.com/post/49797239467/how-much-does-it-cost-to-spend-10-days-in/index.md): How much does it cost to spend 10 days in Mcleodganj

+++
date = "2013-05-06 20:47:19+00:00"
draft = false
tags = ["mcleodganj"]
title = "How much does it cost to spend 10 days in Mcleodganj"
url = "/post/49797239467/how-much-does-it-cost-to-spend-10-days-in"
+++
At <a href="http://hasgeek.com" target="_blank">HasGeek</a> we decided to spend part of summer in <a href="http://en.wikipedia.org/wiki/McLeod_Ganj" target="_blank">McleodGanj</a>. This trip’s main focus was to code and enjoy beauteous mcleodganj surrounding.

## Trip

We(<a href="https://twitter.com/jackerhack" target="_blank">Kiran</a>, <a href="https://twitter.com/hallidude" target="_blank">Supreeth</a>, <a href="http://sosaysharis.wordpress.com/" target="_blank">Haris</a>, <a href="https://www.facebook.com/void.imagineer?fref=ts" target="_blank">Praseetha</a>, me) started from Bangalore 13, April, 2013 and returned on 1, May, 2013. I have uploaded the photos in <a href="https://www.facebook.com/media/set/?set=a.543823985669920.1073741826.100001269693478&amp;type=1&amp;l=45ec031f02" target="_blank">FaceBook</a>, <a href="https://www.evernote.com/shard/s207/sh/47dfff3e-23c0-47be-b49f-d26117bd6162/7310aff830011199402b63494d6b85db" target="_blank">notes</a> of food items I had in the trip, list of <a href="https://foursquare.com/kracetheking/list/mcleodganj" target="_blank">places</a> visited in McleodGanj.

Following is the breakup of the trip cost.

## 13, April 10:00PM, Yeshwantpur Railway Station, Bangalore,

*   Dinner : 60
*   Taxi fare: 50
*   Train Ticket cost: 1750

## 14, April - In Train crossing central India

*   Breakfast: 40
*   lunch: 100
*   Dinner: 80
*   Ice Cream: 20

## 15, April: Delhi

![Jama Masjid](https://66.media.tumblr.com/12b9c19b8f118c0cd6ff19b8ced30bb2/tumblr_inline_mme99vUKGc1qz4rgp.jpg)

*   Breakfast: 50
*   Metro Smartcard: 100
*   Rickshaw fare: 10
*   lunch : 150
*   Icecream: 85
*   Dinner: 85
*   Evening we started to Mcleodganj in Ac Volvo.
*   Bus fare: 800

## 16, April: First day in mcleodganj

![](https://66.media.tumblr.com/f71cfa3a28f7beceb06e95b3b79768fa/tumblr_inline_mme98oNSAq1qz4rgp.jpg)

*   Breakfast: 50
*   Tea: 15
*   Dinner: 160
*   Snacks (like biscuit, chocolate): 110

## 17, April: Small trek to air jaldi office in the evening

![](https://66.media.tumblr.com/d9323ce66fe8851e715ea16ee3dd5693/tumblr_inline_mme94uTjVA1qz4rgp.jpg)

*   Breakfast: 50
*   Butter Tea: 15
*   Lunch: 90
*   Ginger Honey lemon tea: 50
*   Dinner: 170

## 18, April: Evening walk to St.John’s Church

*   Breakfast: 65
*   Lunch: 70
*   Dinner: 180
*   Yak Yak Tshirt: 200
*   Coffee: 100

## 19, April: Walk to Bhagsu waterfalls

![Supreeth flying](https://66.media.tumblr.com/af336415ed3715a2317ff76bb7f539ff/tumblr_inline_mme93lLI1w1qz4rgp.jpg)

*   Breakfast: 60
*   Dinner: 200
*   Three pin holder: 80
*   Cheese potato omlette: 50
*   Pani Puri: 30

## 20, April: Work at leisure

*   Breakfast: 60
*   Lunch: 140
*   Dinner: 155

## 21, April: Walking to Dalai Lama Temple

*   Breakfast: 160
*   Lunch : 240
*   Complete Hindi Book: 350

## 22, April - Trek to Triund

![Triund](https://66.media.tumblr.com/848785d0104943cbad3bc1749fb0d280/tumblr_inline_mme8x3xu3o1qz4rgp.jpg)

*   Breakfast: 60
*   Snickers: 80
*   Lunch: 90
*   Tea: 30
*   Snacks: 75
*   Gloves: 250
*   Dinner: 130
*   Coffee: 40
*   Stay: 150 ( We stayed in a hut in Triund)
*   Snacks: 28

## 23, April - Trek to SnowLine, walk back to McleodGanj

![Mountains in Snow Line](https://66.media.tumblr.com/993cb75b97eac511769b6db3447f4735/tumblr_inline_mme8vqxJyS1qz4rgp.jpg)

*   Breakfast - 110
*   Coffee: 40
*   Lunch: 130
*   Snacks: 26
*   Dinner: 370 ( We had Exactly Dinner in Out Of Blue)
*   Taxi: 30

## 24, April - Relax

*   Breakfast: 100
*   Lunch: 285
*   Dinner: 250

## 25, April - Climbing Down to Dharmasala

*   Breakfast: 62
*   Laundry: 60
*   Mcleodganj stay(9 days): 2160

## 26, April - Leaving to Delhi in the evening

*   Dharmasala Stay: 500
*   Taxi: 110
*   Bus Fare(Dharmasala to Delhi): 800

## 27, April - Attending BarCamp

*   Metro card recharge: 100
*   Vad Pav, snacks: 400

## 28, April - Trip to Taj Mahal

*   Filter Coffee: 20
*   Lunch: 240
*   Gulfi ice: 10
*   Travel: 300
*   Taj Mahal car parking: 50
*   Dinner(basically chaats): 120
*   Agra fort + Taja Mahal Ticket fare: 40

## 29, April - Last day in Delhi

*   Lunch: 432
*   Iron man 3 Movie: 100
*   chaat: 30
*   Dinner: 300

## 30, April - Marching back to home

*   Train fare: 2150 (Inclusive of food)

## 1, May - Home

*   Tips: 100
*   cab: 100
*   Bus Ticket fare: 13
*   Breakfast: 35

Total Cost: __16736__(My Budget was 15000). It is worth the money.

## Work progress

During the trip I was working on <a href="http://github.com/hasgeek/hacknight" target="_blank">Hacknight</a>, sent 9 pull requests, 6 merged to master, 3 pending for approval. Added 647 LOC and removed 110 LOC.We worked out of Cafe, room with data cards, phone internet without any high speed wifi connection. For first 4 days, I was unable to work because of my bad internet data card.

<a href="http://github.com/jace" target="_blank">Kiran</a> made better progress, was able to hold his github streak.

# _Note_

*   Amount expressed is in Indian Rupees.
*   In Delhi we got accodomation in friends place and food too.
*   We shared two rooms in Mcleodganj, so it was cheap.
*   We tasted Italian, Tibetian, Indian, Japanese, Korean, Israeli, French etc …
*   Butter tea is my favourite of the trip.
*   Dogs were synonymous to cafe.

Hope to visit the place again.

- [hardest feature request ](https://kracekumar.com/post/47806200096/hardest-feature-request/index.md): hardest feature request 

+++
date = "2013-04-12 21:12:08+00:00"
draft = false
tags = ["javascript", "facepalm", "iframe"]
title = "hardest feature request "
url = "/post/47806200096/hardest-feature-request"
+++
I was working on <a href="http://hasgeek.tv" target="_blank">Hgtv</a> feature, syncing `` slides `` and `` videos ``, when video is viewed slide changes automatically. Seems easy but guess what someone has to take pain to watch entire video and collect the details about timings of the video and slide number . Then pass on the info to <a href="https://github.com/ffissore/presentz.js" target="_blank">presentz.js</a> which syncs video and slides.

## Iframe

If you look into source code how slide show and video is inserted, it is iframe. All videos are from `` youtube ``, slides are from `` speakerdeck ``, `` slideshare ``. Now to sync video and slide, I need to fetch the slide number and current time of video. `` Youtube `` has js api, which was easy to figure, but speakerdeck and slideshare inserts images into iframe. When next button is clicked image is changed. If I can access DOM I am done, but unfortunately you cannot access the DOM of an Iframe for a Cross Origin Request. I found this info after one whole day of tinkering and trying all answers in stackoverflow. Then I looked into presentz JS how it handles slide changing. Speakerdeck receives postMessage, it accepts `` nextSlide ``, `` previousSlide ``, `` goToSlide `` messages. Once speakerdeck processed the messages and sends message to originator, and the received message has to be processed(`` window.addEventListener ``). Before figuring above messages I was brute forcing to get figure out how to get current slide. Once I figured it only accepts three message, then it was easy. Below is the code.

        var receiveMessage = function(event){
        var data;
        if (event.origin.indexOf("speakerdeck.com") === -1){
          if (event.origin.indexOf("slideshare.net") === -1){
            return;
          }
        }
        data = $.parseJSON(event.data);
        if (data[1]) {
          slideno = data[1].number;
        }};

Then register event listener

    window.addEventListener("message", receiveMessage, false);

Then slideshare was bit out of track. Slideshare had js api which requires swfobject and swfobject\_playerapi. Then it was bit easier still took me some time to figure out missing `` swfobject_playerapi `` is required.

It took me five days to finally get this feature and <a href="https://github.com/hasgeek/hasgeek.tv/pull/86" target="_blank">pull</a> is ready.

## Learning:

*   I learned coffeescript.
*   Parsing DOM of Iframe is not possible for CORS.
*   `` postMessage ``

This feature was so far the hardest one.

## Root Cause

The root cause for this problem, is my \*\*Ass u mption \*\*. I was trying to use presentz.js to get current slide and video player timings.

- [coding from balcony](https://kracekumar.com/post/46695528573/coding-from-balcony/index.md): coding from balcony

+++
date = "2013-03-30 20:04:57+00:00"
draft = false
tags = ["coding", "experiment"]
title = "coding from balcony"
url = "/post/46695528573/coding-from-balcony"
+++
I live in 5th floor, my table is near window. I spent saturday by watching 3 documentaries and 1 tamil movie <a href="http://en.wikipedia.org/wiki/Sindhu_Bhairavi_(film)" target="_blank">Sindhu Bhairavi</a>, yes no single line of code. Once I am done with movie it was 00:30 AM, now stepped into the balcony was mesmerized by breeze. I felt like a poet and casual thoughts.![](https://66.media.tumblr.com/60daab28929ba73e5ba12731fb6b90cb/tumblr_inline_mkhoixP3t01qz4rgp.jpg)

Yes breeze indeed brought a new thought seed from distant place, __“How about coding from balcony”__. No second thought, cleaned, set up done. All ready now. Truly great to sit in bean bag hearing Tamil song with laptop, random stray dog barking, car sound and cool breeze. ![](https://66.media.tumblr.com/187c2f9fa08a0ae58e70963d1af9c7d4/tumblr_inline_mkhoxg2Xlq1qz4rgp.jpg)

All set to code :-)

- [Avvaiyar now International Icon](https://kracekumar.com/post/46334740169/avvaiyar-now-international-icon/index.md): Avvaiyar now International Icon

+++
date = "2013-03-26 13:33:09+00:00"
draft = false
tags = ["Auvaiyar", "Tamil"]
title = "Avvaiyar now International Icon"
url = "/post/46334740169/avvaiyar-now-international-icon"
+++
கற்றது கைமண் அளவு, கல்லாதது உலகளவு - ஔவையார்.

Yes, you must have read this is tamil text book in standard 1. Now this is translated into english and referenced in <a href="http://www.nasa.gov/audience/foreducators/informal/features/F_Cosmic_Questions_prt.htm" target="_blank">NASA</a>. From NASA - Cosmic Questions Exhibit

    What we have learned
    Is like a handful of earth;
    What we have yet to learn
    Is like the whole world
        - Auvaiyar, 4th C poet, India  

<a href="http://en.wikipedia.org/wiki/Avvaiyar" target="_blank">Wikipedia</a> has an article about Avvai paatti.

- [How much will it cost to attend Hacker School ?](https://kracekumar.com/post/46275902528/how-much-will-it-cost-to-attend-hacker-school/index.md): How much will it cost to attend Hacker School ?

+++
date = "2013-03-25 20:41:28+00:00"
draft = false
tags = ["hackerschool", "money"]
title = "How much will it cost to attend Hacker School ?"
url = "/post/46275902528/how-much-will-it-cost-to-attend-hacker-school"
+++
<a href="https://www.hackerschool.com/" target="_blank">Hacker School</a> is a three-month, full-time school in New York for becoming a better programmer for free, but stay, travels is yours.

I have no idea how much it will cost for travel from India, stay, food, internet, transit so I asked the question in <a href="http://www.quora.com/Hacker-School/How-much-does-it-cost-to-live-in-NY-for-three-months" target="_blank">Quora</a>. I got pretty good answers.

Then I started do my lame math.

## Monthly Expense(USD)

     Rent = 1000
     Phone = 80
     Transit = 100
     Internet = 40
     Electricity = 40
     Food = 250
     Snacks = 100
     Outing = 120
     Misc = 100

Total = 1830.

So let me round to `` 2000 USD `` for safety.

If I am right I need to pay advance for Room, let me add one month rent to total expense. `` 7000 USD `` now.

Two way flight will cost `` 60000 INR, 10000 INR `` for Visa. Assuming `` 1 USD to 55.0, 70000 INR = 1272.73 USD ``.

`` 7000 USD is 385000 INR ``. So total money required for me is `` 455000
 INR ``.

__Q__: How long will take to save so much money ?__A__: 30 months to 36 months.

So after 3 years these USD to INR must be higher and US food pricing, rent must be increased at least by 2% to 5%. So it will be 7350 USD.

By guess 1 USD will be 60 INR in 2015. `` 7350 * 60 = 441000 ``, Wow to spend 3 months in Hacker School I need to save money for 3 years.

Will my interest hold up ?

`` Time and Tide waits for none ``.

- [Evaluate python code using client side javascript  ](https://kracekumar.com/post/46183195282/evaluate-python-code-using-client-side-javascript/index.md): Evaluate python code using client side javascript  

+++
date = "2013-03-24 19:24:08+00:00"
draft = false
tags = ["python", "emscripten", "empythoned", "javascript"]
title = "Evaluate python code using client side javascript  "
url = "/post/46183195282/evaluate-python-code-using-client-side-javascript"
+++
Now Python code can be evaluated using Client side Javascript with the help of <a href="https://github.com/replit/empythoned/" target="_blank">empythoned</a> project. `` empythoned `` uses <a href="https://github.com/kripken/emscripten/wiki" target="_blank">emscripten</a> which convert `` LLVM bitcode `` to `` javascript ``.

## What is empythoned?

`` Empythoned `` is project which has converted `` CPython `` to `` javascript ``. I have created a demo project to test how to use `` empythoned ``, have a <a href="https://github.com/kracekumar/test-empythoned" target="_blank">look</a>.

- [Python parallel assignment](https://kracekumar.com/post/45502206532/python-parallel-assignment/index.md): Python parallel assignment
+++
date = "2013-03-16 15:31:21+00:00"
draft = false
tags = ["python", "variable", "parallel-assignment"]
title = "Python parallel assignment"
url = "/post/45502206532/python-parallel-assignment"
+++
Python supports parallel assignment like

    >>> lang, version = "python", 2.7
    >>> print lang, version
    python 2.7

values are assigned to each variable without any issues.

    >>> x, y, z = 1, 2, x + y
    Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    NameError: name 'x' is not defined

First python tries to evaluate `` x + y `` expression. Since `` x, y `` is defined in same line, python is unable to access the variable `` x `` and `` y ``, so `` NameError `` is raised.

    >>> x, y = 1, 2
    >>> z, a = x + y, 65
    >>> print x, a
    1 65

In above code `` x, y `` is referenced before so `` x + y `` is evaluated and the value is assigned to `` z ``.

__So don’t assign the values in same line and use it in expression__

- [Setting up privoxy proxy server for browsing](https://kracekumar.com/post/41956153813/setting-up-privoxy-proxy-server-for-browsing/index.md): Setting up privoxy proxy server for browsing
+++
date = "2013-01-31 18:21:00+00:00"
draft = false
tags = ["ubuntu", "proxy", "privoxy"]
title = "Setting up privoxy proxy server for browsing"
url = "/post/41956153813/setting-up-privoxy-proxy-server-for-browsing"
+++
I wanted to setup proxy server for browsing. Tried&nbsp;<a href="http://www.squid-cache.org/" target="_blank">http://www.squid-cache.org/</a>&nbsp;felt cumbersome to configure though it has advanced features.

Finally decided to setup <a href="http://www.privoxy.org/" target="_blank">http://www.privoxy.org/</a>. I assume you have personal server where all the requests are forwarded.

__Installation__

__ Server Config__

_sudo apt-get install privoxy_

_sudo vim /etc/privoxy/config_

look for _listen-address_&nbsp;and add ip:port _listen-address 78.12.204.2:8118_

To enable logging for all requests uncomment _debug 1_(you will need to rotate log file using cronjob).Done with server config.

___Client Config___

Firefox users using ubuntu _Edit -> Preferences->Network->Connection->Settings_, choose manual proxy and enter ip address and port.

Chromium/Chrome uses system proxy, _Sytem->Network_&nbsp;look for _Configure HTTP Proxy_&nbsp;and enter details.

For command line use, add _HTTP\_PROXY=http://ip:port_ to _~/.bashrc_.

- [2 weeks after installing Ubuntu 12.04 64bit in MacBook](https://kracekumar.com/post/37344856282/2-weeks-after-installing-ubuntu-1204-64bit-in/index.md): 2 weeks after installing Ubuntu 12.04 64bit in MacBook
+++
date = "2012-12-06 19:08:07+00:00"
draft = false
tags = []
title = "2 weeks after installing Ubuntu 12.04 64bit in MacBook"
url = "/post/37344856282/2-weeks-after-installing-ubuntu-1204-64bit-in"
+++
Its been two weeks, since I installed <a href="http://www.youtube.com/playlist?list=PL279M8GbNseskfSem7bQD4U0eSOjCEBhK" target="_blank">Ubuntu 64 bit</a> in MacBook. Let me say how I feel about.

My setup isn’t complicated, I use `` sublime2, terminator, gnome-terminal, ipython, pypy-2.0beta, postgres, mosh, firefox-nightly, chromium, chrome, clementine, xchat-gnome, hotot(recent addition) ``.

Also installed `` nividia `` drivers for ubuntu. I am happy I installed ubuntu but had issue with right click and figured `` shift + F10 `` is short cut.

## Hate Hate Hate

If you are using data-card you must have noticed default `` modem-manager `` is buggy. If I suspend my system for 10 hours and try to activate the device `` no response ``. You kill the process and restart it doesn’t work, So I need to restart the machine. Really din’t find solution for it :-(.

Occassionally after resuming from suspend mode, system load is 100%, I don’t know reason is due to `` firefox nightly or hotot(I had lot memory leaks in past) ``, only way to find out memory usage of firefox is `` about:memory `` which doesn’t display tab wise memory usage rather by componenet wise.

I like Mac keyboard and weight compared to my toshibha not sure my next laptop will be MacBook.

- [Stable Browser vs Fastest Browser](https://kracekumar.com/post/36602101124/stable-browser-vs-fastest-browser/index.md): Stable Browser vs Fastest Browser

+++
date = "2012-11-26 18:33:11+00:00"
draft = false
tags = ["firefox", "chrome", "chromium", "GNU/Linux"]
title = "Stable Browser vs Fastest Browser"
url = "/post/36602101124/stable-browser-vs-fastest-browser"
+++
## Firefox Vs Chrome Vs Chromium

Chrome &amp; Firefox are my current favourite browser. I was previously using chromium but stopped using because of this <a href="https://bugs.launchpad.net/ubuntu/+source/chromium-browser/+bug/1054557" target="_blank">bug</a>.

Since then I switch between chrome and firefox. Chrome is fastest, better UI &amp; UX. Firefox is stable.

Today I faced issue with my flash player in chrome(Ubuntu 64 bit OS). All the youtube video starts playing at 2x - 3x speed. After few mins video plays in normal speed. This is been really annoying me a lot, whereas there is no issue with Firefox.

## Stable vs Fastest

I never faced any wierd issue like Chrome or Chromium in firefox but what bothers me using Firefox is speed.

Firefox memory management is entirely different from chrome. Every tab on chrome is separate process whereas in firefox they share the same memory as result, switching between tabs and response time of firefox icons isn’t as fast chrome.

Other reason I love Chromim/chrome is writing extension/addons is dead simple.

Now I am confused whether to use the stable browser or fastest browser.

- [Can PyPy be used for web application deployment ?](https://kracekumar.com/post/36532666649/can-pypy-be-used-for-web-application-deployment/index.md): Can PyPy be used for web application deployment ?
+++
date = "2012-11-25 20:11:00+00:00"
draft = false
tags = ["pypy", "python", "deployment"]
title = "Can PyPy be used for web application deployment ?"
url = "/post/36532666649/can-pypy-be-used-for-web-application-deployment"
+++
## What is PyPy ?

*   PyPy is an implementation of Python in Python which uses JIT(Just In Time) translation.

## Why to Use PyPy?

*   According to <a href="http://speed.python.org" target="_blank">benchmarks</a> “It depends greatly on the type of task being performed. The geometric average of all benchmarks is 0.18 or 5.6 times faster than CPython”.

## Experience?

I have used PyPy for sandboxing for my project <a href="http://github.com/kracekumar/pylive" target="_blank">pylive</a>, tested flask with pypy, ported brubeck to work on <a href="https://github.com/kracekumar/brubeck/blob/pypy/pypy.md" target="_blank">pypy</a>, tested <a href="https://github.com/hasgeek/lastuser" target="_blank">lastuser</a> in PyPy 2.0beta1. In the experiment `` pypy-sandbox, requests, flask, werkzeug, jinja2, SQLAlchemy(postgres + sqlite), cython, greenlet, eventlet, markdown, gunicorn, dictshield, json, zmq `` was tested.

If the library depends on C extensions more likely it needs rewrite using <a href="http://cffi.readthedocs.org/en/latest/" target="_blank">cffi</a>. Here is the bummer, so you can’t use your `` requirements.txt `` to directly deploy with `` PyPy ``. For an example: `` pyzmq `` library is written in C. So you can’t run in PyPy, so one needs to use <a href="https://github.com/felipecruz/zmqpy" target="_blank">zmqpy</a>, <a href="https://github.com/svpcom/pyzmq-ctypes" target="_blank">pyzmq</a>.

## Benchmarks

<a href="https://gist.github.com/4137006" target="_blank">Benchmark</a> of `` brubeck `` after porting shows `` PyPy `` din’t boost performance. This could be due to following reasons.

1. `` gevent `` doesn’t run on PyPy yet. :-(


2.

    `` zmq `` library blocks, `` eventlet `` support for zmq is <a href="https://github.com/nanonyme/eventlet-pypy/issues/1" target="_blank">buggy</a>.


3.

    `` ujson `` vs `` PyPy `` in built `` json `` performance.



## Does any one use PyPy in production

1. <a href="http://www.quora.com/Alex-Gaynor/Posts/Quora-is-now-running-on-PyPy" target="_blank">Quora runs on pypy</a> but no updates recently.


2. Twisted site run on PyPy.


3. There is a scientfic telescope project that runs on pypy(couldn’t locate the link).



## My take

1. `` Postgres driver `` runs without any issues, happy for that.


2.

    Waiting for `` gevent `` to support, after that there will be significant progress in speed of brubeck and adoption.


3.

    All my personal projects will be tested against pypy. Yes hopes are becoming true.

- [How to run Python Linux commands in PyPy ?](https://kracekumar.com/post/36529827327/how-to-run-python-linux-commands-in-pypy/index.md): How to run Python Linux commands in PyPy ?
+++
date = "2012-11-25 19:36:19+00:00"
draft = false
tags = ["PyPy", "apt-get"]
title = "How to run Python Linux commands in PyPy ?"
url = "/post/36529827327/how-to-run-python-linux-commands-in-pypy"
+++
I have been using PyPy from 1.6. Now PyPy 2.0beta1 is out. Most of python libraries which don’t have `` c extensions `` as dependencies work(exceptions are present).

E.g: `` requests, fabric, gunicorn ``

## Install PyPy2.0beta1

1. Grab 32 or 64 bit <a href="http://pypy.org/download.html" target="_blank">pypy2.0 beta1</a>. If you are using ubuntu grab `` libc2.5 ``.


2.

    `` bzip2 pypy-2.0-beta1-linux64-libc2.15.tar.bz2 ``


3.

    `` tar -xvf pypy-2.0-beta1-linux64-libc2.15.tar ``


4.

    <code>curl -O <a href="http://python-distribute.org/distribute_setup.py" target="_blank">http://python-distribute.org/distribute_setup.py</a></code>


5.

    <code>curl -O <a href="https://raw.github.com/pypa/pip/master/contrib/get-pip.py" target="_blank">https://raw.github.com/pypa/pip/master/contrib/get-pip.py</a></code>


6.

    `` ./pypy-2.0-beta1/bin/pypy distribute_setup.py ``


7.

    `` ./pypy-2.0-beta1/bin/pypy get-pip.py ``


8.

    `` ./pypy-2.0-beta1/bin/pip install virtualenv ``


9.

    Please feel free to create `` alias ``. Here is sample alias.



    alias pypy2b="/home/kracekumar/downloads/pypy-2.0-beta1/bin/pypy"
    alias pypy2b-pip='/home/kracekumar/downloads/pypy-2.0-beta1/bin/pip'
    alias pypy2b-venv='/home/kracekumar/downloads/pypy-2.0-beta1/bin/virtualenv'

##  Next ?

1. `` pypy2b-pip install fabric ``


2.  Add `pypy bin` directory to `$PATH, export PATH=$PATH:/home/kracekumar/downloads/pypy-2.0-beta1/bin` or open `` ~/.bashrc `` add `` export PATH=$PATH:/home/kracekumar/downloads/pypy-2.0-beta1/bin `` and `` source ~/.bashrc ``.


3.

        kracekumar@aadukalam:~/codes/python-wiktionary$ which fab
        /home/kracekumar/downloads/pypy-2.0-beta1/bin/fab


4.   <code><br/>
kracekumar@aadukalam:~/codes/python-wiktionary$ cat <code>which fab</code><br/><code>#!/home/kracekumar/downloads/pypy-2.0-beta1/bin/pypy</code><br/><code># EASY-INSTALL-ENTRY-SCRIPT: 'Fabric==1.5.1','console_scripts','fab'</code><br/><strong>requires</strong> = ‘Fabric==1.5.1’<br/>
import sys<br/>
from pkg_resources import load_entry_point<br/>
if <strong>name</strong> == ’<strong>main</strong>’:<br/>
sys.exit(
   load_entry_point('Fabric==1.5.1’, 'console_scripts’, 'fab’)()<br/>
)</code>

So `` fab `` now uses pypy interpreter rather than CPython.

But there are other commands like `` history, gwibber-service `` which are installed via `` apt-get `` can’t use PyPy since `` apt-get `` doesn’t support `` pypy `` since it is unstable.

## Thoughts

* There should be way developer can specify in debian/ubuntu repo to look for PyPy in the user installation if not use CPython.


*  `` apt-get `` is written in python, if distros ship `` pypy `` and `` apt-get `` works well, there is a chance `` PyPy `` will became main stream in linux packaging.

- [Why did I install Ubuntu in MacBook ?](https://kracekumar.com/post/36222425955/why-did-i-install-ubuntu-in-macbook/index.md): Why did I install Ubuntu in MacBook ?

+++
date = "2012-11-21 18:43:00+00:00"
draft = false
tags = ["linux", "macbook", "osx"]
title = "Why did I install Ubuntu in MacBook ?"
url = "/post/36222425955/why-did-i-install-ubuntu-in-macbook"
+++
## Reasons

*   Mac doesn’t have package manager like `` synaptic ``. But you have App Store for apps.
*   You need to build `` xcode `` or `` gcc-installer `` to install any cocoa app.
*   I Use USB Data card as result I don’t have privilege to experience high speed broadband connection. As I have noticed OSX apps are heavy to download.
*   Linux fanboy :-)

- [Post PC OS Hate](https://kracekumar.com/post/35495203505/post-pc-os-hate/index.md): Post PC OS Hate

+++
date = "2012-11-11 17:40:36+00:00"
draft = false
tags = ["rant", "android-hate"]
title = "Post PC OS Hate"
url = "/post/35495203505/post-pc-os-hate"
+++
Smartphones &amp; Tablets are new plastic papers. Android and IOS dominates the market. I own Android Phone because its cheap. When it comes to Gadgets I am conservative. I really don’t like Android much because its written in Java and need a better hardware to run Interpreted Languages, JS. Android runs from low end cheap 4000Rs phone to Nexus Phone. So wide variety of device headaches to developers.

Every Android phone is one layer coated with device makers code, same with Linux distros. In Desktop/Laptop I really want to use Linux, give me a MacBook first thing I will do is (try to) Install Linux distro.

I don’t have such love for android or any current set of OS for Smart Phones, but I am attracted towards B2G because its web.

Let me wait for an year and see.

- [Microframeworks produces micro level progress in project](https://kracekumar.com/post/35493348288/microframeworks-produces-micro-level-progress-in/index.md): Microframeworks produces micro level progress in project

+++
date = "2012-11-11 17:14:24+00:00"
draft = false
tags = ["flask", "microframework", "python"]
title = "Microframeworks produces micro level progress in project"
url = "/post/35493348288/microframeworks-produces-micro-level-progress-in"
+++
I created my first web app in PHP 5.2 with \* no frameworks \*, then learned drupal, tried codeigniter, joomla. Then I learned Rails for HappySchool project and learned Django since I am python lover. Tried Pylons and settled with Flask and experimenting brubeck.

Flask is microframework built around werkzeug datastructures.

# Advantages Vs Disadvantages

*   Learn in depth working of HTTP vs Time consuming
*   Opportunity to create library vs Time consuming
*   Less batteries available vs More development time

It is highly loosely coupled which is good to replace the parts with best tools if available. Not suited for everyone. Unless you are ready to explore/headdesk/discover/ship/reship/learn/hack _DON’T use_ microframework, choose full blown framework like Django/Rails.

Remember web development sucks models, views, templates, helpers, UI. With micorframework you get only one level or two level above CGI. you need to figure out _session management, Database Toolkit, NoSQL toolkit, template engine, form handlers, assest manager, blah blah_.

- [Hackathons are to hack and * headdesk * moments](https://kracekumar.com/post/35492377794/hackathons-are-to-hack-and-headdesk-moments/index.md): Hackathons are to hack and * headdesk * moments

+++
date = "2012-11-11 17:00:40+00:00"
draft = false
tags = ["hackathon"]
title = "Hackathons are to hack and * headdesk * moments"
url = "/post/35492377794/hackathons-are-to-hack-and-headdesk-moments"
+++
I feel whole idea of hackathon is to build stuff over night or a day or two. All the outcome of the hackathon is for good. As a beginner it could be useful to get started to technology/language. I am biased towards prizes in hackathon.

- [What GNU/Linux Operating System lacks ?](https://kracekumar.com/post/34154528432/what-gnulinux-operating-system-lacks/index.md): What GNU/Linux Operating System lacks ?

+++
date = "2012-10-23 07:10:09+00:00"
draft = false
tags = ["GNU/Linux"]
title = "What GNU/Linux Operating System lacks ?"
url = "/post/34154528432/what-gnulinux-operating-system-lacks"
+++
I am GNU/Linux user for almost 4 years, saying that I don’t switch back to windows for any day to day activities, being said that GNU/Linux operating system’s lacks lot of \* applications \*. There are lot superior command line applications but those are intended for religious command line humans.

I have been using mac osx 10.6. for 3-4 weeks, it seems to have all applications but I don’t feel at `` ~ ``.

_Applications that needs to be improved_

1.   
    
    All mail clients like thunderbird, evolution suck big time. When you start connecting 4-5 accounts with more than 10000 emails, then slowly you feel the pain. I have used mutt but I have pain while reading mailing list replies.
    
    
2.   
    
    Native Twitter Client. There are decent ones like Hotot but it crashes often and I see segfault &amp; coredump. Then I switched to Tweet Deck.
    
    
3.   
    
    Decent Calendar application.
    
    

- [What I like about Python](https://kracekumar.com/post/33900573776/what-i-like-about-python/index.md): What I like about Python

+++
date = "2012-10-19 17:22:18+00:00"
draft = false
tags = ["python", "python-likes"]
title = "What I like about Python"
url = "/post/33900573776/what-i-like-about-python"
+++
<a href="https://twitter.com/roguelynn" target="_blank">Lynn Root </a> asked in twitter what you like and like to improve in python <a href="https://twitter.com/roguelynn/status/259338864664125440" target="_blank">https://twitter.com/roguelynn/status/259338864664125440</a>. Following are my observation

_Likes_

1.   Importance to documentation..
2.   Clean syntax.
3.   Easy to get started for non CS background people.
4.   lot of smart programmers.
5.   Libraries like IPython, requests, flask.
6.   Creating libraries like pygments, sphinx, readthedocs to solve \* REAL \* problem.

_I would like/want to improve_

1.   python.org site
2.   While programmers are reading docs.python.org, code snippets should be executable right from the page(I have plans to start this as personal project).
3.   Ship real documentation server like godoc(I will tentatively start this work in second week of november).
4.   Solve concurrency in core with CPython.
5.   Make python first class programming language in Windows.
6.   Better support for tablet, smart phone application development.

Apart from all the above python community is awesome(yes my english is poor yu know that).

- [How Python makes learning simpler](https://kracekumar.com/post/31347683771/how-python-makes-learning-simpler/index.md): How Python makes learning simpler
+++
date = "2012-09-11 19:23:17+00:00"
draft = false
tags = []
title = "How Python makes learning simpler"
url = "/post/31347683771/how-python-makes-learning-simpler"
+++
Python is a simple language and developed with programmer’s productivity and code readability in mind. Learning new language would be simple to complicated depending on language syntax, wierdness and many other factors.

Its universally accepted best way to learn programming language is to write programs and rewrite again.

Python makes learning curve easier. Python has certain features which makes easier to learn inside interpreter.

*   `help(object) ` ,`` help `` is function which takes a `` object `` or `` function `` and tells what exactly the object documentation.`` help(2) `` will print `` integer `` class properties and methods in nicer format.


*

    `` dir(object) ``,`` dir `` is a function list of methods associated with a particular object, `` dir('python') `` lists all functions associated with string. `` ___all__ `` is magic methods and `` title `` is normal functions.


*

    `` import os;
    os.__file___ ``In order to find the location of the module, it is convenient to use `` modulename.__file__ `` will return path of the source file. Note certain modules can be `` .py `` files, others are `` .pyc ``, few are `` .so ``.



Next time you code in python, open your favorite text editor and python interpreter and try all the hacks in python interpreter and read documentation.

__Ipython pro tip__Inside Ipython `` %psource modulename `` prints source code of the module, useful in lot of cases.

Find out what `` modulename.__all__ `` does

`` You're almost as happy as you think you are. ``

- [How I got into HasGeek Crew](https://kracekumar.com/post/26494437210/how-i-got-into-hasgeek-crew/index.md): How I got into HasGeek Crew

+++
date = "2012-07-04 15:32:00+00:00"
draft = false
tags = ["hasgeek", "python", "hiring", "kracekumar"]
title = "How I got into HasGeek Crew"
url = "/post/26494437210/how-i-got-into-hasgeek-crew"
+++
## Background about me

I am kracekumar, graduated from Amrita school of Engineering, Coimbatore in B.Tech IT (2007-2011). I am working with IBM India Pvt Ltd, Bangalore as Associate System Engineer from 14th July, 2011 til 16th July, 2012.(C\# developer but never wrote single line of code in c\# in IBM).

I am GNU/Linux user for 3 years and developed application in PHP, Rails, Flask(all are hobby projects).

## Scene

I was not happy with job at IBM and I had training bond for one year(14th July, 2011 to 13th July 2012), decided to resign from my job once bond period is over whether I have new job offer or not. I usually look for job posting in <a href="http://jobs.hasgeek.com" target="_blank">Hasgeek job board </a> from time to time. I was very interested to work in Linux, Python or Rails and wasn’t interested in Java or C\# or Windows Technology.

## I tweeted on 13th May

>  
> Seems like I couldn’t find out cool python company in india \#sad.
> 

Following are the replies I received in order (Ascending order of replies in twitter)

>  
> Mehul Ved ‏@mehulved
> 
> @kracetheking define cool python company? Maybe @versesane or @Sengupta can help. They have done some research on such companies.
> 

---
>  
> kracekumar ‏@kracetheking
> 
> @mehulved: Cool means geeky ppl who care for every line code of code. @versesane @Sengupta
> 

---
>  
> Mehul Ved ‏@mehulved
> 
> @kracetheking what about @Instamojo where @Sengupta and @hiway work? @versesane
> 

---
>  
> James Dennis ‏@j2labs
> 
> @kracetheking Have you met @sid\_chilling ? He’s even a Brubeck user!
> 

---
>  
> Ankur Gupta ‏@versesane
> 
> @kracetheking @mehulved @Sengupta bangalore has lot of good work in python happening …
> 

---
>  
> Ankur Gupta ‏@versesane
> 
> @kracetheking @mehulved @Sengupta best way to accumulate list of companies is to see archived list of opening on <a href="http://simplyhired.co.in" target="_blank">http://simplyhired.co.in</a>
> 

---
>  
> kracekumar ‏@kracetheking
> 
> @versesane @mehulved @Sengupta :I always search in hasgeek jobs but they want people who know python, but want to code in php.
> 

---
>  
> Ankur Gupta ‏@versesane
> 
> @kracetheking @mehulved @Sengupta well how many years of professional work ex do you have or what softwares have you worked on that ppl use
> 

---
>  
> kracekumar ‏@kracetheking
> 
> @j2labs: I have meet @_sunil_ during pycon india, 2011 who works along with @sid\_chilling,that time I haven’t discovered brubeck. thanks!
> 

---
>  
> sid\_chilling ‏@sid\_chilling
> 
> @kracetheking good to meet you. do contact me or @_sunil_ when you come to our city. would love to meet. thanks @j2labs
> 

---
>  
> Kiran Jonnalagadda ‏@jackerhack
> 
> @kracetheking We don’t advertise, but we could do with Python skills on board to
> 

Kiran is the founder of <a href="http://hasgeek.com" target="_blank">hasgeek</a>

---
>  
> kracekumar ‏@kracetheking
> 
> @versesane : Thanks for those links. /cc @mehulved @Sengupta
> 

---
>  
> Ankur Gupta ‏@versesane
> 
> @kracetheking i can introduce u to some hiring managers let us talk over email versesane at gmail
> 

---

All the above conversation happened on 13th May 2012.

On 14th May 2012

>  
> kracekumar ‏@kracetheking
> 
> @jackerhack : When can we have the talk ?
> 

---
>  
> Kiran Jonnalagadda ‏@jackerhack
> 
> @kracetheking I’m out of town this week. Can talk on IRC though. I’m jace in irc://irc.freenode.net/\#hasgeek.
> 

---

Then

Myself and <a href="https://twitter.com/#!/jackerhack" target="_blank">Kiran</a> had conversation in IRC and kiran revealed hiring policy for hasgeek is contribute to hasgeek codebase and if it is ok you are hired. I agreed to the deal.

## Contributing to Hasgeek Codebase

From 14th May 2012 to 18th May 2012 I worked on <a href="https://github.com/hasgeek/hasgeek.tv" target="_blank">Hasgeek Tv</a>, <a href="https://github.com/kracekumar/hasgeek.tv" target="_blank">My contribution here</a>. Most of the code was already completed by kiran, but there were few incomplete pieces which I completed.

On 21st May 2012, kiran and me decided we will meet on 22nd May 2012 in CIS(Center for Internet Society). On 22nd May I was half an hour late to the place. Then Kiran, Siddarth and me had discussion(not an interview) about each other. Kiran and siddarth informed me that they are working on new project <a href="http://hacknight.in" target="_blank">Hacknight</a> for which they want me to contribute and depending on the outcome in three weeks, I might be hired.

## Journey

It was Kiran, Siddarth, <a href="https://twitter.com/#!/geohacker" target="_blank">Sajjad</a> and I who worked on hacknight. The code is <a href="https://github.com/hasgeek/hacknight" target="_blank">Here</a>. I worked on hacknight code in evening after coming from office and saturday in CIS. Most of our discussion was carried out in CIS(Hasgeek office ) , IRC, Github.

I had initial learning curve looking into other hasgeek libraries like <a href="https://github.com/hasgeek/coaster" target="_blank">Coaster</a>, <a href="https://github.com/hasgeek/baseframe" target="_blank">Baseframe</a>, <a href="https://github.com/hasgeek/lastuser" target="_blank">Lastuser</a>, <a href="https://github.com/jace/pydocflow" target="_blank">Peopleflow</a>. Thanks for Kiran, Siddarth, Sajjad who answered all my silly doubts and helped to get started.

## My mistakes

I broke the code twice. Both were due to improper tests.

*   
    
    <a href="https://github.com/hasgeek/baseframe/commit/43f08689017bb1e0dfeb4f9e7e5d4efc47e6af8d" target="_blank">Baseframe broken code</a>
    
    
*   
    
    <a href="https://github.com/hasgeek/hacknight/issues/42" target="_blank">Hacknight code which I broke</a>
    
    

## Climax

On 7th June, 2012 Kiran informed hasgeek is happy to hire me after final talk on 9th June, 2012. All went well, I informed I am happy to join Hasgeek. I will be joining hasgeek on 17th July, 2012.

Finally if you have read so far you should probably look into <a href="http://beta.hacknight.in/" target="_blank">Hacknight Beta Version</a> and feel free to speak to me in <a href="https://twitter.com/#!/kracetheking" target="_blank">Twitter</a>.

## Thanks

Thanks a ton to following people who setup the entire atmosphere

<pre><code>-[mehulved](<a href="https://twitter.com/#!/mehulved" target="_blank">https://twitter.com/#!/mehulved</a>)
-[versesane](<a href="https://twitter.com/#!/versesane" target="_blank">https://twitter.com/#!/versesane</a>)
-[Sengupta](<a href="https://twitter.com/#!/Sengupta" target="_blank">https://twitter.com/#!/Sengupta</a>)
-[j2labs](<a href="https://twitter.com/#!/j2labs" target="_blank">https://twitter.com/#!/j2labs</a>)
-[sid_chilling](<a href="https://twitter.com/#!/sid_chilling" target="_blank">https://twitter.com/#!/sid_chilling</a>)
</code></pre>

- [Fake Python switch statement](https://kracekumar.com/post/24826905040/fake-python-switch-statement/index.md): Fake Python switch statement

+++
date = "2012-06-10 18:28:39+00:00"
draft = false
tags = []
title = "Fake Python switch statement"
url = "/post/24826905040/fake-python-switch-statement"
+++
Python has no _switch_ statement.

what is switch statement ? switch statement is an alternate to `` if - elseif - else `` statement.

_Example in C_

<pre class="prettyprint">

  int payment_status=1;
    switch(payment_status){
    case 1:
        process_pending_payment();
        break;
    case 2:
       process_paid();
        break;
    case 3:
        process_trans_failure();
       break;
    default:
       process_default();
}
</pre>

In _python_ we can achieve same behaviour using `` dict ``.

`` Fake switch statement in python ``

<pre class="prettyprint">
payment_functions = {
    1: process_pending_payment,
    2: process_paid,
    3: process_trans_failure
}
try:
    status =2
    payment_functions[status]()
except KeyError:
    process_default()
</pre>

In above code `` payment_functions `` is `` dict ``, where `` key `` is the one of the value of `` status `` and corresponding `` value `` is function to be invoked(but `` () `` is not present immediately).

When we try to access the dict element `` function `` name and `` () `` is present immediately so function is called. If `` key `` is absent `` KeyError `` exception is raised, `` default `` function is placed inside except block.

- [python `in` operator use cases](https://kracekumar.com/post/22512660850/python-in-operator-use-cases/index.md): python `in` operator use cases

+++
date = "2012-05-06 13:06:00+00:00"
draft = false
tags = ["python", "in operator"]
title = "python `in` operator use cases"
url = "/post/22512660850/python-in-operator-use-cases"
+++
Python \*in\* operator is membership test operator.\*Examples:\*List—-

<pre class="prettyprint"><code>
In [1]: python_webframeworks = ['flask', 'django', 'pylons', 'pyramid', 'brubeck']

In [2]: 'flask' in python_webframeworks

Out[2]: True

In [3]: 'web.py' in python_webframeworks

Out[3]: False

</code></pre>

_in_ operator iterates over the list of elements and returns &nbsp;_True_ or _False_.

_What about nested list?_

<pre class="prettyprint"><code>

In [4]: webframeworks = [['flask', 'django', 'pyramid'],['rails', 'sintara'],['zend', 'symfony']]

In [5]: 'flask' in webframeworks
Out[5]: False

</code></pre>

_in_ isnt handy for nested list, unless it is&nbsp;overriden.&nbsp;

## Dict

&nbsp;_in_ operator against dictionary checks for the presence of key.

<pre class="prettyprint"><code>
In [7]: person = {'name': 'kracekumar', 
'country': 'India', 'os': 'Linux', 
'programming_languages': {'web': 'php', 'multi_paradigm': ['python', 'ruby', 'java', 'c#']}}

In [8]: 'name' in person

Out[8]: True
</code></pre>

_in_ doesnt check inside key if value is dict

<pre class="prettyprint"><code>
In [9]: 'web' in person

Out[9]: False

</code></pre>

In case if dict needs to look into keys whose value is dict, you need to override &nbsp;`` __contains__ ``. At the end of the blog post I will explain, now lets create a class which is inherited from dict and try the same experiment.

<pre class="prettyprint"><code>

In [10]: class Person(dict):
&nbsp; &nbsp;....: &nbsp; &nbsp; pass
&nbsp; &nbsp;....:&nbsp;

In [11]: p = Person()

In [12]: p

Out[12]: {}

In [13]: p['name'] = 'krace'

In [14]: p

Out[14]: {'name': 'krace'}

</code></pre>

To make things simple &nbsp;`` __init__ `` and other `` magic methods `` are omitted.&nbsp;

<pre class="prettyprint"><code>
In [15]: 'name' in p

Out[15]: True

</code></pre>

## Set

<pre class="prettyprint"><code>
In [16]: conferences_attended = set(['Pycon - India', 'code retreat', 'JSFOO', 'Meta Refresh'])&nbsp;

In [17]: 'Pycon - India' in conferences_attended

Out[17]: True

</code></pre>

## Generators

<pre class="prettyprint"><code>
In [18]: 2 in xrange(3)

Out[18]: True

</code></pre>

## Nested list

<pre class="prettyprint"><code>
In [19]: nested_list = [[1, 3, 5, 7], [2, 4, 6, 8], []]

In [20]: [] in nested_list

Out[20]: True

In [21]: [2, 4, 6, 8] in nested_list

Out[21]: True

</code></pre>

In case of nested list checking whether inner list is empty is pretty handy.

## Strings

<pre class="prettyprint"><code>
In [23]: message = " Python is simple and powerful programming language "

In [24]: "Python" in message

Out[24]: True

In [25]: message.find("Python")

Out[25]: 1

In [26]: message = "Python is simple and powerful programming language "

In [27]: message.find("Python")

Out[27]: 0

</code></pre>

_in_ can be used to check the presence of a sequence or substring against string. In other languages there will be function to check for substring, but in python its very straightforward. This is one of the reason why I love Python.

__NOTE:__

&nbsp;Dont use find method in string to find the existence of a substring because it will return the position.&nbsp;

<pre class="prettyprint"><code>
if message.find(text):
&nbsp; &nbsp; &nbsp; #lets do what we need do
else:
&nbsp; &nbsp; &nbsp; #fall back
</code></pre>

  
Above snippet is wrong since text might be present in the beginning which will result 0 and if condition will fail.

## Files&nbsp;

<pre class="prettyprint"><code>
In [29]: with open('test.txt', 'w') as f:

&nbsp; &nbsp; &nbsp;f.write(" some text for checking")
&nbsp; &nbsp;....:&nbsp;

In [32]: with open('test.txt', 'r') as f:

&nbsp; &nbsp; &nbsp;for text in f:

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;print 'some' in text
&nbsp; &nbsp;....: &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;
True

</code></pre>

## overriding `` __contains__ ``

  
&nbsp;Consider a `` class Person `` &nbsp;which is inherited from _dict_ and override `` __contains__ ``

<pre class="prettyprint"><code>

In [153]: class Person(dict):
&nbsp; &nbsp;.....: &nbsp; &nbsp; def __contains__(self, item):
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; for key in self:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if key is item:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return True
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if isinstance(self[key], dict):
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; for k in self[key]:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if k is item:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return True
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; else:
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; return False
&nbsp; &nbsp;.....: &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;

In [154]: p = Person()

In [155]: p['skills'] = {'programming_languages': {'web': ['php'], 'multi_paradigm':
['python', 'ruby', 'c#']}
}

In [156]: 'skills' in p

Out[156]: True

In [157]: 'programming_languages' in p

Out[157]: True

In [158]: 'web' in p

Out[158]: False

</code></pre>

In the above example _key_ will be looked for two level dict only.&nbsp;

# Misc
- [About](https://kracekumar.com/about/index.md): About
---
title: "About"
date: 2020-06-09T22:38:19+05:30
draft: false
menu: "main"
---

I'm Kracekumar, Software Engineer based out of Dublin. Currently, I work at Stripe building Local Payment Methods in Europe.

I have deep expertise in backend technologies and proven tracken record of tech leadership.
In the past, I have spoken in various technical conferences like PyCon India, Euro Python, PyGotham, etc...

I was the organizer of [Bangalore Python User group](https://www.meetup.com/bangpypers/) and
volunteered for [PyCon India](https://in.pycon.org/) from 2012 to 2016.
In 2017, [I founded RFCs We Love Bangalore Meetup](https://github.com/rfcswelove/rfcs_we_love/commit/aab0b5055132604ac935eae78041890a222ba979)
and [run by various other volunteers](https://www.iiesoc.in/rfcswelove).
The meetup discusses on published RFCs content and discussion around it.

I'm open for new opportunities, to know more about my technical background,
you can read my [CV](/resume/Krace_Resume.pdf).

Outside of tech, I enjoy literature and organize a little meetup, [Dubliners](https://www.meetup.com/meetup-group-hzxiwxve/)
in Dublin.