23 5 / 2014

Some of the python source file starts with -*- coding: utf-8 -*-. This particular line tells python interpreter all the content (byte string) is utf-8 encoded. Lets see how it affects the code.

uni1.py:

# -*- coding: utf-8 -*-
print("welcome")
print("animé")

output:

➜  code$ python2 uni1.py
   welcome
   animé

Third line had a accented character and it wasn’t explictly stated as unicode. print function passed successfully. Since first line instructed interpreter all the sequences from here on will follow utf-8, so it worked.

What if first line was missing ?

uni2.py

print("welcome")
print("animé")

output:

code$  python2 uni2.py
File "uni2.py", line 2
SyntaxError: Non-ASCII character '\xc3' in file uni2.py on line 2, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

Now python complains that Non-ASCII character is found since default encoding is ASCII. More about source encoding can be found in PEP 263

Always set encoding in first or second line of python file.

13 5 / 2014

As of writing (12, May 2014) latest version of pip is 1.5.1. pip doesn’t allow installing packages from non PyPI based url. It is possible to upload tar or zip or tar.gz file to PyPI or specify download url which points other sites(Example: pyPdf points to http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz). pip considers externally hosted packages as insecure. Agreed.

This is one of the reason why I kept using pip 1.4.1. Finally decided to fix this issue. Below is the sample error which pip throws.

(document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf
Downloading/unpacking pyPdf
Could not find any downloads that satisfy the requirement pyPdf
Some externally hosted files were ignored (use --allow-external pyPdf to allow).
Cleaning up...
No distributions at all found for pyPdf
Storing debug log for failure in /Users/kracekumar/.pip/pip.log

(document-converter)➜  document-converter git:(fix_requirements) pip install --allow-external pyPdf
You must give at least one requirement to install (see "pip help install")
(document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf --allow-external pyPdf
Downloading/unpacking pyPdf
Could not find any downloads that satisfy the requirement pyPdf
Some insecure and unverifiable files were ignored (use --allow-unverified pyPdf to allow).
Cleaning up...
No distributions at all found for pyPdf
Storing debug log for failure in /Users/kracekumar/.pip/pip.log

The above method is super confusing and counter intutive. Fix is

(document-converter)➜  document-converter git:(fix_requirements) pip install pyPdf --allow-external pyPdf --allow-unverified pyPdf
Downloading/unpacking pyPdf
pyPdf an externally hosted file and may be unreliable
pyPdf is potentially insecure and unverifiable.
Downloading pyPdf-1.13.tar.gz
Running setup.py (path:/Users/kracekumar/Envs/document-converter/build/pyPdf/setup.py) egg_info for package pyPdf

Installing collected packages: pyPdf
Running setup.py install for pyPdf

Successfully installed pyPdf
Cleaning up...

The above method is not used in production environment. In production environment it is recommended to do pip install -r requirements.txt.

# requirements.txt
--allow-external pyPdf
--allow-unverified pyPdf
pyPdf
--allow-external xhtml2pdf==0.0.5

pyPdf has two issues, two flags needs to mentioned in requirements.txt. Since xhtml2pdf requires pyPdf --allow-external flag is passed. Wish it was possible to pass both the switches in same line. If you do so pip will ignore it. Now running pip install -r requirements.txt will works like a charm(with warnings).

Since current approach is super confusing, there is a discussion. Thanks Ivoz for helping me to resolve this.

27 4 / 2014

I am big fan of bus travel. Still it is my only mode of transportation. The two reason I love it are wind and sight seeing. Whenever the wind kisses me I forget myself and start thinking about memories.

The best part of the wind (Thendral) is it kindles happiness, sad moments, memorable ones, wishes and missing. Thendral has complete effect of changing my mood and mode.

I don’t think only thendral has this effect. Trees, plants, flowers and water also produces same effect. Bus journey sows lot of peace in me.

Not to forgot this wind has full potential to make me shed tears. Nature has answer to questions and triggers thoughts. Love how small things can create big impact.

*Note*: Thendral is a Tamil word. Thendral is a type of wind which originates from south and produces soothing effect.

07 4 / 2014

Over period of time few people have asked me in meetups, online I want to learn python. Suggest me few ways to learn. Everyone who asked me had different background and different intentions. Before answering the question I try to collect more information about their interest and their previous approaches. Some learnt basics from codecademy, some attended beginners session in Bangpypers meetup. In this post I will cover general questions asked and my suggested approach.

Q: Suggest some online resources and books to learn python ?

A: I suggest three resources, How to think like computer scientist, Learn Python The Hardway, CS101 from Udacity. This is highly subjective because it depends on previous programming experience etc … I have a blog post with lot of python snippet without explanation (I know it is like sea without waves).

Q: It takes too much time to complete the book. I want to learn it soon.

A: I have been programming in python over 3 years now, still I don’t know in depth python. You may learn it in six months or in a week. It is the journey which is interesting more than destination.

Q: How long will it take to learn python ?

A: It depends what you want to learn in python. I learnt python in 3 hours while commuting to college. You should be able to grasp basic concepts in few hours. Practice will make you feel confident.

Q: I learnt basics of Python, can you give me some problems to solve and I will get back with solutions ?

A: No. I would be glad to help you if you are stuck. Learning to solve your problem is great way to learn. My first usable python program was to download Tamil songs. I still use the code :-). So find the small problem or project to work on. I would be happy to review the code and give suggestions to it.

Q: I want to contribute to open source python projects can you suggest ?

A: Don’t contribute to project because you want to, rather find a library or project which is interesting to you and see if things can be made better. Your contribution can be small like fixing spelling mistake (I have contributed with single character change). Linux kernel accepts patch which fixes spelling mistake. Every contribution has its own effect. So contribute if it adds value.

In case you are reading this blog post and want to learn python or need help I would be glad to help.

22 3 / 2014

We are writing a small utility function called is_valid_mime_type. The function takes a mime_type as an argument and checks if the mime type is one of the allowed types. Code looks like

ALLOWED_MIME_TYPE = ('application/json', 'text/plain', 'text/html')

def is_valid_mimetype(mime_type):
    """Returns True or False.

    :param mime_type string or unicode: HTTP header mime type
    """
    for item in ALLOWED_MIME_TYPE:
        if mime_type.startswith(item):
            return True
    return False

Above code can refactored into single line using any.

def is_valid_mimetype(mime_type):
    """Returns True or False.

    :param mime_type string or unicode: HTTP header mime type
    """
    return any([True for item in ALLOWED_MIME_TYPE
                              if mime_type.startswith(item)])

One liner. It is awesome, but not performant. How about using next ?

def is_valid_mimetype(mime_type):
    """Returns True or False.

    :param mime_type string or unicode: HTTP header mime type
    """
    return next((True for item in ALLOWED_MIME_TYPE 
                              if mime_type.startswith(item)), False)

(True for item in ALLOWED_MIME_TYPE if mime_type.startswith(item) is generator expression. When ALLOWED_MIME_TYPE is None or EMPTY exception will be raised. In order to avoid that False is passed as an argument to next.

Edit:

def is_valid_mimetype(mime_type):
    """Returns True or False.

    :param mime_type string or unicode: HTTP header mime type
    """
    return any(mime_type.startswith(item) for item in ALLOWED_MIME_TYPE)

Cleaner than Next.

10 3 / 2014

Normally I don’t plan weekends. I code, watch movies. This weekend (8th March) was different though. March 7th, friday evening wasn’t good. I was banging my head at work to get api working. Then came back home. Relaxed for an hour Facebook, Youtube. Then opened emacs and started to play Raja sir’s music. Stared at code, walked along the execution. Figured out the issue. Can’t ask for more. Calm and code. Slept at 3.00 AM.

Woke up at 12:00 PM and had brunch. Plugged in my external hard disk. Found bunch of new movies downloaded by my friend. The wolf of wallstreet caught my eyes. Watched and enjoyed it. When movie was about to get over, someone knocked the door. With all the unwillingness, I woke up and opened the door. Apartment security gave a parcel. He lit my face. It was a courier containing books that were ordered three days back. Courier had four books ஆயிரத்தொரு அரேபிய இரவுகள் - Ayirathoru Arabia Iravugal, கவிராஜன் கதை - Kavirajan Kadhai, திருத்தி எழுதிய தீர்ப்புகள் - Thiruthi Ezhudhiya Theerpukal, நானே எனக்கொரு போதிமரம் - Naney Enakkoru Bodhiram. Once the movie was over, as usual went to Kaikondrahalli lake.

After spending few hours outside, came back home. Started flipping the pages of Ayirathoru Arabia Iravugal. Couldn’t resist much. It was beauftifully written book (85 pages). S Ramakrishnan’s writing and speech always amazes me with simplicity. Watched half of Captain Philips movie and went to sleep.

Next day started reading Ayirathoru Arabia Iravugal. Completed the book in half an hour and started to watch the movie. Movie got over around 12.30. Next book is Kavirajan Kadhai. Read few pages and went for lunch. Came back in 30 minutes and continued. Completed the book in 3 hours. Needn’t say Vairamuthu made me shed tears at many places. Went for a evening walk around the lake. Spent an hour. Started with Thiruthi Ezhudhiya Theerpukal. This is also by Vairamuthu. Enjoyed a lot. It was written in free verse in 1979. Completed the book in few hours. I was overjoyed reading back to back books.

What can I say now, books, movies, music (while writing the blog post) gave me absolute pleasure. Something is missing. Yes code ;-(. I missed. I am figuring out how can I get best of weekend and free time with books, code, movies, writing, meetups. Seems books, movies on saturday and code, writing on sunday.

Books cooked happiness.
Movies made the hour.
Weekend went without deadend.
Finally moon is missing.

07 3 / 2014

Python has sorted function which sorts iterable in ascending or descending order.

# Sort descending
In [95]: sorted([1, 2, 3, 4], reverse=True)
Out[95]: [4, 3, 2, 1]

# Sort ascending
In [96]: sorted([1, 2, 3, 4], reverse=False)
Out[96]: [1, 2, 3, 4]

sorted(iterable, reverse=True)[:n] will yield first n largest numbers. There is an alternate way.

Python has heapq which implements heap datastructure. heapq has function nlargest and nsmallest which take arguments n number of elements, iterable like list, dict, tuple, generator and optional argument key.

In [85]: heapq.nlargest(10, [1, 2, 3, 4,])
Out[85]: [4, 3, 2, 1]

In [88]: heapq.nlargest(10, xrange(1000))
Out[88]: [999, 998, 997, 996, 995, 994, 993, 992, 991, 990]

In [89]: heapq.nlargest(10, [1000]*10)
Out[89]: [1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000, 1000]

In [99]: heapq.nsmallest(3, [-10, -10.0, 20.34, 0.34, 1])
Out[99]: [-10, -10.0, 0.34]

Let’s say marks is a list of dictionary containing students marks. Now with heapq it is possible to find highest and lowest mark in a subject.

In [113]: marks = [{'name': "Ram", 'chemistry': 23},{'name': 'Kumar', 'chemistry': 50}, {'name': 'Franklin', 'chemistry': 89}]

In [114]: heapq.nlargest(1, marks, key=lambda mark: mark['chemistry'])
Out[114]: [{'chemistry': 89, 'name': 'Franklin'}]

In [115]: heapq.nsmallest(1, marks, key=lambda mark: mark['chemistry'])
Out[115]: [{'chemistry': 23, 'name': 'Ram'}]

heapq can be used for building priority queue.

Note: IPython is used in examples where In [114]: means Input line number 114 and Out[114] means Output line number 114.

27 2 / 2014

Let’s say you want to find how many times each element is present in the list or tuple.

Normal approach

words = ['a', 'the', 'an', 'a', 'an', 'the']
d = {}
for word in words:
    if word in d:
        d[word] += 1
    else:
        d[word] = 1
print d
{'a': 2, 'the': 2, 'an': 2} 

Better approach

words = ['a', 'the', 'an', 'a', 'an', 'the']
d = {}
for word in words:
    d[word] = d.get(word, 0) + 1

print d
{'a': 2, 'the': 2, 'an': 2

Both the approach returned same values. The first one has 6 lines of logic and second has 3 lines of logic (less code less management).

Second approach uses d.get method. d.get(word, 0) return count of the word if key is present else 0. If 0 isn’t passed get will return None.

Pythonic approach:

import collections

words = ['a', 'b', 'a']

res = collections.Counter(words)

print res
Counter({'a': 2, 'b': 1})

Last approach is just one line and pythonic.

Snippet is extracted from Transforming Code into Beautiful, Idiomatic Python. Do watch and enjoy.

10 2 / 2014

Two Scoops of Django -1.5 is book by Pydanny and Audrey Roy focusing on writing clean and better Django application.

If you are using Django in production this is must read book.

Q: I am using django since 0.8 do I need this book ?

A: Yes, consider the book as starting point to validate your assumption.

Q: I just started using django, should I read this ?

A: Yes. I started to use django in production last month. Sometimes I felt I should finish this book before pushing any code further. For every two or three chapters I can clearly find mistakes and fix it.

Enjoyed pieces:

  • 100% support for breaking dependencies and settings into multiple files. I always say this to my friends, never have single settings.py. Having private repo isn’t a solution to store secret information.
  • Never use Meta.exclude in Model forms. I did this mistake when I started.
  • Covering security pointers.
  • Advocacy for Class Based Views. Class based views makes code cleaner by breaking into relevant methods. It is easy to have fat function which can stretch to 10 to 15 lines while accpeting GET and POST.
  • Advicing not store to sessions, files, logs in DB. Django session will cause pain while migrating database (MySQL -> Postgres). It is better to store sessions in redis or riak.

Conclusion:

If you are using django, read this book and see how many changes you are making to code. Worth every penny spent !

08 2 / 2014

It is very common to update single attribute of a model instance (say update first name in user profile) and save it to db.

In [18]: u = User.objects.get(id=1)

In [19]: u.first_name = u"kracekumar"

In [20]: u.save()

Very straight forward approach. How does django send the sql query to database ?

In [22]: from django.db import connection

In [22]: connection.queries
Out[22]: 
[... 
{u'sql': u'UPDATE "auth_user" SET "password" = \'pbkdf2_sha256$12000$vsHWOlo1ZhZg$DrC46wq+a2jEtEzxmUEw4vQw8oV/rxEK7zVi30QLGF4=\', "last_login" = \'2014-02-01 06:55:44.741284+00:00\', "is_superuser" = true, "username" = \'kracekumar\', "first_name" = \'kracekumar\', "last_name" = \'\', "email" = \'me@kracekumar.com\', "is_staff" = true, "is_active" = true, "date_joined" = \'2014-01-30 18:41:18.174353+00:00\' WHERE "auth_user"."id" = 1 ', u'time': u'0.001'}]

Not happy. Honestly it should be UPDATE auth_user SET first_name = 'kracekumar' WHERE id = 1. Django should ideally update modified fields.

Right way to do is

In [23]: User.objects.filter(id=u.id).update(first_name="kracekumar")
Out[23]: 1

In [24]: connection.queries
Out[24]:
[...
{u'sql': u'UPDATE "auth_user" SET "first_name" = \'kracekumar\' WHERE "auth_user"."id" = 1 ', u'time': u'0.001'}]

Yay! Though both queries took same amount of time, latter is better.

Edit: There is one more cleaner way to do it.

In [60]: u.save(update_fields=['first_name'])

In [61]: connection.queries
Out[61]: 
[...
{u'sql': u'UPDATE "auth_user" SET "first_name" = \'kracekumar\'  WHERE "auth_user"."id" = 1 ',
u'time': u'0.001'}]