Datetime hell. Time zone aware to UNIX timestamp.

Rating +1 for datetime in the category crapiest module in the Python 2.x Standard Library. The fact that datetimes in Python are time zone unaware by default, even if you parse a datetime from a string with time zone in it, sucks.

What I tried to achieve today was, I wanted to parse a ISO date like 2013-06-05T15:19:10Z and convert it to UNIX timestamp. You say simple? I thought so, too! FYI my local time zone is CEST (+0200).

First try

I skipped the first try, because I already knew this will fail, but it could be the first try for someone who does not.

from datetime import datetime

if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'
    datetime_obj = datetime.strptime(
        iso_string, '%Y-%m-%dT%H:%M:%S%Z'
    )
    print datetime_obj.strftime('%s')

Result

ValueError: time data '2013-06-05T15:19:10Z' does not match format '%Y-%m-%dT%H:%M:%S%Z'

The datetime doc says:

%Z Time zone name (empty string if the object is naive).

But Z as in Zulu is not supported. Even if it would be, parsing the time zone from a string, never ever results in a time zone aware datetime object, because:

datetime.strptime(date_string, format) is equivalent to datetime(*(time.strptime(date_string, format)[0:6])).

Noticed the [0:6]? Yeah …

Second try

from datetime import datetime

if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'
    datetime_obj = datetime.strptime(
        iso_string, '%Y-%m-%dT%H:%M:%SZ'
    )
    print datetime_obj.strftime('%s')

Result

1370438350

I thought, since datetime is time zone unaware, it maybe handles everything as UTC? So this should work. But …

>>> datetime.utcfromtimestamp(1370438350).isoformat()
'2013-06-05T13:19:10'

… it does not. 2 hours short! So a datetime object is aware of a time zone, my local system time zone, or at least something behind the scenes is.

Third try

I need to make the datetime object aware of the time zone. If the object already knows it’s in UTC, everything should work out fine. Btw the datetime module comes with a tzinfo base class for time zone classes, but not with a single implemented time zone. If you don’t like to implement them yourself use pytz.

from datetime import datetime, tzinfo, timedelta


class UTC(tzinfo):
    """UTC"""

    def utcoffset(self, dt):
        return timedelta(0)

    def tzname(self, dt):
        return "UTC"

    def dst(self, dt):
        return timedelta(0)


if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'

    datetime_obj = datetime.strptime(
        iso_string, '%Y-%m-%dT%H:%M:%SZ'
    )
    datetime_obj = datetime_obj.replace(tzinfo=UTC())

    timestamp = datetime_obj.strftime('%s')

    print timestamp
    print '-> %s' % datetime\
        .utcfromtimestamp(int(timestamp)).isoformat()

Result

1370441950
-> 2013-06-05T14:19:10

Closer, but still 1 hour missing. Where is this hour gone? I don’t know! It’s gone!

Fourth try

I figure, since the strptime() method is wack, his homey strftime() maybe is too?! There is an other way to create a UNIX timestamp from a datetime object.

from datetime import datetime, tzinfo, timedelta
from time import mktime


class UTC(tzinfo):
    """UTC"""

    def utcoffset(self, dt):
        return timedelta(0)

    def tzname(self, dt):
        return "UTC"

    def dst(self, dt):
        return timedelta(0)


if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'

    datetime_obj = datetime.strptime(
        iso_string, '%Y-%m-%dT%H:%M:%SZ'
    )
    datetime_obj = datetime_obj.replace(tzinfo=UTC())

    timestamp = mktime(datetime_obj.timetuple())

    print timestamp
    print '-> %s' % datetime\
        .utcfromtimestamp(timestamp).isoformat()

Result

1370441950.0
-> 2013-06-05T14:19:10

Nope. Tried again using utctimetuple() instead of timetuple(). But no …

1370441950.0
-> 2013-06-05T14:19:10

Fifth try

Alright then, I only got one way creating an UNIX timestamp from a datetime object left.

from datetime import datetime, tzinfo, timedelta
from time import mktime
from calendar import timegm


class UTC(tzinfo):
    """UTC"""

    def utcoffset(self, dt):
        return timedelta(0)

    def tzname(self, dt):
        return "UTC"

    def dst(self, dt):
        return timedelta(0)


if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'

    datetime_obj = datetime.strptime(
        iso_string, '%Y-%m-%dT%H:%M:%SZ'
    )
    datetime_obj = datetime_obj.replace(tzinfo=UTC())

    timestamp = timegm(datetime_obj.timetuple())

    print timestamp
    print '-> %s' % datetime\
        .utcfromtimestamp(timestamp).isoformat()

Result

1370445550
-> 2013-06-05T15:19:10

Wohoooo! I did it! I successfully converted a datetime ISO string to a fully working, down to the last second correct, UNIX timestamp. So sad … let the flaming begin.

Suggestion by Jeff Epler

import time
from calendar import timegm
from datetime import datetime


if __name__ == '__main__':
    iso_string = '2013-06-05T15:19:10Z'

    timestamp = timegm(
        time.strptime(
            iso_string.replace('Z', 'GMT'),
            '%Y-%m-%dT%H:%M:%S%Z'
        )
    )

    print timestamp
    print '-> %s' % datetime.utcfromtimestamp(timestamp).isoformat()

Result

1370445550
-> 2013-06-05T15:19:10

Thanks Jeff, this one is much better!

Python Lessons – Part 2

Python Lessons – Part 2

What we will cover in this part

  • More detail on data types
  • For Loop Statement
  • Exercise: More features for the Website monitoring tool

Solution to exercise in part 1

Before we get started, here is a possible solution to the exercise from part 1

#! python

import requests
from time import time

def get_website(url):
    return requests.get(url)

if __name__ == '__main__':
    start_time = time()
    response = get_website('http://aboutsimon.com/')
    end_time = time()
    elapsed_time = end_time - start_time

    if response.status_code == 200:
        print 'HTTP Status Code: OK'
    else:
        print 'Error: Unexpected HTTP Status Code %d' % (response.status_code)

    if elapsed_time < 0.5:
        print 'GET request took: %f seconds' % (elapsed_time)
    else:
        print 'Error: Request took longer than 0.5 seconds.'

    if len(response.content) > 0:
        print 'Page size: %d byte' % (len(response.content))
    else:
        print 'Error: Page size too small.'

More detail on data types

Python has a couple of built-in data types. Each built-in data type can be constructed by dedicated expressions like a = 1, which will construct an integer object or b = "mystring", which will construct an string object. Built-in data types can also be constructed by a constructor function. Constructor functions are basically used to convert/cast data types.

Numeric Types

There are four distinct numeric types. Integer, long integer, floating point number and complex number. Each of them differ in what values they can represent. The maximum and minimum value depends on the OS architecture you are running on.

All numeric types can be used with the following operators. (There are more if you’re interested look into Built-in Types.

Operators

  • x + y: Add y and x
  • x – y: Subtract y from x
  • x * y: Multiply x and y
  • x / y: Divide x and y
  • x // y: Divide x and y, floor (round down) the qoutient
  • x % y: Modulo, find the remainder of the division x / y
  • x ** y: x to the power y

Augmented Assignment (Operator Shortcut)
For all operators mentioned above exist a shortcut to execute an operator combined with an assignment to a variable. For example if you like to increase a counter by one for each step you may write counter = counter + 1, but you could also write counter += 1, which is exactly the same. Unlike many other languages, Python does not implement ++ nor --, for increasing or decreasing a variable by one.

Integer

  • Type: int
  • Constructor function: int(x, base=10)
  • Syntax: a = 1
a = 1
a += 2  # Result: 3
a = a / 2 # Result: 1
a = int('1') # Cast/Convert a string object to an integer object, with base 10 (default base)
a = int('0xff', 16)  # Cast/Convert a string object holding a hex number, with base 16 (base hexadecimal)
a = int(0xff)  # Cast/Convert a hex value to int. We don't need a base, because the hex number already is an integer, only in hex notation.

Line 3 is special, because dividing 3 / 2 does not result in 1.5, but in 1. This is because we divide two integers, which will result in an integer and an integer can’t represent a number like 1.5.

Long Integer

  • Type: long
  • Constructor function: long(x, base=10)
  • Syntax: a = 123718923789173891723891723897

The main difference between int and long is, long can represent very large numbers (unlimited precision) and int, is limited to a max and min value. If max of an integer is reached, it will be automatically converted to a long. Don’t worry about integer overflow.

Floating Point Number

  • Type: float
  • Constructor function: float(x)
  • Syntax: a = 1.0123
a = 1.0
a += 2.0  # Result: 3.0
a = a / 2 # Result: 1.5
a = float('1.0') # Cast/Convert a string object to an float object
a = int(1)  # Cast/Convert a int to float. Result: 1.0

Complex Number

  • Type: complex
  • Constructor function: complex([real[, imag]])
  • Syntax: a = 4j

Sequence Types

A sequence type is an ordered list of objects. There are 7 sequence types in Python string, unicode, list, tuple, bytearray, buffer, and xrange. I will cover string, unicode, list and tuple for now. A sequence type can be iterated, meaning you can access each object inside a sequence via loops. Objects inside a sequence can be accessed directly with an index or a slice, a range of indices. The index of a sequence always starts at 0. So, the first item of a sequence is index 0. The index of a sequence can be negative. Negative indices will get objects from the end of a sequence. For example s[-1] will get the last item of a sequence. Index ranges are always exclusive of the range end. The range 0 to 5, will get the items 0, 1, 2, 3 and 4.

Sequence Operations

  • s[i]: Get an object in a sequence by index n
  • s[x:y]: Get all objects in a sequence by a range index x to y
  • s[x:y:k]: Get all objects in a sequence by a range index x to y, with k steps
  • x in s: Check if object x is in sequence s
  • x not in s: Check if object x is not in sequence s
  • len(s): Get the length of a sequence
  • min(s): Get the smallest object of a sequence
  • max(s): Get the largest object of a sequence
  • s.index(n): Get the index of the first occurence of object n in sequence s
  • s.count(n): Get the count of occurences of object n in sequence s
  • s1 + s2: Concatenate sequence s1 and sequence s2 to build one sequence out of s1 and s2

String

  • Type: str
  • Constructor function: str(object=”)
  • Syntax: a = 'my string value' or a = "my string value"

A str object is by default an ASCII encoded string. Each byte represents one character. Because a string is a sequence type, it can be accessed like a list. You can iterate over a string object, to get each byte (with ASCII each character too), or get a sub string by accessing the string with an index range. The max() and min() functions will return the largest or smallest character measured by its numeric value. See ASCII Table.

s = "I like movies."
print s[0]  # Result: "I"
print s[-7:-1]  # Result: "movies"
print len(s)  # Result: 14
print max(s)  # Result: "v" (Because the character "v" inside the string has the largest numeric value.) 
print s[0:-1] + " very much."  # Result: "I like movies very much."

# This will print each character of the string object
for character in s:
    print character

print s.count('i')  # Result: 2

print s[::2]  # Print every second character of the string object

The last line is strange. There are no values, which specify the start and end of the index range?!
Omitting the values of a index ranges means, start at the beginning and stop at the end of the sequence. s[:2] is equal to s[0:2], from beginning of the sequence to index 2. s[5:] is equal to s[5:len(s)], from index 5 to the end of the sequence. s[:] is equal to s[0:len(s)], from the beginning of the sequence to the end of the sequence. By the way, there is no difference in creating a string instance with ‘single quotes’ or “double quotes”, in some languages like Perl or Go there is a difference, in Python is none.

Unicode

  • Type: unicode
  • Constructor function: unicode(object[, encoding[, errors]])
  • Syntax: a = u'my 串' or a = u"my 串"

str objects and unicode objects are pretty much the same, except that str is a byte representation of a string. If you construct a str with a multi byte unicode letter for example the Chines sign “串” you will end up with a sequence of length 3, because the sign “串” uses 3 bytes. If you print something like print '串'[0] it does not print “串”, but the first byte 0xE4. A unicode object is aware of string codec and printing u'串'[0] will result in “串”.

a = u'串'
print len(a)  # Result: 1

List

  • Type: list
  • Constructor function: list([iterable])
  • Syntax: a = [1, 2, "hi"]

A list is a container which can hold a ordered list of all kind of objects. You can add or remove items to or from a list. A list can be extended by an other list or sorted by its items.

l = [1, 2, 3]  # New list
print len(l)  # Result: 3

l.append("hi!")  # add an item to the list
print len(l)  # Result: 4

value = l.pop()  # Remove and return the last item from the list
print value  # Result: "hi!"
print len(l)  # Result: 3

l.extend([4, 5, 6])  # Extend the list by an other list
print len(l)  # Result: 6

l.reverse()  # Reverse the list
print l  # Result: [6, 5, 4, 3, 2, 1]

l.sort()  # Sort the list
print l  # Result: [1, 2, 3, 4, 5, 6]

print l[:4]  # Result: [1, 2, 3, 4]

You can find more on lists and its usage beside a plain data storage in the Data Structures Tutorial on python.org.

Tuple

  • Type: tuple
  • Constructor function: tuple([iterable])
  • Syntax: a = (1, 2, "hi")

A tuple object is a kind of lightweight list, except you can’t modify a tuple. You can’t add or remove items, nor change the order of a tuple. See it like a static list. More on tuples an other time.

Mapping Types

The last new data type for today is the one and only mapping type in Python the dictionary dict.

dict

  • Type: dict
  • Constructor function: dict(**kwarg) or dict(mapping, **kwarg) or dict(iterable, **kwarg)
  • Syntax: a = {"numbers": [1,2,3,4], "strings": ["hello", "you"]}

The dict is a really powerful data type. Inside a dict you can map all kind of objects to other objects. It has great use for a wide range of purposes. A dict entry consists of a key:value mapping. The entries of a dict are not ordered, meaning if you initialize a dict in a specific order, it is not guaranteed it will stay in this order.

mydict = {}  # New empty dict
mydict['name'] = 'Simon'  # A new dict entry, mapping an string object to a string object
mydict['age'] = 28  # Mapping a string object to an int object
mydict['languages'] = ['Python', 'Perl', 'Go', 'JavaScript']  # Mapping a string object to a list object

print mydict.keys()  # Get all key objects from the dict an print them. Result: ['languages', 'age', 'name']

print mydict['name']  # Get the value of the field mapped to 'name' and print it. Result: 'Simon'

mydict['firstname'] = 'Simon'
del mydict['name']  # Delete the 'name' mapping from the dict.

print len(mydict)  # Return the number of items in the dict. Result: 3

# check if the key 'name' is in mydict. Result: False
print 'name' in mydict

print mydict.values()  # Get all values from mydict and print them. Result: [['Python', 'Perl', 'Go', 'JavaScript'], 28, 'Simon']

mydict.clear()  # Delete all items from mydict
print len(mydict)  # Result: 0

mydict['Simon'] = {'age': 28, 'languages': ['Python', 'Perl', 'Go', 'JavaScript']}
print ['Simon']['age']  # get the value mapped to 'Simon', get the value mapped to 'age' inside the 'Simon' value and print it

You may noticed the del statement. With del you can delete stuff. Delete items from a list, delete items from a dict, even delete entire variables. You can find more on dicts here.

For Loop

This was boring, but necessary, sorry. Now the last new thing for today, the for loop. With for you can iterate sequences. The for keyword is followed by a variable name, where the current object for the iteration is assigned to, followed by the in keyword, followed by the sequence.

# iterate the list, each item will be assigned to item
for item in [1, 2, 3]:
    print item  # print the current value of item

As long as an object provides a sequence to iterate you can use the for loop. In the next example we will iterate a dict.

mydict = {}
mydict['Simon'] = {'age': 28, 'languages': ['Python', 'Perl', 'Go', 'JavaScript']}

# iterate the keys list, each item will be assigned to item
for dict_key in mydict.keys():
    print dict_key  # Result: 'Simon'
    print mydict[dict_key]  # Result: {'age': 28, 'languages': ['Python', 'Perl', 'Go', 'JavaScript']}

You can nest more loops, to iterate the hole dict, with all keys/subkeys and values/subvalues.

mydict = {}
mydict['Simon'] = {'age': 28, 'languages': ['Python', 'Perl', 'Go', 'JavaScript']}

# iterate the keys list, each item will be assigned to item
for dict_key in mydict.keys():
    print dict_key

    for child_dict_key in mydict[dict_key].keys():
        print child_dict_key
        print mydict[dict_key][child_dict_key]

Exercise

Now, extent your Website monitoring tool. Instead of just checking one Website, we will check more and instead of checking static return codes and response time, we will make it configurable for each Website. Create a dict. Add an entry for each Website you like to check. The key of the dict should be a name for the Website. Mapped to this name is an other dict with key:value pairs named ‘url’, ‘status_code’ and ‘response_time’. Iterate over the dict, GET each URL and check if the response.status_code is equal to the configured status_code and the elapsed_time less or equal to the configured response_time. Modify your print statements, to print also the name of the Website, which is currently checked. The dict is a global variable, so it’s defined before the if __name__ == '__main__':.

Little help:

websites = {
    "Simon's Blog": {
        'url': 'http://aboutsimon.com/',
        'status_code': 200,
        'response_time': 0.5,
    }
}

Add as much Websites to the dict, as you like.

Python Lessons – Intro & Part 1

Python Lessons

Intro

A friend of mine, who is studying Computer Science, asked me to help him learning a Programming Language. Students with no background in programming often learn Java as first language, which I find very hard to learn, with nearly no experience and knowledge of the concepts used in Java. My understanding of developing computer programs is, one need to learn the basic concepts of developing and with mastering one language, nearly every language can be understood and mastered. I decided to teach him Python, because Python is a widely used and grown Scripting Language, which covers object orientation, mostly clean syntax and readability. Readability is good for learning a language by looking at programs from others.

In several parts I will slowly cover the basics of Python. I don’t like guides, with no real world examples, implementing Fibonacci for the millions time. In the first parts we will build a small tool for monitoring/checking a Website. Later we will program something more complex, like a Web Crawler.

If anyone, from an English speaking country, like to lecture my text, feel free to send me corrections.

Requirements

  • Basic knowledge of programming, I will not describe what a variable is for
  • Linux, preferable based on Debian, like Ubuntu
  • Working Python installation, >= Python 2.7, < Python 3.0

Developer Environment

To write Python programs I use a combination of tools, which make life more easy.

Editor

I use Sublime Text 2. Sublime Text is fast and light. There are tons of great plugins, written in Python. It can be evaluated for free, without registration, as long as you need to. The editor is Cross-platform, available for Linux, Mac OS X and Windows. I recommend using the plugins Package Control, which is for installing plugins easily and SublimeLint, installable via Package Control. SublimeLint checks your Python code (and more) for syntax and style guides. My basic settings for Sublime Text (Preferences -> Settings User) are the following.

{
    "detect_indentation": false,
    "draw_white_space": "all",
    "find_selected_text": true,
    "fold_buttons": false,
    "font_options":
    [
        "subpixel_antialias"
    ],
    "font_size": 14,
    "highlight_line": true,
    "ignored_packages":
    [
        "Vintage"
    ],
    "rulers":
    [
        72,
        79
    ],
    "tab_size": 4,
    "translate_tabs_to_spaces": true,
    "trim_trailing_white_space_on_save": true,
    "use_tab_stops": true
}

VirtualEnv

VirtualENV is a fantastic Python tool to isolate dedicated environments from the system. It creates an environment that has its own installation directories, that doesn’t share libraries with other VirtualENV environments and by default even not with the Python system installation. Inside a VirtualENV you can do no harm. Install as much Packages as you like, screw around, if something broke, delete the VirtualENV and create a fresh one.

Installation

cd ~/Downloads/
wget -O pypa-virtualenv.tar.gz https://github.com/pypa/virtualenv/tarball/master
tar xf pypa-virtualenv.tar.gz
cd pypa-virtualenv-*/
sudo python setup.py install

Create your first VirtualENV and activate it

virtualenv myenv
cd myenv
source bin/activate

Now you are inside a VirtualENV. Leave the VirtualENV with the deactivate command. In a VirtualENV one has access to several programs. Of course, the Python interpreter python. Also pip. PIP is a Python Package Manager, which can manage packages from the Python Package Index, PyPI. PIP will be used later.

Hello World

Our first program will be the famous “Hello World” program.

#! python

if __name__ == '__main__':
    print 'Hello World!'

Create a VirtualENV, active it, open your favorite editor, paste the code from above and execute the created Python file with the Python interpreter, python hello_world.py. The string “Hello World!” will be printed.

  • Line 1: The Shebang, explained here. Must have in each Python script, which will be called with python! Good style and very useful with VirtualENVs.
  • Line 3: I will explain this strange if statement later. For now just accept it, that it is good style to start your main program logic inside this if statement.
  • Line 4: The print statement. print evaluates the given expression (in this case a string) and writes the resulting object to STDOUT (Console).

Very simple, now a little more complex.

#! python

if __name__ == '__main__':
    print '%s %s!' % ('Hello', 'World')

This will also print “Hello World!”, but now we are using string formatting. String formatting is extremely useful. Let me show why.

#! python

if __name__ == '__main__':
    hello = 'Hello'
    world = 'World'
    print '%s %s!' % (hello, world)

In this example the string “Hello” is assigned to the variable hello and “World” assigned to world. String formatting basically means to replace the objects inside the parentheses with the placeholders “%s” in the string. The placeholder (for example “%s”) is called conversion type. A conversion type defines to which data type the passed object will be converted.

myvar = 1
'Foo %s' % (myvar)

myvar assigned to an integer will be converted to a string.

myvar = 1
'Foo %d' % (myvar)

myvar assigned to an integer will be converted to an integer.

myvar = 'string'
'Foo %d' % (myvar)

myvar assigned to a string will be converted to an integer? No! This will cause a TypeError, because Python expects a number.

myvar = 1.1
'Foo %d' % (myvar)

myvar assigned to a float will be converted to an integer? Yes! But this will not print “Foo 1.1″, but “Foo 1″ because “%d” is an integer conversion type.

String formatting will not only be used if it comes to printing text. Each time you need to build a string from other objects, or concatenate two string objects, string formatting is useful and good style.

mynumber = 1
myword = 'Hi!'
mystring = str(mynumber) + ' ' + myword
print mystring

Or

mynumber = 1
myword = 'Hi!'
mystring = '%s %s' % (mynumber, myword)
print mystring

Both examples do exactly the same, but the second one is much more readable. And that’s what Python is all about, readability.

mynumber = 1
myword = 'Hi!'
mystring = '%d %s' % (mynumber, myword)
print mystring

Using “%d” instead of “%s” is even better because mynumber is an integer and everything else than integer is unexpected, which we will tell the formatted string by using “%d” instead of “%s”.

Slightly more complex Hello World coming up.

#! python

def hello():
    return 'Hello'

def world():
    return 'World'

if __name__ == '__main__':
    print '%s %s!' % (hello(), world())

This time we used functions to build the “Hello World!” string. The function hello() returns a ‘Hello’ string object and the function world returns a ‘World’ string object. Again we are using string formatting, to format the returned values by both functions into one string. Now, the last “Hello World!”.

#! python

def words_to_string(first_word, second_word):
    return '%s %s' % (first_word, second_word)

if __name__ == '__main__':
    print words_to_string('Hello', 'World!')

Functions are introduced by the def keyword. Inside the parentheses you can define the function arguments, or leave them empty, if no arguments should be passed to the function. Isn’t it called “Method”? No, outside classes functions are called functions. Inside classes functions are called methods. The words_to_string function needs two arguments, which will be used to build a new string. The new string will be returned by the function.

Indentation

Python does not use curly braces “{}” to define a block of code. Blocks are defined by indentation. Each time you begin a new block of code after a compound statement like if, for, while, def, class, …, you need to indent the block by one level.

def words_to_string(first_word, second_word):
    return '%s %s' % (first_word, second_word)

or

if __name__ == '__main__':
    print 'Hi!'

Common style defined by Python Enhancement Proposal 8 (PEP 8) is to use 4 spaces for each indentation level. Don’t mix tabs and spaces. Unexpected indents will raise an IndentationError.

if __name__ == '__main__':
    print 'Hi!'
        print 'Ho!'
IndentationError: unexpected indent

if statement

if is an control flow statement, meaning you can influence the flow/order of your program. The if statement is introduced by the if keyword and an expression, which evaluates to true or false. If the expression result is true, the if block will be executed, if the result is false, the block will not be executed. To compare objects, the comparison operators can be used.

if 1 + 1 == 2:
    print '1 + 1 = 2'

or

mybool = True
if mybool:
    print 'mybool is True'

or

mybool = False
if not mybool:
    print 'mybool is False'

True and False

There are more objects which evaluate to true or false. Here are some examples.

Data Types True

  • mystring = 'foo': A not empty string is true
  • mybool = True: A bool with value True is of course true
  • mynumber = 1: A positive or negative number, which is not 0, is true
  • mydict = {'key': 'value'}: A not empty dict is true
  • mylist = [1,2,3]: A not empty list is true

Data Types False

  • mystring = '': An empty string is false
  • mybool = False: A bool with value False is of course false
  • mynumber = 0: 0 is false
  • mydict = {}: An empty dict is false
  • mylist = []: An empty list is false

Comparison Operators

Relational Comparison Operators

  • > : (a > b) a greater than b
  • < : (a < b) a less than b
  • >= : (a <= b) a greater than or equal to b
  • <= : (a <= b) a less than or equal to b
  • == : (a == b) a equal to b
  • != : (a != b) a not equal to b

Identity Comparison Operators

  • is : (a is b) a and b are the same object
  • is not : (a is not b) a and b are not the same object

Boolean Operations

  • and : (a > b) and (a < 1)
  • or : (a > b) or (a == b)
  • not : not a

The strange if statement

As you may noticed the functions are defined outside the if statement.
Qoute from a good Stackoverflow answer – What does if __name__ == ‘__main__’ do

When the Python interpreter reads a source file, it executes all of the code found in it. Before executing the code, it will define a few special variables. For example, if the python interpreter is running that module (the source file) as the main program, it sets the special __name__ variable to have a value “__main__”. If this file is being imported from another module, __name__ will be set to a different value.

[…]

One of the reasons for doing this is that sometimes you write a module (a .py file) where it can be executed directly. Alternatively, it can also be imported and used in another module. By doing the main check, you can have that code only execute when you want to run the module as a program and not have it execute when someone just wants to import your module and call your functions themselves.

The if statement is some kind of protection, which keeps Python from executing code, which should only be executed if the .py File is executed directly.

Import Python Modules

Python libraries are organized in modules. A module is a Python .py file. The name of the module is defined by the name of the .py file. A module “mymodule” would be a .py file called “mymodule.py”. Module names can also be represented by a folder. In this case a folder name is the module name and inside this folder is a .py file called “__init__.py”, which represent the .py file from which content can be imported. To load content from a module the import statement is used. The contents which can be imported from a module are variables, functions and classes. There are modules already shipped with your Python installation. These modules can be found in the Python Standard Library. User generated modules are shipped via Python Packages. Lots of useful packages can be found on the Python Package Index.

There are two ways to import contents from a module. You can import the namespace of a module, meaning you can access all exported contents of a module under the name of the namespace. The second way is, to import contents from a module directly into the namespace of your Python script.

#! python

import time

if __name__ == '__main__':
    print time.time()

In this case the time module will be imported. All exported contents of time are accessible under the namespace “time”. Here we access the time function from time and print the current time.

#! python

from time import time

if __name__ == '__main__':
    print time()

Here we use the import syntax to directly import a function from the time module into the namespace of the Python script. You can import everything from a module into the script namespace by importing “*”, from time import *, but this is bad style. Don’t do it. If you need to import everything from a module, import the module name like import time and access the contents of the module via the namespace time. Importing “*” can easily cause name conflicts, which are confusing to debug sometimes.

What you learned by now

  • Print an object
  • Format strings
  • Assign variables
  • Define functions
  • How to indent
  • if statement
  • Comparison Operators
  • Boolean Operations
  • if __name__ == ‘__main__’
  • Import and use modules

Exercise

Instead of just listing all language features of Python we will stop introducing new syntax and start using the learned knowledge for a first program. We will implement a script, which monitors a website.

Requirements

For monitoring a website, we need a HTTP client library. Python has two HTTP clients in its standard library, but they are complicated to use and badly designed. Instead we will use Requests. Requests wraps around the HTTP clients in the Python standard library by providing a really user friendly API. The script should also check for the request time. So we need a module, which provides an interface to the current time. We use the time module for that.

Install requests

To install Requests, we use the pip command. Activate your VirtualENV and enter the following command.

pip install requests

A clean install looks like this:

Downloading/unpacking requests
  Downloading requests-1.1.0.tar.gz (337kB): 337kB downloaded
  Running setup.py egg_info for package requests
    
Installing collected packages: requests
  Running setup.py install for requests
    
Successfully installed requests
Cleaning up...

Introduction

HTTP GET with Requests

This is how you execute an HTTP GET request with Requests. Requests will return a Response object.

#! python

import requests

if __name__ == '__main__':
    response = requests.get('http://aboutsimon.com/')

Time a GET request

The Python time module provides a time function, which will return the time in seconds since the epoch (UNIX timestamp) as a floating point number. To time something we need a start time and an end time. Between start and end, we will execute the GET request. Subtract start from end and you will get the elapsed time for the GET request.

#! python

import requests
from time import time

if __name__ == '__main__':
    start_time = time()
    response = requests.get('http://aboutsimon.com/')
    end_time = time()
    elapsed_time = end_time - start_time

    print 'GET request took: %f seconds' % (elapsed_time)

The “%f” in the string stands for float, because elapsed_time is of type float and we like to see the precision of float.

Access attributes from response

The Response object from the GET requests provides a couple of attributes. With response.status_code we can access the HTTP Status Code of the HTTP Server Response as integer. If everything is fine, the Status Code should be 200 for “HTTP OK”. response.content will return the page content we requested as str object, which is a ASCII string. The content itself isn’t that interesting for now, but the size of the content is. To get the size of a string we use the built-in function len. Built-in functions must not be imported, they are just there, like keywords (if, def, import etc).

#! python

import requests
from time import time

if __name__ == '__main__':
    start_time = time()
    response = requests.get('http://aboutsimon.com/')
    end_time = time()
    elapsed_time = end_time - start_time

    print 'GET request took: %f seconds' % (elapsed_time)
    print 'HTTP Status Code: %d' % (response.status_code)
    print 'Page size: %d byte' % (len(response.content))

Your turn

Your exercise is, implement a Python script, which checks a Website.

  • Define a function get_website, with a URL as argument, which executes the GET request and returns the reponse
  • Check if the reponse Status Code is 200, if yes print “HTTP Status Code: OK”, if not print an error message
  • Check if the request took less than 0.5 seconds, if yes print the request time, if not print an error message
  • Check if the response content size is greater than 0 bytes, if yes print the size, if not print an error message
  • Read PEP 8 and check if your script is compliant to PEP 8.

In Part 2 of Python Lessons, I will post a solution, to this exercise.

Ubuntu Alert Alias – Command Desktop Notification

While I was fiddling with my .bashrc today, I noticed an BASH Alias called alert. You ever had a job like tar or rsync running, doing other stuff beside it and missed that the job already finished? With alert not anymore. alert is using notify-send to send an Desktop Notification with the last executed command. It can even tell you, if the command was successful or not by looking at the exit code. I found it perfect for notifying a long running job finished.

curl -o /dev/null http://aboutsimon.com/feed/; alert

Ubuntu Alert Alias Notification Usage

Ubuntu Spotify Keyboard Media Button Support

Ever wondered how to get the Media Buttons on your Keyboard working with the Spotify Linux client? There is an easy solution, thanks to Spotify Gnome.

Installation is easy.

# cd /usr/local/sbin/
# sudo wget https://raw.github.com/jreese/spotify-gnome/master/bin/spotify
# sudo chmod 755 spotify

After restarting Spotify, your Media Buttons should be working. Have fun.

Accessing Docstring from decorated functions

Accessing a Docstring in Python is pretty simple. Just call __doc__ on any Python object and it will return the Docstring of the object.

>>> class Foo(object):
...   '''Test Docstring'''
... 
>>> a = Foo()
>>> a.__doc__
'Test Docstring'
>>> Foo.__doc__
'Test Docstring'
>>> def foo():
...     '''Test Docstring'''
... 
>>> foo.__doc__
'Test Docstring'

But what happens if a defined function/method is decorated?

>>> def decorator(cb):
...     def _decorator(*args, **kwargs):
...         return cb(*args, **kwargs)
...     return _decorator
... 
>>> @decorator
... def foo():
...     '''Test Docstring'''
... 
>>> foo.__doc__
>>> repr(foo.__doc__)
'None'

The Docstring of the decorator is returned instead, in this case there is no Docstring, so it’s None. An easy solution is called functools.wraps.

This is a convenience function for invoking partial(update_wrapper, wrapped=wrapped, assigned=assigned, updated=updated) as a function decorator when defining a wrapper function.

wraps preserves some attributes of the original called function/method for example name and Docstring. Just add wraps as decorator to the wrapper function inside your own decorator and calling __doc__ will return the original Docstring again.

>>> from functools import wraps
>>> 
>>> def decorator(cb):
...     @wraps(cb)
...     def _decorator(*args, **kwargs):
...         return cb(*args, **kwargs)
...     return _decorator
... 
>>> @decorator
... def foo():
...     '''Test Docstring'''
... 
>>> foo.__doc__
'Test Docstring'

For me, using wraps, is best practice, when it comes to writing decorators.

Python TCP socket performance tweak on Linux

Short

sockopt TCP_NODELAY=1 increases performance big time if you’re doing lots of small packets blocks of data with socket.IPPROTO_TCP.

Long

Over at abusix I started a project using IMAP. For connecting an IMAP server in Python there is basically only imaplib and a few high level libs which wrap around imaplib.

In my first tests importing our old Email storage, I started with a very small amount of 5000 Emails appending to the IMAP INBOX.

import imaplib
import os

imap = imaplib.IMAP4('192.168.0.1', 143)
(status, msg) = imap.login('mail', 'testpassword')

if status == 'OK':
    imap.create('Archive')

    dir = '/root/mails/'
    for f in os.listdir(dir):
        fd = open('%s%s' % (dir, f), 'rb')
        mail = fd.read(-1)
        fd.close()

        imap.append('INBOX', None, None, mail)

Importing 5000 mails by calling append for every single Email resulted in a run time of 210 seconds, which is 23.8 messages/sec. This is slow. I checked IMAP server configs, checked I/O and CPU load. All fine. To validate if the program is the issue or server configuration, I wrote the exact same script in Perl using Mail::IMAPClient. Running the Perl script with the same amount of data, on the same server, resulted in a run time of 7.9 seconds. Wtf? This is like 632 messages/sec, which is good and the kind of result I was aiming for using Python. So I checked the IMAP protocol calls generated by Perl and Python, to see if Perl is maybe using multi appends or something different, but their wasn’t any difference. So I thought, since the Email parser of Python is damn slow compared to the Perl parsers out there, too this is maybe bad protocol parsing or slow regex stuff again. I profiled the Python code to see which calls are slow.

me@dev:~# python -m cProfile migrate_imap.py
         742868 function calls (742690 primitive calls) in 210.908 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        [..]
        1    0.068    0.068  210.908  210.908 migrate_imap.py:3()
    21040    0.110    0.000  207.605    0.010 imaplib.py:1007(_get_line)
        [..]
     5260    0.030    0.000  209.033    0.040 imaplib.py:1068(_simple_command)
        [..]
    21040    0.046    0.000  207.390    0.010 imaplib.py:238(readline)
        [..]
     5256    0.033    0.000  210.169    0.040 imaplib.py:304(append)
        [..]
     5260    0.028    0.000  206.798    0.039 imaplib.py:892(_command_complete)
    21040    0.182    0.000  208.124    0.010 imaplib.py:909(_get_response)
     5260    0.034    0.000  206.748    0.039 imaplib.py:985(_get_tagged_response)
        [..]
    21040    0.238    0.000  207.343    0.010 socket.py:406(readline)
        [..]
    10517  206.906    0.020  206.906    0.020 {method 'recv' of '_socket.socket' objects}

(I deleted all the jitter and only left the important stuff in)

So basically socket.recv() is the problem. Means something is taking ages until data is received. With absolutely no clue I stumbled upon http://bugs.python.org/issue3766 the guy reporting this issue had basically the same problem like me.

So I decided to try out setting TCP_NODELAY to 1.

imap.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

I rerun the Python script and WOW, the run time decreased to 14 seconds, not as good as Perl, but totally sufficient. So long story short, if you’are doing networking via Python sockets and sending/receiving a bigger amount of small data blocks you really should consider using TCP_NODELAY on client and server side! This can really boost your socket performance.

For further reading about TCP_NODELAY: http://www.techrepublic.com/article/tcpip-options-for-high-performance-data-transmission/1050878

Btw. I didn’t try TCP_CORK, yet.

Build DEB packages from PyPI with FPM

You may already know FPM?! If not it’s a Ruby project for building packages like DEB or RPM. Who ever build a package manually knows, it’s pain. With FPM it’s not. Here is a quick example howto build a DEB package out of a Python package from PyPI.

Install dependencies

aptitude install python-setuptools python-dev build-essential dpkg-dev libopenssl-ruby ruby1.8-dev rubygems

Install FPM

gem install fpm
ln -s /var/lib/gems/1.8/bin/fpm /usr/local/bin/

Build a package from PyPI

fpm -s python -t deb -m "Your Name" msgpack-python

You can use your own PyPI mirror by passing --python-pypi http://pypi.yourdomain.local/simple.

Done

# ls -l python-msgpack-python_0.1.12_amd64.deb
-rw-r--r-- 1 root src 74644 Apr 18 16:52 python-msgpack-python_0.1.12_amd64.deb

Awesome!

Python, get the reference

Ever wanted to debug code and get the reference count of an object you created?

>>> import sys
>>> a = object()
>>> sys.getrefcount(a)
2

For all, who don’t know what a reference count is:

For all, who wonder, why creating an object and assigning it to variable “a” create 2 references:

  • 1. reference: getrefcount argument
  • 2. reference: a

For all who know what a reference count is and like to look into Python GC or play around with it: http://docs.python.org/library/gc.html

Struggling with a circular reference? Check out the weakref module!