Python TCP socket performance tweak on Linux

Short

sockopt TCP_NODELAY=1 increases performance big time if you’re doing lots of small packets blocks of data with socket.IPPROTO_TCP.

Long

Over at abusix I started a project using IMAP. For connecting an IMAP server in Python there is basically only imaplib and a few high level libs which wrap around imaplib.

In my first tests importing our old Email storage, I started with a very small amount of 5000 Emails appending to the IMAP INBOX.

import imaplib
import os

imap = imaplib.IMAP4('192.168.0.1', 143)
(status, msg) = imap.login('mail', 'testpassword')

if status == 'OK':
    imap.create('Archive')

    dir = '/root/mails/'
    for f in os.listdir(dir):
        fd = open('%s%s' % (dir, f), 'rb')
        mail = fd.read(-1)
        fd.close()

        imap.append('INBOX', None, None, mail)

Importing 5000 mails by calling append for every single Email resulted in a run time of 210 seconds, which is 23.8 messages/sec. This is slow. I checked IMAP server configs, checked I/O and CPU load. All fine. To validate if the program is the issue or server configuration, I wrote the exact same script in Perl using Mail::IMAPClient. Running the Perl script with the same amount of data, on the same server, resulted in a run time of 7.9 seconds. Wtf? This is like 632 messages/sec, which is good and the kind of result I was aiming for using Python. So I checked the IMAP protocol calls generated by Perl and Python, to see if Perl is maybe using multi appends or something different, but their wasn’t any difference. So I thought, since the Email parser of Python is damn slow compared to the Perl parsers out there, too this is maybe bad protocol parsing or slow regex stuff again. I profiled the Python code to see which calls are slow.

me@dev:~# python -m cProfile migrate_imap.py
         742868 function calls (742690 primitive calls) in 210.908 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        [..]
        1    0.068    0.068  210.908  210.908 migrate_imap.py:3()
    21040    0.110    0.000  207.605    0.010 imaplib.py:1007(_get_line)
        [..]
     5260    0.030    0.000  209.033    0.040 imaplib.py:1068(_simple_command)
        [..]
    21040    0.046    0.000  207.390    0.010 imaplib.py:238(readline)
        [..]
     5256    0.033    0.000  210.169    0.040 imaplib.py:304(append)
        [..]
     5260    0.028    0.000  206.798    0.039 imaplib.py:892(_command_complete)
    21040    0.182    0.000  208.124    0.010 imaplib.py:909(_get_response)
     5260    0.034    0.000  206.748    0.039 imaplib.py:985(_get_tagged_response)
        [..]
    21040    0.238    0.000  207.343    0.010 socket.py:406(readline)
        [..]
    10517  206.906    0.020  206.906    0.020 {method 'recv' of '_socket.socket' objects}

(I deleted all the jitter and only left the important stuff in)

So basically socket.recv() is the problem. Means something is taking ages until data is received. With absolutely no clue I stumbled upon http://bugs.python.org/issue3766 the guy reporting this issue had basically the same problem like me.

So I decided to try out setting TCP_NODELAY to 1.

imap.sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)

I rerun the Python script and WOW, the run time decreased to 14 seconds, not as good as Perl, but totally sufficient. So long story short, if you’are doing networking via Python sockets and sending/receiving a bigger amount of small data blocks you really should consider using TCP_NODELAY on client and server side! This can really boost your socket performance.

For further reading about TCP_NODELAY: http://www.techrepublic.com/article/tcpip-options-for-high-performance-data-transmission/1050878

Btw. I didn’t try TCP_CORK, yet.

4 thoughts on “Python TCP socket performance tweak on Linux

  1. Maybe it uses smaller socket buffer size? That can have significant effect on send latency.

  2. The Perl IMAP lib I used is using non blocking IO, which can lead to better performance. The buffer size is 4k which is pretty common. Python imaplib is restricted to blocking IO, because it is using socket.makefile() with default buffer size, which is -1 (read full buffer). I think network wise, imaplib could be better.

  3. You can also speak IMAP – either client or server – with Twisted. There’s even a nice short example of using the Twisted IMAP client on the front page of http://twistedmatrix.com/ as of a few days ago. Like the perl library in question, Twisted uses non-blocking I/O (although I wouldn’t make any predictions about its performance – give it a try and see how it does for you).

Comments are closed.