The big alternatives to Python datetime all share similar goals. These goals are ease of use, simplicity and intelligent/user friendly API design. Awesome goals and I love those libraries for investing a lot of effort to achieve them. What I wanted to find out was how much impact using libraries like Arrow, Pendulum or Delorean have on the performance of the code you’re writing.
Why bother about the performance of your date-time library? Well date-times are everywhere. Think of web-services, databases, data processing and many more performance critical applications which use date-times a lot. When you invest a lot of time in metrics like requests per second and by the time you introduce an new library your performance drops by half you might want to use something different.
Everyone thinking now “Why use Python, when you want something fast?” clearly never heard of projects like PyPy and are not aware of what kind of performance you can achieve with Python nowadays.
So I setup a benchmark, which can be found here to compare Python datetime, Arrow, Pendulum, Delorean and udatetime on a performance level. I picked 4 typical performance critical operations to measure the speed of those libraries.
- Decode a date-time string
- Encode (serialize) a date-time string
- Instantiate object with current time in UTC
- Instantiate object with current time in local timezone
- Instantiate object from timestamp in UTC
- Instantiate object from timestamp in local timezone
To be fair, the main goal of the library udatetime is performance and I picked it to illustrate what’s easily possible performance wise.
The benchmark was done for Python 2.7, PyPy and Python 3.5. As usual 1
million executions per benchmark and picked the
min of 3 repeats.
As you can see, regardless of which interpreter you use the benchmark yields the same results. udatetime is the fastest library and especially fast on PyPy. Overall Python datetime is 3 times slower, Arrow is 10 times slower, Delorean is 13 times slower and Pendulum is 18 times slower than udatetime.
If we leave out the results of Python 2.7 and Python 3.5, which would compare all libraries on pure Python level, because udatetime has no C level optimization for PyPy, the results are Python datetime is 5 times slower, Arrow is 12 times slower, Delorean is 13 times slower and Pendulum is 18 times slower than udatetime.
Some people say the efforts you need to invest in performance optimization must yield at least a 10 fold better result after to be worthwhile. I think the results speak for themselves in that regard.
Something which was really alarming were the benchmark results in decoding (parsing). I expected libraries like Arrow and Pendulum to be a little slower than Python datetime because they try several different formats, but I didn’t expect these results.
If you parse date-times a lot and you’re using Arrow, Pendulum or Delorean you should seriously consider using something different for that task, if performance is important. Python datetime is 4 times slower, Arrow 16 times slower, Pendulum 19 times slower and Delorean awesome 20 times slower than udatetime. All of them use dateutil in some way underneath this explains why they show similar bad results.
Here are the detailed results for Python 2.7 and Python 3.5.
I hope I could illustrate the impact of your library choice in general. You don’t need to reinvent the wheel, but sometimes you need to think about what kind of wheels you’re using.