Home > Programming & scripting, Python > Bittorrent bencode decoder in python using 30 lines of code

Bittorrent bencode decoder in python using 30 lines of code

July 28th, 2007

Bittorrent uses a quite decent algorithm for serialization of it’s data for the .torrent files called bencoding. This page has a pretty good description of it.

I wanted to be able to parse this data to get information out of them for a thing I’m working on and played around a bit in python. Since it was late at night and I was rather bored I wanted to see how short I could get the decoder without writing extremely ugly hacks.
I ended up with a fully functional module that clocks in at 30 lines of code.

class Decoder(object):
    def __init__(self, data): self.data, self.ptr = data, 0
    def _cur(self): return self.data[self.ptr]
    def _get(self, x):
        self.ptr += x
        return self.data[self.ptr-x:self.ptr]
    def _get_int_until(self, c):
        num = int(self._get(self.data.index(c, self.ptr)-self.ptr))
        self._get(1) # kill extra char
        return num
    def _get_str(self): return self._get(self._get_int_until(":"))
    def _get_int(self): return self._get_int_until("e")
    def decode(self):
        i = self._get(1)
        if i == "d":
            r = {}
            while self._cur() != "e":
                key = self._get_str()
                val = self.decode()
                r[key] = val
            self._get(1)
        elif i == "l":
            r = []
            while self._cur() != "e": r.append(self.decode())
            self._get(1)
        elif i == "i": r = self._get_int()
        elif i.isdigit():
            self._get(-1) # reeeeewind
            r = self._get_str()
        return r

Download it here

Runtime example for a random torrent I downloaded to test it on:

>>> from decode import Decoder
>>> f=open("stuff.torrent")
>>> data=f.read()
>>> d=Decoder(data)
>>> d.decode()
{'comment': '', 'comment.utf-8': '', 'azureus_properties': {'dht_backup_enable': 1}, 'encoding': 'UTF-8',
    'creation date': 1185571081, 'info': {'piece length': 32768, 'name': 'SHAVER.mp3', 'private': 0,
    'pieces': '\x1ee\xb4\xa0X\xe2 ... I removed characters here since the line is too long ... \xc3v\xc2\x00\xc9',
    'length': 229374, 'name.utf-8': 'SHAVER.mp3'}, 'created by': 'Azureus/2.5.0.4',
    'announce': 'http://tpb.tracker.thepiratebay.org/announce'}

Hopefully this might be useful to someone.

And yeah, this code is not very “pythonic” but that word is rather stupid anyways so… whatever :)
If I did a C++ (or whatever) implementation it would probably look pretty much the same.

buffi Programming & scripting, Python

  1. Andrew
    August 2nd, 2007 at 21:57 | #1

    I was just looking for a python bencode implementation, but I can’t find any licensing information on your site.
    Is the code on your blog public domain? BSD? GPL? Something else? I’m under the impression that normal copyright law applies unless another license is granted, so I’d like to know before I use the code. :)

  2. August 3rd, 2007 at 09:47 | #2

    This code (bencoding) is public domain.
    There is however a bencode.py in the standard bittorrent source that is a bit easier to read. This was mostly a fun hack, but works just fine :)

  3. Discerer
    November 3rd, 2007 at 11:22 | #3

    actually the implementation in bittorrent is horrible to read, I just rewrote it to be able to read it at all :p

  1. No trackbacks yet.