Bittorrent bencode decoder in python using 30 lines of code
Bittorrent uses a quite decent algorithm for serialization of it’s data for the .torrent files called bencoding. This page has a pretty good description of it.
I wanted to be able to parse this data to get information out of them for a thing I’m working on and played around a bit in python. Since it was late at night and I was rather bored I wanted to see how short I could get the decoder without writing extremely ugly hacks.
I ended up with a fully functional module that clocks in at 30 lines of code.
class Decoder(object):
def __init__(self, data): self.data, self.ptr = data, 0
def _cur(self): return self.data[self.ptr]
def _get(self, x):
self.ptr += x
return self.data[self.ptr-x:self.ptr]
def _get_int_until(self, c):
num = int(self._get(self.data.index(c, self.ptr)-self.ptr))
self._get(1) # kill extra char
return num
def _get_str(self): return self._get(self._get_int_until(":"))
def _get_int(self): return self._get_int_until("e")
def decode(self):
i = self._get(1)
if i == "d":
r = {}
while self._cur() != "e":
key = self._get_str()
val = self.decode()
r[key] = val
self._get(1)
elif i == "l":
r = []
while self._cur() != "e": r.append(self.decode())
self._get(1)
elif i == "i": r = self._get_int()
elif i.isdigit():
self._get(-1) # reeeeewind
r = self._get_str()
return r
Runtime example for a random torrent I downloaded to test it on:
>>> from decode import Decoder
>>> f=open("stuff.torrent")
>>> data=f.read()
>>> d=Decoder(data)
>>> d.decode()
{'comment': '', 'comment.utf-8': '', 'azureus_properties': {'dht_backup_enable': 1}, 'encoding': 'UTF-8',
'creation date': 1185571081, 'info': {'piece length': 32768, 'name': 'SHAVER.mp3', 'private': 0,
'pieces': '\x1ee\xb4\xa0X\xe2 ... I removed characters here since the line is too long ... \xc3v\xc2\x00\xc9',
'length': 229374, 'name.utf-8': 'SHAVER.mp3'}, 'created by': 'Azureus/2.5.0.4',
'announce': 'http://tpb.tracker.thepiratebay.org/announce'}
Hopefully this might be useful to someone.
And yeah, this code is not very “pythonic” but that word is rather stupid anyways so… whatever ![]()
If I did a C++ (or whatever) implementation it would probably look pretty much the same.
I was just looking for a python bencode implementation, but I can’t find any licensing information on your site.
Is the code on your blog public domain? BSD? GPL? Something else? I’m under the impression that normal copyright law applies unless another license is granted, so I’d like to know before I use the code.
This code (bencoding) is public domain.
There is however a bencode.py in the standard bittorrent source that is a bit easier to read. This was mostly a fun hack, but works just fine
actually the implementation in bittorrent is horrible to read, I just rewrote it to be able to read it at all :p