Bittorrent bencode decoder in python using 30 lines of code
Bittorrent uses a quite decent algorithm for serialization of it’s data for the .torrent files called bencoding. This page has a pretty good description of it.
I wanted to be able to parse this data to get information out of them for a thing I’m working on and played around a bit in python. Since it was late at night and I was rather bored I wanted to see how short I could get the decoder without writing extremely ugly hacks.
I ended up with a fully functional module that clocks in at 30 lines of code.
class Decoder(object):
def __init__(self, data): self.data, self.ptr = data, 0
def _cur(self): return self.data[self.ptr]
def _get(self, x):
self.ptr += x
return self.data[self.ptr-x:self.ptr]
def _get_int_until(self, c):
num = int(self._get(self.data.index(c, self.ptr)-self.ptr))
self._get(1) # kill extra char
return num
def _get_str(self): return self._get(self._get_int_until(":"))
def _get_int(self): return self._get_int_until("e")
def decode(self):
i = self._get(1)
if i == "d":
r = {}
while self._cur() != "e":
key = self._get_str()
val = self.decode()
r[key] = val
self._get(1)
elif i == "l":
r = []
while self._cur() != "e": r.append(self.decode())
self._get(1)
elif i == "i": r = self._get_int()
elif i.isdigit():
self._get(-1) # reeeeewind
r = self._get_str()
return r
Runtime example for a random torrent I downloaded to test it on:
>>> from decode import Decoder
>>> f=open("stuff.torrent")
>>> data=f.read()
>>> d=Decoder(data)
>>> d.decode()
{'comment': '', 'comment.utf-8': '', 'azureus_properties': {'dht_backup_enable': 1}, 'encoding': 'UTF-8',
'creation date': 1185571081, 'info': {'piece length': 32768, 'name': 'SHAVER.mp3', 'private': 0,
'pieces': '\x1ee\xb4\xa0X\xe2 ... I removed characters here since the line is too long ... \xc3v\xc2\x00\xc9',
'length': 229374, 'name.utf-8': 'SHAVER.mp3'}, 'created by': 'Azureus/2.5.0.4',
'announce': 'http://tpb.tracker.thepiratebay.org/announce'}
Hopefully this might be useful to someone.
And yeah, this code is not very “pythonic” but that word is rather stupid anyways so… whatever ![]()
If I did a C++ (or whatever) implementation it would probably look pretty much the same.
Recent Comments