Archive

Archive for July, 2007

Bittorrent bencode decoder in python using 30 lines of code

July 28th, 2007

Bittorrent uses a quite decent algorithm for serialization of it’s data for the .torrent files called bencoding. This page has a pretty good description of it.

I wanted to be able to parse this data to get information out of them for a thing I’m working on and played around a bit in python. Since it was late at night and I was rather bored I wanted to see how short I could get the decoder without writing extremely ugly hacks.
I ended up with a fully functional module that clocks in at 30 lines of code.

class Decoder(object):
    def __init__(self, data): self.data, self.ptr = data, 0
    def _cur(self): return self.data[self.ptr]
    def _get(self, x):
        self.ptr += x
        return self.data[self.ptr-x:self.ptr]
    def _get_int_until(self, c):
        num = int(self._get(self.data.index(c, self.ptr)-self.ptr))
        self._get(1) # kill extra char
        return num
    def _get_str(self): return self._get(self._get_int_until(":"))
    def _get_int(self): return self._get_int_until("e")
    def decode(self):
        i = self._get(1)
        if i == "d":
            r = {}
            while self._cur() != "e":
                key = self._get_str()
                val = self.decode()
                r[key] = val
            self._get(1)
        elif i == "l":
            r = []
            while self._cur() != "e": r.append(self.decode())
            self._get(1)
        elif i == "i": r = self._get_int()
        elif i.isdigit():
            self._get(-1) # reeeeewind
            r = self._get_str()
        return r

Download it here

Runtime example for a random torrent I downloaded to test it on:

>>> from decode import Decoder
>>> f=open("stuff.torrent")
>>> data=f.read()
>>> d=Decoder(data)
>>> d.decode()
{'comment': '', 'comment.utf-8': '', 'azureus_properties': {'dht_backup_enable': 1}, 'encoding': 'UTF-8',
    'creation date': 1185571081, 'info': {'piece length': 32768, 'name': 'SHAVER.mp3', 'private': 0,
    'pieces': '\x1ee\xb4\xa0X\xe2 ... I removed characters here since the line is too long ... \xc3v\xc2\x00\xc9',
    'length': 229374, 'name.utf-8': 'SHAVER.mp3'}, 'created by': 'Azureus/2.5.0.4',
    'announce': 'http://tpb.tracker.thepiratebay.org/announce'}

Hopefully this might be useful to someone.

And yeah, this code is not very “pythonic” but that word is rather stupid anyways so… whatever :)
If I did a C++ (or whatever) implementation it would probably look pretty much the same.

buffi Programming & scripting, Python

Modifying django to have a BlobField for storing binary data in mysql

July 25th, 2007

Like I mentioned in my last post I’ve been playing around a bit with XBT Tracker which uses a blob-field for storing a binary hash.

I want to build a site arround this using django, but for some reason there are no BlogField available in the model-api, so I had to make my own, and am posting how to do it here if anyone else is interested and also to keep a not to myself of how I did it for later use (there are no BigInteger in django either I belive so that’s next).

I chose to ignore portability for this, since I only use mysql and have zero experience with the other databases and this was just a fix for my own site really. Hopefully django will get a BigIntegerField and BlobField in the release eventually.

Anyways… to start adding the BlobField, open the file “creation.py” in the django/db/backends/mysql folder. In my case

/usr/lib/python2.4/site-packages/django/db/backends/mysql/creation.py

In the list of DATA_TYPES, add

'BlobField':         'blob',

Then open the file “introspection.py” in the same folder. In my case

/usr/lib/python2.4/site-packages/django/db/backends/mysql/introspection.py

Change

FIELD_TYPE.BLOB: 'TextField',

to

FIELD_TYPE.BLOB: 'BlobField',

Then finally open __init__.py in django/db/models/fields . In my case

/usr/lib/python2.4/site-packages/django/db/models/fields/__init__.py

Then simply copy TextFields class, put it below the TextField class and rename the copy BlobField
It should look something like this:

class BlobField(Field):
    def get_manipulator_field_objs(self):
        return [oldforms.LargeTextField]

    def formfield(self, **kwargs):
        defaults = {'required': not self.blank, 'widget': forms.Textarea,
                   'label': capfirst(self.verbose_name), 'help_text': self.help_text}
        defaults.update(kwargs)
        return forms.CharField(**defaults)

This should make the admin panel use the same widgets and so on as the TextArea which hopefully should be fine. You probably wont use the admin-panel for a BlobField anyways :)
There might be some issues with this, since TextField uses some validators that you dont want here, but like I said… the chances that you will insert binary data through the admin interface is rather low. Feel free to drop me a comment if this part does not work.

Finally you might want to remove all compiled pythonfiles (.pyc) since they wont be updated. Unsure if this is needed, but… well I did it and it worked for me :P

buffi@jenet:~$ cd /usr/lib/python2.4/site-packages/django/ #path to your django install
buffi@jenet:/usr/lib/python2.4/site-packages/django$ find ./ -name "*.pyc" -print | xargs rm

Ok… time to try out making a model with it.

Sample model:

from django.db import models

class MyModel(models.Model):
    my_field = models.BlobField()

And then let’s test it…

buffi@jenet:~/site/testblob$ python manage.py validate
0 errors found.
buffi@jenet:~/site/testblob$ python manage.py sqlall testapp
BEGIN;
CREATE TABLE `testapp_mymodel` (
    `id` integer AUTO_INCREMENT NOT NULL PRIMARY KEY,
    `my_field` blob NOT NULL
);
COMMIT;

Looks good!
Doing a

python manage.py syncdb

works just fine as well. Then it’s just actually using it left…

$ python manage.py shell
Python 2.4.4 (#2, Apr  5 2007, 20:11:18)
[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from myapp import models
>>> binary_stuff = '\x00\x01\x02'
>>> m = models.MyModel(my_field=binary_stuff)
>>> m.save()
>>>

Whoaa, it works!
Checking the mysql database confirms that it works…

So yeah, it’s not platform independent or anything but if you want to use blob fields in mysql then this might be for you ;)

buffi Programming & scripting, Python

Getting a torrents info_hash for XBT Tracker using python

July 25th, 2007

I’ve been playing around with XBT tracker for a small project I’m concidering and needed a script for getting the info_hash info from torrent files to use in the xbt_files table.

Since I read through the sourceforge forums and it seems like I wasn’t alone in having a few issues at first, I might as well put the script here as well if someone else need it.
It is based on the original BitTorrent source code, and has stripped down all of the unneeded stuff from btshowmetainfo.py .

It simply takes a torrent filename as an argument and then prints the hash_info of that torrent to stdout.

Code:

#!/usr/bin/env python

from sys import *
from sha import *
from bencode import *

if len(argv) != 2:
    print "ERROR: use ./hash_info.py filename"
    exit(2)

filename = argv[1]

metainfo_file = open(filename, 'rb')
try:
    metainfo = bdecode(metainfo_file.read())
except ValueError:
    print "ERROR: Not a valid torrent file"
    exit(1)
metainfo_file.close()
info = metainfo['info']
info_hash = sha(bencode(info))

stdout.write(info_hash.digest())

Sample usage:

./hash_info.py /home/buffi/wow_a_torrent.torrent > wow_a_torrent.hash

download the script

The only dependency is bencode.py from BitTorrent. Download it and put it in the same folder as this script.
I also have bencode.py mirrored here

buffi Programming & scripting, Python

buffis.com back online

July 25th, 2007

Got myself a new server for buffis.com, since the old one died and I want to keep wordpress away from my “real” server since I don’t really trust it’s security.
2.80GHz of pleasure with 1gb of ram should be nice enough for playing around a bit, and it was free. Free is nice.

buffi Uninteresting