Archive

Archive for the ‘Python’ Category

Source code for Geektorrent (Django based BitTorrent tracker) released

January 3rd, 2010

A little over two years ago, I wrote the first (as far as I know) BitTorrent tracker implemented in Django. I called it Geektorrent, and it was mostly a proof of concept and I didn’t really care about maintaining it.

The idea was always to release the source code freely, but I never got around to cleaning it up until now. I named it buffis-tracker since I’d like to avoid naming collisions with other Django based trackers (this one is the only other one I could fine now though).

I’ve stripped down the code to make a rather minimalistic tracker implementation. It does however support basic stuff such as torrent categories, user ratios and tagging. Registration is handled through Django’s authentication framework, which means that it should fit nicely as an extension to preexisting applications (tried on djangobb).

The project home page can be found at Google Code.

Here’s a demo of it running. I don’t really want to monitor the demo site though, so user registration and uploads are disabled. I just uploaded some random stuff I had laying around on my hard drive.

This is still just a very small pet project of mine and it probably has some bugs, but it should be pretty nice if you want to get started writing BitTorrent based code in Django.

buffi Programming & scripting, Python, Web development

Job GET! (and some short book reviews)

May 17th, 2009

Like I mentioned in a previous post, I got my Master’s degree earlier this year. Since then I’ve been taking some courses at the University while looking for a nice place to work. In other words, I’ve mostly been slacking around :)

About two months ago I decided to browse through a Swedish job database and found an opening as a programmer at Interactive Institute: Sonic Studio which seemed quite cool. I sent in my CV, went on some interviews and finally signed the hiring papers about 1-2 weeks ago. My first day on the job is about a week from now, and I’m really looking forward to it.

Since Sonic Studio is positioned in PiteÃ¥ which is about 50km from LuleÃ¥ (where I live) by car, I’m currently browsing for a decent used car to purchase. This sucks quite a lot since my knowledge in cars is pretty much limited to knowing where the motors and wheels are positioned, but hopefully I’ll be able to work this out since the other option is taking the bus which leaves me with about thirty less minutes sleep in the mornings.

Otherwise I’ve done a bunch of small stuff during the latest months. Among other things, I started coding on a game-engine for shoot-em-up development since I’ve been wanting to build one of those for some time, and I felt the need to brush up on my C++ after doing a bunch of Java and Python development. This is pretty much just a pet-project that I’ve been playing around with occasionally when I get bored, an early proof of concept video is available here in case someone would be interested in that.

I’ve also been reading a bunch of nerd-literature. Since I’ve played around a bit with C++ again, I finally decided to read through Thinking in C++ which helped me brush up on my syntax, among other things. In my opinion, it is a pretty great book for learning and improving ones skills in C++ but I haven’t really read any other books about the language to compare it to.

When I finished up that book, I started with The Pragmatic Programmer which has been on my “to read list” for some time. The book is essentially a list of tips for developing software, where some of them might seem obvious but others actually forces you to rethink your development process. One of them which I know that I’ve made myself guilty of occasionally is the tip they call “Leave no broken windows”. The real world example given in the book is a car that is left in the streets. The car can be left alone for quite some time but once the first window is destroyed, the rest of the car is usually plundered and vandalized within a short span of time. The same hold true for software, and I have personally ignored small bugs to find them come back with friends later on to torture me. Once you start compromising your code quality, it usually snowballs and gets worse quite fast.

It touches a lot on the non-programming aspects of development as well, such as making sure that you gather the correct requirements for you project and use proper tools such as revision control. Of course there are a lot of other great tips in the book as well, and I’d recommend it to pretty much anyone who are involved with development of software.

Finally I’ve read a new book on lighttpd. As I’ve written previously in the blog, I’ve migrated most of my web-stuff from Apache to lighttpd and I couldn’t be more pleased. Setting up lighttpd is a lot easier than setting up Apache, so you wont really need a book for it but I’d definitely recommend it to anyone who either wants a good guide for setting it all up or just tuning an existing installation. It covers a wide arrange of topics including setup, encryption, DOS-protection and migration from other web servers. All in all, it isn’t an essential book for anyone who wants to run an lighttpd-installation but I’d recommend it since it beats the online resources in readability.

buffi Programming & scripting, Python, Uninteresting, Web development , , ,

A nasty bug

April 27th, 2009

A few weeks ago a pretty nasty bug appeared on my image hosting service pici.se about a month ago. For some reason, some people started noticing that their thumbnails were changed to pictures which didn’t belong to them. This was a rather serious issue, and I had a look at the code to figure out what was going on. I realized that I had recently changed my deployment from Django under apache to Django under lighttpd and this was surely related to that… but how?

It turned out that it was a combination of bugs that together fucked stuff up for me pretty bad. Since I started allowing pictures over 1MB a while ago, but wanted to limit hotlinking to them, I put them into a separate directory from the other pictures being served. That meant that the large pictures were given filenames such as “large/asSDavXZ.jpg” instead of “asSDavXZ.jpg” where the filename is randomly generated. I had however forgotten to update my code to check for the presence of these images when fetching a new random file name, that is my code looked something like this (pseudo-code):

filename = get_random_stuff()
while file_exists(filename):
  filename = get_random_stuff()

when it should have been changed to this:

filename = get_random_stuff()
while file_exists(filename) or file_exists("large/" + filename):
  filename = get_random_stuff()

When I saw this, the first thing that came to mind was that this shouldn’t be such a big deal. Generating two filenames which were the same should pretty much never happen since there are so many possible combinations of characters.

Well… it did. And quite often. Running some scripts on the server showed that there were quite a lot of these collisions which seemed REALLY weird, and then I found this bug in Django. It seems like fastCGI deployment using method=prefork gives the same random seeds to each process. In combination with my fuckup, this made these collisions happen quite a lot and people got their thumbnails overwritten since all thumbnails were stored in the same folder without the “large/” prefix for each image.

That is, for a pictures thumbnail to be overwritten the following had to happen:

1. Someone uploads a picture below 1MB and is handed to fastCGI process 1 and given a “random” string for it’s filename.
2. Someone else uploads a picture larger than 1MB and is handed to fastCGI process 2, and given the same randomized string as a filename due to the random seeds being non-random due to a bug in Django.
3. My code has a nasty bug in it and doesn’t detect this collision.
4. Thumbnails are generated for image2 at the same target as the thumbs for image1.

Fixing the bug in my code was rather trivial but I also patched my Django installation to avoid any other weird issues due to non-random seeds. The current patch which is available for the bug should however not be utilized since it uses time.ctime() as a random seed for each request, and ctime() will only change once a second which means that subsequent requests given to the same fastCGI process during the same second will be given the same seed. Instead time.time would be better, so I patched my installation with pretty much the same thing but using the following instead.

random.seed("%d%f" % (getpid(), time())) 

This seems to generate random values for each request as far as my testing goes.

buffi Programming & scripting, Python, Web development , , , ,

Migrating Django from Apache to lighttpd using FastCGI

February 3rd, 2009

I run a medium traffic imagehosting site which serves about 8-10000 pageviews per day using Django and I have been using the recommended deployment method (Apache + mod_python) for just under two years and it has mostly worked well. However, about two months ago I was starting to notice increasing delays from the server and Apache would occasionally fail in spectacular ways which brought the CPU load to 100% for long periods of time.

I have always used lighttpd to serve static content, and since I don’t really enjoy the Apache configuration syntax I decided to give the lighttpd + FastCGI deployment method a try. I expected this migration to be complicated, but it took me less than an hour to figure it out. I have some minor documentation of my changes below in case you are interested in the basics of how to handle a Apache -> lighttpd Django migration.

Apache configuration

My old configuration, using Apache looked like this (some stuff omitted).

<VirtualHost *>
        ServerName pici.se
        DocumentRoot /var/www
        ErrorLog /var/log/apache2/pici_error.log
        ServerAlias pici
        <Location "/">
            PythonPath "['/home/buffi/site'] + sys.path"
            SetHandler python-program
            PythonHandler django.core.handlers.modpython
            SetEnv DJANGO_SETTINGS_MODULE pici.settings
            PythonDebug On
        </Location>
        <Location "/css/"> SetHandler None </Location>
        <Location "/js/"> SetHandler None </Location>
        <Location "/im/"> SetHandler None </Location>
        <Location "/picisendfiles/"> SetHandler None </Location>
        <Location "/pictures/"> SetHandler None </Location>
        <Location "/thumbs/"> SetHandler None </Location>

        # Static content served with lighttpd.
        RewriteEngine on
        RewriteRule  ^/pictures(.*) http://static.pici.se:8080/pictures$1
        RewriteRule  ^/thumbs(.*) http://static.pici.se:8080/thumbs$1
</VirtualHost>

lighttpd configuration

The corresponding lighttpd configuration became:

$HTTP["host"] =~ "pici\.se" {
    server.document-root = "/home/buffi/site/pici/"
    fastcgi.server = (
        "/pici.fcgi" => (
            "main" => (
                "socket" => "/home/buffi/site/pici/pici.sock",
                "check-local" => "disable",
            )
        ),
    )

    alias.url = (
        "/css/" => "/home/buffi/site/pici/picipage/css/",
        "/js/" => "/home/buffi/site/pici/picipage/js/",
        "/im/" => "/home/buffi/site/pici/picipage/im/",
        "/thumbs/" => "/var/www/static/thumbs/",
        "/pictures/" => "/var/www/static/pictures/",
        "/picisendfiles/" => "/var/www/picisendfiles/",
       )

    url.rewrite-once = (
        "^(/css.*)$" => "$1",
        "^(/im.*)$" => "$1",
        "^(/js.*)$" => "$1",
        "^(/picisendfiles.*)$" => "$1",
        "^(/thumbs.*)$" => "$1",
        "^(/pictures.*)$" => "$1",
        "^(/.*)$" => "/pici.fcgi$1",
        )
}

You might notice the path of a FastCGI socket.

“socket” => “/home/buffi/site/pici/pici.sock”,

To create this socket, simply use the FastCGI script available through manage.py. I create my socket using the following command.

./manage.py runfcgi method=prefork socket=/home/buffi/site/pici/pici.sock pidfile=pici.pid

All Django requests will then be forwarded from lighttpd to the FastCGI daemon.

Conclusion

By migrating from Apache to lighttpd I noticed a nice performance boost and got a more enjoyable syntax for my configuration. I haven’t really bothered to measure the decrease in CPU load, but it is easily noticeable and my server doesn’t become as sluggish as before during heavy load. I’ve used lighttpd + FastCGI for about a month or so now, and everything seems stable. I’d recommend any Django developer using Apache to give it a try.

buffi Programming & scripting, Python, Web development , , , ,

How to get random rows from mysql using django without hurting your server

January 20th, 2008

Apparently using ORDER BY RAND() in mysql is a really bad idea (for tables containing a fair ammount of rows).

I did not know this, and tried getting random rows in django by using order_by(”?”), which uses ORDER BY RAND() to get a set of 32 random rows from one of my tables containing about 200 000 rows. This turned out to be a really bad idea, and it pretty much violated my server, having mysql consuming all of my CPU and most of my memory. Going through the slow queries log showed this:

# Query_time: 70  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SELECT `picipage_picture`.`id`,`picipage_picture`.`name`,`picipage_picture`.
`header`,`picipage_picture`.`description`,`picipage_picture`.`uploadednick_id`,
`picipage_picture`.`uploadedip`,`picipage_picture`.`views`,`picipage_picture`.
`timestamp`,`picipage_picture`.`gallery_id`,`picipage_picture`.`private`,
`picipage_picture`.`privid`,`picipage_picture`.`camera` FROM `picipage_picture`
WHERE (`picipage_picture`.`private` = 0) ORDER BY RAND() LIMIT 32;

A query time of 70 is pretty much insane (the limit for a query to be concidered slow is by default 2).

The blog post mentioned at the top mentions how to avoid this, and I’m gonna go ahead and post the djangoed version of the solution.

Getting a single row is simple.

random_pic = Picture.objects.order_by("?")[0] # Slow!

becomes

from random import randint
num_pics = Picture.objects.count()
random_pic = Picture.objects.all()[randint(0, num_pics-1)] # Fast!

Getting a set of random objects is harder.

random_pics = Picture.objects.order_by("?")[:32] # SUPER slow!

becomes

from random import sample
PICS_TO_GET = 32
num_pics = Picture.objects.count()

# Get a bunch of extra numbers, to avoid missing ID's
# Assumes enough rows
rand_nums = sample(xrange(1,num_pics), PICS_TO_GET*10)

# Match ID's of pictures to the sampled list
random_pics = random_pics.filter(id__in = rand_nums)[:PICS_TO_GET] # Fast!

This is messy, but really fast and works beautifully on tables with a bunch of rows (but there are probably nicer ways to do it).
Thanks goes out to mattmcc in #django@freenode for pointing me towards the blog entry.

buffi Programming & scripting, Python

Easy concurrency in Python using decorators

January 9th, 2008

I ran across this reddit link about daemonizing processes in unix with Python and I thought that a decorators for forking processes might be a bit nicer. Someone has probably already done this but whatever.

I should also right away mention that my code is heavily inspired by these two URL’s
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://www.myelin.co.nz/post/2003/3/13/

Basically, threading in python is a bit gimped by the global interpreter lock. This isn’t necessarily a bad thing, but if you want to be able to use “true” concurrency in python you will have to use multiple processes instead of threads or as the python FAQ puts it:
“This doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple processes rather than multiple threads.”

Using pythons generator syntax it is quite easy to create a generator that transforms a normal function into a separate process, and returns a pipe from which output can optionally be receieved. The syntax of use will be something like this (I will explain the implementation of @forked below).

@forked
function my_function_goes_here(arguments):
    stuff()

This simple addition of the forked generator will when calling your function instead return a data pipe and launch the function in a separate process. Here is a more concrete example of usage:

# This process is just an example of usage
#
# The function write_times is forked into a
# separate process using the @forked generator
@forked
def write_times(arg, num, delay=0):
    """Write_times(arg, num, delay=0)
    print the argument arg num times with an optional
    delay inbetween"""
    import time

    for i in xrange(num):
        time.sleep(1)
        print arg,

Then you can simply call the function, receive the pipe, keep on doing stuff in your normal process and when you want the data from the other process, read it from the pipe like this.

# Create the forked process
r_pipe = write_times("foo", 3, 1)

print "This will be outputted directly after the fork"

print "Waiting for input from fork..."
in_data = r_pipe.read()

print "Data received:", in_data
r_pipe.close() # Clean up pipe

Running this will produce the output

$ python test.py
This will be outputted directly after the fork
Waiting for input from fork...
Data received: foo foo foo

Finally. Here is the implementation of the @forked generator. Nothing too crazy goes on in here. It pretty much just performs a standard double fork, and the parents returns the read end of the pipe while the child goes on about doing the passed function through a wrapper and outputting all its data through the pipe.

def forked(f):
    """Generator for creating a forked process
    from a function"""
    import os, sys

    # Make a pipe
    r, w = os.pipe()

    # Perform double fork
    if os.fork(): # Parent
        os.close(w) # close write end of pipe
        r = os.fdopen(r)

        # Return a function that returns the read pipe
        return lambda *x, **kw: r 

    # Otherwise, we are the child 

    # Perform second fork
    os.setsid()
    os.umask(077)
    os.chdir('/')
    if os.fork():
        os._exit(0) 

    os.close(r) # Close read part of pipe

    w = os.fdopen(w, 'w') # Get write part for writing

    # Bind stdout to pipe
    sys.stdout.flush()
    sys.stdout = w

    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
        w.close() # Clean up pipe
        os._exit(0)

    return wrapper

That wasn’t very hard now was it? :)

The full source can be downloaded here, including the test-case. It’s only about 40-50 lines of code or so excluding comments.
Please feel free to leave comments and suggestions for improvements. I’m not really an Unix expert or anything.

buffi Programming & scripting, Python

Cross-platform suppressing of output in python

January 5th, 2008

A common way to suppress output under Unix/Linux in python is by doing

import sys
sys.stdout = open("/dev/null","w")
print "Hello world!" # Will not be printed to stdout

The reason for this could for an example be to let a forked process run silently, or something similar which could not be achieved by simply redirecting the entire application to /dev/null.

The issue with this is that this won’t work on non-unix systems such as windows, which normally isn’t an issue, but if you want platform independency then it isn’t very hard to achieve. Simply create a “file-like object” which implements write, and let it do nothing. This will work on all systems.

import sys

class Silencer(object):
  def write(self, data):
    pass

sys.stdout = Silencer()
print "Hello world!" # Will not be printed to stdout

If you know that you are using other methods of stdout such as writelines, implement them as well.

buffi Programming & scripting, Python

I made an FTP-server (easyFTPD v0.1)

December 16th, 2007

… and you can get it here!

Setting up virtual users in most unix/linux-FTP-servers is actually a lot harder than it should be so I decided to create an easy to use FTP-server, targeted at virtual users.

I started off with an implementation using twisted but I soon found pyftpdlib which fitted this project perfectly. Since pretty much everything you need is already implemented, the application became reduced to programming a wrapper around pyftpdlib and setting it up for easy configuration and deployment.

The first version can be found here, and I really like the way it turned out. Creating virtual users are simply done by adding them to a users-file with the syntax

username:password:permissions:share_folder

where password might be a salted hash, or plaintext. More info on user configuration here.

Getting it running is as easy as first installing it by doing the following as superuser

python setup.py install

It can then be started with or without options

easyftpd              # Default settings
easyftpd -p 2100 -s   # Port 2100 and silent (no logging)
easyftpd -p 12345 -d  # Port 12345 and running as daemon

Information about more options and the configuration file can be found at the project page.

Hopefully I’m not the only one who has needed an easy to use FTP-server. proFTPD is simply too scary for me ;)

buffi Programming & scripting, Python

Abusing python: Game of life using ncurses in one line of code

November 4th, 2007

I like to make one-liners. Not because they are useful but because it’s fun, and it teaches you stuff about the workings of a language and how to abuse it. When I say one-liner, I mean a single statement that does not use eval (otherwise one-liners are trivial) or ; to separate multiple statements.

I have never written any implementation of Conway’s Game of Life before, and tried to see if I could get it done using a single line of code.

I started out by coding a function that when given a sequence of (x,y) coordinate tuples returned a new dictionary with the next set of coordinates set to True. The choice of a dict for the return value was actually not perfect, as a set would have done just fine, but I used a dictionary earlier in the algorithm for checking which of the coordinates that should be alive, and I didn’t really see any need to rewrite it since a dict here pretty much is a set with a (in this case) redundant value.

The first working algorithm looked like this:

def get_next(old):
  new, new2 = {}, {}
  offset = ((-1, -1),(-1, 0),(-1, 1),(0, -1),
    (0, 1),(1, -1),(1, 0),(1, 1))
  for (x, y) in old:
    for (ox, oy) in offset:
      new[(x+ox, y+oy)] = \
        new.get((x+ox, y+oy), 0) + 1
  for (x, y) in new:
    v = new[(x, y)]
    if v == 3 or v == 2 and (x, y) in old:
      new2[(x, y)] = True
  return new2

A bit of abuse got this down to a single line that looked like this (as you might see, the backslashes are used to make this more readable, it really is just a single line of code.

def get_next(old): return globals().__setitem__("new",{}) \
  or globals().__setitem__("new2",{}) or \
  [[new.__setitem__((x+ox,y+oy),
  new.get((x+ox,y+oy), 0) + 1) \
  for (ox, oy) in ((-1, -1),(-1, 0),(-1, 1),(0, -1),(0, 1),
  (1, -1),(1, 0),(1, 1))] for (x,y) in old if old[(x,y)]] \
  and [new2.__setitem__((x,y),True) for (x,y) in new if \
  (new[(x,y)] == 3 or new[(x,y)] == 2 and (x,y) in old)] and new2

This works fine for getting the next “map” of the game of life, but how fun is that? I wanted ncurses support so that I could actually see this. Eventually I ended up with this:

print globals().__setitem__("f",globals().__setitem__) or \
  f("d",{(1,2):True,(2,3):True,(3,1):True,(3,2):True,(3,3):True,}) \
  or f("curses",__import__("curses")) or f("stdscr", curses.initscr()) \
  or stdscr.nodelay(1) or curses.noecho() or curses.cbreak() \
  or (curses.curs_set(0) or 1) and f("bar",[1]) or [1]+[(bar.append(1) \
  or stdscr.clear() or [1]+[stdscr.addstr(0,0,"Press q to quit") \
  or stdscr.addch(y%20+1,x%20,"x") for (x,y) in d] and stdscr.refresh() \
  or __import__("time").sleep(0.2) or f("d",(lambda old: f("new",{}) \
  or f("new2",{}) or [[new.__setitem__((x+ox,y+oy),
  new.get((x+ox,y+oy), 0) + 1) \
  for (ox, oy) in ((-1, -1),(-1, 0),(-1, 1),(0, -1),(0, 1),
  (1, -1),(1, 0),(1, 1))] \
  for (x,y) in old if old[(x,y)]] and [new2.__setitem__((x,y),True) \
  for (x,y) in new if (new[(x,y)] == 3 or new[(x,y)] == 2 and (x,y) \
  in old)] and new2)(d))) for foo in bar if (stdscr.getch() !=
  ord('q'))] and curses.nocbreak() or stdscr.keypad(0) or \
  curses.echo() or curses.endwin() or "Bye!"

Now THAT is a one-liner! :)

Running this creates a ncurses window and a “glider” that will move on forever (wraps to x=0,y=0 at position x=20,y=20).
Here are some pics of it running:

If you want to run this yourself to have a look you can download the one-liner here. It should work on any unix/linux/mac-system supporting ncurses. I don’t think that ncurses works in more obscure operating systems such as windows so if you are a windows user then you will just have to take my word on that it works ;)
This version also lacks the line separation backslashes used for the presentation here which perhaps makes it clearer that it is just a single line of (rather horrible) code.

Like I said, I am well aware of the fact that this code isn’t pretty, it isn’t meant to be. It does however contain a nice bunch of clever tricks that some people might not be aware of.

buffi Programming & scripting, Python

Some nice things to know about operators in python

October 11th, 2007

There are a few operator related things that aren’t (as far as I know) common knowledge in python.

The first is the not in and is not operators. Basically, not is normally a unary operator and in and is are binary. Hower these two operations are valid

foo not in bar # same as "not foo in bar"
foo is not bar # same as "not foo is bar"

This is not obvious since not in this case doesn’t do unary negation but rather forms a new binary operator together with is or in.

Another cool thing is chaining comparisons. In C this would evaluate to False (or rather 0 since C lacks booleans).

int foobar = 3>2>1; // in C evaluates to (3>2)>1 = 1 > 1 = 0

In python however this will return True! This is because of python evaluating this chaining the same way as it is used in mathematics and so on.

foobar = 3>2>1 # in python evaluates to True

Basically this is translated by python to

foobar = 3>2 and 2>1

Both of these things might be common knowledge but I didn’t know about it and someone else might not as well.

buffi Programming & scripting, Python