Archive

Archive for January, 2008

How to get random rows from mysql using django without hurting your server

January 20th, 2008

Apparently using ORDER BY RAND() in mysql is a really bad idea (for tables containing a fair ammount of rows).

I did not know this, and tried getting random rows in django by using order_by(”?”), which uses ORDER BY RAND() to get a set of 32 random rows from one of my tables containing about 200 000 rows. This turned out to be a really bad idea, and it pretty much violated my server, having mysql consuming all of my CPU and most of my memory. Going through the slow queries log showed this:

# Query_time: 70  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SELECT `picipage_picture`.`id`,`picipage_picture`.`name`,`picipage_picture`.
`header`,`picipage_picture`.`description`,`picipage_picture`.`uploadednick_id`,
`picipage_picture`.`uploadedip`,`picipage_picture`.`views`,`picipage_picture`.
`timestamp`,`picipage_picture`.`gallery_id`,`picipage_picture`.`private`,
`picipage_picture`.`privid`,`picipage_picture`.`camera` FROM `picipage_picture`
WHERE (`picipage_picture`.`private` = 0) ORDER BY RAND() LIMIT 32;

A query time of 70 is pretty much insane (the limit for a query to be concidered slow is by default 2).

The blog post mentioned at the top mentions how to avoid this, and I’m gonna go ahead and post the djangoed version of the solution.

Getting a single row is simple.

random_pic = Picture.objects.order_by("?")[0] # Slow!

becomes

from random import randint
num_pics = Picture.objects.count()
random_pic = Picture.objects.all()[randint(0, num_pics-1)] # Fast!

Getting a set of random objects is harder.

random_pics = Picture.objects.order_by("?")[:32] # SUPER slow!

becomes

from random import sample
PICS_TO_GET = 32
num_pics = Picture.objects.count()

# Get a bunch of extra numbers, to avoid missing ID's
# Assumes enough rows
rand_nums = sample(xrange(1,num_pics), PICS_TO_GET*10)

# Match ID's of pictures to the sampled list
random_pics = random_pics.filter(id__in = rand_nums)[:PICS_TO_GET] # Fast!

This is messy, but really fast and works beautifully on tables with a bunch of rows (but there are probably nicer ways to do it).
Thanks goes out to mattmcc in #django@freenode for pointing me towards the blog entry.

buffi Programming & scripting, Python

Easy concurrency in Python using decorators

January 9th, 2008

I ran across this reddit link about daemonizing processes in unix with Python and I thought that a decorators for forking processes might be a bit nicer. Someone has probably already done this but whatever.

I should also right away mention that my code is heavily inspired by these two URL’s
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://www.myelin.co.nz/post/2003/3/13/

Basically, threading in python is a bit gimped by the global interpreter lock. This isn’t necessarily a bad thing, but if you want to be able to use “true” concurrency in python you will have to use multiple processes instead of threads or as the python FAQ puts it:
“This doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple processes rather than multiple threads.”

Using pythons generator syntax it is quite easy to create a generator that transforms a normal function into a separate process, and returns a pipe from which output can optionally be receieved. The syntax of use will be something like this (I will explain the implementation of @forked below).

@forked
function my_function_goes_here(arguments):
    stuff()

This simple addition of the forked generator will when calling your function instead return a data pipe and launch the function in a separate process. Here is a more concrete example of usage:

# This process is just an example of usage
#
# The function write_times is forked into a
# separate process using the @forked generator
@forked
def write_times(arg, num, delay=0):
    """Write_times(arg, num, delay=0)
    print the argument arg num times with an optional
    delay inbetween"""
    import time

    for i in xrange(num):
        time.sleep(1)
        print arg,

Then you can simply call the function, receive the pipe, keep on doing stuff in your normal process and when you want the data from the other process, read it from the pipe like this.

# Create the forked process
r_pipe = write_times("foo", 3, 1)

print "This will be outputted directly after the fork"

print "Waiting for input from fork..."
in_data = r_pipe.read()

print "Data received:", in_data
r_pipe.close() # Clean up pipe

Running this will produce the output

$ python test.py
This will be outputted directly after the fork
Waiting for input from fork...
Data received: foo foo foo

Finally. Here is the implementation of the @forked generator. Nothing too crazy goes on in here. It pretty much just performs a standard double fork, and the parents returns the read end of the pipe while the child goes on about doing the passed function through a wrapper and outputting all its data through the pipe.

def forked(f):
    """Generator for creating a forked process
    from a function"""
    import os, sys

    # Make a pipe
    r, w = os.pipe()

    # Perform double fork
    if os.fork(): # Parent
        os.close(w) # close write end of pipe
        r = os.fdopen(r)

        # Return a function that returns the read pipe
        return lambda *x, **kw: r 

    # Otherwise, we are the child 

    # Perform second fork
    os.setsid()
    os.umask(077)
    os.chdir('/')
    if os.fork():
        os._exit(0) 

    os.close(r) # Close read part of pipe

    w = os.fdopen(w, 'w') # Get write part for writing

    # Bind stdout to pipe
    sys.stdout.flush()
    sys.stdout = w

    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
        w.close() # Clean up pipe
        os._exit(0)

    return wrapper

That wasn’t very hard now was it? :)

The full source can be downloaded here, including the test-case. It’s only about 40-50 lines of code or so excluding comments.
Please feel free to leave comments and suggestions for improvements. I’m not really an Unix expert or anything.

buffi Programming & scripting, Python

Cross-platform suppressing of output in python

January 5th, 2008

A common way to suppress output under Unix/Linux in python is by doing

import sys
sys.stdout = open("/dev/null","w")
print "Hello world!" # Will not be printed to stdout

The reason for this could for an example be to let a forked process run silently, or something similar which could not be achieved by simply redirecting the entire application to /dev/null.

The issue with this is that this won’t work on non-unix systems such as windows, which normally isn’t an issue, but if you want platform independency then it isn’t very hard to achieve. Simply create a “file-like object” which implements write, and let it do nothing. This will work on all systems.

import sys

class Silencer(object):
  def write(self, data):
    pass

sys.stdout = Silencer()
print "Hello world!" # Will not be printed to stdout

If you know that you are using other methods of stdout such as writelines, implement them as well.

buffi Programming & scripting, Python