Archive

Archive for the ‘Programming & scripting’ Category

Using Occlusion, Reverb and Obstruction with OpenAL under Mac OS X (Cocoa)

January 4th, 2010

I haven’t been able to find a single line of online documenting how the effects extension for OpenAL on OS X should be used, so I thought I’d write a short summary here since I had a bunch of issues getting started with it.

First of all, if you are going to develop ANYTHING using OpenAL and Cocoa, download the following example from Apple.

OpenALExample

This is pretty much the only thing to get you started, and it includes most of the OpenAL functionalities that you’ll want to use. I made the mistake of trying to find docs for the effects extension instead of just looking through this source, and failed miserably. Lacking docs, I took a look at the header which contains some information about enabling reverb and other effects. Unfortunately, some of the inline documentation in this code is currently incorrect.

The row stating

 #define ALC_ASA_REVERB_ON 'rvon' // type ALboolean.

should say

 #define ALC_ASA_REVERB_ON 'rvon' // type UInt32.

Trying to use an ALboolean when enabling the reverb wont work (it expects UInt32). I filed a bug report for this, so hopefully it’ll be fixed eventually.

The header also references several methods which doesn’t seem to exist, so instead do the following to enable reverb/occlusion/obstruction.

I should mention that the following code assumes you already have OpenAL working and are just missing the effects.

Step 1
Include the effects header

#import <OpenAL/MacOSX_OALExtensions.h>

Step 2
Paste these functions from OpenALExample into your source file (or in a header which you’ll #import)

OSStatus  alcASASetSourceProc(const ALuint property, ALuint source, ALvoid *data, ALuint dataSize)
{
    OSStatus    err = noErr;
    static  alcASASetSourceProcPtr  proc = NULL;

    if (proc == NULL) {
        proc = (alcASASetSourceProcPtr) alcGetProcAddress(NULL, (const ALCchar*) "alcASASetSource");
    }   

    if (proc)
        err = proc(property, source, data, dataSize);
    return (err);
}

OSStatus  alcASASetListenerProc(const ALuint property, ALvoid *data, ALuint dataSize)
{
    OSStatus    err = noErr;
    static  alcASASetListenerProcPtr    proc = NULL;

    if (proc == NULL) {
        proc = (alcASASetListenerProcPtr) alcGetProcAddress(NULL, "alcASASetListener");
    }   

    if (proc)
        err = proc(property, data, dataSize);
    return (err);
}

Step 3 (for reverb)
The function calls below will enable reverb for a source (with id sourceId).

UInt32 reverbOn = 1;
alcASASetListenerProc(alcGetEnumValue(NULL,
    "ALC_ASA_REVERB_ON"), &reverbOn, sizeof(reverbOn));

ALfloat level = 1.0; // How much reverb you want for your source.
alcASASetSourceProc(alcGetEnumValue(NULL,
    "ALC_ASA_REVERB_SEND_LEVEL"), sourceId, &level, sizeof(level));

// Theres a bunch of predefined room types in MacOSX_OALExtensions.h
ALint roomType = ALC_ASA_REVERB_ROOM_TYPE_SmallRoom;
alcASASetListenerProc(alcGetEnumValue(NULL,
    "ALC_ASA_REVERB_ROOM_TYPE"), &roomType, sizeof(roomType));

Occlusion and obstructions work exactly the same, but using other constants from MacOSX_OALExtensions.h (have a look inside OpenALExample if you don’t understand which one to use).

I’m not sure anyone will see this and have any use of it, but I know seeing a quick summary like this would have saved me some time so here it is…

buffi Cocoa, Programming & scripting , ,

Source code for Geektorrent (Django based BitTorrent tracker) released

January 3rd, 2010

A little over two years ago, I wrote the first (as far as I know) BitTorrent tracker implemented in Django. I called it Geektorrent, and it was mostly a proof of concept and I didn’t really care about maintaining it.

The idea was always to release the source code freely, but I never got around to cleaning it up until now. I named it buffis-tracker since I’d like to avoid naming collisions with other Django based trackers (this one is the only other one I could fine now though).

I’ve stripped down the code to make a rather minimalistic tracker implementation. It does however support basic stuff such as torrent categories, user ratios and tagging. Registration is handled through Django’s authentication framework, which means that it should fit nicely as an extension to preexisting applications (tried on djangobb).

The project home page can be found at Google Code.

Here’s a demo of it running. I don’t really want to monitor the demo site though, so user registration and uploads are disabled. I just uploaded some random stuff I had laying around on my hard drive.

This is still just a very small pet project of mine and it probably has some bugs, but it should be pretty nice if you want to get started writing BitTorrent based code in Django.

buffi Programming & scripting, Python, Web development

Impressions from my first six months of OS X development

January 3rd, 2010

Since I started my new job about 6 months ago, I’ve almost exclusively been programming in Objective-C + Cocoa for OS X (and to a minor extent iPhone). I’ve been thinking that I should summarize my first impressions of it somewhere so I’ll do it here.

The syntax

My first issue with the language was the syntax. It’s function calls look similar to the ones in LISP, even though the language is a lot closer to Java or C++. Writing function calls nested like

[[foo bar] haveFunWithStuff:stuff andMoreStuff:moreStuff];

instead of

ClassName.bar().haveFun(stuff, moreStuff);

is actually not that big of a deal. The main issue i have with it is that Objective-C methods become quite verbose, and Cocoa in particular has quite lengthy method names in it’s library.

The reason for using a somewhat weird syntax is pretty obvious though. Since Objective-C is a strict superset of C, it makes it apparent which code is “pure” C and which code isn’t. I assumed this is the reason for using the unusual string syntax @”foo bar” as well (to avoid parser collision with C style strings).

Dynamic typing

Since I’ve coded a bunch of python stuff I’m used to dynamic typing, but I don’t really think having it mixed with static typing like in Objective-C is that pretty. For an example, if you want to use the standard event bindings in Interface Builder, you might hook up one of your buttons to the message buttonDown on a controller in your application. The declaration of this method is.

- (IBAction)buttonDown:(id)sender;

This specifies that the sender can be an object of any type. Nothing can really be assumed about the sender, and any method can be called on it (however, if the actual instance doesn’t support the call an error will occur). The documentation for buttonDown can however state that sender should implement an informal protocol). Essentially this means that instead of forcing a formal protocol (like an Interface in Java), it assumes that the instance supports one or more methods. If it doesn’t then the application fails during runtime.

Cocoa also has support form formal protocols, which works like interfaces in Java. These look like

- doStuff:(id <MyProtocol>)foo;

and implicitly forces foo to implement the methods in MyProtocol. Dynamic typing IS useful, but I don’t really like having the protocols declared by the documentation and not the code.

Memory Management

Up until OS X 10.5, Objective-C applications for OS X didn’t support garbage collection. This meant that only manual memory management was supported, and this is handled through manual reference counting.

Essentially, every time an object instance foo becomes dependent on another object bar, foo is responsible for retaining bar. This is done by increasing its reference count by one like:

// Inside Foo.m
- setBar:(BarClass*)bar {
  myBar = [bar retain];
}

Foo is also responsible for decreasing the reference count of bar when it’s done using it. This means that writing a setter for an object property can look like

- setBar:(BarClass*)bar {
  // Check needed to avoid getting ref count to 0 if setting the same object twice in a row.
  if (myBar != bar) {
    [myBar release];
    myBar = [bar retain];
  }

}

This looks a bit messy, and it kind of is. I’m not sure if this method decreases the risk of using freed memory compared to just using malloc and free like in C, but if it does then it’s definitely worth it. I’ve had a bunch of memory related issues when playing around with iPhone programming, but this was mostly when I was completely fresh to the language so I can’t really blame its design.

The garbage collection provided in OS X > 10.5 however works quite well, and I use it for all my OS X code. Hopefully the iPhone will get it as well eventually, but until then you’re stuck with ref counting.

No Private/Public/Protected

Objective-C doesn’t have the concept of private and public methods. All methods defined in a class header is public to callers. It is however possible to write implementations or even new declarations (using categories) inside of the implementation files which leaves the methods unexposed. In practice, it’s still possible to call them. You will just be given a warning during compilaton.

This also means that you can’t write protected methods like in Java (only accessible for it’s class and subclasses). If you want to expose a method to subclasses, you’ll have to expose it to the whole world as well. This will make some classes APIs a bit messier, but it’s not a major issue.

No modules

Objective-C uses something quite close to C++ for class dependencies. Headers work line in C/C++ but instead of including them with #include, you use #import which removes the need to check if the header was already included.

This means that you will still run into cyclic imports like in C or C++ if you have two classes Foo and Bar with dependencies on each other. To solve this, you use a new kind of forward references for the classes, and move the #import call to the classes implementation files like this

// Foo.h
@class Bar;
@interface Foo : NSObject
{
  Bar *bar;
}
@end
// Foo.m
#import "Bar.h"
@implementation Foo {
  /* stuff */
}
@end
// Bar.h
@class Foo;
@interface Bar : NSObject
{
  Foo *foo;
}
@end
// Bar.m
#import "Foo.h"
@implementation Bar {
  /* stuff */
}
@end

I’d really prefer to have module system like in Java.

Categories

Categories are a way to monkey patch code into existing classes. I find it pretty nasty from the design perspective, since it pretty much completely ignore the concept of encapsulation.

// NSStringExtras.h
#import "NSString.h"
@interface NSString (extras)
- (void)sayHello; // Extends the Cocoa string class with a hello world method.
@end

They can be used in a few quite nice way though. One of them is declaring categories for a class inside it’s implementation. This makes it possible to declare methods which will appear private, which “fixes” the lack of privacy in Objective-C.

// Foo.h
@interface Foo : NSObject {
}
- (void)publicMethod;
@end
// Foo.m
@interface Foo (hidden)
- (void)privateMethod;
@end

@implementation Foo
- (void)publicMethod {
  [self privateMethod];
}
@end

@implementation Foo (hidden)
- (void)privateMethod {
  NSLog(@"Hello world");
}
@end

No abstract classes

You can’t make abstract classes in Objective-C. You can do hacks which makes it impossible to initialize the “abstract” base class, but the concept of abstract classes doesn’t exist . This is rather ugly.

Networking Programming with Cocoa

Cocoa doesn’t have a nice Socket class like in Java. You are instead given a few different options for networking including CFSocket (which is a C API), raw BSD sockets and NSSocketPort which is pretty poorly documented and lacks support for UDP.

The lack of a proper Socket class in the standard library is solved by asyncsocket though, which makes socket handling a lot less painful. Something like this should probably be in the standard library in my opinion.

Xcode and Interface Builder

Xcode is the IDE shipped with the OS X dev tools. It’s pretty decent, but has a few shortcomings. One of them is the revision control integration. Xcode has some support for it (CVS/SVN/Perforce), but it’s pretty lacking and it seems like a lot of people use external tools for their needs. If you just want to commit and fetch updates from a single branch then it’s probably good enough, but if you want to do more advanced stuff you’ll have to do it by some other means.

The unit-test integration is also quite lacking. Getting a test suite up and running is way more work than it should be, and I still have to do some work arounds to be able to use the debugger while running tests.

The debugger is otherwise quite nice. It wraps GDB and I haven’t had any major issues with it.

Interface Builder is the design tool used for laying out OS X/iPhone GUI’s and I actually really like it. The way it lets ju bind you GUI items to your application controller is easy to learn and quite powerful. It’s actually probably my favorite WYSIWYG editor for a compiled language.

KVC/KVO

Essentially built in support for the Observer pattern for all objects. Quite useful, even though it like most parts of Cocoa isn’t as strict as when utilizing the pattern in many other languages.

Testing

It seems to me like very few Cocoa developers seem to use unit tests. I’ve been trying to keep a reasonable test coverage, and the testing tools OCUnit and OCMock are rather decent. The backwards compatibility with C makes OCUnit a bit iffy at times, but other than that I think it’s ok.

Documentation

The Cocoa documentation is for the most part excellent, but surprisingly a bunch of stuff lacks online documentation completely (try searching for docs on the core audio matrix mixer for an example). Usually there’s some kind of example to get you started though.

The community

The online community for OS X development is pretty good. There are a few online resources such as cocoadev which contains tons of good data. The #macdev and #iphonedev IRC channels on freenode are also excellent (especially Psy|). Finally, the related apple mailing lists provides a lot of information about topics which aren’t really covered elsewhere.

If you are forced to use Core Audio for an example, be prepared to search through the mailing lists for help a lot.

Summary

I don’t really know what I think about Objective-C. It’s design is a lot less strict compared to C++ and Java, and I don’t know if I like it or not. It works just fine though and I can’t blame Apple for deciding to use it as the standard language to use for OS X dev stuff. It is however quite complex to get to know the correct way of doing stuff when using Cocoa. It took me quite some time until I fully understood the KVO mechanic which makes Cocoa bindings work and so on.

Something I know that I like though is Xcode. It has some shortcomings (mainly revision control), but it’s an awesome IDE which allows for very rapid development thanks to Interface Builder which is excellent.

If I’d have to summarize my opinion, I’d say that I’m cautiously positive to doing OS X development. I don’t hate it after six months, and that’s actually somewhat impressive.

buffi Cocoa, Programming & scripting, Uninteresting , ,

Job GET! (and some short book reviews)

May 17th, 2009

Like I mentioned in a previous post, I got my Master’s degree earlier this year. Since then I’ve been taking some courses at the University while looking for a nice place to work. In other words, I’ve mostly been slacking around :)

About two months ago I decided to browse through a Swedish job database and found an opening as a programmer at Interactive Institute: Sonic Studio which seemed quite cool. I sent in my CV, went on some interviews and finally signed the hiring papers about 1-2 weeks ago. My first day on the job is about a week from now, and I’m really looking forward to it.

Since Sonic Studio is positioned in PiteÃ¥ which is about 50km from LuleÃ¥ (where I live) by car, I’m currently browsing for a decent used car to purchase. This sucks quite a lot since my knowledge in cars is pretty much limited to knowing where the motors and wheels are positioned, but hopefully I’ll be able to work this out since the other option is taking the bus which leaves me with about thirty less minutes sleep in the mornings.

Otherwise I’ve done a bunch of small stuff during the latest months. Among other things, I started coding on a game-engine for shoot-em-up development since I’ve been wanting to build one of those for some time, and I felt the need to brush up on my C++ after doing a bunch of Java and Python development. This is pretty much just a pet-project that I’ve been playing around with occasionally when I get bored, an early proof of concept video is available here in case someone would be interested in that.

I’ve also been reading a bunch of nerd-literature. Since I’ve played around a bit with C++ again, I finally decided to read through Thinking in C++ which helped me brush up on my syntax, among other things. In my opinion, it is a pretty great book for learning and improving ones skills in C++ but I haven’t really read any other books about the language to compare it to.

When I finished up that book, I started with The Pragmatic Programmer which has been on my “to read list” for some time. The book is essentially a list of tips for developing software, where some of them might seem obvious but others actually forces you to rethink your development process. One of them which I know that I’ve made myself guilty of occasionally is the tip they call “Leave no broken windows”. The real world example given in the book is a car that is left in the streets. The car can be left alone for quite some time but once the first window is destroyed, the rest of the car is usually plundered and vandalized within a short span of time. The same hold true for software, and I have personally ignored small bugs to find them come back with friends later on to torture me. Once you start compromising your code quality, it usually snowballs and gets worse quite fast.

It touches a lot on the non-programming aspects of development as well, such as making sure that you gather the correct requirements for you project and use proper tools such as revision control. Of course there are a lot of other great tips in the book as well, and I’d recommend it to pretty much anyone who are involved with development of software.

Finally I’ve read a new book on lighttpd. As I’ve written previously in the blog, I’ve migrated most of my web-stuff from Apache to lighttpd and I couldn’t be more pleased. Setting up lighttpd is a lot easier than setting up Apache, so you wont really need a book for it but I’d definitely recommend it to anyone who either wants a good guide for setting it all up or just tuning an existing installation. It covers a wide arrange of topics including setup, encryption, DOS-protection and migration from other web servers. All in all, it isn’t an essential book for anyone who wants to run an lighttpd-installation but I’d recommend it since it beats the online resources in readability.

buffi Programming & scripting, Python, Uninteresting, Web development , , ,

A nasty bug

April 27th, 2009

A few weeks ago a pretty nasty bug appeared on my image hosting service pici.se about a month ago. For some reason, some people started noticing that their thumbnails were changed to pictures which didn’t belong to them. This was a rather serious issue, and I had a look at the code to figure out what was going on. I realized that I had recently changed my deployment from Django under apache to Django under lighttpd and this was surely related to that… but how?

It turned out that it was a combination of bugs that together fucked stuff up for me pretty bad. Since I started allowing pictures over 1MB a while ago, but wanted to limit hotlinking to them, I put them into a separate directory from the other pictures being served. That meant that the large pictures were given filenames such as “large/asSDavXZ.jpg” instead of “asSDavXZ.jpg” where the filename is randomly generated. I had however forgotten to update my code to check for the presence of these images when fetching a new random file name, that is my code looked something like this (pseudo-code):

filename = get_random_stuff()
while file_exists(filename):
  filename = get_random_stuff()

when it should have been changed to this:

filename = get_random_stuff()
while file_exists(filename) or file_exists("large/" + filename):
  filename = get_random_stuff()

When I saw this, the first thing that came to mind was that this shouldn’t be such a big deal. Generating two filenames which were the same should pretty much never happen since there are so many possible combinations of characters.

Well… it did. And quite often. Running some scripts on the server showed that there were quite a lot of these collisions which seemed REALLY weird, and then I found this bug in Django. It seems like fastCGI deployment using method=prefork gives the same random seeds to each process. In combination with my fuckup, this made these collisions happen quite a lot and people got their thumbnails overwritten since all thumbnails were stored in the same folder without the “large/” prefix for each image.

That is, for a pictures thumbnail to be overwritten the following had to happen:

1. Someone uploads a picture below 1MB and is handed to fastCGI process 1 and given a “random” string for it’s filename.
2. Someone else uploads a picture larger than 1MB and is handed to fastCGI process 2, and given the same randomized string as a filename due to the random seeds being non-random due to a bug in Django.
3. My code has a nasty bug in it and doesn’t detect this collision.
4. Thumbnails are generated for image2 at the same target as the thumbs for image1.

Fixing the bug in my code was rather trivial but I also patched my Django installation to avoid any other weird issues due to non-random seeds. The current patch which is available for the bug should however not be utilized since it uses time.ctime() as a random seed for each request, and ctime() will only change once a second which means that subsequent requests given to the same fastCGI process during the same second will be given the same seed. Instead time.time would be better, so I patched my installation with pretty much the same thing but using the following instead.

random.seed("%d%f" % (getpid(), time())) 

This seems to generate random values for each request as far as my testing goes.

buffi Programming & scripting, Python, Web development , , , ,

Migrating Django from Apache to lighttpd using FastCGI

February 3rd, 2009

I run a medium traffic imagehosting site which serves about 8-10000 pageviews per day using Django and I have been using the recommended deployment method (Apache + mod_python) for just under two years and it has mostly worked well. However, about two months ago I was starting to notice increasing delays from the server and Apache would occasionally fail in spectacular ways which brought the CPU load to 100% for long periods of time.

I have always used lighttpd to serve static content, and since I don’t really enjoy the Apache configuration syntax I decided to give the lighttpd + FastCGI deployment method a try. I expected this migration to be complicated, but it took me less than an hour to figure it out. I have some minor documentation of my changes below in case you are interested in the basics of how to handle a Apache -> lighttpd Django migration.

Apache configuration

My old configuration, using Apache looked like this (some stuff omitted).

<VirtualHost *>
        ServerName pici.se
        DocumentRoot /var/www
        ErrorLog /var/log/apache2/pici_error.log
        ServerAlias pici
        <Location "/">
            PythonPath "['/home/buffi/site'] + sys.path"
            SetHandler python-program
            PythonHandler django.core.handlers.modpython
            SetEnv DJANGO_SETTINGS_MODULE pici.settings
            PythonDebug On
        </Location>
        <Location "/css/"> SetHandler None </Location>
        <Location "/js/"> SetHandler None </Location>
        <Location "/im/"> SetHandler None </Location>
        <Location "/picisendfiles/"> SetHandler None </Location>
        <Location "/pictures/"> SetHandler None </Location>
        <Location "/thumbs/"> SetHandler None </Location>

        # Static content served with lighttpd.
        RewriteEngine on
        RewriteRule  ^/pictures(.*) http://static.pici.se:8080/pictures$1
        RewriteRule  ^/thumbs(.*) http://static.pici.se:8080/thumbs$1
</VirtualHost>

lighttpd configuration

The corresponding lighttpd configuration became:

$HTTP["host"] =~ "pici\.se" {
    server.document-root = "/home/buffi/site/pici/"
    fastcgi.server = (
        "/pici.fcgi" => (
            "main" => (
                "socket" => "/home/buffi/site/pici/pici.sock",
                "check-local" => "disable",
            )
        ),
    )

    alias.url = (
        "/css/" => "/home/buffi/site/pici/picipage/css/",
        "/js/" => "/home/buffi/site/pici/picipage/js/",
        "/im/" => "/home/buffi/site/pici/picipage/im/",
        "/thumbs/" => "/var/www/static/thumbs/",
        "/pictures/" => "/var/www/static/pictures/",
        "/picisendfiles/" => "/var/www/picisendfiles/",
       )

    url.rewrite-once = (
        "^(/css.*)$" => "$1",
        "^(/im.*)$" => "$1",
        "^(/js.*)$" => "$1",
        "^(/picisendfiles.*)$" => "$1",
        "^(/thumbs.*)$" => "$1",
        "^(/pictures.*)$" => "$1",
        "^(/.*)$" => "/pici.fcgi$1",
        )
}

You might notice the path of a FastCGI socket.

“socket” => “/home/buffi/site/pici/pici.sock”,

To create this socket, simply use the FastCGI script available through manage.py. I create my socket using the following command.

./manage.py runfcgi method=prefork socket=/home/buffi/site/pici/pici.sock pidfile=pici.pid

All Django requests will then be forwarded from lighttpd to the FastCGI daemon.

Conclusion

By migrating from Apache to lighttpd I noticed a nice performance boost and got a more enjoyable syntax for my configuration. I haven’t really bothered to measure the decrease in CPU load, but it is easily noticeable and my server doesn’t become as sluggish as before during heavy load. I’ve used lighttpd + FastCGI for about a month or so now, and everything seems stable. I’d recommend any Django developer using Apache to give it a try.

buffi Programming & scripting, Python, Web development , , , ,

One of the main reasons that so many PHP-coders suck at programming?

February 2nd, 2008

I was about to just write a very small thing in PHP, and wondered what the nicest way to get a random string i PHP would be (lacking a nice sample method like in python) and googled “random string php”.

I found this.

Just… what is that? It just might be the ugliest piece of code I’ve ever seen. Not only does it have a nice little switch statement for each letter in the alphabet, for some reason the coder also seeds the random seed for each iteration in the loop.

This might be the worst I’ve seen, but I’ve seen a lot of PHP example code that is pretty close, and I think that this might be one of the main reasons that a lot of PHP-developers produce unreadable code (have a look at tbsource among other large projects for some horrifying code).

It’s nice that people like to share their code to teach others, but you are more likely to hurt development if you don’t have a clue what you are doing.

I’m not saying that all PHP-developers are horrible programmers (because they aren’t), and I use quite a few PHP-applications myself (such as wordpress) but the ratio of bad code compared to other languages seems to be way above average.

buffi Programming & scripting

How to get random rows from mysql using django without hurting your server

January 20th, 2008

Apparently using ORDER BY RAND() in mysql is a really bad idea (for tables containing a fair ammount of rows).

I did not know this, and tried getting random rows in django by using order_by(”?”), which uses ORDER BY RAND() to get a set of 32 random rows from one of my tables containing about 200 000 rows. This turned out to be a really bad idea, and it pretty much violated my server, having mysql consuming all of my CPU and most of my memory. Going through the slow queries log showed this:

# Query_time: 70  Lock_time: 0  Rows_sent: 0  Rows_examined: 0
SELECT `picipage_picture`.`id`,`picipage_picture`.`name`,`picipage_picture`.
`header`,`picipage_picture`.`description`,`picipage_picture`.`uploadednick_id`,
`picipage_picture`.`uploadedip`,`picipage_picture`.`views`,`picipage_picture`.
`timestamp`,`picipage_picture`.`gallery_id`,`picipage_picture`.`private`,
`picipage_picture`.`privid`,`picipage_picture`.`camera` FROM `picipage_picture`
WHERE (`picipage_picture`.`private` = 0) ORDER BY RAND() LIMIT 32;

A query time of 70 is pretty much insane (the limit for a query to be concidered slow is by default 2).

The blog post mentioned at the top mentions how to avoid this, and I’m gonna go ahead and post the djangoed version of the solution.

Getting a single row is simple.

random_pic = Picture.objects.order_by("?")[0] # Slow!

becomes

from random import randint
num_pics = Picture.objects.count()
random_pic = Picture.objects.all()[randint(0, num_pics-1)] # Fast!

Getting a set of random objects is harder.

random_pics = Picture.objects.order_by("?")[:32] # SUPER slow!

becomes

from random import sample
PICS_TO_GET = 32
num_pics = Picture.objects.count()

# Get a bunch of extra numbers, to avoid missing ID's
# Assumes enough rows
rand_nums = sample(xrange(1,num_pics), PICS_TO_GET*10)

# Match ID's of pictures to the sampled list
random_pics = random_pics.filter(id__in = rand_nums)[:PICS_TO_GET] # Fast!

This is messy, but really fast and works beautifully on tables with a bunch of rows (but there are probably nicer ways to do it).
Thanks goes out to mattmcc in #django@freenode for pointing me towards the blog entry.

buffi Programming & scripting, Python

Easy concurrency in Python using decorators

January 9th, 2008

I ran across this reddit link about daemonizing processes in unix with Python and I thought that a decorators for forking processes might be a bit nicer. Someone has probably already done this but whatever.

I should also right away mention that my code is heavily inspired by these two URL’s
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/66012
http://www.myelin.co.nz/post/2003/3/13/

Basically, threading in python is a bit gimped by the global interpreter lock. This isn’t necessarily a bad thing, but if you want to be able to use “true” concurrency in python you will have to use multiple processes instead of threads or as the python FAQ puts it:
“This doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple processes rather than multiple threads.”

Using pythons generator syntax it is quite easy to create a generator that transforms a normal function into a separate process, and returns a pipe from which output can optionally be receieved. The syntax of use will be something like this (I will explain the implementation of @forked below).

@forked
function my_function_goes_here(arguments):
    stuff()

This simple addition of the forked generator will when calling your function instead return a data pipe and launch the function in a separate process. Here is a more concrete example of usage:

# This process is just an example of usage
#
# The function write_times is forked into a
# separate process using the @forked generator
@forked
def write_times(arg, num, delay=0):
    """Write_times(arg, num, delay=0)
    print the argument arg num times with an optional
    delay inbetween"""
    import time

    for i in xrange(num):
        time.sleep(1)
        print arg,

Then you can simply call the function, receive the pipe, keep on doing stuff in your normal process and when you want the data from the other process, read it from the pipe like this.

# Create the forked process
r_pipe = write_times("foo", 3, 1)

print "This will be outputted directly after the fork"

print "Waiting for input from fork..."
in_data = r_pipe.read()

print "Data received:", in_data
r_pipe.close() # Clean up pipe

Running this will produce the output

$ python test.py
This will be outputted directly after the fork
Waiting for input from fork...
Data received: foo foo foo

Finally. Here is the implementation of the @forked generator. Nothing too crazy goes on in here. It pretty much just performs a standard double fork, and the parents returns the read end of the pipe while the child goes on about doing the passed function through a wrapper and outputting all its data through the pipe.

def forked(f):
    """Generator for creating a forked process
    from a function"""
    import os, sys

    # Make a pipe
    r, w = os.pipe()

    # Perform double fork
    if os.fork(): # Parent
        os.close(w) # close write end of pipe
        r = os.fdopen(r)

        # Return a function that returns the read pipe
        return lambda *x, **kw: r 

    # Otherwise, we are the child 

    # Perform second fork
    os.setsid()
    os.umask(077)
    os.chdir('/')
    if os.fork():
        os._exit(0) 

    os.close(r) # Close read part of pipe

    w = os.fdopen(w, 'w') # Get write part for writing

    # Bind stdout to pipe
    sys.stdout.flush()
    sys.stdout = w

    def wrapper(*args, **kwargs):
        """Wrapper function to be returned from generator.
        Executes the function bound to the generator and then
        exits the process"""
        f(*args, **kwargs)
        w.close() # Clean up pipe
        os._exit(0)

    return wrapper

That wasn’t very hard now was it? :)

The full source can be downloaded here, including the test-case. It’s only about 40-50 lines of code or so excluding comments.
Please feel free to leave comments and suggestions for improvements. I’m not really an Unix expert or anything.

buffi Programming & scripting, Python

Cross-platform suppressing of output in python

January 5th, 2008

A common way to suppress output under Unix/Linux in python is by doing

import sys
sys.stdout = open("/dev/null","w")
print "Hello world!" # Will not be printed to stdout

The reason for this could for an example be to let a forked process run silently, or something similar which could not be achieved by simply redirecting the entire application to /dev/null.

The issue with this is that this won’t work on non-unix systems such as windows, which normally isn’t an issue, but if you want platform independency then it isn’t very hard to achieve. Simply create a “file-like object” which implements write, and let it do nothing. This will work on all systems.

import sys

class Silencer(object):
  def write(self, data):
    pass

sys.stdout = Silencer()
print "Hello world!" # Will not be printed to stdout

If you know that you are using other methods of stdout such as writelines, implement them as well.

buffi Programming & scripting, Python