Archive

Archive for August, 2007

Django (SVN) and non-unicode HTTP GET-data

August 28th, 2007

Django merged with the unicode branch a while ago. If you are still using 0.96 then this doesn’t affect you, but if you are running the SVN-versionen then it does.

A HTTP request can have GET arguments such as http://mydomain.com/mysite/?foo=bar&banana=purple which has the GET data foo = “bar” and banana = “purple”. Not all characters are allowed to be in these request-URL’s which forces an alternative syntax which is the character % followed by the hexadecimal representation of the character. For an example the character with ascii value 255 can be represented as %FF.

Bittorrent is a good example of a protocol that sends data (for an example it’s info_hash which is a 20 byte sha1 hash) as HTTP GET-data which uses this encoding. Django does however not handle this well at all since the merge with the unicode branch.

Here is an example:

Let’s say that you have a page at http://mydomain.com/ which you want to send the character with ordinal 238 (EE in hex) as a GET argument.
http://mydomain.com/?info_hash=%EE

You might then have a views that does something like this

def handle_stuff(request):
  get_data = request.GET.copy()
  info_hash = get_data["info_hash"]
  assert(False) # for debugging

Assert false will bring up djangos VERY nice debug screen if you have debug turned on in your settings.py and show that

get_data = <MultiValueDict?: {u'info_hash': [u'\ufffd']}>
info_hash = u'\ufffd'

u’\ufffd’ is the unicode symbol for “OH FUCK THIS DIDN’T WORK” (or something like that).
Basically… forcing unicode in django broke recieving non unicode GET-data through the GET MultiValueDict.

Ok… so why am I posting it here and not as a bug report in django?
Well, I did also post a bug report about this today but the reason for posting here is that it exists an ok workaround for now, and I hope that this post might assist people in finding it.

request.META['QUERY_STRING'] hold the raw query string that is requested for a page request, and you can use the parse_qs (or parse_qsl) method in the cgi module to get the non-violated GET-data. It even properly escapes the %XX notation into “regular” characters.
Thank you Crast in #django @ freenode for reminding me about the existence of META['QUERY_STRING'].

get_data = cgi.parse_qs(request.META['QUERY_STRING'])
info_hash = get_data["info_hash"][0]
info_hash_as_hex = get_data["info_hash"][0].encode("hex")

Ugly? Yeah…
Functional? Yup!

buffi Programming & scripting, Python

Simplified IP-ban middleware

August 23rd, 2007

This is based on the last post where I showed a simple middleware for IP banning in django.
In the comments, SmileyChris mentioned that simply rendering the template in the middleware would eliminate the need for a url configuration and view for the IP ban middleware. This was actually a really great idea, since it also removes the need to check the path for the ban page, and everything becomes a lot cleaner. Thanks :)

I rewrote the middleware to use this method instead, and it now looks like this

from pici.picipage.models import Ban
from django.shortcuts import render_to_response

class IPBanMiddleware(object):
  """
  Simple middleware for taking care of bans from specific IP's
  Redirects the banned user to a ban-page with an explanation
  """
  def process_request(self, request):
    ip = request.META['REMOTE_ADDR'] # user's IP

    # see if user is banned
    try:
      # if this doesnt throw an exception, user is banned
      ban = Ban.objects.get(ip=ip)
      reason = ban.reason

      # return the "ban page"
      return render_to_response("subpages/banned/banned.html",
        {"reason": reason})
    except Ban.DoesNotExist: # not banned! goodie
      pass

No additional view or url-patterns needed, simply add the Ban model that I wrote in the last post into your model, and create the template and you are good to go!

My Ban model looked like this (same as last post), but can of course be improved to fit your needs.

class Ban(models.Model):
    ip = models.IPAddressField()
    reason = models.TextField()

    def __str__(self):
        return self.ip

    class Admin: pass

Thanks for the idea, and I hope that other people will start “bashing my code” as well ;)
Constructive criticism is an awesome thing, and I know that there are a whole lot of more experienced django developers than me out there.

buffi Programming & scripting, Python

Simple middleware for database driven IP ban using django.

August 21st, 2007

The internet is stupid, and your users will misbehave. Therefore every homepage should have some way of banning users from posting content, or perhaps in some cases from viewing your page altogether.
I wanted a simple way to ban users from accessing one of my django made homepages, and then having them redirected to a page with an explanation why. The IP’s and explanations should be entered through the django admin interface. For this django middleware is a decent choice.

First of all, lets create a Ban model in your models file. Something like this should do (you might want timestamps and other stuff as well, I don’t really need it).

class Ban(models.Model):
    ip = models.IPAddressField()
    reason = models.TextField()

    def __str__(self):
        return self.ip

    class Admin: pass

Do a python manage.py syncdb to insert the model into your database.

Now it’s time to create the middleware. The middleware can be placed anywhere reachable from your python path.
I named my middleware file ipban.py and placed in in a subfolder called “middleware”. Also create an empty __init__.py file in this directory if you do so as well to indicate that it holds python files. The content of my ipban.py is

from django import http
from pici.picipage.models import Ban

class IPBanMiddleware(object):
  """
  Simple middleware for taking care of bans from specific IP's
  Redirects the banned user to a ban-page with an explanation
  """
  def process_request(self, request):
    ip = request.META['REMOTE_ADDR'] # user's IP
    path = request.path # requested path

    # see if user is banned
    try:
      # if this doesnt throw an exception, user is banned
      Ban.objects.get(ip=ip) 

      # only redirect when not already at the ban page
      if not path.startswith("/banned/"):
        return http.HttpResponseRedirect("/banned/%s/" % ip)
    except Ban.DoesNotExist: # not banned! goodie
      pass

pici.picipage.models should of course be changed to point to your own models module containing the Ban model.
This middleware checks if the users IP is in the list of banned users at every request. If the user is banned, then he is redirected to the ban-page “/banned/ipgoeshere/”.

Now open your settings.py file and add this class to your MIDDLEWARE_CLASSES.
In my case I added

'pici.middleware.ipban.IPBanMiddleware',

Add a urls.py entry with something like this (I’m somewhat lazy at regular expressions). Obviously replace miscviews.banned with whatever you are going to call your view-handling method and it’s module.

(r"^banned/([^/]+)/$", miscviews.banned),

Add the handler for the banning in your view. Mine looks like this. Replace “subpages/banned/banned.html” with the path to your template.

def banned(request, banip):
  try:
    ban = Ban.objects.get(ip=banip)
    reason = ban.reason
  except Ban.DoesNotExist:
    reason = "None given"
  return render_to_response("subpages/banned/banned.html",
    {"reason": reason}, context_instance=RequestContext(request))

Finally add the template and you are done!
Adding the template should be straight forward if you know anything about django, but it can look something like this

{% extends "index.html" %}

{% block main_content %}
  <div class="normal_content_centered">
    <h1>BANNED!</h1>
    <p>The reason for your banning is:</p>
    <p><b>{{ reason }}</b></p>
    <img src="url_to_crying_baby_goes_here.jpg" />
  </div>
{% endblock %}

The image of a crying baby is essential.

This can of course quite easily be optimized by using sessions to avoid having to do the lookup for each request, however a very simple database hit per request is not a big deal in my case so I’ll save that for later. Premature optimization is evil and all that.

Have fun banning users from your homepage :)

edit: Improved version up

buffi Programming & scripting, Python

No good way using django to interface with external applications using binary data.

August 12th, 2007

I posted a ticket about it being seemingly impossible to use djangos db api (even with custom SQL) to handle binary data.
http://code.djangoproject.com/ticket/5135
and got the reply.

Storing binary data was unsafe before (what if your binary data contained a zero byte?), so it was kind of lucky — and unsupported — that it worked at all. It just works even less well know.

The real fix here is something like #2417 (adding a propery binary field type). The current workaround is to use base64 encoding (or base96 or some other binary->ascii encoding) on the data before storing it. There’s nothing we can do at the text field level, since we are assuming Unicode strings for text and databases obviously use an encoding when they store stuff, hence we have to convert between the encoding and Python Unicode objects.

Although I do agree with him about a new BinaryField being the best way to solve this, I also realize that this means that django does not have any nice way to handle binary data at all right now, and that it probably won’t have it any time soon :/
I will have to resort to ugly “external” MySQLdb hacks for now.

Using base64 or similar is of course a good choice if you write your tables from scratch, but interfacing with a legacy application makes it impossible without digging through a whole lot of source code, and modifying that application (which also makes it a non-generic solution that pretty much requires a ugly fork of that app to work).

One would really think that interfacing with external application should be big enough to at least be able to handle “unsupported” field types using custom SQL. I mean… there’s even a chapter in djangobook about using legacy databases / external apps.

I hope that a BinaryField will eventually be incorporated in django, but I’m not the right person to do it since I only have experience using mysql, and it should of couse be generic :)
Due to django using unicode all the way down to custom SQL, I bet there will be quite a few issues with creating one though.

Edit: Also, this is important.
This only deals with the SVN-release.
0.96 was releases before the merge with the unicode-branch, and custom SQL to blob fields work there.

buffi Programming & scripting, Python

Django SVN-release can’t handle binary data insertion?

August 11th, 2007

Django does not have a BinaryField, or BlobField or whatever you want to call it, which is a bit sad. I made a fix for this a few posts back that worked fine in 0.96. However when trying the SVN-release it seems like everything that goes to the database is first turned into unicode, which of course includes the binary data.
Even doing custom SQL won’t work

An example from my code.
The info_hash field is a blob

def create_xbt_file(info_hash, timestamp):
  query = "INSERT INTO xbt_files (info_hash, mtime, ctime) VALUES (%s, %s, %s)"
  from django.db import connection
  cursor = connection.cursor()
  cursor.execute(query, [info_hash, timestamp, timestamp])

This throws a nasty UnicodeDecodeError whenever a byte with a position that unicode doesn’t like is in info_hash.

If I bypass django completely using MySQLdb, then it works fine

def create_xbt_file(info_hash, timestamp):
  import MySQLdb
  db = MySQLdb.connect("localhost", DATABASE_USER, DATABASE_PASSWORD, DATABASE_NAME)
  cursor = db.cursor()
  query = "INSERT INTO xbt_files (info_hash, mtime, ctime) VALUES (%s, %s, %s)"
  cursor.execute(query, [info_hash, timestamp, timestamp])
  db.close()

This obviously is a ugly hack, but it seems to be the only way to fix it right now.
If I’m wrong then please correct my since I would obviously not like to have this in my code.

buffi Programming & scripting, Python

Quick and ugly fix for Data truncated error using FloatField in django 0.96

August 5th, 2007

I have a FloatFIeld in one of my models that looks lite this

score = models.FloatField(default=0.0, max_digits=2, decimal_places=1)

This would raise this error for me when inserting certain floats (since there are too many decimal places I guess…)

Data truncated for column 'score' at row 1

A very easy fix for this is to make a string representation of the float and then insert that into the FloatField. This works just fine due to “duck typing”. The string quacks like a float ;)

fixed_float = str(my_float)[:3] # only works for floats < 10 obviously

buffi Programming & scripting, Python

A nice example of when to use reduce in python

August 3rd, 2007

I just ran building a query in django that forced me to OR together a bunch of Q objects and I immediately realized that it was a great example for when reduce is nice.

Basically, I have a list of names as strings.

names = ["foo", "bar", "cake", "banana"] # example

Each of these represent a name of an object in a database. For each name I should build a Q object with it’s name argument set to that name. Then finally these queries should be OR’ed together to form a filter for later use.

Without using reduce it’s probably best done like this

def get_filter(names):
  queries = [Q(name = name) for name in names] # build Q objects
  name_filter = queries[0] # set up first Q to enable OR - loop
  for query in queries[1:]: # OR loop through the rest
      name_filter |= query
  return name_filter

With reduce it is reduced to this

def get_filter(names):
  queries = [Q(name = name) for name in names] # build Q objects
  return reduce(Q.__or__, queries) # reduce the Q objects using unbound OR

Much nicer, although the function for reduce can be debated. Importing operator.or_ or using a lambda x,y: x|y works as well.
Setting the filter to the first variable and then iterating over the rest OR’ing as we loop seems a lot more messy then just reducing it using OR :) Since OR’ing a Q object to anything else than a Q object (or an QOr or QAnd which is the result of doing OR or AND on Q objects…) will throw an exception it is needed though.

Also, both of these functions will throw exceptions when names is empty. Just returning an empty filter might be nicer, which of course is solved by simply checking if names is empty in the beginning of the function and in that case return an empty filter.

Edit: Please read comments for more info.

buffi Programming & scripting, Python