Home > Programming & scripting, Python > Django (SVN) and non-unicode HTTP GET-data

Django (SVN) and non-unicode HTTP GET-data

August 28th, 2007

Django merged with the unicode branch a while ago. If you are still using 0.96 then this doesn’t affect you, but if you are running the SVN-versionen then it does.

A HTTP request can have GET arguments such as http://mydomain.com/mysite/?foo=bar&banana=purple which has the GET data foo = “bar” and banana = “purple”. Not all characters are allowed to be in these request-URL’s which forces an alternative syntax which is the character % followed by the hexadecimal representation of the character. For an example the character with ascii value 255 can be represented as %FF.

Bittorrent is a good example of a protocol that sends data (for an example it’s info_hash which is a 20 byte sha1 hash) as HTTP GET-data which uses this encoding. Django does however not handle this well at all since the merge with the unicode branch.

Here is an example:

Let’s say that you have a page at http://mydomain.com/ which you want to send the character with ordinal 238 (EE in hex) as a GET argument.
http://mydomain.com/?info_hash=%EE

You might then have a views that does something like this

def handle_stuff(request):
  get_data = request.GET.copy()
  info_hash = get_data["info_hash"]
  assert(False) # for debugging

Assert false will bring up djangos VERY nice debug screen if you have debug turned on in your settings.py and show that

get_data = <MultiValueDict?: {u'info_hash': [u'\ufffd']}>
info_hash = u'\ufffd'

u’\ufffd’ is the unicode symbol for “OH FUCK THIS DIDN’T WORK” (or something like that).
Basically… forcing unicode in django broke recieving non unicode GET-data through the GET MultiValueDict.

Ok… so why am I posting it here and not as a bug report in django?
Well, I did also post a bug report about this today but the reason for posting here is that it exists an ok workaround for now, and I hope that this post might assist people in finding it.

request.META['QUERY_STRING'] hold the raw query string that is requested for a page request, and you can use the parse_qs (or parse_qsl) method in the cgi module to get the non-violated GET-data. It even properly escapes the %XX notation into “regular” characters.
Thank you Crast in #django @ freenode for reminding me about the existence of META['QUERY_STRING'].

get_data = cgi.parse_qs(request.META['QUERY_STRING'])
info_hash = get_data["info_hash"][0]
info_hash_as_hex = get_data["info_hash"][0].encode("hex")

Ugly? Yeah…
Functional? Yup!

buffi Programming & scripting, Python

  1. me
    March 15th, 2008 at 22:14 | #1

    not working, info_hash is still unicode and therefore info_hash_as_hex is not valid.

    tested on 7243 svn revision

  2. John
    May 9th, 2008 at 19:57 | #2

    Man thanks so much for putting this up, it’s been days for me to figure it out!

    Thanks again, and hope to see that source for geektorrent very soon!

    JOHN

  1. No trackbacks yet.