Django (SVN) and non-unicode HTTP GET-data
Django merged with the unicode branch a while ago. If you are still using 0.96 then this doesn’t affect you, but if you are running the SVN-versionen then it does.
A HTTP request can have GET arguments such as http://mydomain.com/mysite/?foo=bar&banana=purple which has the GET data foo = “bar” and banana = “purple”. Not all characters are allowed to be in these request-URL’s which forces an alternative syntax which is the character % followed by the hexadecimal representation of the character. For an example the character with ascii value 255 can be represented as %FF.
Bittorrent is a good example of a protocol that sends data (for an example it’s info_hash which is a 20 byte sha1 hash) as HTTP GET-data which uses this encoding. Django does however not handle this well at all since the merge with the unicode branch.
Here is an example:
Let’s say that you have a page at http://mydomain.com/ which you want to send the character with ordinal 238 (EE in hex) as a GET argument.
http://mydomain.com/?info_hash=%EE
You might then have a views that does something like this
def handle_stuff(request): get_data = request.GET.copy() info_hash = get_data["info_hash"] assert(False) # for debugging
Assert false will bring up djangos VERY nice debug screen if you have debug turned on in your settings.py and show that
get_data = <MultiValueDict?: {u'info_hash': [u'\ufffd']}>
info_hash = u'\ufffd'
u’\ufffd’ is the unicode symbol for “OH FUCK THIS DIDN’T WORK” (or something like that).
Basically… forcing unicode in django broke recieving non unicode GET-data through the GET MultiValueDict.
Ok… so why am I posting it here and not as a bug report in django?
Well, I did also post a bug report about this today but the reason for posting here is that it exists an ok workaround for now, and I hope that this post might assist people in finding it.
request.META['QUERY_STRING'] hold the raw query string that is requested for a page request, and you can use the parse_qs (or parse_qsl) method in the cgi module to get the non-violated GET-data. It even properly escapes the %XX notation into “regular” characters.
Thank you Crast in #django @ freenode for reminding me about the existence of META['QUERY_STRING'].
get_data = cgi.parse_qs(request.META['QUERY_STRING'])
info_hash = get_data["info_hash"][0]
info_hash_as_hex = get_data["info_hash"][0].encode("hex")
Ugly? Yeah…
Functional? Yup!
Recent Comments