Using Google App Engine blobstore with web2py
File uploads on gae-hosted web2py sites have to be stored in the database since the file system is read-only. Uploaded binary data can be held in a blob datatype but this type is limited to a maximum size of 1MB. In addition, any one http request to GAE has a data transfer limit of 10MB.
Google now support a blobstore API (http://code.google.com/appengine/docs/python/blobstore/overview.htm) to upload, store and serve up larger files. Blobs up to 50Mb each can be stored in the blobstore, and then a key as a text string is provided as a way of accessing this blob. The blobstore API is currently listed by Google as experimental.
File uploads take place on a specific url google generated for you at the time of upload, so uploads can take place outside of the usual GAE 10MB limit.
To subsequently provide the file as a download, the blob key string can be injected as a http reader of a blank page response. GAE then detects this header, hooks into the page output and outputs the file contents (up to 50MB downloads).
This can all be tested in development with dev_appserver.py. For production you need to enable billing for the GAE hosted app, however you get the first 1GB of file storage for free.
Web2py file uploads currently only work with blob datatypes. I've adapted a project I'm working on using web2py 1.75.4 to successfully use the blobstore API for file uploads. When running in a non-gae environment it falls back to standard web2py file upload behaviour.
This is by no means a generic plug-in yet but I will list the steps I took in case anyone else would like to try this:
- modify db definition of file upload table
to my existing table I added a blob_key string field:
Field('blob_key', readable=False, writable=False) )
- Create blank upload method
Once the blobstore completes an upload, it needs to redirect back to a regular web2py url. I set up a new controller file called gae.py with a blank method called 'upload' for this.
- Modify existing form upload controller code
I had existing controller code to output a form upload in a page as follows:
media_form=SQLFORM(db.my_uploads,fields=['file']) if media_form.accepts(request.vars,session): response.flash='media file uploaded' ...
I modified this code so when running on gae hosting, the form submit will jump to a generated blobstore url:
media_form=SQLFORM(db.my_uploads,fields=['file']) #Get blobstore file upload url if on gae upload_url = "" if request.env.web2py_runtime_gae: from google.appengine.ext import blobstore upload_url = blobstore.create_upload_url(URL(r=request,c='gae',f='upload',args=id)) media_form['_action']=upload_url if media_form.accepts(request.vars,session): response.flash='media file uploaded' ...
blobstore.create_upload_url generates a url for a google-supplied handler to carry out the file upload.
This method takes in as one parameter the web2py url to jump back to after the upload has completed, which I have pointed to my blank upload method.
In my case each file upload record also holds a reference to a related table. For my purposes I pass that reference into the url here as args=id. The upload method will be responsible for actually inserting the file upload record.
- fix web2py unicode issue (Warning: This is a hack)
The web2py framework was crashing for me when the blobstore upload handler tried to invoke my upload.url. What I found was that request.env.PATH_INFO was coming in as a unicode string which was not usually the case, and this was causing an error later in during gluon/main.py, session.connect.
To fix this I modified my gaehandler.py and added one line to the wsgiapp method, just before the return statement:
def wsgiapp(env, res): ... env['PATH_INFO'] = str(env['PATH_INFO']) return ...
- implement upload url
This is called by the blobstore API once it has processed and stored the raw upload data. Information about the upload file(s) is available in the page request as mime-encoded data. The blobstore API provides individual methods to access this data and turn it back into blobinfo objects, and it also provides a WSGI request handler base class to do the same thing.
I could not figure out how to get the individual method calls working, so I wrapped a call to the request handler instead. Here is my gae.py controller with upload method:
from google.appengine.api import users with from google.appengine.ext import blobstore from google.appengine.ext import webapp from google.appengine.ext.webapp import blobstore_handlers from google.appengine.ext.webapp.util import run_wsgi_app #web2py controller, handle gae upload def upload(): #define WSGI request handler for upload class UploadHandler(blobstore_handlers.BlobstoreUploadHandler): def post(self): upload_files = self.get_uploads('file') blob_info = upload_files globals()['blob_info'] = blob_info #create wsgi application application = webapp.WSGIApplication([(request.env.path_info, UploadHandler)],debug=True) application(request.wsgi.environ,request.wsgi.start_response) blob_info = globals()['blob_info'] #Create new file upload record db.my_file_uploads.insert(related_table=request.args, filename=blob_info.filename, blob_key=blob_info.key()) session.flash='file uploaded' redirect(URL(r=request,c='admin',f='edit',args=request.args))
So UploadHandler is a class derived from the google-supplied blobstore_handlers.BlobstoreUploadHandler. This responds to a web request. The line
application = webapp.WSGIApplication([(request.env.path_info, UploadHandler)],debug=True)
sets up a fake web request with the same incoming url as the current url, and invokes this upload handler method.
Within the handler I retrieve the blobinfo object with details on the file upload. I did not know how to pass this back to the calling method, so as a hack I store this as the global variable blob_info.
Once the WSGI handler completes, back in regular web2py code I save details in my file upload table, then finally redirect to an appropriate url specific to my application.
Just a mention about the database field 'filename' - this is my own text field, separate from the built-in web2py file database type. In a gae installation this field gets set to blob_info.filename as above. For completeness here is how this gets set in the non-gae version:
In non-gae mode because the form url is not modifed, the media_form.accepts line gets the chance to run on the HTTP form post when the file upload is submitted. In this situation I have the following logic to retrieve the filename after successful upload handling:
if media_form.accepts(request.vars,session): new_record = (db.my_uploads.id==media_form.vars.id).select() new_record.update_record(filename = request.vars.file.filename)
- Implement download url.
When the file upload table is displayed by sqlform and crud methods, the hyperlink for the download field is generated from the defintion of the table in db.py. These links will fail with GAE blobstore uploaded files.
In my project I put the following hack in for crud generated urls to work with gae-uploaded files:
- set the hyperlink generated for this field in crud/sqlform methods to a custom download url
In db.py, to my my_uploads table I added a represent property to my file column definition as follows:
Field('file', 'upload', ... represent=lambda file : A('download', _href=URL(r=request, c='gae', f='download', args=file))
- when a blobstore file upload is processed, set the file database column to something
previously the file column would be blank when the website is gae hosted since blobstore is handling everything. Now I set this to a unique id just so it can be used as a lookup.
In gae.py, upload handler I modifed my existing:
#Create new file upload record db.my_file_uploads.insert(related_table=request.args, filename=blob_info.filename, blob_key=blob_info.key())
db.my_uploads.upload_field=False db.my_uploads.insert(related_table=request.args, filename=blob_info.filename, blob_key=blob_info.key(), file=str(uuid.uuid4()).replace('-',''))
Running in non-GAE mode, column file will be the standard unique string used by the built-in web2py file upload logic. In gae mode the same field is now a unique id I set with the call to uuid.uuid4()
- provide a custom download handler
Thanks to the represent property above, I can funnel all download requests to the following controller method:
(in controller gae.py)
def download(): #handle non-gae download if not request.env.web2py_runtime_gae: return response.download(request,db) #handle gae download my_uploads=db(db.my_uploads.file==request.args).select() blob_info = blobstore.get(my_uploads.blob_key) response.headers['X-AppEngine-BlobKey'] = my_uploads.blob_key; response.headers['Content-Type'] = blob_info.content_type; response.headers['Content-Disposition'] = "attachment; filename=%s" % blob_info.filename return response.body.getvalue()
Examining the non-gae portion first: the request args are just forwarded to the regular response.download method to handle
In gae mode the file column of this record will be blank. Instead the blob_key string will be set to a key I can use to look up the blobstore api with. The X-AppEngine-Blobkey response header is set to this key to tell GAE we want the blobstore download helper to kick in for this page response. We also have to manually set the response headers Content-Type and Content-Disposition here so web browsers will show a File Save As dialog and not just dump the binary in the webpage window.
(the blobstore name is available with the 'from google.appengine.ext import blobstore' that already exists at the top of my gae.py file)
I could have implemented this by invoking a BlobstoreDownloadHandler class like I did for BlobstoreUploadHandler, but in this instance the logic was simple enough to avoid this and use native web2py code instead.