a suggestion for efficient and scalable counters in Datastore

code samples, urlborg — Tags: , , — Panayotis @ 13:04

As I’ve mentioned before, I’m trying to migrate urlBorg to Google AppEngine. urlBorg needs to count many things, like clicks on a short URL, etc, so I really need a scalable and efficient way to implement counters. This is not as trivial as it sounds in the Google AppEngine environment.

This post is actually the result of a good discussion done here

Here is the code I’ve come up with.
An example usage would be as simple as adding a line like (where page_id is a unique string identifying each page)

Acc(page_id).acc()

in each one of your pages. Getting the total coun is as simple as

Acc(page_id).val()

(Due to the way the total count is calculated, this may not give accurate results if you are in the middle of a traffic spike, but it’s good enough for web analytics usage)

class AccVals(db.Model):
       cluster = db.StringProperty(required=True)
       count = db.IntegerProperty(required=True)
       updated = db.DateTimeProperty(auto_now=True)
       rand = db.FloatProperty()

class Acc():
       def __init__(self, name,init=0):
               self.__sec = 0.1
               self.__name = name
               self.__init = init

       def inc(self):
               def trans(key):
                       obj = AccVals.get(key)
                       obj.count += 1
                       obj.put()
                       self.__val = obj.count

               q = db.Query(AccVals).filter('cluster =',self.__name).filter('rand >', random.random()).get()
               if (q):
                       if (datetime.datetime.now() - q.updated < datetime .timedelta(0,self.__sec)):
                               obj = AccVals(cluster=self.__name,count=self.__init, rand=random.random() )
                               key = obj.put()
                       else:
                               key = q.key()
               else:
                       obj = AccVals(cluster=self.__name,count=self.__init, rand=1.0 )
                       key = obj.put()

               db.run_in_transaction(trans,key)
               return self.__val

       def val(self):
               total = 0
               q = AccVals.all()
               q.filter('cluster =',self.__name)
               for r in q:
                       total += r.count
               return total

It behaves relatively good and looks like it can scale no matter how
much traffic or traffic spikes you have.

If you look into it, you will see that a “counter instance” is chosen
in random. You may be tempted to use the “instance” that was updated
longer in the past ( order(’-updated’).get() ), but it turns out that
when you have a traffic spike (or whatever it is your counters count)
the indexes are not updated soon enough and this will return the last
records that were updated :-) It looks like selecting a random
instance is no big deal in low traffic and works much better in high
traffic. I’ve also seen that after a while, you end up with the number
of counter instances that are required to handle the traffic of the
specific counter with few transaction collisions.

There is one interesting point: the value of self.__sec. I set it to
0.1 seconds, but this is just a value that looked good after some
tests. I have the impression that this value is *related* to some kind
of “global AppEngine constant”, measuring the time it takes for a
transaction to complete and safely propagate to the rest of the
infrastructure. I guess this varies, depending on the resource
allocation done for a specific app. Could someone from the AppEngine
development team give us some insight on this?

As I’ve mentioned before, I’m a Python newbie, so use the code above
at your risk :-)

Please post your comments here, so that they are all in one place.

unique integer IDs in Google datastore

code samples — Tags: , , — Panayotis @ 09:04

update: A good discussion on the topics mentioned in this article can be found here, please read it before using the code :-)

newbie code ahead! Use at your own risk :-)

One of the first problems I faced when trying to build an application in Google AppEngine, was the lack of something like a “unique, auto_increment” column type in the datastore. How do I maintain a unique numeric id in a way that is guarantied work even under heavy use, and concurrent requests?

Here is some code I came up with, that seems to work. I’m a python newbie, so please don’t hesitate to point out any mistakes!

What’s more, I’m just going through the Google AppEngine quirks, so I’m not aware of how to optimize the code or of any performance considerations implied by it. Once again, any comments are more than welcome!

class Idx(db.Model):
        name = db.StringProperty(required=True)
        count = db.IntegerProperty(required=True)

class Counter():
        """Unique counters for Google Datastore.
        Usage: c=Counter('hits').inc() will increase the counter 'hits' by 1 and return the new value.
        When your application is run for the first time, you should call the create(start_value) method."""
        def __init__(self, name):
                self.__name = name
                res = db.GqlQuery("SELECT * FROM Idx WHERE name = :1 LIMIT 1", self.__name).fetch(1)
                if (len(res)==0):
                        self.__status = 0
                else:
                        self.__status = 1
                        self.__key = res[0].key()

        def create(self, start_value=0):
                """This method is NOT "thread safe". Even though some testing is done,
                the developer is responsible make sure it is only called once for each counter.
                This should not be a problem, since it sould only be used during application installation.
                """

                res = db.GqlQuery("SELECT * FROM Idx WHERE name = :1 LIMIT 1", self.__name).fetch(1)
                if (len(res)==0):
                        C = Idx(name=self.__name, count=start_value)
                        self.__key = C.put()
                        self.__status = 1
                else:
                        raise ValueError, 'Counter: '+ self.__name +' already exists'

        def get(self):
                self.__check_sanity__()
                return db.get(self.__key).count

        def inc(self):
                self.__check_sanity__()
                db.run_in_transaction(self.__inc1__)
                return self.get()

        def __check_sanity__(self):
                if (self.__status==0):
                        raise ValueError, 'Counter: '+self.__name+' does not exist in Idx'
                else:
                        pass

        def __inc1__(self):
                obj = db.get(self.__key)
                obj.count += 1
                obj.put()

Suppose you have a Products class that looks like this

class Product(db.Model):
        Serial_ID = db.IntegerProperty(required=True)
        Name = db.TextProperty(required=True)

You should have an “installation page” that is only called once during your application installation and does something like this to create the counter Product_Serial_ID with initaial value 0.

s = Counter('Product_Serial_ID').create(0)

Calling the above code for a second time will raise an exception, but concurrent calls may have unexpected results.

Inserting a new product in the datastore:

P = Product(Serial_ID=Counter('Product_Serial_ID').inc(), Name='Product Name')
P.put()

Please note that if put() fails, the next time you try to insert the product you will get a new Product_Serial_ID. But at least you can be sure it’s unique and incremental :-)

AppEngine Datastore limitations

urlborg — Tags: , , — Panayotis @ 00:04

I’ve been trying to decide if moving urlBorg to Google App Engine is a good idea. The pros are obvious: scalability. There are many features I’ve wanted to implement for urlBorg but never did because I’m afraid that if it turns into a hit, my server will go down.

I mean, creating short URLs is a trivial thing. If you want to make a service that stands out, it has to be that it takes care of the little details in a much better way than the rest. And you have to be sure that your service will be able to scale.

So, moving urlBorg to Google App Engine should be a no brainer, right? Wrong.

My main issue is AppEngine Datastore.

The App Engine datastore is not a relational database. While the datastore interface has many of the same features of traditional databases, the datastore’s unique characteristics imply a different way of designing and managing data to take advantage of the ability to scale automatically.

So, forget about queries involving group functions like count(*), min(), max()… :-(

I wish they had some good examples on how to use the AppEngine Datastore to do data mining. How should/would a “web analytics” application be implemented using AppEngine for example?

urlborg.xml (video)

urlborg — Tags: — Panayotis @ 14:02

On the urlborg.xml front, a lot has been going on during the last days. A new interface, documentation, etc. Check out urlBorg developers blog for more.

the next version of eyetv3 may support XMLTV

misc — Tags: , , — Panayotis @ 12:02

I recently had the chance to get my hands on a beta version of elgato eyetv (v3.0.1b31). The cool thing about it? It will read TV listings from XMLTV files!

XMLTV is very important for users that live in areas for wich titanTV or tvtv has few or no information -like Greece, where I live.

I created an XMLTV file with the greek TV stations and their listings for the next couple of days. Then I drag’n'dropped it on eyetv, and… wow! For the first time in the last 3 years that I have been using eyetv, I was able to have a TV guide on it!

eyetv3 with XMLTV support

Once eyetv reads the XML file, you can go to the “Channels” listing and map your existing channels with the ones in the XML file. The “EPG” column will have one more option now, “xmltv”. Just click on it and eyetv it will try and match the TV station name you have set manually with the ones in XML. If it can’t you are presented with a list of all the channels in the XML file, and you pick the one you want.

eyetv3 with XMLTV support

I wish they added some more features in the final release, like the ability to subscribe to a remote XMLTV URL -in the same way you can manually subscribe to a podcast in iTunes by entering the feed URL. This would allow advanced users (it may require some hackery) to make their XMLTV files public for the less tech savvy.

The bottom line: I love it!

facebook photos not private?

misc — Tags: , — Panayotis @ 23:02

I just noticed: If you know the URL of a photo in facebook, you don’t have to have permissions to see it! You don’t even have to log in to facebook!

Check out, one of mine.

Is this a privacy hole? Is it a feature? Should users be concerned?

urlBorg developers blog, prologue theme and twitter

urlborg — Tags: , , , — Panayotis @ 21:02

I set up a separate blog, blog.urlborg.com where I will post news about urlBorg.

I chose the Prologue wordpress theme that resembles twitter a lot in look and feel. I also installed Twitter Tools and set it up to update twitter.com/urlborg. I like the way those too integrate together.

checking your web server logs for urlB.org links

urlborg — Tags: — Panayotis @ 15:02

If you are a site owner, you probably like knowing how your content is used, even if this is just an incoming link. One of the nice features of urlB.org is that you can easily tell which urls of your site have been “shortened” by it, just by looking at your server logs!

Here are a couple of entries from my logs:

grep "urlBorg/1.0" < access_log | grep -v HEAD 

69.73.152.127 - - [14/Feb/2008:16:48:03 +0200] "GET /weblog/ HTTP/1.1" 200 26385 "-" "urlBorg/1.0 (+http://www.urlb.org/) [lgfy]"
69.73.152.127 - - [14/Feb/2008:16:48:20 +0200] "GET /weblog/2007/12/08/3214/ HTTP/1.1" 200 19345 "-" "urlBorg/1.0 (+http://www.urlb.org/) [lggw]"

Not only I am able to see when a shortcut was created to one of my pages, but I can also see the “URL key” (ex. “lgfy”) -using the API I could get extra info about the short URL, like stats, etc.

eating my own dogfood

misc — Tags: — Panayotis @ 04:02

I’ve started rewriting the urlB.org front end using the public API. It works, and soon the API will give developers even more functionality.

I want developers to use urlB.org as a backend service for their apps.

urlBorg as OS X system-wide service

urlborg — Tags: — Panayotis @ 14:02

Download

Next Page »
This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 Unported License.
(c) 2008 vrypan|net|log | powered by WordPress with Barecity