Ttracker - How it Works

2010-02-12 Moving docs to http://n2.talis.com/wiki/Ttracker

Source code under http://hyperdata.org/svn/ttracker/ highlighted on Wiki at http://hyperdata.org/wiki/browser/ttracker

Client-side

HTML pages you wish to track should contain a block of Javascript which will do an AJAX call to a server-side PHP script. This Javascript looks like this:

<!-- Ttracker -->
<script type="text/javascript">
document.write(unescape("%3Cscript src='http://hyperdata.org/ttracker/js/tt.js' type='text/javascript'%3E%3C/script%3E"));
var tracker = new Tracker();
var url = tracker.getEscapedUrl();
var datetime = tracker.getEscapedDateTime();
var form = "%3Cform name='ttracker' action='http://hyperdata.org/ttracker/php/tracker.php' method='post'%3E"
         + "%3Cinput type='text' name='url' value='"+url+"' /%3E"
         + "%3Cinput type='text' name='datetime' value='"+datetime+"' /%3E"
         + "%3C/form%3E";
document.write(unescape(form));
document.ttracker.submit();
</script>
<!-- End Ttracker block --> 

The auxiliary script tt.js pulls out the page URI etc. from HTML DOM and generates a timestamp. These are passed to the server through a POST call from a dynamically-created HTML FORM (above).

see: index.html

Server-side

The server-side handling is achieved through a bunch of PHP scripts. Processing flow occurs through tracker.php.

Although the scripts do take a second or two to run, this is unlikely to be an issue as the server-side calls are asynchronous and so shouldn't affect normal browser activity.

The basic processing sequence is as follows:

  • Request data is obtained (including values passed through POST call)
  • Data is RDFized (with helper requestrdf.php)
  • RDF is POSTed to Store
  • A SPARQL ASK query is run against the store to see if the target page itself has been analysed yet (the triple <pageURI> rdf:type foaf:Document is used as a flag)
  • ... If the page hasn't been analysed, disembed.php is used to parse out any embedded data
  • ... Flag triple is added (still in disembed.php)
  • ... Resulting RDF (plus flag triple) is POSTed to store

The current version includes some custom logging (via Apache) for debugging purposes.