Ttracker - How it Works
2010-02-12 Moving docs to http://n2.talis.com/wiki/Ttracker
Source code under http://hyperdata.org/svn/ttracker/ highlighted on Wiki at http://hyperdata.org/wiki/browser/ttracker
Client-side
HTML pages you wish to track should contain a block of Javascript which will do an AJAX call to a server-side PHP script. This Javascript looks like this:
<!-- Ttracker -->
<script type="text/javascript">
document.write(unescape("%3Cscript src='http://hyperdata.org/ttracker/js/tt.js' type='text/javascript'%3E%3C/script%3E"));
var tracker = new Tracker();
var url = tracker.getEscapedUrl();
var datetime = tracker.getEscapedDateTime();
var form = "%3Cform name='ttracker' action='http://hyperdata.org/ttracker/php/tracker.php' method='post'%3E"
+ "%3Cinput type='text' name='url' value='"+url+"' /%3E"
+ "%3Cinput type='text' name='datetime' value='"+datetime+"' /%3E"
+ "%3C/form%3E";
document.write(unescape(form));
document.ttracker.submit();
</script>
<!-- End Ttracker block -->
The auxiliary script tt.js pulls out the page URI etc. from HTML DOM and generates a timestamp. These are passed to the server through a POST call from a dynamically-created HTML FORM (above).
see: index.html
Server-side
The server-side handling is achieved through a bunch of PHP scripts. Processing flow occurs through tracker.php.
Although the scripts do take a second or two to run, this is unlikely to be an issue as the server-side calls are asynchronous and so shouldn't affect normal browser activity.
The basic processing sequence is as follows:
- Request data is obtained (including values passed through POST call)
- Data is RDFized (with helper requestrdf.php)
- RDF is POSTed to Store
- A SPARQL ASK query is run against the store to see if the target page itself has been analysed yet (the triple <pageURI> rdf:type foaf:Document is used as a flag)
- ... If the page hasn't been analysed, disembed.php is used to parse out any embedded data
- ... Flag triple is added (still in disembed.php)
- ... Resulting RDF (plus flag triple) is POSTed to store
The current version includes some custom logging (via Apache) for debugging purposes.