<< back to projects

Digital Collections Now!

A version of this was published on the ACRL TechConnect blog on 2014-06-17. Scroll down to the bottom for some additional details on this project that were not included in that post.

When Google Analytics first turned on real-time reporting it was mesmerizing. I could see what resources on the NCSU Libraries’ Rare and Unique Digital Collections site were being viewed at the exact moment they were being viewed. Or rather I could view the URL for the resource being viewed. I happened to notice that there would sometimes be multiple people viewing the same resource at the same time. This gave me some hint that today someone’s social share or forum post was getting a lot of click throughs right now or something happened in the news and we had an image of one of the people involved. I could then follow up and see examples of where we were being effective with search engine optimization.

The Rare & Unique site has a lot of visual resources like photographs and architectural drawings. I wanted to see the actual images that were being viewed. The problem, though, was that Google Analytics does not have an easy way to click through from a URL to the resource on your site. I would have to retype the URL, copy and paste the part of the URL path, or do a search for the resource identifier. I just wanted to see the images now. (OK, this first use case was admittedly driven by the great virtue of laziness.)

My first attempt at this was to create a page that would show the resources which had been viewed most frequently in the past day and past week. To enable this functionality, I added some custom logging that is saved to a database. Every view of every resource would just get a little tick mark that would be tallied up occasionally. These pages showing the popular resources of the moment are then regenerated every hour.

It was not a real-time view of activity, but it was easy to implement and it did answer a lot of questions for me about what was most popular. Some images are regularly in the group of the most-viewed images. I learned that people often visit the image of the NC State men’s basketball 1983 team roster which went on to win the NCAA tournament. People also seem to really like the indoor pool at the Biltmore estate.

Really Real-Time

Now that I had this logging in place I set about to make it really real-time. I wanted to see the actual images being viewed at that moment by a real user. I wanted to serve up a single page and have it be updated in real-time with what is being viewed. And this is where the persistent communication channel of WebSockets came in. WebSockets allows the server to immediately send these updates to the page to be displayed.

People have told me they find this real-time view to be addictive. I found it to be useful. I have discovered images I never would have seen or even known how to search for before. At least for me this has been an effective form of serendipitous discovery. I also have a better sense of what different traffic volume actually feels like on good day. You too can see what folks are viewing in real-time now.

Details

The pieces required to make this all work are a Web application, a WebSocket server, and two browsers. Here is the flow through the application. A user requests a resource page. On the server the Web application sends a short message to the WebSocket server with the identifier of that resource. The WebSocket server broadcasts the message on a channel where all of the browser clients on the real-time page are listening for updates. When a browser client gets the message with the identifier, it makes a request back to the web application for the HTML snippet to display on the page.

This could be simplified some by sending the HTML snippet through the WebSocket server or not requiring everything to be routed through the web application. By doing things the way that I have, I am sending the shortest messages possible and allowing the WebSocket server to be rather dumb. The WebSocket server just needs to accept messages from particular applications or IP addresses and forward them along to every listening client. The security model becomes a lot simpler with the Web application, and not the WebSocket server, being responsible for validating that a message ought to be broadcast. The ability to reuse the WebSocket server in this configuration has also allowed me to reuse the same server for other projects.