HCI'98 Short Paper. September 1-4, 1998. Sheffield, UK. See also presentation slides.

SiteSeer: An Interactive Treeviewer for Visualizing Web Activity

Eric Sigman, Robert Farrell, and Mark Rosenstein
Bellcore
445 South St., Morristown NJ
07960, U.S.A

ABSTRACT

SiteSeer is an interactive visualization tool designed to support the work of web site analysts in such tasks as understanding web site traffic patterns and effective advertisement placement. SiteSeer integrates visualization of the content, structure, and utilization aspects of a web space.

KEYWORDS

Visualization, World-Wide Web, Data Filtering, Fisheye View, Traffic Analysis

INTRODUCTION

The growth of commercial web sites has resulted in the need for a tool to aid in the monitoring, restructuring, and the introducing of new content into the site. These tasks depend on the user's knowledge of the content, organization and the current site visitors' usage patterns. SiteSeer is a prototype visualization tool designed for web analysis. In essence, the tool is an interactive tree viewer that uses a variety of techniques for visual emphasis, focusing, and information filtering. This paper describes the tool and its application to two tasks: understanding site traffic patterns and the effective placement of advertisments.

EARLIER WORK

SiteSeer is an outgrowth of our work with AMIT (Animated Multiscale Interactive Treeviewer) (Wittenburg & Sigman, 1997a, 1997b), and retains many of its basic features. AMIT is a tool aimed at integrating search and browsing on the World-Wide Web. It presents a web space as a tree structure. Font scaling and tree pruning are used to provide multifoveal fisheye views (Furnas, 1986), and animation provides transitions between the user's customized views. AMIT has been deployed for a web space of over 12,000 documents ( http://www.apparent-wind.com/sailing-page.html). Initially, an off-line "web walker" collects documents by following the outgoing link structure from a designated root node. The walker generates a directed graph of that space, and then the system represents this graph as a tree structure. In AMIT, the titles for these documents are presented as nodes in a tree. The text collected by the web walker is indexed by the Latent Semantic Indexing (LSI) (Deerwester et al., 1990) module. At runtime, a user's query to a LSI based search engine returns a list of document hits along with relevancy scores. AMIT generates a view of the tree pruned to show the hits exceeding a threshold; the relevancy scores are reflected in the font size used to render the node. Users customize the tree view through direct manipulation. For example, users can select a set of nodes as foci for a succeeding view. The new view will be reduced to the selected nodes and their paths to the root.

SITESEER FOR WEB SITE ANALYSIS

SiteSeer extends AMIT to encompass a repository of traffic and ad presentation data. This data is posted against the tree of hyperlinked documents. For example, heavily trafficked documents are represented by larger nodes. In this way, regions or pathways with high utilization become readily apparent. Often, users are concerned with characteristics of the traffic, such as the originating site, type of site, or day of the week. SiteSeer provides easy-to-use filters to extract this data through point and click dialog boxes. SiteSeer was applied to a site where advertisements were dynamically served to visitors. In this case, analysts want to know both how frequently ads are viewed throughout the site and ad effectiveness as measured by the number of visitors clicking on the ad banner. Typically, these analysts are seeking optimal advertisement placement. A visualization that combines structure and traffic supports this task. The user can formulate queries to filter ad data by various parameters including those available to filter the traffic data. An important feature is the sequential visualization of a query chain. Here, a view of the tree that results from a query can serve as input to a subsequent query. For example, a user could first query for the most frequently accessed documents, and then holding that structure fixed, query for ad data that would be overlaid on the current view of the tree. An even more interesting example utilizes the LSI search engine. Since LSI rates the similarities among the documents, it is possible to create a view based on documents that have similar content. A subsequent chained query can then be overlaid on this view. Thus, for example, a user could request pages with sports related content, and then overlay traffic data to discover promising regions of the site for placing sneaker ads. From the experience with SiteSeer, perhaps the most interesting future direction is to consider the issues of "path analysis." SiteSeer is limited to showing access to documents in a structural path, but does not show actual traversal behavior of visitors. A visualization that shows these traversals would likely answer questions on how people are actually navigating the site and help improve the site for visitors.

REFERENCES

Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W. and Harshman, R.A. (1990) Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6). (postscript or PDF)

Furnas, G.W. (1986) Generalized fisheye views. In the Proceedings of Human Factors in Computing Systems, CHI `86. (Boston, MA, April).

Wittenburg, K. and Sigman, E. (1997) Integration of Browsing, Searching, and Filtering in an Applet for Web Information Access. In the Proceedings of Human Factors in Computing Systems, CHI `97. (Atlanta, GA, March).

Wittenburg, K. and Sigman, E. (1997) Visual Focusing Techniques in a Treeviewer for Web Information Access. In the Proceedings of the IEEE Symposium on Visual Languages, VL 97. (Capri, Italy, September). (PDF).