Rank results based on   options »
Loading Loading...
Computer Science

UofT
Valid HTML and CSS
Valid HTML 4.01 Transitional
Valid CSS!
BlogScope logos
BlogScope
BlogScope
BlogScope

About

BlogScope is an analysis and visualization tool for blogosphere which is being developed as part of a research prject at the the Department of Computer Science, University of Toronto.

Motivation: The explosive growth of the internet and the massive adoption of social media has created new ways for individuals to express their opinions online. Millions of bloggers across the globe are writing daily to produce one of the richest pool of information, blogosphere. Bloggers blog about diverse topics including their personal lives, product reviews, political opinions, technology trends, tourism experiences, sports events, and the entertainment industry. Without a doubt, blogging is a social phenomenon. This trend will persist and grow as our lifes become more heavily dependent on internet technologies. Given such trends there is pressing need to monitor such online forums continuously, and extract useful and actionable information regarding the "public opinion" on a variety of topics. BlogScope tries to help user discover knowledge from blogs by providing hints in form of bursts and correlations.

We currently have over 28.61 million blogs with 405.12 million posts in our database. After removing non-english content and spam posts, we index 228.75 million documents. This data can be analyzed using BlogScope, as demonstrated by this flash video. For regular use of BlogScope, Firefox search plugin can be installed by clicking here.

Know more about the system, take a tour.

We love feedback. Please write to us online or send mail to admin@blogscope.net.

Members

Collaborators Summer Interns Past Contributors

Publications

Nilesh Bansal, Sudipto Guha, Nick Koudas, Ad-Hoc Aggregations of Ranked Lists in the Presence of Hierarchies, to appear in Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, Vancouver, Canada, June 9-12 2008. (slides)

Nilesh Bansal, Fei Chiang, Nick Koudas, Frank Wm. Tompa, Seeking Stable Clusters in the Blogosphere, In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, Vienna, Austria, Sept 23-28 2007. (slides)

Nilesh Bansal, Nick Koudas, BlogScope: A System for Online Analysis of High Volume Text Streams, In Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB 2007, Vienna, Austria, Sept 23-28 2007, Demonstration Proposal.

Nilesh Bansal, Nick Koudas, Searching the Blogosphere, In Proceedings of the 10th international Workshop on Web and Databases, WebDB 2007, (co-located with SIGMOD) Beijing, China, June 15 2007. (slides)

Nilesh Bansal, Nick Koudas, BlogScope: Spatio-temporal Analysis of the Blogosphere, In Proceedings of the 16th international conference on World Wide Web, WWW 2007, Banff, Canada, May 8-12, 2007, Poster.

System

BlogScope is written in Java, and it runs on four Sun V40z server machine with RedHat Linux AS4. Main components include: a multi-threaded crawler with spam analyzer, indexing and searching module, statistics collection and access framework, popularity curve generator, correlation discovery module, natural language processor, and the web interface. Figure below summarizes the high level system architecture.

High level architecture of BlogScope
Powered By

Linux
Java
Apache
MySQL
Lucene
Tomcat

All this is built using many great open source libraries and utilities, which must be acknowledged.

Libraries

  • DBPool - Java database connection pooling
  • Informa - RSS library for Java
  • HTML Parser - A parser for real-world HTML
  • Lucene - Java-based indexing and search technology
  • Snowball - Stemming and language processing
  • JDOM - Java solution for XML
  • SAX - Simple API for XML
  • WhirlyCache - Object caching library
  • Apache Commons - Reusable Java components from Apache
  • Trove - High performance collections
  • DWR - Ajax for Java
  • Dojo - Javascript toolkit

Platform

  • MySQL - Database
  • Tomcat - Java servlet container
  • Java - The technology
  • Linux - The operating system

Development

Firefox


Home | About | Help | Team | Demo | Tour | Widgets | Contact
Monitoring over 28.61 million blogs with 405.12 million posts.
© 2006-2008 University of Toronto, all rights reserved. Patent Pending.
This public online version displays a subset of BlogScope technology.
To know more about the technology and complete set of features contact us.