Search engines have been becoming the most important method of inform-ation retrieval on the Web. Traditional search engines make Web pages as retrieval units. By contract, object-level search engines make different kinds of objects , such as products,people, papers, etc., as retrieval units. Comp-ared with traditional search engines, object-level search engines provide a way of organizing search results with richer semantic information and improve users’ search experience. The objective of the project is to develop a method based on graph mining for object expansion that can enhance object level search. We would solve a series of critical problems of object level search including object ranking, hierarchical text categorization, etc.
We present a multi-plane model for representing the object relationship graph in object-level search.The first step of object-level search is to extract objects and the relationships among them. In our model, we use an object relationship graph to explain the objects and relationships. In several different topics on the ACM Portal data set, our model shows 10-20% improvement to PaperRank.
We propose a novel approach to enhance Latent Semantic Indexing (LSI) by utilizing category labels.In text categorization, terms in a document that also appear in category labels are more effective in categorizing the document than other terms. Motivated by this intuition, we propose to scale the term vectors of category labels in the term-document matrix before performing Singular Value Decomposition (SVD). We compared the performance of two variants of our approach with two baselines. In most cases, both of them significantly improve the classification performance, and in some cases SLSI even outperforms SVM.
We also consider the representation of the abstract objects. We proposed an approach of representing objects by several user queries. We call this kind of objects as “ query concepts”. We propose a novel algorithm for clustering queries on large scale click-through bipartite. The experiments on a large commercial search engine log clearly show this algorithm is both effective and efficient.
Query suggestion plays an important role in improving usability of search engines. Although some recently proposed methods suggest queries by mining query patterns from search logs, none of them are context-aware they do not take into account the immediately preceding queries as context in query suggestions. The following example illustrates why context is useful: Figure1: The framework of our approach for query suggestion
Suppose a user raises a query “MSRA”, his (or her) search intent may be information of Microsoft Research Asia or Methicillin resistant Staphylococcus aureus (a kind of communicable disease). If we know the previous query is “famous labs”, we can know the user is likely to be interested in the Microsoft Research Asia.
We propose a novel context-aware query suggestion approach by mining click-through data and session data. We make the following contributions. First, instead of mining patterns of individual queries which may be sparse, we summarize queries into concepts. A concept is a group of similar queries. Although mining concepts of queries can be reduced to a clustering problem on a bipartite graph, the very large data size and the “curse of dimensionality” pose great challenges. We may have millions of queries involving millions of URLs, and we want to form hundreds of thousands of concepts. To tackle these challenges, we develop a novel, highly scalable yet effective algorithm. Second, there are often a huge number of patterns that can be used for query suggestion. Mining those patterns and organizing them properly for fast query suggestion is far from trivial. We develop a novel concept sequence suffix tree structure to address this challenge.
Figure2: The demo of query suggestion
We build a demo of the context-aware concept based query suggestion. In this demo, we implement a proxy search engine which returns the search results of Microsoft Live search engine when receiving a query and provide the query suggestion generated by our approach. By now, we only deploy it in internal network and test it. We will public this demo to Internet in future and external users can visit it. This technique is potential to be transferred into a commercial search engine such as Live Search. This way, tens of millions of Web users will experience the object level query suggestion.
We have published submitted the following papers based on our research:
 H. Cao, D. Jiang, P. Jian, Q. He, Z. Liao, E. Chen and H. Li, Context-Aware Query Suggestion byMining Click-Through and Session Data, in ACM KDD’08, 2008 .This paper won the Best Application Paper Award of KDD’08.
 X. Quan, E. Chen, Q. Luo and H. Xiong, Adaptive Label-Driven Scaling for Latent Semantic Indexing, in ACM SIGIR’08, 2008
H. Cao, D. Jiang, J. Pei, E. Chen, H. Li, Towards Context-Aware Search by Learning A Very Large Variable Length Hidden Markov Model from Search Logs, WWW 2009.
H. Cao, D. Hu, D. Shen, D. Jiang, J. Sun, E. Chen, Q. Yang,Context-Aware Query Classification, SIGIR2009, Accepted.
E. Chen, L. Shi and D. Hu, Probabilistic Model for Syntactic and Semantic Dependency Parsing, The 12th Conference on Computational Natural Language Learning (CoNLL 2008)
E. Chen, X. Quan, Q. Luo and H. Xiong, Label-Relevant Latent Semantic Indexing for Hierarchical Text Categorization, submitted.