Aggregated Search : Apa itu.

  • by

Silahkan baca saja di

Lebih lengkap dan terstruktur :

Catetan dari tim yahoo :

rangkuman yang yahoo

Aggregated Search

Aggregated search addresses the task of searching and assembling information from a variety of information sources on the Web (i.e., the verticals) and placing it in a single interface. There are differences between “standard” web search and aggregated search.1 With the former, documents of the same nature are compared (e.g., web pages or images) and ranked according to their estimated               relevance to the query. With the latter, documents of a different nature are compared (e.g., web pages against images) and their relevance estimated with respect to each other. These heterogeneous information items have different features, and therefore cannot be ranked using the same algorithms. Also, for some queries (e.g., “red car”), it may make more sense to return, in addition to standard web results, documents from one vertical (e.g., image vertical) than from another (e.g., news vertical). In other words, the relevance of verticals differ with queries. The main challenge in aggregated search is how to identify and integrate relevant heterogeneous results for each given query into a single result page

Komponen Aggregate Search

Aggregated search has three main components: (1) vertical representation, concerned with how to represent verticals so that the documents contained within and their type are identifiable, (2) vertical selection, concerned with how to select the verticals from which relevant documents can be retrieved, and (3) result presentation, concerned with how to assemble results from selected verticals so as to best layout the result page with the most relevant information

Vertical Representation

To select the relevant vertical(s) for each submitted query (described in Sect. 5.4), an aggregated search engine needs to know about the content of each vertical (e.g., term statistics, size, etc.). This is to ensure that, for example, the query “tennis” is passed to a sport vertical, whereas the query “Madonna” is sent to a music vertical and eventually a celebrity vertical. For this purpose, the aggregated search system keeps a representation of each of its verticals.

NB : kemungkinan NLP dijadikan sebagai vertical juga

In aggregated search, various statistics about the documents (e.g., term frequency, vertical size) contained in the vertical are available—for those verticals operated by the same body—and are used to generate the vertical representation. Therefore, a vertical representation can be built using techniques from federated search working with cooperative resources.10 A technique reported in the literature is the generation of vertical representations from a subset of documents, so-called sampled documents. Two main sampling approaches have been reported in the literature (Arguello et al. 2009), one where the documents are sampled directly from the vertical, and another using external resources.

Jadi, sampling untuk aggregat yang pertama adalah sampling yang diambil dari vertical, dan sampel lain menggunakan sumber luar misal wikipedia, dan sample tersebut di aggregate untuk menjadi patokan sebereapa relevan sebuah koten.

In addition, using high-frequency vertical queries on a regular basis (how regularly depends on the vertical), is a way to ensure that vertical representations are up-to-date, in particular for verticals with a highly dynamic content, such as news. For this type of verticals, users are mostly interested in the most recent content.

Overall, using high-frequency vertical queries to sample the vertical directly, together with sampling from external sources such as Wikipedia, can ensure that the most relevant and recent vertical content can be retrieved (popular and/or peak queries), while still providing a good coverage of the vertical content (useful for tail queries), and that text-impoverished verticals can be properly represented (image and video verticals)

Vertical Selection

Vertical selection is the task of selecting the relevant verticals (none, one or several) in response to a user query. Vertical selection is related to resource selection in federated search, where the most common approach is to rank resources based on the similarity between the query and the resource representation.

Vertical selection also makes use of the vertical representations, in a way similar to federated search, but has access to additional sources of evidence. Indeed, verticals focus on specific types of documents (in terms of domain, media, or genre). Users searching for a particular type of document (e.g., “images of red cars”) may issue domain/genre/media specific words in the query (e.g., “picture”). The query string therefore constitutes a source of evidence for vertical selection. Second, vertical search engines are being used explicitly by users, so associated query logs may be available, which can be exploited for vertical selection. These two sources of evidence can be used to provide directly to the aggregated search engine a so-called “vertical relevance value”, by the vertical search engine, reflecting how relevant the vertical content might be to the query.

Tinggalkan Balasan

Alamat email Anda tidak akan dipublikasikan. Ruas yang wajib ditandai *