A Theory of Indexing by Gerard Salton

By Gerard Salton

Provides a concept of indexing in a position to rating index phrases, or topic identifiers in reducing order of value. This ends up in the alternative of excellent rfile representations, and likewise money owed for the function of words and of word list sessions within the indexing strategy.

This learn is standard of theoretical paintings in computerized details association and retrieval, in that thoughts are used from arithmetic, laptop technology, and linguistics. a whole concept of details retrieval may perhaps emerge from a suitable mixture of those 3 disciplines.

Show description

Read or Download A Theory of Indexing PDF

Best probability books

Level crossing methods in stochastic models

Seeing that its inception in 1974, the extent crossing procedure for interpreting a wide category of stochastic versions has develop into more and more well known between researchers. This quantity strains the evolution of point crossing concept for acquiring chance distributions of country variables and demonstrates resolution tools in various stochastic versions together with: queues, inventories, dams, renewal types, counter versions, pharmacokinetics, and the usual sciences.

Structural aspects in the theory of probability

The e-book is conceived as a textual content accompanying the conventional graduate classes on likelihood thought. an incredible characteristic of this enlarged model is the emphasis on algebraic-topological points resulting in a much broader and deeper realizing of uncomplicated theorems akin to these at the constitution of constant convolution semigroups and the corresponding approaches with autonomous increments.

Steps Towards a Unified Basis for Scientific Models and Methods

Tradition, in reality, additionally performs an enormous function in technology that is, according to se, a mess of alternative cultures. The e-book makes an attempt to construct a bridge throughout 3 cultures: mathematical facts, quantum idea and chemometrical tools. after all, those 3 domain names shouldn't be taken as equals in any feel.

Extra resources for A Theory of Indexing

Example text

48 G. SALTON TABLE 20 Average precision values at indicated recall points for three collections Standard term Phrases formed from Phrases formed from frequency high frequency medium frequency weights nondiscriminators discriminators /? 3854 SPT PT ST P Standard term frequency weighting (word stem run). Single terms, pairs and triples used in queries and documents. Pairs and triples used; corresponding single terms deleted. Single terms retained; triples added. Pairs added; corresponding singJe terms deleted.

Standard TF:f\ A. 0084 A :> B A ;> B 23 % 8% To summarize, several methods based on the multiplication of standard term frequency weights by inverse document frequency and discrimination values have been found that appear to offer high performance standards. Among the methods which offer statistically significant improvements over the standard term weighting procedures for all processing environments, the following are the most promising: (a) ft standard weights with elimination of poor discriminators; (b) /* • WFk without elimination, or with elimination of poor discriminators or of terms with high document frequency; (c) fkt-DVk with elimination of poor discriminators or of high frequency terms.

1, averaged over the 24 user queries that are utilized with each collection. TABLE 9 Comparison of binary and term frequency weighting with and without inverse document frequency normalization Binary Term frequency Binary with weights weights IDF weights with IDF $ /! 1 CRAN MED Time Term frequency A THEORY OF INDEXING 29 Four weighting procedures are used to produce the output of Table 9, including binary term weights £>,, term frequency weights /*, and binary as well as term frequency weights multiplied by an inverse document frequency factor, designated (IDF)k in Table 9.

Download PDF sample

Rated 4.38 of 5 – based on 14 votes