The abundance of information available on the Web and in Digital Libraries, in
combination with their dynamic and heterogeneous nature, has determined a rapidly
increasing difﬁculty in ﬁnding what we want when we need it and in a manner which
best meets our requirements.
A High Level Architecture of Content-based Systems
CONTENT ANALYZER – When information has no structure (e.g. text), some
kind of pre-processing step is needed to extract structured relevant information.
The main responsibility of the component is to represent the content of items
CONTENT ANALYZER – When information has no structure (e.g. text), some kind of pre-processing step is needed to extract structured relevant information.The main responsibility of the component is to represent the content of items(e.g. documents, Web pages, news, product descriptions, etc.) coming from information sources in a form suitable for the next processing steps. Data items are analyzed by feature extraction techniques in order to shift item representation from the original information space to the target one (e.g.Web pages represented as keyword vectors). This representation is the input to the PROFILE LEARNER and FILTERING COMPONENT;
Explicit feedback has the advantage of simplicity, albeit the adoption of numeric/symbolic scales increases the cognitive load on the user, and may not be adequate for catching user’s feeling about items. Implicit feedback methods arebased on assigning a relevance score to specific user actions on an item, such as saving, discarding, printing, bookmarking, etc. The main advantage is that they do
not require a direct user involvement, even though biasing is likely to occur, e.g.interruption of phone calls while reading.
Learning short-term and long-term profiles is quite typical of news filtering systems. NewsDude learns a short-term user model based on TF-IDF (cosine similarity), and a long-term model based on a na¨ıve Bayesian classifier by relying on an initial training set of interesting news articles provided by the user. The news source is Yahoo!News. In the same way Daily Learner, a learning agent for wireless information access, adopts an approach for learning two separate user-models. The former, based on a Nearest Neighbor text classification algorithm, maintains the short-term interests of users, while the latter, based on a na¨ıve Bayesian classifier, represents the long-term interests of users and relies on data collected over a longer period of time.