The future of the Web remains a mystery; nevertheless, it is
possible to speculate on the directions that web search engines
might take. There is little doubt that the Web will continue to
change and to grow and that web search engines must respond to this
growth and change. New initiatives are expected in the following
areas:
-
Web mining.
-
Use of fuzzy logic.
-
User interfaces.
-
Web semantics.
The above problems are discussed in turn in what follows. During
this discussion, a brief synopsis of two articles is provided. These
studies report on the work that is being undertaken to address the
problems of search results that do not answer the precise
informational need of the user. In the first, web mining is used to
respond to the difficulty in locating current information and in the
second article, fuzzy logic is employed to query the user when the
search engine encounters a word that is qualitative and imprecise.
Since with most search engines the user is actually searching an
index of the Web rather than the Web as it exists on the day the
search is performed, there may be problems in locating information
that is current.
To respond to this, researchers are investigating the use of
online web mining agents to complement search engines. In a recent
article, by Menczer, the power of infospiders is outlined and
promises to result in considerable improvements in searching. As
Menczer states:
The key idea of all search engines is that the index built
by the crawl of the Web can be used many times, so that the cost
of crawling and indexing is amortized over the many queries that
are answered by a search engine by looking up the same static
database.
[2]
In Menczer's study, the infospider begins searching a set of seed
URL's obtained from the results of the traditional search engines or
by using a set of personal bookmarks compiled by the searcher. Using
these sites, the infospider can take the search an additional step
to locate more current information. These infospiders utilize
artificial intelligence and other sophisticated computer heuristics
to complete the search. These searching techniques are being
investigated to overcome the limitations of search engines "looking
up the same static database." This application responds to the
problems of finding Web intelligence when the Web is a very dynamic
resource but the search engines allow access to a static portion of
the Web to enable speed of retrieval.
Although this is interesting research, one solution is the use of
personal bookmarks to go to sites to obtain today's update.
Nevertheless, this study is attempting to develop tools that can go
out to the Web and automatically retrieve current information. When
the use of personal bookmarks is referred to in this case, we are
reminded that there are other ways to locate information in addition
to the use of web search engines.
Although web searches tend to consist of two or three keywords or
phrases, many users would be more comfortable if their searches
could be written using the language of a normal conversation. Ask
Jeeves was developed in response to this kind of search request.
Even in keyword searching, they often combine a concrete concept
and a qualitative or descriptive term. Qualitative or descriptive
language is used to refine the type of information the user is
seeking. These qualitative or description words may present
difficulties in web searching if the search engines simply map
search terms to an existing index.
Choi [3]
describes the use of a "fuzzy query." Fuzzy language is commonly
used to express an information request. In Choi's example, a user
wants information on popular national parks in the United States.
Search engines are designed to eliminate the words like
in and
the to locate the
keywords to be used in the search.
National Parks and
United States are
concrete or "crisp" terms that can be mapped directly to an index of
terms to retrieve pertinent information. However, the term
popular is a
qualitative term which might lead to confusion for a computer if the
search algorithm is applied by mapping the words searched to terms
in an index. The person has a clear concept in mind when performing
the search. He/she knows what is meant by the term
popular in this
example. Searchers will select the web sites that best reflect their
own idea of the meaning of the word
popular.
In his paper, Choi describes an enhanced search engine where the
words used in the search statement are analyzed into two categories
that he calls "crisp" and "fuzzy" terms. When the search engine
encounters fuzzy terms, the user is presented with a new search
interface in which the user is asked what was meant by the fuzzy
term. To ensure that the search results are pertinent to the
information needed, the search engines need to be programmed to
search an index that has assigned various meanings to words; it
would be somewhat like looking up a word in a thesaurus in word
processing. The user must choose the meaning that best matches the
term to a list of possible words describing what he/she had in mind.
If you were considering purchasing a car and you wanted to see
what was attractive in
new car designs, it would be difficult to locate websites that could
interpret what the searcher means by the words
attractive car. The
individual looking for the
attractive car has certain concepts in mind but those ideas
are not easy to define in a few keywords. Search engines have to
provide a way for the user to indicate what is meant by the word
attractive. Choi proposes
the use of fuzzy logic to create an index called the Perception
Index to supplement the index all search engines currently use
in processing a search.
Currently, it is common to search the Web by typing a set of
keywords into a web search engine; subsequently, a long list of web
sites is returned for the user to browse and to select the sites of
interest for further investigation. This traditional interface and
interaction is about to change as the power and functionality of
computers increase.
With improved computer power, comes the ability to offer user
interfaces that take advantage of images and sounds, in addition to
the usual text-based interaction.
Real estate sites provide the user with the ability to click
on a map to select a geographic region where one might be
interested in purchasing real estate. From there, the user
is presented with a menu which includes information on such
options as a range of prices, type of accommodation, number
of bedrooms and other pertinent data. It is easy to envision
a system that carries the map analogy to the point where you
could select prices from a bar graph or a set of boxes could
contain data; for example, blue boxes might contain all
houses having three bedrooms, two baths with an attached
garage. When home computers were constrained by poor imaging
capabilities, this type of interface could not be
considered. This scenario is changing rapidly since most
home computers are now powerful enough to handle large image
files.
Ed Baylin [4]
offers another suggestion for improving search interfaces; it is
based on a combination of the best features of various search
engines and meta search crawlers now available. For instance:
Imagine a search engine that would combine the following:
-
The post-processing functions of, say,
Copernic including the topic clustering abilities of, say,
Vivisimo).
-
Hotbot's ability to pass the same kinds
of filters to multiple search engine catalogs based on using
the same interface format to the extent that the same
filters are used. Moreover, unlike Hotbot, these filters
would be passed in parallel to the different search engines,
as do meta search engines, instead of being switched only at
the user's request from one catalog to another as a series
of separate searches.
The average search on the Web consists of two or three words.
Studies have shown that these two or three words may be used to
represent a very complex information need. Furthermore, two people
may employ the same two or three keywords to search for different
concepts. This demonstrates the difficulties that might be
encountered in finding the right words to retrieve pertinent
information; the process of selecting the right keywords is a more
complicated task than most people assume.
It has already been stated that search engines no longer rely
exclusively on keyword matching and occurrences to retrieve
information. They employ sophisticated statistical probabilities
including page linking and other pertinent data for information
retrieval on the Web. Nevertheless, if the user has difficulty
articulating the exact information they are seeking, it would seem
to be an impossible task to expect a search engines to be able to
overcome these handicaps. Scirus (www.scirus.com)
and others are attempting to solve this problem by offering
suggestions for further searching. A list of keywords is displayed
along the right part of the screen, and may be used to search (these
new terms are generated by an extraction process from the original
search results). These words may prove useful but once again, the
original search itself may not have included the best choice of
keywords or phrases. Scirus does not offer the searcher a chance to
examine word relationships; the list of new terms is simply offered
as words related to the results from the original search.
Research has been undertaken into this type of search problem;
one such example is outlined in the paper by Lee & Tsai
[5]. This article
outlines an approach to a new type of search engine in which the
user will be queried to acquire feedback. This application solves
the problems encountered in those searches where few or none of the
web sites seem to be exactly what we were looking for when we began
our search. In this research by Lee & Tsai, the user is actually
searching across many search engines; the interface is processing
the data from four computer agents: an interface agent, a filtering
agent, a discovery agent and an information agent. The main issue
for the user is the speed with which satisfactory results are
achieved and the number of times the system queries the user. It is
the feedback from the user that is used to further the search
process. When the user indicates that particular web sites are
useful, these sites are analyzed to provide the search process with
data to further refine the search. This is an iterative process and
it simulates the way in which most individuals locate information in
the first place. Finding information by any means is often not
straightforward. Lee & Tsai's work simulates the usual way in which
we find what we need.
If the user begins a search with only a vague notion of what is
needed, this system may help to refine the search. As a user finds
relevant information, it helps to filter out the superfluous and to
focus on the specific information.
Lee & Tsai use the term
ai to do a search. It must be remembered that
this study is being reported in an academic journal; hence,
the example was pertinent to the authors who are doing
research in
artificial intelligence.
As further clarification of the system described in this
study, you might search using the word
Dolly. The
variety of sites retrieved would provide a good example of
the usefulness of this type of searching. If the user
actually wanted information on
cloning
but the only term used in the search were
Dolly,
then a long list of sites covering a variety of topics would
be retrieved. These sites might range from pages pointing to
Dolly Parton
or the Australian magazine called
Dolly,
plus sites on cloning. In the search engine described in
this study, the user marks the sites that are pertinent to
the search. If all of the selected sites pertain to cloning,
the search engine would use both the discovery agent and the
filtering agent to retrieve other sites related to cloning.
This type of searching is only experimental, but the initial
results show promise for the future. It is easy to think of many
occasions when this would be very useful way to search.
A theory of thesaurus relationships [6]
Involved in the whole notion of generating new terminology to be
used in a new search, is the notion of relationships between words
(or terms), and the theory of thesaurus relationships. Not only can
such relationships be used to find new terms to be suggested to the
searcher as possible alternatives, but also they can be used to
develop the semantic web (
Part 4:
Vision of the Semantic Web).
According to Ed Baylin, the theory of thesaurus relationships can
be seen and better understood as a special case of a more general
systems theory of roles that objects in a system play in relation to
one another. Ed Baylin has developed such a systems theory (
www.SearchHelpCenter.com/search-discussion-article-1.html).
However, be forewarned that this theory is a very specialized and
advanced academic topic. In lieu of trying to plough through that
heavy kind of materials, the following is Ed's user-friendlier
introduction to the ideas as they apply to thesaurus relationships.
|
Ed's Theory of Thesaurus
Relationships
Future improvements to search engines
may, for example, involve use of sophisticated
artificial intelligence methods for concept searching.
Thus, the user will be able to concentrate more on
presenting the idea of what is sought, as opposed to
finding the exact words or phrases.
Search engines often use similarity
relationships to assist the user in generating further
searches, in iterative fashion, possibly involving
backtracking now and again.
Suppose the user searched for more information on
the planet Org.
When the findings page appeared, some related
searches links might be shown next to each finding,
such as lifestyles on
the planet org
or layout of the
planet org. If the user selected one of these
links, a new search would be initiated, using the
term selected.
Some search engines try to place your search into
one of the subject categories covered in that search
engine's subject directory facilities. Thus, any
findings for the planet
Org might be presented along with a link to
Astronomy,
presented as a subject category.
The idea of relationships made on the
basis of similarity of word meaning can be extended to
translation of terms to other languages.
In a similarity search, if the term
school is used
as a search string, the search engine would not only
find documents containing that search term in the
requested places, but also documents containing
terms such as
educational institution, and, if language
translation is included in this idea, words such as
école, the
French word for school.
Exactly equivalent, synonymous terms are
generally hard to find. In general, it is easier to find
roughly similar terms, that is, ones that have a fuzzy
equivalence relationship (called a similarity
relationship). Using similarity relationships, one finds
terms that overlap in meaning. Different search engines
have different kinds of abilities when it comes to
finding terms that are not exactly equivalent.
One variation on the idea of rough
similarity between terms involves broader/narrower term
kinds of hierarchical relationships between terms. (This
involves a type of hierarchical relationship based on
sub/super- classification, often called
"generalization/specialization," "categorization,"
"super/sub-typing," "super/sub-categorization," and the
like).
A search that finds the term
teacher might
also find the narrower term
professor, which
generally means a university teacher, where
teacher is a
super-class of
professor.
Still wider schemes for types of relationships between
terms are envisaged in theories on relationships
between words. As can be well imagined, the subject of
kinds of thesaurus relationships is overwhelmingly
complex, especially when it comes to all the different
ways of sub-classifying these relationships and their
facets to many levels of hierarchy.
For those who are interested in a little academic
theory on these matters at this point, please note that,
in addition to fuzzy similarity or equivalence
relationships, a first cut at classifying thesaurus
relationships might also identify the following
sub-types:
-
Fuzzy "Dissimilarity" relationships:
Antonyms or "distinguished from" relationships fall
into this category.
-
Fuzzy Causal relationships: Whenever an
event occurs, a number of objects are involved in
the varying roles of:
-
Agents/instruments: to enact the
process involved;
-
inputs: patients of the operation, or
operands;
-
Products: outputs of the event;
-
Methods: related by the occurrence of
events, i.e., the procedure, or series of steps,
that the agent/instrument carries out to
complete the process involved in the event;
-
Contexts: usually identified by
space, or time, or population, giving the
backdrops against which the event happened.
With respect to the theory of thesaurus
relationships, a search that finds
lecture,
which is a kind of event, might also find the:
-
Agent
teacher;
-
Input
student: patient of the lecturing process
carried out by the teacher agent;
-
Product
student: used to describe the student who
has now learned from the lecture;
-
Method
lecture notes: the method used by the
teacher; and
-
Contexts
school or
classroom, among others.
Organs of the body are component parts of the
systems of the body. Thus, a search that finds
digestive system
might also find stomach.
|