Search Discussion Article

FREE SEARCH HELP

On Site Resources

Search Tool Guide

BUY THE BOOK
ABOUT THE BOOK
FAQ's
Audiences
User Benefits
Overview & Contents
Book Excerpts
Awards-Reviews
Updates
OTHER
Contact Us
Authors
Discussion Topics
Sales Affiliates
by Darlene Canning
 

Part 3: Future Developments: What's Next?

Introduction

The future of the Web remains a mystery; nevertheless, it is possible to speculate on the directions that web search engines might take. There is little doubt that the Web will continue to change and to grow and that web search engines must respond to this growth and change. New initiatives are expected in the following areas:

  • Web mining.

  • Use of fuzzy logic.

  • User interfaces.

  • Web semantics.

The above problems are discussed in turn in what follows. During this discussion, a brief synopsis of two articles is provided. These studies report on the work that is being undertaken to address the problems of search results that do not answer the precise informational need of the user. In the first, web mining is used to respond to the difficulty in locating current information and in the second article, fuzzy logic is employed to query the user when the search engine encounters a word that is qualitative and imprecise.

Web mining agents enhance search results

Since with most search engines the user is actually searching an index of the Web rather than the Web as it exists on the day the search is performed, there may be problems in locating information that is current.

To respond to this, researchers are investigating the use of online web mining agents to complement search engines. In a recent article, by Menczer, the power of infospiders is outlined and promises to result in considerable improvements in searching. As Menczer states:

The key idea of all search engines is that the index built by the crawl of the Web can be used many times, so that the cost of crawling and indexing is amortized over the many queries that are answered by a search engine by looking up the same static database. [2]

In Menczer's study, the infospider begins searching a set of seed URL's obtained from the results of the traditional search engines or by using a set of personal bookmarks compiled by the searcher. Using these sites, the infospider can take the search an additional step to locate more current information. These infospiders utilize artificial intelligence and other sophisticated computer heuristics to complete the search. These searching techniques are being investigated to overcome the limitations of search engines "looking up the same static database." This application responds to the problems of finding Web intelligence when the Web is a very dynamic resource but the search engines allow access to a static portion of the Web to enable speed of retrieval.

Although this is interesting research, one solution is the use of personal bookmarks to go to sites to obtain today's update. Nevertheless, this study is attempting to develop tools that can go out to the Web and automatically retrieve current information. When the use of personal bookmarks is referred to in this case, we are reminded that there are other ways to locate information in addition to the use of web search engines.

Fuzzy logic

Although web searches tend to consist of two or three keywords or phrases,  many users would be more comfortable if their searches could be written using the language of a normal conversation. Ask Jeeves was developed in response to this kind of search request.  Even in keyword searching, they often combine a concrete concept and a qualitative or descriptive term. Qualitative or descriptive language is used to refine the type of information the user is seeking. These qualitative or description words may present difficulties in web searching if the search engines simply map search terms to an existing index.

Choi [3] describes the use of a "fuzzy query." Fuzzy language is commonly used to express an information request. In Choi's example, a user wants information on popular national parks in the United States. Search engines are designed to eliminate the words like in and the to locate the keywords to be used in the search. National Parks and United States are concrete or "crisp" terms that can be mapped directly to an index of terms to retrieve pertinent information. However, the term popular is a qualitative term which might lead to confusion for a computer if the search algorithm is applied by mapping the words searched to terms in an index. The person has a clear concept in mind when performing the search. He/she knows what is meant by the term popular in this example. Searchers will select the web sites that best reflect their own idea of the meaning of the word popular.

In his paper, Choi describes an enhanced search engine where the words used in the search statement are analyzed into two categories that he calls "crisp" and "fuzzy" terms. When the search engine encounters fuzzy terms, the user is presented with a new search interface in which the user is asked what was meant by the fuzzy term. To ensure that the search results are pertinent to the information needed, the search engines need to be programmed to search an index that has assigned various meanings to words; it would be somewhat like looking up a word in a thesaurus in word processing.  The user must choose the meaning that best matches the term to a list of possible words describing what he/she had in mind.

If you were considering purchasing a car and you wanted to see what was attractive in new car designs, it would be difficult to locate websites that could interpret what the searcher means by the words attractive car. The individual looking for the attractive car has certain concepts in mind but those ideas are not easy to define in a few keywords. Search engines have to provide a way for the user to indicate what is meant by the word attractive. Choi proposes the use of fuzzy logic to create an index called the Perception Index to supplement the index all search engines currently use in processing a search.

New interfaces for searching

Currently, it is common to search the Web by typing a set of keywords into a web search engine; subsequently, a long list of web sites is returned for the user to browse and to select the sites of interest for further investigation. This traditional interface and interaction is about to change as the power and functionality of computers increase.

With improved computer power, comes the ability to offer user interfaces that take advantage of images and sounds, in addition to the usual text-based interaction.

Real estate sites provide the user with the ability to click on a map to select a geographic region where one might be interested in purchasing real estate. From there, the user is presented with a menu which includes information on such options as a range of prices, type of accommodation, number of bedrooms and other pertinent data. It is easy to envision a system that carries the map analogy to the point where you could select prices from a bar graph or a set of boxes could contain data; for example, blue boxes might contain all houses having three bedrooms, two baths with an attached garage. When home computers were constrained by poor imaging capabilities, this type of interface could not be considered. This scenario is changing rapidly since most home computers are now powerful enough to handle large image files.

Ed Baylin [4] offers another suggestion for improving search interfaces; it is based on a combination of the best features of various search engines and meta search crawlers now available. For instance:

Imagine a search engine that would combine the following:

  • The post-processing functions of, say, Copernic including the topic clustering abilities of, say, Vivisimo).

  • Hotbot's ability to pass the same kinds of filters to multiple search engine catalogs based on using the same interface format to the extent that the same filters are used. Moreover, unlike Hotbot, these filters would be passed in parallel to the different search engines, as do meta search engines, instead of being switched only at the user's request from one catalog to another as a series of separate searches.

Toward the use of semantics on the Web

The average search on the Web consists of two or three words. Studies have shown that these two or three words may be used to represent a very complex information need. Furthermore, two people may employ the same two or three keywords to search for different concepts. This demonstrates the difficulties that might be encountered in finding the right words to retrieve pertinent information; the process of selecting the right keywords is a more complicated task than most people assume.

It has already been stated that search engines no longer rely exclusively on keyword matching and occurrences to retrieve information. They employ sophisticated statistical probabilities including page linking and other pertinent data for information retrieval on the Web. Nevertheless, if the user has difficulty articulating the exact information they are seeking, it would seem to be an impossible task to expect a search engines to be able to overcome these handicaps. Scirus (www.scirus.com) and others are attempting to solve this problem by offering suggestions for further searching. A list of keywords is displayed along the right part of the screen, and may be used to search (these new terms are generated by an extraction process from the original search results). These words may prove useful but once again, the original search itself may not have included the best choice of keywords or phrases. Scirus does not offer the searcher a chance to examine word relationships; the list of new terms is simply offered as words related to the results from the original search.

Research has been undertaken into this type of search problem; one such example is outlined in the paper by Lee & Tsai [5]. This article outlines an approach to a new type of search engine in which the user will be queried to acquire feedback. This application solves the problems encountered in those searches where few or none of the web sites seem to be exactly what we were looking for when we began our search. In this research by Lee & Tsai, the user is actually searching across many search engines; the interface is processing the data from four computer agents: an interface agent, a filtering agent, a discovery agent and an information agent. The main issue for the user is the speed with which satisfactory results are achieved and the number of times the system queries the user. It is the feedback from the user that is used to further the search process. When the user indicates that particular web sites are useful, these sites are analyzed to provide the search process with data to further refine the search. This is an iterative process and it simulates the way in which most individuals locate information in the first place. Finding information by any means is often not straightforward. Lee & Tsai's work simulates the usual way in which we find what we need.

If the user begins a search with only a vague notion of what is needed, this system may help to refine the search. As a user finds relevant information, it helps to filter out the superfluous and to focus on the specific information.

Lee & Tsai use the term ai to do a search. It must be remembered that this study is being reported in an academic journal; hence, the example was pertinent to the authors who are doing research in artificial intelligence.

As further clarification of the system described in this study, you might search using the word Dolly. The variety of sites retrieved would provide a good example of the usefulness of this type of searching. If the user actually wanted information on cloning but the only term used in the search were Dolly, then a long list of sites covering a variety of topics would be retrieved. These sites might range from pages pointing to Dolly Parton or the Australian magazine called Dolly, plus sites on cloning. In the search engine described in this study, the user marks the sites that are pertinent to the search. If all of the selected sites pertain to cloning, the search engine would use both the discovery agent and the filtering agent to retrieve other sites related to cloning.

This type of searching is only experimental, but the initial results show promise for the future. It is easy to think of many occasions when this would be very useful way to search.

A theory of thesaurus relationships [6]

Involved in the whole notion of generating new terminology to be used in a new search, is the notion of relationships between words (or terms), and the theory of thesaurus relationships. Not only can such relationships be used to find new terms to be suggested to the searcher as possible alternatives, but also they can be used to develop the semantic web ( Part 4: Vision of the Semantic Web).

According to Ed Baylin, the theory of thesaurus relationships can be seen and better understood as a special case of a more general systems theory of roles that objects in a system play in relation to one another. Ed Baylin has developed such a systems theory ( www.SearchHelpCenter.com/search-discussion-article-1.html). However, be forewarned that this theory is a very specialized and advanced academic topic. In lieu of trying to plough through that heavy kind of materials, the following is Ed's user-friendlier introduction to the ideas as they apply to thesaurus relationships.

Ed's Theory of Thesaurus Relationships

Future improvements to search engines may, for example, involve use of sophisticated artificial intelligence methods for concept searching. Thus, the user will be able to concentrate more on presenting the idea of what is sought, as opposed to finding the exact words or phrases.

Search engines often use similarity relationships to assist the user in generating further searches, in iterative fashion, possibly involving backtracking now and again.

Suppose the user searched for more information on the planet Org. When the findings page appeared, some related searches links might be shown next to each finding, such as lifestyles on the planet org or layout of the planet org. If the user selected one of these links, a new search would be initiated, using the term selected.

Some search engines try to place your search into one of the subject categories covered in that search engine's subject directory facilities. Thus, any findings for the planet Org might be presented along with a link to Astronomy, presented as a subject category.

The idea of relationships made on the basis of similarity of word meaning can be extended to translation of terms to other languages.

In a similarity search, if the term school is used as a search string, the search engine would not only find documents containing that search term in the requested places, but also documents containing terms such as educational institution, and, if language translation is included in this idea, words such as école, the French word for school.

Exactly equivalent, synonymous terms are generally hard to find. In general, it is easier to find roughly similar terms, that is, ones that have a fuzzy equivalence relationship (called a similarity relationship). Using similarity relationships, one finds terms that overlap in meaning. Different search engines have different kinds of abilities when it comes to finding terms that are not exactly equivalent.

One variation on the idea of rough similarity between terms involves broader/narrower term kinds of hierarchical relationships between terms. (This involves a type of hierarchical relationship based on sub/super- classification, often called "generalization/specialization," "categorization," "super/sub-typing," "super/sub-categorization," and the like).

A search that finds the term teacher might also find the narrower term professor, which generally means a university teacher, where teacher is a super-class of professor.

Still wider schemes for types of relationships between terms are envisaged in theories on relationships between words. As can be well imagined, the subject of kinds of thesaurus relationships is overwhelmingly complex, especially when it comes to all the different ways of sub-classifying these relationships and their facets to many levels of hierarchy.

For those who are interested in a little academic theory on these matters at this point, please note that, in addition to fuzzy similarity or equivalence relationships, a first cut at classifying thesaurus relationships might also identify the following sub-types:

  • Fuzzy "Dissimilarity" relationships: Antonyms or "distinguished from" relationships fall into this category.

  • Fuzzy Causal relationships: Whenever an event occurs, a number of objects are involved in the varying roles of:

  • Agents/instruments: to enact the process involved;

  • inputs: patients of the operation, or operands;

  • Products: outputs of the event;

  • Methods: related by the occurrence of events, i.e., the procedure, or series of steps, that the agent/instrument carries out to complete the process involved in the event;

  • Contexts: usually identified by space, or time, or population, giving the backdrops against which the event happened.

With respect to the theory of thesaurus relationships, a search that finds lecture, which is a kind of event, might also find the:

  • Agent teacher;

  • Input student: patient of the lecturing process carried out by the teacher agent;

  • Product student: used to describe the student who has now learned from the lecture;

  • Method lecture notes: the method used by the teacher; and

  • Contexts school or classroom, among others.

  • Fuzzy Composition relationships: Another kind of hierarchical relationship is based on something being a part of a whole, but not the very "essence" of that whole.

Organs of the body are component parts of the systems of the body. Thus, a search that finds digestive system might also find stomach.

 


FREE SEARCH HELP

On Site Resources

Search Tool Guide

BUY THE BOOK
ABOUT THE BOOK
FAQ's
Audiences
User Benefits
Overview & Contents
Book Excerpts
Awards-Reviews
Updates
OTHER
Contact Us
Authors
Discussion Topics
Sales Affiliates

FREE SEARCH HELP

On Site Resources

Search Tool Guide

BUY THE BOOK
ABOUT THE BOOK
FAQ's
Audiences
User Benefits
Overview & Contents
Book Excerpts
Awards-Reviews
Updates
OTHER
Contact Us
Authors
Discussion Topics
Sales Affiliates

FREE SEARCH HELP

On Site Resources

Search Tool Guide

BUY THE BOOK
ABOUT THE BOOK
FAQ's
Audiences
User Benefits
Overview & Contents
Book Excerpts
Awards-Reviews
Updates
OTHER
Contact Us
Authors
Discussion Topics
Sales Affiliates

FREE SEARCH HELP

On Site Resources

Search Tool Guide

BUY THE BOOK
ABOUT THE BOOK
FAQ's
Audiences
User Benefits
Overview & Contents
Book Excerpts
Awards-Reviews
Updates
OTHER
Contact Us
Authors
Discussion Topics
Sales Affiliates


Effective Internet Search: E-Searching Made Easy!   © Baylin Systems, Inc., 2006