by Darlene Canning
It is impossible to predict the exact evolution of the Web;
however, exciting research is underway to develop the semantic web.
The semantic web is a vision for the future of the Web as it grows
and adapts to the changing demands of the information world. There
are many groups around the world working on the semantic web,
particularly in the areas of e-commerce, digital libraries and
knowledge management; however, it has only been implemented on a
small scale in each of these areas. Examples of these experiments
are described in the case studies at the end of this section (
Part 4: Case studies: application of the
semantic web).
Proponents of the semantic web, including Tim Berners-Lee (www.w3.org/People/Berners-Lee/)
and others at the World Wide Web Consortium (W3C) project (www.W3C.org),
are convinced it will solve many of the existing problems
encountered in retrieving information on the Web. W3C, comprised of
a full-time staff of more than sixty experts, together with over 500
member organizations, are developing the tools needed for the
semantic web. Before the semantic web can become reality, these
tools must be readily available and easy to apply; this does not
appear to be the case at the present time. Although the semantic web
is a brilliant idea, many issues remain and only time will tell if
the semantic web will lead to another revolution in the information
world.
What is the semantic web?
The semantic web is a work-in-progress and therefore, it is
difficult to provide a precise definition. The semantic web is not
intended to replace the existing World Wide Web, but it is meant to
add another layer on top of what currently exists. According to
Berners-Lee:
The semantic web is not a separate web but an extension of
the current one, in which information is given well-defined
meaning, better enabling computers and people to work in
cooperation. [7]
Confusion arises from the fact that much of the semantic web is
still under development and it is largely experimental in nature.
Basically, to have a semantic web, documents must be coded in
languages other than HTML (the standard upon which the Web
developed). An infrastructure will have to exist in which terms are
more clearly defined to provide meaning and outline relationships in
the use of language. This will permit information on the Web to be
located based on its meaning and also its relationship to other
information. These relationships will be far more structured and
include more data than the simple inclusion of metadata descriptors
such as the Dublin Core. These relationships are to be made
available in such a way that they may be processed automatically;
that is, computer applications, or agents, as they are often called,
will be able to explore the Web to retrieve pertinent information.
These agents are the future search engines. Once a search is made,
the information will be retrieved based on the semantics embedded in
the coding of the documents; this, together with other computer
tools, will link this information to related Web information
sources.
The word semantic
pertains to the use of language, and language is of even greater
importance in both the creation, and the retrieval, of information
on the semantic web. There are both human and machine-created
components for this use of language to be applied to the semantic
web. In fact, the semantic web may require the user to define a
context before a search begins and the author of the document may
have to assign a context to the document when it is created for the
Web. This will eliminate the confusion in using a term such as the
banking example (
Part 2:
Problems not resolved by relevancy ranking) found
earlier. To avoid this kind of confusion, the searcher would specify
a context such as financial
or navigation
before a web search is initiated; the author would assign the
document to a context before placing it on the semantic web. From
there, the computer could search the Web for
banking based on the
context of the search itself as well as the context assigned to the
Web resource.
Problems to be addressed before the semantic web can exist
Many obstacles remain to be solved before the semantic web
becomes a reality. Protocols and standards must be accepted and made
readily available to enable the semantic web to move from the
experimental stage to a fully developed information resource. The
resources needed to create the infrastructure of the semantic web
include mark-up languages, resource description frameworks (RDF,
www.w3.org/RDF/)
and ontologies, including classification schemes and thesauri.
Applications of the semantic web are only in their infancy and in
general, it is being applied to a single discipline or in an area
with a narrow focus. At this stage, the semantic web is experimental
in nature and it is not known how long it will take to develop to
its full potential.
One might expect the semantic web to be used in a corporate
setting where there is a general understanding of the nature of the
information to be shared. The semantic web lends itself to use in
the areas of e-commerce and in knowledge management. Thus, one of
the case studies (
Part 4: Case studies: applications of the
semantic web) provides an outline of the work that is
being undertaken in several European corporate settings.
Any taxonomy or classification scheme that is sufficiently broad
to cover all disciplines suffers from a lack of currency when new
terms appear or usage changes; this is especially true in areas of
research that are considered cutting edge. At the same time,
classification schemes that cover all disciplines may lack the
specificity needed for very precise applications; for example, they
may not have been updated frequently and they may not contain the
specific terms. The creation and maintenance of ontologies must
overcome similar problems to ensure that they remain current;
furthermore, they must be updated automatically for the semantic web
to become a reality.
Despite the problems, exciting research is being conducted on the
semantic web. In China, where there is widespread acceptance of a
national classification scheme, the development of a semantic web
has already begun. Nevertheless, the application that is described
in this chapter (
Part 4: Case studies: applications of the
semantic web) is limited in scope and was used only for
the discipline of computer science.
The strength of the Web arises from its grassroots development
(publishing on the Web is almost as accessible to the individual as
it is for large companies). Posting documents on the Web does not
require a huge investment in time and money. HTML code can be
learned or adapted quickly to allow for a speedy way to publish on
the Web. Consequently, the Web is a vast information resource known
for its multiplicity of sources and for its lack of structure. To
create the semantic web, the various elements of the infrastructure
must be easily accessible to all. Can an XML (Extensible Markup
Language) document be created as readily as an HTML document? In
XML, elements in the document once coded are used as searchable
data. XML provides the common syntax for machine understandable
statements. All parts of the document might be tagged to become
data. HTML provided a simple page formatting language with very few
tags such as <title>,
<bold>,
<paragraph> and similar
tags to control the layout on screen. It was easy to learn since the
codes had a fairly simple application. There currently exists a
large body of information on the Web not in XML format; it is highly
unlikely that all of that will be converted to XML.
Proponents of the semantic web expect that it will create another
revolution in information retrieval should it become fully
functional. It is expected to change the way in which information is
created and retrieved.
The semantic web proposes to restructure the Web to permit better
information retrieval. To implement the semantic web, the supporting
infrastructure must be created and built. Documents will need to be
written in a standard mark-up language other than HTML; ontologies
must be readily available to define meaning and provide
relationships among terms and Web documents; and then computer
agents will be needed to search the Web to locate this knowledge.
The semantic web is comprised of data elements and rules for making
decisions about the inference rules governing the relationships
among the data elements. Mark-up languages will be used to create
documents in which the data is coded; ontologies will be developed
to provide the standards for decision-making rules about the use of
terms.
Mark-up languages
When mark-up languages such as XML are employed, metadata (data
about the content of the document) can be embedded and coded in the
document. HTML was basically a language to provide layout
information; XML on the other hand is an evolution of SGML (Standard
Generalized Markup Language) that was used to query databases. A
database implies a rigid structure where there are records and
fields used as descriptive entities to describe objects. The process
of creating a XML document for the semantic web would be analogous
to using the entries in a personal address book to create a computer
database. For each entry in the book, there would be an individual
record in the database. Each record pertains to the address for one
individual or one company. For each record, you might have a field
for first name, last name, street address, city address,
state/province, and so forth. A document written in XML would
contain many embedded codes to tag the data in the document. If a
document described an author such as Tim Berners-Lee, the person
creating the document might use a tag called
author or
software programmer
or any number of tags depending on the data that should be tagged in
the published document. These tags are hidden codes used to identify
or code information embedded in the webpages or text of the
document. XML provides the structure; nevertheless, there is a need
for consensus and agreement on the meaning of these tags if they are
to be used in an automatic way by computers.
Steven Cherry provides an excellent explanation of the way in
which the XML documents will be used to retrieve information on the
semantic web.
Right now, HTML coding serves mostly to control appearance and
arrangement of text and images on a web page, so that only a few
elements are tagged such as <title> and <bold>. With XML tags
<price>, for instance a software agent might be able to comparison
shop across different web sites, or update an account ledger after
an e-purchase. [8]
Resource Description Framework (RDF) is a term that is often
found in descriptions of the semantic web. RDF repositories of
metadata and ontologies must be available to be searched to assign
meaning to the content of Web information. In RDF, a document makes
assertions that particular things are related. Because RDF uses URIs
(Uniform Resource Identifiers) to encode this information in a
document, the RDF allows for Web resource identifiers (URIs) to be
described in relation to other ones on the Web. These are called
ontologies where thesauri-like relationships are defined between
words and phrases.
Ontologies
Ontologies are also needed to facilitate this use of language on
the semantic web. Ontologies define relationships between terms and
they include classification schemes, thesauri and similar language
tools.
The best definition of an ontology comes from the group working
on the semantic web (www.w3.org/TR/2002/WD-webont-req-20020307/#onto-def).
In their document [9],
they provide the following
description of an ontology.
The word ontology has been used to describe artifacts with
different degrees of structure. These range from simple
taxonomies (such as the Yahoo hierarchy), to metadata schemes
(such as the Dublin Core), to logical theories. The Semantic Web
needs ontologies with a significant degree of structure. These
need to specify descriptions for the following kinds of
concepts:
- Classes (general things) in the many domains of
interest;
- The relationships that can exist among things;
- The properties (or attributes) those things may have.
Standardized ontologies are being created.
www.W3C.org
provides examples of ontologies under development. Work on and
ontology for Mathematics has already started. With the tools created
for the semantic web, computer agents will be able to process the
language of the user's search query.
Mapping of featured search engines to Dublin Core
Although much of the semantic web is still futuristic, one type
of ontology has been implemented to a certain extent in today's Web.
The Dublin Core is an example of a metadata ontology. Ed Baylin
explains how it has been implemented in relation to the search
engines featured in this book.
|
Ed's Explanation:
Featured Search Engines versus the Dublin Core
[10]
To facilitate building search engine indices, various
standardized ways of setting up metadata for
unstructured (and also structured) data have been
proposed, but only one has been implemented to any
extent so far. This is the Dublin Core (http://dublincore.org/documents/dces/)
ontology of 15 fields, including, among others, fields
for author, document language, creation or last change
date, and period of applicability. Each field can be
refined for particular uses, and, with the approval of
the organization responsible for the Dublin Core,
further fields can be added.
The Dublin Core metadata scheme is of particular
interest to reference librarians, because it is data
from a bibliographic perspective. This is because its
fields are applicable to just about any Web document
(called a "resource" in Web terminology).
While the Dublin Core's scheme can be implemented in
varying degrees of complexity, as the need and
opportunity arise, even the Dublin Core metadata
structure at its most complex, is too limited to fully
suit all disciplinary and application domains.
The RDF (Resource Description Framework) method, based
on XML (Extensible Markup Language), provides another,
more flexible and complex scheme for bibliographic and
other kinds of metadata. The latter requires even more
expertise than even the more complex kinds of Dublin
Core refinements. It allows the users in each domain
(discipline or application area) to define their own
schemes for indexing and classifying data. The Dublin
Core can be included as a part of any particular RDF.
|
The cell
entries in the leftmost column of the table refer to
appropriate rows or sets of contiguous rows in the
Features Control Center of our book, "Effective Internet
Search." |
Ed's table on the following pages maps the
tenuous relationships between the Dublin Core's fifteen standard
fields to the metadata-based filters of the featured search
engines in this book. From it one can conclude that the
relationship between the metadata parameters in the
Dublin Core
and those in the general-purpose search engines vary
considerably by search engine.
In general, some of the 15 Dublin Core fields
are better supported than others. The better mapping fields,
shown first in the below table, are those for Title, Resource
Identifier, Date, Relation, and Language.
Computer agents
We already use language to query the Web, but
the semantic web will involve the use of computer agents to go
out to the Web to seek more pertinent information. Much of what
is accessible on the Web right now was designed for human
interpretation with a search engine providing the access. The
semantic web, on the other hand, will automate the query process
by allowing data and programs to be processed automatically by
computers. This requires a level of structure and
standardization previously unseen on the Web.
|