World of Librarians as Infomanager

search engines are programs or an information retrieval system which provide list of document available on www even without knowing the exact URL of the web recourse. With the help of Boolean operator we can get more exhaustive and specific documents and improve the recall and precision value of the results.

What is search engine?
Search engine is a program, database or an information retrieval system designed to help find stored information available on internet or on a computer it searches documents for specified keywords and returns a list of the documents where the keywords were found. Search engine are the key of finding specific information on the vast expanse of the World Wide Web. Without sophisticated search engines, it would be virtually impossible to locate anything on the Web without knowing a specific URL.

How these search engines work?
When a user enters a query into a search engine (typically by using key words), the engine examines its index and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text figure see Figure-0.1
This is the simplest method of searching process of search engine:-

Figure-0.1

Ø Users put there query in interface (search box) of search engine.
Ø These queries come to the database from search program these giant database is the collection of information (web pages) which is indexed word by word.
Ø These database collect information from crawler/spider/robots (program which all the time fetch as many web documents as possible form internet) that visit a Web site, collect that web page (read the information), read the site's meta tages and also follow the links that the site connects to performing indexing on all linked Web sites as well. The crawler returns all that information back to a central depository, where the data is indexed. These crawlers will also periodically check changes happen in web sites (web pages) and send the information of latest change to its database
Ø Then these databases provides a listing of best-matching web pages according to its criteria, to result design(program which design result for display) which is concerned with which word should be capitalized, bold, hyperlinked etc. and provide a list of result usually with a short summary containing the document's title.
Ø After properly design the result, result design send the list of result again to interface.

There are some examples of web search engines – Google, Yahoo, MSN, etc. these search engines searches according to the given keyword by user. Most search engines support the use of the Boolean operators AND, OR and NOT to further specify the search query. Search engines search for all the keywords in a search phrase or one single keyword of a phrase it depends on the value of the Boolean operator used in the search term.

........................(for more contact me)

Pallavi

Introduction

Digital library is concerned with that body of knowledge relating to the collection, organization, storage, distribution, retrieval, and utilization of digital information. Digital libraries basically store materials in electronic format and manipulate large collections of those materials effectively. In these days digital world is characterized by access to information rather than holding the information. Organized collections of scientific materials are traditionally called "libraries," and the searchable online versions ofthese are called "digital libraries”. when this dl is networked user want result from different libraries and sources which is distributed on internet for this there is need of that kind of searching facility which allow users to search multiple data sources with a single query from a single user interface. So here we will discuss searching services like federated and harvesting for accessing information. Library uses different protocols like z39.50,sru/srw,OAI-PMH.These protocols are often tied to services in that they are specific ways of implementing a service (i.e. one may create a harvesting service by using the OAI-PMH protocol), but they are not services unto themselves they are protocols.

Federated searching

Federated searching is a hot topic that seems to be gaining traction in libraries everywhere. Federated searching, also known as metasearching, broadcast searching, cross searching, parallel searching and a variety of other names, is the ability to search multiple information resources from a single interface and return an integrated set of results and provide on a single user interface so federated search will help user to reach information of their interest. Although aspects of this kind of shared searching has existed for some time (especially with Z39.50 catalogue searching), federated searching, that buzzword that has become so popular in the library world today is a technology that allows users to search many networked information resources from one interface. It queries a bunch of resources at once and then presents the results from all of them to the user.

Features of federated searching:

Support for multiple protocols (Z39.50, SRU/SRW, OAI).
Simple and advanced search (search by specific field).
Post processing of results (combined results).
Integration with other software (courseware, bib management tools).
Advanced result display (clustering, visualization).
Context-sensitive linking (OpenURL).

Open URL is a standard for persistently identifying content. Linking from a citation to the full-text. Finds which databases we have the full-text in and shows the user where it is (or takes them directly to it).

Steps of federated searching:

A typical user types in a search query in the portal interface’s search box (user interface) and clicks on “search”;
Query is sent to every individual database in the portal or federated search list that the user entered, along with the resources that the user wishes to search to the server
The server looks at each resource that it is asked to search and calls the appropriate search plug-in;
Portal returns the search results to the web server
Then the federating searching tool collects the search results and presents it to the user.

To follow these steps Federated search software uses standardized protocols to access databases. The most common protocol used is Z39.50. Some target databases that do not comply with the Z39.50 standard can still be searched using "translator" programs that convert the query format of the federated system into the format of the native system.

Z39.50

Z39.50 is an information retrieval protocol based on client server model for searching and retrieving information from remote computer databases.

The distributed query approach in general benefits from the real time nature of the queries and produces fresher results from different resources, vendor, standards. It can also search flat text file available on internet. With the help of this protocol user can get full text and exact result it also helps in combined search like Boolean and proximity. A dl with this protocol can translates the user query, at the time it is presented, into acceptable queries for each DL and merges the resulting returned hits into one page that is presented to the user. Z39.50 helps in crosswalk between libraries, standards etc. if the server data provider is working. So it is good for interoperability and provides recent and exact result. Searching for and downloading bibliographic records using a Z39.50 tool is simple and very efficient since multiple sources can be searched simultaneously and records easily compared. It allow users to establish relationships with a variety of sources.

Metadata harvesting (OAI-PMH)

In the harvesting approach, a service provider periodically harvests metadata from data providers using a predefined protocol like OAI-PMH developed by open archive. This search service is based on the harvested metadata. The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content however; the work of OAI has expanded to promote broad access to digital resources for user.

Steps of metadata harvesting:

1. There is a service provider which periodically harvests metadata from data providers (repositories) using a predefined protocol.

2. Then harvest metadata from that repositories which using different protocols standards

3. Service provider (software) Come back with that harvested metadata and Indexes (process) them to provide a federation search service.

4. And provide result to user in on standard format.

For metadata harvesting OAI-PMH is best information retrieval protocol it is also based on client server model .this protocol uses harvester software to (service provider) harvest metadata from different libraries or repositories (data provider) service provider also done the work of indexing and abstracting and store that result in the database of the digital library and provide result to user on their query. So it provides fast result because it gives result from its database.

In federated search and metadata harvesting data provider provide data from heterogeneous data source and that have been indexes by a single search index and the result provided and the result. Here protocol Z39.50 provide latest result produce by the vendor (data provider) and OAI-PMH provide the result from harvested metadata reside on the internal database of a digital library.

So both of the protocols have different feature but both can use together for more latest and exact result to the user.

Feasibility of harvesting records using OAI-PMH and Z39.50:

The Z39.50 OAI Server Profile was developed to support harvesting of records using a simple OAI-PMH gateway. This profile describes how a Z39.50 server along with its associated bibliographic database could be turned into an OAI-PMH data provider by putting a gateway on top of the Z39.50 server that implements OAI-PMH (see Figure).

The gateway was designed to simultaneously act as a Z39.50 client and an OAI Repository translating OAI requests into Z39.50 requests and packaging the Z39.50 responses into OAI responses. This would require certain characteristics to be present in the underlying data structures and search mechanisms of the Z39.50 server implementations. In particular, it would require a unique identifier for each record, a way to provide a date stamp, and the means to retrieve records according to criteria specified in terms of these data. They include the ability to export all of the records in the database, the ability to sort by record identifiers and system transaction dates, and the ability to filter results by a variety of date criteria. Although attribute values are defined in the Z39.50 standard for these processes, it does not follow that any given system will support them, and in particular not the part of the system designed for library patron use. Therefore, it would require a major development effort for the vendors While the technique of harvesting directly from Z39.50 servers using OAI-PMH to obtain MARC records seemed to be an elegant solution in principle, developing relationships with the caretakers of these records and arranging for static harvests of records via FTP tools proved a more practical approach to procuring the records.

Limitation:

There are so many limitations with federated search or meta search like the problem of different standards using by different digital libraries problem of duplication in result but these problems are not that much big one but the problem of simplify the user’s experience, there are significant challenges in ensuring that precision and relevance of retrieval remain strong, and diverse opinions on how this should be accomplished. Is federated search the only solution to meeting those needs and expectations? What other approaches may be possible in a world of syndicated content via Really Simple Syndication (RSS) and Open Search as implemented in Amazon’s A9?

Really Simple Syndication (RSS):

RSS is a method that uses XML to distribute web content on one web site, to many other web sites.

It is a web site that wants to allow other sites to publish some of its content creates an RSS document and registers the document with an RSS publisher. A user that can read RSS-distributed content can use the content on a different site. Syndicated content can include data such as news feeds, events listings, news stories, headlines, project updates, and excerpts from discussion forums or even corporate information. RSS makes it possible for people to keep up with their favorite web sites in an automated manner that's easier than checking them manually Each RSS text file contains both static information about your site, plus dynamic information about your new stories, all surrounded by matching start and end tags. So if there will be some change happen in any site it send information about it to user. With RSS, information on the internet becomes easier to find, and web developers can spread their information more easily to special interest groups. In this way RSS will help user to reach information of their interest. Since RSS data is small and fast-loading, it can easily be used with services like cell phones or PDA's. So it can play the role of Z39.50 and OAI-PMH protocols.

Conclusion:

So in digital library the search requests to be federated or should the system be based on harvesting so that a library can provide result from libraries who are even using different standards (Dublin core, MARC etc.) where Z39.50 protocols doesn’t service provider and doesn’t process the result but able to provide latest and recent result but OAI-PMH protocol provide result faster than Z39.50.Z39.50 is not even to provide result if server is slow or dump or its name become change. In general, the harvesting approach has much better response time because only a local database has to be searched. The distributed query approach in general benefits from the real time nature of the queries and produces fresher results. So both protocols have some limitation to cope up with these problem one new technique came in face as RSS so with this user can get more information, recent information, in small easier, fast loading way. so library should select one from all of these or all of them according to need of there user.

with regards

Pallavi

World of Librarians as Infomanager

Wednesday, December 31, 2008

Question of UGC-NET, Dec. 2008

Saturday, December 6, 2008

Sunday, November 30, 2008

The main idea behind a normal search engine

Monday, June 30, 2008

Federated Search & Metadata Harvesting

web 2.0 as a tool of library

COUNT

About Me

Library mismanagement

Is this paper less or less paper society

IT and India

Terror of the library

Best shelf in the library

Blog Archive

My Blog List

My Blog List

Is this true

My Blog List

World of Librarians as Infomanager

Wednesday, December 31, 2008

Question of UGC-NET, Dec. 2008

Saturday, December 6, 2008

Sunday, November 30, 2008

The main idea behind a normal search engine

Monday, June 30, 2008

Federated Search & Metadata Harvesting

web 2.0 as a tool of library

COUNT

About Me

Library mismanagement

Is this paper less or less paper society

IT and India

Terror of the library

Best shelf in the library

Blog Archive

My Blog List

My Blog List

Is this true

My Blog List

Subscribe To