Friday, November 04, 2005

The Deep Web – Vast but Not Unsearchable

With so much information available on the web and now easily searchable by current search technologies it may be hard to believe that it doesn’t cover all information available on the web. Wikipedia provides a useful definition of the Deep Web which includes information stored in databases, and multimedia files.

In fact, search engines cover only a fraction of available information. Although somewhat dated, Bright Planet pegs the available information at 550 times greater than what’s found through search engines. This “deep” information is stored in databases and is created dynamically. Marcus Zillman places the size of the deep web at over 600B pages with the searchable web around 6B.

Search engines crawl static pages and cannot dive into databases thus missing it. There are numerous search engines focused on this task. One of them is Turbo10 which searches over 800 other search engines combing through the Deep Web. According to their site they handle over 90M searches per month. This seems to be a common approach to the Deep Web – query on the domain area to find the right domain-specific search tool, and then use that tool to dive into the Deep Web.

The Deep Web is particularly important for teachers. Here’s a list of the top 20 skills an educator should have.

For a tutorial on how to navigate some of these databases check out split-level searching techniques and more at this site.

For virtual instrumentation, GlobalSpec offers a number of deep web search tools.

For additional resources, click here.

If you are working with Deep Web technology, I would like to hear from you. You can reach me at hall.martin@ni.com.

Best regards,
Hall T. Martin