Overview
Flow control
When indexing Web sites, Verity Spider distributes requests to Web servers in a
round-robin manner. This means one URL is fetched from each Web server in turn.
With flow control, it is possible that a faster Web site will finish before a slower one.
Regardless, the Verity Spider optimizes indexing every Web server.
Verity Spider V3.7 adjusts the number of connections per server depending on the
download bandwidth. When the download bandwidth from a Web server falls below
a certain value, Verity Spider will automatically scale back the number of
connections to that Web server. There will always be at least one connection to a Web
server. When the download bandwidth increases to an acceptable level, Verity Spider
reallocates connections (per the value of the -connections option, which is 4 by
default). You can turn off flow control with the -noflowctrl option.
Multithreading
Since version 3.1, the Verity Spider has separated the gathering and indexing jobs
into multiple threads for concurrence. Verity Spider V3.7 can create concurrent
connections to Web servers for fetching documents, and have concurrent indexing
threads for maximum utilization. This translates to an overall improvement in
throughput. In previous releases, work was done in a round-robin manner, so that at
any given time, only one job was running. Spider attends to the Web sites within an
indexing job in a round-robin manner.
Efficient DNS lookups
Verity Spider V3.7 significantly reduces DNS lookups, which means great
improvements to spidering throughput. If spidering is limited by domain or host,
then no DNS lookups are made on hosts that fall outside of that range. Previously,
DNS lookups were made on all candidate URLs.
Proxy handling efficiency
The use of the -noproxy option for reducing proxy checking for certain hosts, and the
use of -proxyauth for authenticating on proxy servers allows for much greater
flexibility when dealing with indexing jobs that involve proxy servers and firewalls.
NOTE: Information Server V3.7does not support retrieving documents for viewing
through secure proxy servers. Do not use -proxyauth for indexing documents which
are to be viewed through Information Server V3.7.
147
Need help?
Do you have a question about the COLDFUSION 5-ADVANCED ADMINISTRATION and is the answer not in the manual?