Normally, when Verity Spider resolves host names, it uses DNS lookups to convert the names to
canonical names, of which there can be only one per machine. This allows for the detection of
duplicate documents, to prevent results from being diluted. In the case of multiple aliased hosts,
however, duplication is not a barrier as documents can be referred to by more than one alias and
yet remain distinct because of the different alias names.
Example
You can have both marketing.verity.com and sales.verity.com running on the same host. Each
alias has a different document root, although document names such as index.htm can occur for
both. With the
option, both server aliases can be indexed as distinct sites. Without
-virtualhost
the
option, they would both be resolved to the same host name, and only the first
-virtualhost
document encountered from any duplicate pair would be indexed.
Note: If you are using Netscape Enterprise Server, and you have specified only the host name as a
virtual host, Verity Spider will not be able to index the virtual host site. This is because Verity Spider
always adds the domain name to the document key.
Content options
The following sections describe the Verity Spider content options.
-casesen
Makes processing case-sensitive by specifying that the spider separately process keys that differ
only in case. Use only for indexing UNIX servers.
-exclude
Syntax:
-exclude exp_1 [exp_n] ...
Specifies that files, paths, and URLs matching the specified expression(s) will not be followed. If
you use backslashes, you must double them so that they are properly escaped; for example:
C:\\test\\docs\\path
You can use wildcard expressions, where the asterisk (*) is for text strings and the question mark
(?) is for single characters; for example:
'/my_doc*/year199?'
In Windows, include double-quotation marks around the argument to protect special characters,
such as the asterisk (*). On UNIX, use single-quotation marks. This is only required when you
run the indexing job from a command line. Quotation marks are not necessary within a
command file (the
option).
-cmdfile
To use regular expressions, also specify the
option.
-regexp
To specify a file, path, or URL that you want followed but not indexed, use the
-indexclude
option. For document types, use the
option instead; for example, specify
-mimeexclude
rather than
.
-mimeexclude application/pdf
-exclude *.pdf
Note: When specifying a URL, you must use full, absolute paths using the same format that appears
in the HTML hyperlink. If the link is relative, you must change it to absolute to use it with the
-exclude
option.
See also
.
-regexp
Content options
119
Need help?
Do you have a question about the COLDFUSION MX 61 - CONFIGURING AND ADMINISTERING COLDFUSION MX and is the answer not in the manual?
Questions and answers