172
-indskip
Syntax:
Type: Web crawling only.
Specifies Verity Spider is follow and parse links, but not index, any HTML document
which contains the text of exp within the given HTML_tag. For multiple HTML_tag
and exp combinations, use multiple instances of the -skip option.
You can use wildcard expressions, where the asterisk ( * ) is for text strings and the
question mark ( ? ) is for single characters. For example:
'/my_doc*/year199?'
On Windows NT, you should include double quotes around the argument to protect
the special characters such as (*). On UNIX, you should use single quotes. Note that
this is only required when you run the indexing job from a command line. Quotes are
not necessary within a command file (-cmdfile).
If you use backslashes, you must double them so they are properly escaped. For
example:
C:\\test\\docs\\path
To use regular expressions, also specify the -regexp option.
Example
To skip all HTML documents which contain the word "personnel" in the Title
element, while still parsing those documents for links to other documents, use the
following:
-indskip title "personnel"
Example
To avoid indexing directory listing pages, while still parsing the document and path
links except for link up to the parent directory, use one of the following depending on
the Web server being indexed:
For Netscape Web servers, use the following:
-indskip title "*Index of*"
-nofollow "*parent directory*"
For Microsoft Internet Information Server, use the following:
-indskip a "*to parent directory*"
-nofollow "*parent directory*"
-maxdocsize
Syntax:
Specifies the maximum size, in kilobytes, for documents to be indexed. Any
documents larger than the value specified by maxdocsize will be ignored.
The default is to index documents of any sizes.
-indskip HTML_tag "exp"
-maxdocsize integer
Chapter 8 Verity Spider
Need help?
Do you have a question about the COLDFUSION 5-ADVANCED ADMINISTRATION and is the answer not in the manual?