ixparse man page on OPENSTEP

ixparse man page on OPENSTEP

Man page or keyword search:
man Server 1419 pages
apropos Keyword Search (all sections)
Output format


IXPARSE(1)							    IXPARSE(1)

NAME
       ixparse - generate and convert text processing information files

SYNOPSIS
       /usr/bin/ixparse	    [ -aAbChHfgnNprUvwWx ]    [ -ttype ]    [ -Dfile ]
       [ -Ffile ]  [ -Sfile ]  [ -Llanguage ]  [ -M# ]	[ -P# ]	  [ -ystring ]
       [ file ... ]

DESCRIPTION
       Given a list of files, or a stream on standard input, ixparse generates
       one of four types of profiling information on  standard	output.	  With
       the -v option, ixparse can also generate profiling information for each
       input file; the output is put into separate files named	by  adding  an
       extension to the input file's name (see below for the extensions).

       The  four  types	 of  profile  are:  weighting  domain (a binary format
       defined by the  Indexing	 Kit's	IXWeightingDomain  class),  histogram,
       description,  and Attribute Reader Format.  The binary weighting domain
       format is undocumented.	A description is a short summary that  can  be
       derived	from  some file formats, such as UNIX manual pages.  Attribute
       Reader Format is described in the Indexing  Kit	documentation  in  the
       NEXTSTEP General Reference.  Histogram format is described below.

       Weighting  domain  files	 can  be  used	with  ixbuild(1) or again with
       ixparse to alter the weighting of tokens in the index or profile.   For
       example, a weighting domain could be generated for all the source files
       in a development project:

	      ixparse -w *.[cm] >project.weight

       and that file could be used again with ixparse:

	      ixparse -Hp -Dproject.weight MyObject.m

       The result would be a histogram where the weights of words  are	skewed
       such  that  if  two words occur the same number of times in MyObject.m,
       those occurring less frequently in the entire set of source files (that
       is, in the domain file project.weight) have higher weights.

       In addition to generating profiling information for text files, ixparse
       can read	 existing  profiles  in	 weighting  domain  format,  histogram
       format,	and  NEXTSTEP Release 2 Word Frequency Table (WFTable) format,
       converting that information to one of the other formats.

HISTOGRAM FORMAT
       Each line of a file in histogram format has the form:

	      token weight rank

       token is the  token  or	word  in  the  index,  weight  is  its	weight
       (frequency)  in the domain, and rank is its cardinal rank in the domain
       (1 == most common, 2 = second most common, and so on).	rank  is  only
       present	in  histograms	produced by converting from weighting domains.
       The fields of the line are separated  by	 single	 spaces;  be  sure  to
       search  backward	 from  the  end	 of a line to find the token, as it is
       possible for the token to contain embedded spaces or tabs.

OPTIONS
       --	  List these options.

       The following options select input and output formats.  Only one of the
       input  options -t, -h, -w, and -x and one of the output options -H, -g,
       -W, and -b can be specified.

       -ttype	  Interpret input as of file type type (for exampe, -trtf  for
		  Rich	 Text	Format).   By  default,	 ixparse  attempts  to
		  determine the file type for each file automatically.

       -w	  Interpret input as weighting domain format.

       -h	  Interpret input as histogram format.

       -x	  Interpret input as NEXTSTEP Release 2 WFTable format.

       -H	  Generate output in histogram format.	This is the default.

       -g	  Generate output as descriptions of file contents.

       -W	  Generate output in weighting domain format.

       -b	  Generate output in Attribute Reader Format.

       -v	  Vector mode. Generate an output file for  each  input	 file.
		  Histogram   and   Attribute  Reader  Format  files  have  an
		  extension of .histogram (this is  a  bug;  Attribute	Reader
		  Format  files	 should	 use  .arf).   weighting domain format
		  files have an extension of .weight.  Description files  have
		  an extension of .description.

       The  remaining  options	control	 other	parsing switches and weighting
       calculations.

       -a	  Use absolute weighting.  The weight of a token (word) is its
		  number of occurrences in the input.

       -A	  Don't	 fold  plural word forms.  The default is to do plural
		  folding.

       -C	  Don't fold case to lower case.  The default is to fold case.

       -Dfile	  Use	the   supplied	 weighting   domain   file    (default
		  .index.domain).   This  is  used  for generating peculiarity
		  weighting.

       -f	  Use frequency	 weighting  (number  of	 occurrences  /	 total
		  tokens).

       -Ffile	  Use	the   supplied	 file	type   table   file   (default
		  .index.ftype).  See the  ixbuild(1)  manual  page  for  more
		  information on file type tables.

       -Llanguage Parse	 files	as  though  they  contain text in the language
		  language.  If no language is specified, the  system  default
		  language is used.

       -M#	  Use the supplied minimum weight; words below this weight are
		  dropped from the index.  The default is no  minimum  weight.
		  This option excludes use of the -P option.

       -n	  Sort histogram output by name rather than weight.

       -N	  Do not sort histogram output.

       -p	  Use  peculiarity  weighting  in conjunction with a weighting
		  domain (see -D).

       -P#	  Use  the  supplied  percentage  passed;  words  below	  this
		  percentage  are dropped from the index.  The default is 100%
		  passed.  This option excludes use of the -M option.

       -r	  Reduce words to stems; writer -> write.  The default is  not
		  to do this.

       -Sfile	  Use  the  supplied  stop words file (default .index.swords).
		  See the ixbuild(1) manual page for more information on  stop
		  words files.

       -U	  Disable  uniquing  in	 Attribute  Reader  Format.   See  the
		  Attribute Reader Format documentation for more information.

       -ystring	  Use the supplied punctuation string to  delimit  words;  for
		  example, -y".,; ".

SEE ALSO
       ixbuild(1), ixsearch(1), Indexing Kit Documentation in NEXTSTEP General
       Reference

BUGS
       ixparse doesn't read data in Attribute Reader Format.

       ixparse filters files from various formats during parsing.   It	should
       make the intermediate filtered formats available as output options.

       Sorting	options	 don't	apply when converting from domain to histogram
       formats.

       Output files generated by vector mode in Attribute Reader Format should
       use .arf as their extension, not .historam.

NeXT Computer, Inc.		August 24, 1993			    IXPARSE(1)

[top]

                             _         _         _ 
                            | |       | |       | |     
                            | |       | |       | |     
                         __ | | __ __ | | __ __ | | __  
                         \ \| |/ / \ \| |/ / \ \| |/ /  
                          \ \ / /   \ \ / /   \ \ / /   
                           \   /     \   /     \   /    
                            \_/       \_/       \_/

More information is available in HTML format for server OPENSTEP

List of man pages available for OPENSTEP

Copyright (c) for man pages and the logo by the respective OS vendor.

For those who want to learn more, the polarhome community provides shell access and support.

[legal] [privacy] [GNU] [policy] [cookies] [netiquette] [sponsors] [FAQ]

Polarhome, production since 1999.
Member of Polarhome portal.
Based on Fawad Halim's script.
....................................................................

Vote for polarhome