'\" t
.TH groff_sanitize @MAN7EXT@ "@MDATE@" "groff-pdfmark @VERSION@"
.
.SH Name
groff_sanitize \- filter unwanted constructs out of groff data streams
.
.\" ====================================================================
.\" Legal Terms
.\" ====================================================================
.\"
.\" Copyright (C) 2024, Free Software Foundation, Inc.
.\"
.\" This file is part of groff-pdfmark, an independently maintained
.\" add-on for the GNU roff type-setting system.
.\"
.\" Permission is granted to copy, distribute and/or modify this
.\" document under the terms of the GNU Free Documentation License,
.\" Version 1.3 or any later version published by the Free Software
.\" Foundation; with no Invariant Sections, no Front-Cover Texts,
.\" and no Back-Cover Texts.
.\"
.\" A copy of the Free Documentation License is included as a file
.\" called fdl-v1.3.txt, in the fdl directory of the groff-pdfmark
.\" source package, whence it is programmatically marked up, to be
.\" processed by groff -m pdfmark, for inclusion as an appendix to
.\" the pdfmark.pdf document.
.\"
.\" ====================================================================
.
.\" Save and disable compatibility mode (e.g., for Solaris 10/11).
.nr _? \n(.C
.do rnn _? *groff_sanitize_7_man_C
.cp 0
.
.\" ====================================================================
.\" Local Macro Definitions
.\" ====================================================================
.
.\" @IMPORT_LOCAL_FALLBACK_MACROS@
.
.\" ====================================================================
.SH Description
.\" ====================================================================
.
The
.B \%groff_sanitize
auxiliary macro package provides a collection of filters,
all of which are accessed through a common
.B \%.\^sanitize
macro call,
for substitution,
or removal of particular
.MR groff @MAN7EXT@
constructs,
from copies of selected fragments of
the document source input data stream.
Such filtering of the input data may be useful,
for example,
when creating a PDF document outline,
or similar,
derived from some content within the document body,
whence any specified formatting controls may not be appropriate
for inclusion within the outline specification.
.
.
.\" ====================================================================
.SH Usage
.\" ====================================================================
.
The
.B \%groff_sanitize
macros
.I may
be loaded using a command line option,
as prescribed in compliance with the conventional syntax for
.MR groff @MAN1EXT@
and
.MR pdfroff @MAN1EXT@
commands:
.sp -0.25v
.P
.RS 3n
.ll -3n
.SY groff
.RI \|[\- option \ .\|.\|.\&]\|
.B \-m sanitize
.RI \|[\- option \ .\|.\|.\&]
.RI \^[ file \ .\|.\^.\&]
.
.SY pdfroff
.RI \|[\- option \ .\|.\|.\&]\|
.B \-m sanitize
.RI \|[\- option \ .\|.\|.\&]
.RI \^[ file \ .\|.\|.\&]
.YS
.ll +3n
.RE
.P
However,
these macros may be more commonly loaded from within document source,
or,
perhaps even more commonly still,
from within another dependent macro package,
using a request of the form:
.sp -0.25v
.P
.RS 3n
.SY .\^mso\0sanitize.tmac
.YS
.RE
.P
After the
.B \%groff_sanitize
macros have been loaded,
the entire gamut of their associated filters may be applied,
to some specific text,
by a macro call of the form:
.sp -0.25v
.P
.RS 3n
.SY .\^sanitize
.RI < varname >
.RI < text \ .\|.\|.\&>
.YS
.RE
.sp -0.25v
.P
in which the
.RI < varname >
argument is required;
it
.I must
represent a valid
.MR groff @MAN7EXT@
name for a string,
in which the filtered value of the
.RI < text \ .\|.\|.\&>
argument is to be stored.
.
.P
In practice,
there is little to be gained by calling
.B \%.\^sanitize
.I directly
from the top level of any document source;
more practical usage encapsulates a
.B \%.\^sanitize
macro call within another macro,
(which may be either user-defined,
or provided by another macro package),
such that the original text is used,
in two or more distinct contexts,
with at least one context using the filtered text,
while another uses the original unfiltered form;
such usage is illustrated in the
.SR 1 Examples
section of this manual page.
.
.
.\" ====================================================================
.SH Principle of Operation
.\" ====================================================================
.
On entry to the
.B \%sanitize
macro,
its
.I \%first
argument,
which is designated as
.RI \%\(lq< varname >\(rq,
and which
.I \%must
represent a valid
.MR groff @MAN7EXT@
identifier,
is interpreted as the name of a string
in which the resultant sanitized text is to be returned
to the calling macro;
this string is defined,
and initialized as an
.I \%empty
(i.e.\& zero-length) string,
and is also assigned an internal alias of
.BR \%sanitize:result .
.
A further internally named string,
.BR \%sanitize:residual ,
is also defined,
and initialized with a value which is comprised of
the aggregate content of the second,
and any additional arguments,
(designated as
.RI \%\(lq< text \ .\|.\|.\&>\(rq);
when more than one argument is incorporated into
the aggregate which forms
.RI \%\(lq< text \ .\|.\|.\&>\(rq),
then, before the content of each additional argument,
a token for one \%word-space is introduced into
.BR \%sanitize:residual ,
to separate the content of that argument
from that of its predecessor.
.
.P
Following this initialization,
.B \%sanitize
enters a cyclic processing phase,
wherein one token is removed from the beginning of
.B \%sanitize:residual
in each cycle,
with additional cycles continuing until no further tokens remain.
.
The token which is removed,
within each cycle,
is examined to determine how it should be processed
in that particular cycle;
the \%token-specific processing actions are:
.
.RS 2n
.ll -3n
.IP \(bu 2n
Any regular character token is simply appended to
.BR \%sanitize:result ,
and processing continues with the next cycle.
.
.IP \(bu 2n
A special token,
which corresponds to one of
.MR groff @MAN7EXT@ 's
escape sequences with a \%single-token representation,
and for which an associated
.B \%groff_sanitize
filter,
(see the
.SR 1 \%Filter\ Actions
section),
has been specified,
will either be discarded,
or it will be replaced by some designated
.IR "substitute text" ,
which is appended to
.BR \%sanitize:result ,
before commencing the next cycle.
.
.IP \(bu 2n
Any instance of the
.MR groff @MAN7EXT@
.I escape character
token will initiate an intermediate
.I \%look-ahead
processing cycle;
this will examine subsequent tokens,
within
.BR \%sanitize:residual ,
to identify a particular
.MR groff @MAN7EXT@
escape sequence,
and any arguments which may be associated with it.
When any such escape sequence has been identified,
and it is associated with a
.B \%groff_sanitize
filter,
.I all
tokens which have been consumed by the
.I \%look-ahead
cycles will be removed from
.BR \%sanitize:residual ,
and will either simply be discarded,
or some designated
.I substitute text
will be appended to
.BR \%sanitize:result ,
depending on which of the particular
.SR 1 \%Filter\ Actions
have been specified,
before proceeding to the next principal
token analysis cycle.
.
.ll +3n
.RE
.
.P
When the preceding cyclic processing has exhausted
.I all
of the tokens from
.BR \%sanitize:residual ,
.I all
of the
.B \%sanitize
macro's internally defined local identifiers,
.I including
.BR \%sanitize:residual ,
and
.BR \%sanitize:result ,
will be deleted,
leaving the sanitized text,
which had been accumulated in
.BR \%sanitize:result ,
in the string named by the
.RI \%< varname >
argument,
for return to the caller.
.
.
.\" ====================================================================
.SH Reserved Identifiers
.\" ====================================================================
.
The
.B \%groff_sanitize
macro package reserves regions of both the
.MR groff @MAN7EXT@
string and the numeric register namespaces,
in which all internal identifiers begin with the label
.RB \%\(lq sanitize: \(rq;
with the exception of those macros, strings,
or numeric registers which are
.I explicitly
documented as
.RI \(lq "user definable" \(rq,
or
.RI \(lq "user modifiable" \(rq,
in the
.SR 1 \%Filter\ Actions
section,
users are
.I strongly
advised to avoid defining,
or modifying,
any macro, string, or numeric register
with a name which begins with this label.
.
.
.\" ====================================================================
.SH Filter Actions
.\" ====================================================================
.
The collection of filters,
which is predefined by the
.B \%groff_sanitize
package,
comprises:
.
.RS 2n
.ll -3n
.IP \(bu 2n
.B \%sanitize:scan.reject\0\c
.RI < token-list >\|.\|.\|.
.sp 0.2v
This is a token elimination filter.
Implemented as a
.MR groff @MAN7EXT@
string,
its value represents a sequence of optionally quoted
.RI \%< token-list >
specifications,
each of which takes the form:
.sp -0.25v
.IP \& 5n
.RI \%["]< opening-delimiter >< reject-token\c
.RI \~.\|.\|.\^>< closing-delimiter >["]
.
.IP \& 2n
Within each such
.RI \%< token-list >
specification,
the
.RI \%< reject-token >
part comprises a list of one or more
.MR groff @MAN7EXT@
entities,
each of which yields a
.I single input token
when read in copy mode.
Each
.RI \%< reject-token >
list
.I must
be enclosed within a pair of arbitrary delimiter tokens,
the
.RI \%< opening-delimiter >
and the
.RI \%< closing-delimiter >,
which
.I must
be represented by
.I identically
the same input token;
this
.I must not
appear
.I anywhere
within the enclosed
.RI \%< reject-token >
list.
.
.IP \& 2n
During
.B \%sanitize
macro processing,
each token which is abstracted from
.B \%sanitize:residual
is compared,
in turn,
with each token which appears in the aggregate collection of
.RI \%< token-list >
constituents,
which comprise the value of
.BR \%sanitize:scan.reject ,
until either a matching token is found,
or all tokens in the aggregate
.RI \%< token-list >
have been compared,
and no matching token has been found.
If a matching token
.I is
found,
nothing is added to
.BR \%sanitize:result ,
and the
.B \%sanitize
macro moves on,
to process the next available token,
if any,
in
.BR \%sanitize:residual .
.
.IP \& 2n
By default,
.B \%groff_sanitize
defines
.B \%sanitize:scan.reject
with an aggregate
.RI \%< token-list >
which comprises the trio of tokens,
\(lq\fC\(rs&\fP\(rq,
\(lq\fC\(rs%\fP\(rq,
and \(lq\fC\(rs:\fP\(rq,
as established by the specification:
.sp -0.25v
.IP \& 5n
.BR \%.\^ds\ \|sanitize:scan.reject\0\c
\(dq\|\(aq\fC\(rs&\(rs%\(rs:\fP\(aq\^\fC\(rs\fP\(dq
.IP \& 2n
This specification is
.IR \%user-modifiable ,
either by use of an alternative
.RB \%\(lq .\^ds \(rq
request,
to redefine the collection of
.RI \%< token-list >
specifications,
(of which there is just the one in the default case),
in its entirety,
or by use of an
.RB \%\(lq .\^as \(rq
request,
to append additional
.RI \%< token-list >
specifications to it;
in the latter case,
each additional
.RI \%< token-list >
specification,
which is individually defined,
.I must
conform,
to the
.sp -0.25v
.IP \& 5n
.RI \%< opening-delimiter >< reject-token \~.\|.\|.\^>< closing-delimiter >
.sp -0.25v
.IP \& 2n
pattern,
and
.I must
be separated from its predecessor by one or more white space tokens,
within the definition of the
.B \%sanitize:scan.reject
string.
.
.IP \& 2n
When
.B \%sanitize:scan.reject
is defined to incorporate more than one
.RI \%< token-list >
specification,
while the
.RI \%< opening-delimiter >
and
.RI \%< closing-delimiter >
within each
.I must
be represented by the
.I same
token,
it is
.I not
necessary to use the same delimiter token for
.I every
individual
.RI \%< token-list >
specification;
it is permitted,
and may be convenient,
to employ a token which is included within the
.RI \%< reject-token >
list of one
.RI \%< token-list >
specification,
as the delimiter for another,
or analogously,
to include the delimiter token of one
.RI \%< token-list >
specification in the
.RI \%< reject-token >
list of another.
.
.IP \& 2n
Subject to the restrictions that each
.I must
be represented by a single
.MR groff @MAN7EXT@
input token,
and that the chosen token
.I must not
appear within any
.RI \%< reject-token >
list for which it is specified as a delimiter,
the choice of delimiter tokens is entirely arbitrary.
The ASCII apostrophe,
\%\(lq\|\(aq\|\(rq,
is usually a suitable choice.
As an alternative,
the ASCII double quote character,
\%\(oq\|\(dq\|\(cq,
may be considered,
'ne 2v
but this is a less convenient choice,
for reasons which will be explained later,
in the
.SR 1 "Caveats and Bugs"
section.
.
.
.IP \(bu 2n
.B \%sanitize:scan.subst
.RI \%< substitution-group >\ .\|.\|.\&
.sp 0.2v
This is a token substitution filter.
Like the
.B \%sanitize:scan.reject
filter,
this is also implemented as a
.MR groff @MAN7EXT@
string;
in this case its value represents a sequence of space separated,
optionally quoted token
.RI \%< substitution-group >
specifications,
each of which takes the form:
.IP \& 5n
.RI \%["]< start-delimiter >< token \~.\|.\|.\^>\c
.RI < mid-delimiter >< substitute-text >< end-delimiter >["]
.
.IP \& 2n
Processing of this filter occurs
.I after
the
.B \%sanitize:scan.reject
filter,
and then
.I only
if that filter
.I did not
match the current input token.
.
.IP \& 2n
The effect of this filter is fundamentally analogous to that of
.BR \%sanitize:scan.reject ,
insofar as it processes each of its
.RI \%< substitution-group >
specifications in turn,
comparing the current input token from
.B \%sanitize:residual
to each individual
.RI \%< token >
which has been specified between the
.RI \%< start-delimiter >
and the
.RI \%< mid-delimiter >
tokens,
in turn;
if a token match is found,
whereas the
.B \%sanitize:scan.reject
filter terminates its processing
.I without
adding
.I any
further content to
.BR \%sanitize:result ,
the
.B \%sanitize:scan.subst
filter appends the content which is specified within the
.RI \%< substitute-text >
field of the matching
.RI \%< substitution-group >,
to
.BR \%sanitize:result ,
discards the matching input token from
.BR \%sanitize:residual ,
and terminates its action,
with
.I no
consideration of any further
.RI \%< substitution-group >
comparisons for the input token.
.
.IP \& 2n
Analogously to the choice of
.RI \%< opening-delimiter >
and
.RI \%< closing-delimiter >
tokens,
within the definition of the
.B \%sanitize:scan.reject
filter,
the
.RI \%< start-delimiter >,
.RI \%< mid-delimiter >,
and
.RI \%< end-delimiter >
tokens,
within each individual
.RI \%< substitution-group >
within the definition of the
.B \%sanitize:scan.subst
filter,
.I must all
be represented by
.I the same
token,
but it is permitted to choose different delimiter tokens,
in distinct
.RI \%< substitution-group >
specifications.
.
.IP \& 2n
By default,
.B \%groff_sanitize
provides a definition of the
.B \%sanitize:scan.subst
filter,
comprising
.I two
initial
.RI \%< substitution-group >
specifications,
thus:
.sp -0.25v
.IP \& 5n
.RB \% .\^ds \0 sanitize:scan.subst \c
.RB \0\(dq\^\(aq\fC\(rs-\fP\^\(aq\^\fC-\fP\^\(aq\c
.RB \0\(dq\^\(aq\fC\(rs\fP\~\fC\(rs\(ti\^\fP\(aq\c
.RB \fC\0\fP\(aq\^\(dq\fC\(rs\fP\(dq
.sp -0.25v
.IP \& 2n
the effect of which is to request replacement of
any instance of a \(lq\fC\(rs-\fP\(rq input token
with an \%ASCII \%hyphen-minus token,
and any instance of either a \(lq\fC\(rs\fP<\fCSP\fP>\(rq
input token,
or a \(lq\fC\(rs\(ti\fP\(rq input token,
with an \%ASCII space \%(i.e.\~\(lq<\fCSP\fP>\(rq)
token,
in the resultant sanitized text.
.
.IP \& 2n
Notice that,
in
.I both
of these default
.RI \%< substitution-group >
specifications,
the \%ASCII apostrophe is employed as the delimiter token.
Also note that the second of these
.RI \%< substitution-group >
specifications,
in its entirety,
is enclosed within a pair of \%ASCII double quote tokens.
This is necessary,
because the specification for this
.RI \%< substitution-group >
contains white space tokens;
the rationale for this requirement is explained later,
in the
.SR 1 "Caveats and Bugs"
section.
.
.IP \& 2n
As is the case for the
.B \%sanitize:scan.reject
filter,
the
.B \%sanitize:scan.subst
filter may be modified by the user,
either by redefining the specification string
in its entirety,
or by appending desired additional
.RI \%< substitution-group >
specifications to the end of the existing definition.
Note that,
when modifying the specification string,
individual
.RI \%< substitution-group >
specifications
.I must
be separated by white space;
within each
.RI \%< substitution-group >,
all three delimiters
.I must
be represented by
.I identically
the
.I same
token,
and any
.RI \%< substitution-group >,
in which white space is included,
.I must
be enclosed,
within the specification string,
between a pair of \%ASCII double quote character tokens.
.
.
.IP \(bu 2n
.B \%sanitize:esc-char.subst
.RI \%< substitution-group >\ .\|.\|.\&
.sp 0.2v
This is an analogue of the
.B \%sanitize:scan.subst
filter,
except that,
whereas the latter specifies substitutions for escape sequences
such as \%\(lq\fC\(rs\(ti\fP\(rq,
and \%\(lq\fC\(rs\fP<\fCSP\fP>\(rq,
each of which is represented by a
.IR "single input token" ,
the
.B \%sanitize:esc-char.subst
filter supports defined substitutions
for any syntactically similar,
but
.I semantically different
escape sequence,
such as \%\(lq\fC\(rs0\fP\(rq,
for which the
.MR groff @MAN7EXT@
representation comprises
.IR "two separate input tokens" .
.
.IP \& 2n
The form of
.RI \%< substitution-group >
specifications for the
.B \%sanitize:esc-char.subst
filter:
.sp -0.25v
.IP \& 5n
.RI \%["]< start-delimiter >< escape \~.\|.\|.\^>\c
.RI < mid-delimiter >< substitute-text >< end-delimiter >["]
.sp -0.25v
.IP \& 2n
may
.I appear
to be
.I syntactically
similar to that for the
.B \%sanitize:scan.subst
filter:
.sp -0.25v
.IP \& 5n
.RI \%["]< start-delimiter >< token \~.\|.\|.\^>\c
.RI < mid-delimiter >< substitute-text >< end-delimiter >["]
.
.IP \& 2n
However,
the two differ
.I semantically
insofar as,
whereas the
.RI \%< token >
list specification for the
.B \%sanitize:scan.subst
filter is expected to comprise only
entities which are each represented by a single
.MR groff @MAN7EXT@
input token,
the corresponding
.RI \%< escape >
list specification,
which is expected by the
.B \%sanitize:esc-char.subst
filter,
should comprise one or more
.MR groff @MAN7EXT@
escape sequences,
each of which is represented by
.I exactly two
input tokens,
the first of which should be the
.MR groff @MAN7EXT@
escape character,
while the second is any other input token
which is representative of any valid
.MR groff @MAN7EXT@
escape sequence,
which does
.I not
take an argument.
.
.IP \& 2n
Apart from this semantic difference,
in the expression of their respective
.RI \%< substitution-group >
specifications,
the
.B \%sanitize:esc-char.subst
filter exhibits,
fundamentally,
the same behaviour as the
.B \%sanitize:scan.subst
filter:
when any escape sequence which is represented in an
.RI \%< escape >
list specification is encountered,
within the input text,
it is discarded,
and its corresponding
.RI \%< substitute-text >
is appended to the resultant sanitized text,
in its place.
.
.IP \& 2n
The
.B \%groff_sanitize
default definition of
.B \%sanitize:esc-char.subst
is:
.sp -0.25v
.IP \& 5n
.RB \% .\^ds \0 sanitize:esc-char.subst \c
.RB \0\(dq\^\(dq\^\(aq\fC\(rs0\fP\(aq\0\(aq\^\(dq\c
.RB \0\(aq\fC\(rs,\fP\h'-0.5n'\fC\(rs/\fP\(aq\|\(aq\fC\(rs\fP\(dq
.sp -0.25v
.IP \& 2n
which results in replacement,
within sanitized text,
of the \%\(lq\fC\(rs0\fP\(rq
.I input
escape sequence by a simple ASCII space,
and
.I removal
of both the \%\(lq\fC\(rs,\fP\(rq and \%\(lq\fC\(rs/\fP\(rq
escape sequences,
(effectively achieved by replacing each by
.IR nothing ).
.
.IP \& 2n
Just as the
.B \%sanitize:scan.reject
and
.B \%sanitize:scan.subst
filters may be modified,
or redefined,
to handle additional single-token escape sequences,
the
.B \%sanitize:esc-char.subst
filter may be modified,
or redefined,
to accommodate additional dual-token sequences;
as before,
when any
.RI \%< substitution-group >
specification includes white space,
that specification
.I must
be enclosed in ASCII double quotes,
within the string definition,
while all individual
.RI \%< substitution-group >
specifications
.I must
be separated by
.I unquoted
white space.
.
.IP \& 2n
Notice that there is no
.I direct
analogue of the
.B \%sanitize:scan.reject
filter,
for the removal of dual-token escape sequences;
however,
an equivalent effect may be achieved by use of the
.B \%sanitize:esc-char.subst
filter,
as is illustrated within it default definition for the removal of
the \%\(lq\fC\(rs,\fP\(rq and \%\(lq\fC\(rs/\fP\(rq escape sequences,
by definition of a
.RI \%< substitution-group >
specification with
.I nothing
in the
.RI \%< substitute-text >
field.
.
.
.IP \(bu 2n
.BI \%sanitize:esc\-(\^ ??
.RI \%< substitute-string >
.sp 0.2v
This is a generic template for a special character substitution filter;
it may be instantiated,
as required,
by defining strings of the form:
.sp -0.25v
.IP \& 5n
.BI .\^ds\0sanitize:esc\-(\^ ??
.RI \%\(dq substitute-string \(rs\(dq
.sp -0.25v
.IP \& 2n
in which the
.RI \%\(lq ?? \(rq
place-holder is replaced by any two-character special character name,
as documented in
.MR groff_char @MAN7EXT@ ,
to implement a substitution filter for
the corresponding named special character escape sequence,
such that,
when this escape sequence is encountered by the
.B \%sanitize
macro,
while processing its local
.B \%sanitize:residual
string,
the value which is specified as
.RI \%< substitute-string >
will be appended to
.BR \%sanitize:result ,
in place of the escape sequence.
.
.IP \& 2n
.B \%groff_sanitize
defines
.I four
default instances of the
.BI sanitize:esc\-(\^ ??
filter,
namely:
.sp -0.25v
.IP \& 5n
'nf
.BR \%.\^ds\0sanitize:esc\-(\^hy\0 \(dq\-\fC\(rs\fP\(dq
.BR \%.\^als\0sanitize:esc\-(\^mi\0sanitize:esc\-(\^hy
.BR \%.\^als\0sanitize:esc\-(\^en\0sanitize:esc\-(\^hy
.BR \%.\^ds\0sanitize:esc\-(\^em\0 \(dq\-\-\fC\(rs\fP\(dq
.fi
.sp -0.25v
.IP \& 2n
which result in the substitution of
a single \%ASCII \%hyphen-minus character,
in sanitized text,
in place of each \%\(lq\fC\(rs\h'-0.4n'(hy\fP\(rq,
\%\(lq\fC\(rs\h'-0.4n'(mi\fP\(rq,
or \%\(lq\fC\(rs\h'-0.4n'(en\fP\(rq input token,
and substitution of a conjoined pair of
\%ASCII \%hyphen-minus characters,
in place of each \%\(lq\fC\(rs\h'-0.4n'(em\fP\(rq input token.
.
.IP \& 2n
It may be observed that,
whereas any instance of a filter,
which is derived from this template,
is
.I \%always
defined in a format which
may seem to be indicative of
.BI \%troff 's
traditional
\%\(lq\fC\(rs\h'-0.4n'(??\fP\(rq
representation of two-character special character escape sequences,
the
.B \%sanitize
macro will recognize
.MR groff @MAN7EXT@ 's
alternative
.RI \%\(lq\fC\(rs\h'-0.3n'[??]\fP\(rq
representation as being equivalent,
and will process it accordingly,
and so,
also append the assigned value,
corresponding to
.RI \%< substitute-string >
to
.BR \%sanitize:result ,
in place of this alternative representation
of the escape sequence.
.
.
.IP \(bu 2n
.BI \%sanitize:esc\-generic
.sp 0.2v
This is a generic template for an escape sequence elimination filter;
it may be instantiated to recognize
.MR groff @MAN7EXT@
escape sequences in any of the three forms,
.RI \%\(lq\fC\(rs\fP ?c \(rq,
.RI \%\(lq\fC\(rs\fP ? \fC(\fP cc \(rq,
or
.RI \%\(lq\fC\(rs\fP ? \fC[\fP.\|.\|.\fC]\fP\(rq,
in which the
.RI \%\(lq\^ ? \^\(rq
placeholder represents any single-character
.I function identifier
for an escape sequence which takes a single-character argument,
.RI \%\(lq\^ c \^\(rq,
a two-character argument,
.RI \%\(lq cc \(rq,
or an arbitrary length argument,
.RI \%\(lq\fC[\fP.\|.\|.\fC]\fP\&\(rq.
.
It may be instantiated for any specific escape sequence,
with
.I function identifier
.RI \%\(lq\^ ? \^\(rq,
by defining an alias of the form:
.sp -0.25v
.IP \& 5n
.BI \%.\^als\0sanitize:esc\- ? \0sanitize:esc\-generic
.sp -0.25v
.IP \& 2n
which then has the effect of removing any instance of
.RI \%\(lq\fC\(rs\fP ? \(rq,
together with its argument,
in any of the three supported formats,
from
.BR \%sanitize:residual ,
while adding
.I \%nothing
to the sanitized text
which is to be returned through
.BR \%sanitize:result .
.
.IP \& 2n
.B \%groff_sanitize
defines
.I two
default instances of the
.B \%sanitize:esc\-generic
filter,
namely:
.sp -0.25v
.IP \& 5n
'nf
.B \%.\^als\0sanitize:esc\-f\0sanitize:esc\-generic
.B \%.\^als\0sanitize:esc\-F\0sanitize:esc\-generic
.fi
.sp -0.25v
.IP \& 2n
the combined effect of which prevents the propagation of any
.RI \%\(lq\fC\(rs\fPf c \(rq,
.RI \%\(lq\fC\(rs\fPf\fC(\fP cc \(rq,
.RI \%\(lq\fC\(rs\fPf\fC[\fP.\|.\|.\fC]\fP\(rq,
.RI \%\(lq\fC\(rs\fPF c \(rq,
.RI \%\(lq\fC\(rs\fPF\fC(\fP cc \(rq,
or
.RI \%\(lq\fC\(rs\fPF\fC[\fP.\|.\|.\fC]\fP\(rq,
escape sequence into sanitized text;
users may wish to extend the effect of this filter,
by defining additional aliases,
modelled on this default pair,
to suppress propagation of other
syntactically similar escape sequences.
.
.
.IP \(bu 2n
.BI \%sanitize:esc\-delimited
.sp 0.2v
This is a complement for the
.RI \%\(lq\fC\(rs\^\fP ? \fC\h'-0.2n'[\fP.\|.\|.\&\fC]\fP\(rq
form of the
.B \%sanitize:esc\-generic
filter;
like the latter,
it eliminates a functional escape sequence,
in which the
.I function identifier
token is followed by an arbitrary length argument,
from the resultant sanitized text;
it differs from the latter in the form in which
that argument is expressed.
.
.IP \& 2n
Whereas the
.B \%sanitize:esc\-generic
filter expects an arbitrary length escape sequence argument
to be expressed in the form \%\(lq\fC[\fP.\|.\|.\&\fC]\fP\(rq,
the
.B \%sanitize:esc\-delimited
filter expects the argument to the escape sequence to have the form:
.sp -0.25v
.IP \& 5n
.RI \%< opening-delimiter >< argument-text >< closing-delimiter >
.sp -0.25v
.IP \& 2n
with the
.RI \%< opening-delimiter >
and the
.RI \%< closing-delimiter >
being represented by
.I identically
the same arbitrarily chosen input token;
it is assumed that this arbitrarily chosen delimiter token does
.I not
appear
.I anywhere
within
.RI \%< argument-text >.
.
.IP \& 2n
By default,
.B \%groff_sanitize
defines
.I two
instances of the
.B \%sanitize:esc-delimited
filter,
namely:
.sp -0.25v
.IP \& 5n
'nf
.B \%.\^als\0sanitize:esc\-s\0sanitize:esc\-delimited
.B \%.\^als\0sanitize:esc\-v\0sanitize:esc\-delimited
.fi
.
.IP \& 2n
Users may add additional filters,
similar to these,
to support any other escape sequences which exhibit
similar semantics to this default pair.
.ll +3n
.RE
.
.
'ne 4v
.\" ====================================================================
.SH Files
.\" ====================================================================
.
.TP 4n
.I \%@SITE_TMACDIR@/sanitize.tmac
Implements the
.B \%sanitize
macro,
and its supporting predefined
.B \%groff_sanitize
filters.
.
.
.\" ====================================================================
.SH Caveats and Bugs
.\" ====================================================================
.
Inclusion of any escape sequence,
which lacks an associated
.B \%groff_sanitize
filter action assignment,
within text which is to be sanitized,
may have unpredictable,
and undesirable effects.
.
.P
Passing input text,
which includes any use of the \%\(lq\fC\(rss\fP\(rq escape sequence,
in any of its supported forms other than the
.RI \%\(lq\fC\(rss< delimiter >< expression >< delimiter >\(rq
form,
to the
.B \%sanitize
macro,
will confuse the
.B \%sanitize:esc\-s
filter,
producing unpredictable,
and probably undesirable,
results in the sanitized text.
.
.P
If redefining either the
.BR \%sanitize:scan.reject ,
or the
.B \%sanitize:scan.subst
filter,
their associated
.RI \%< token >
list specifications are interpreted
.I strictly
as sequences of single-token entities,
each of which nominally represents a special escape sequence,
with no associated argument;
inclusion of any token sequence,
which does
.I not
represent such an entity,
will have unpredictable,
and most likely undesirable results.
.
.P
Conversely,
the
.RI \%< escape >
list specifications for the
.B \%sanitize:esc-char.subst
filter
.I must
comprise
.I only
sequences of
dual-token escape sequences,
.I none
of which accept any argument;
inclusion of any tokens which are not paired,
with the escape character token as the first of the pair,
will not be interpreted as required,
to deliver the intended behaviour
of this filter.
.
.P
When redefining,
or otherwise modifying,
any of the
.BR \%sanitize:scan.reject ,
.BR \%sanitize:scan.subst ,
or
.BR \%sanitize:esc-char.subst
filter specifications,
it is important to understand how any ASCII double quote tokens
will be interpreted,
within these specification strings.
Effectively,
each of these strings may be passed,
.IR unquoted ,
within an argument list to an internal
.B \%groff_sanitize
macro,
and thus,
the argument grouping effect of the ASCII double quote token
will override any other intended effect.
Consequently,
while it is certainly possible to work around the limitation,
the choice of the ASCII double quote as a
.RI \%< substitution-group >
delimiter token may be less convenient
than some alternative choice.
.
.
.\" ====================================================================
.SH Examples
.\" ====================================================================
.
The
.B \%sanitize
macro is not,
typically,
called
.I directly
from any user's
.MR groff @MAN7EXT@
document source;
it
.IR is ,
however,
often incorporated into higher level macros,
such as the following example,
which inserts text,
the specification of which may incorporate
some arbitrary format controlling escape sequences,
as a heading,
into a PDF document body,
while placing a sanitized copy of the same text
into an associated PDF document outline:
.
.RS 3n
.sp 0.25v
.EX
\&.de H
\&.\(rs" Usage: .H \(lalevel\(ra \(latext\(ra ...
\&.\(rs"
\&.\(rs" Save the heading level argument.
\&.\(rs"
\&.   nr \(rs\(rs$0.level \(rs\(rs$1
\&.
\&.\(rs" Set each new heading one paragraph space below any
\&.\(rs" text which precedes it.
\&.\(rs"
\&.   sp \(rs\(rsn(PDu
\&.
\&.\(rs" Reduce arguments to heading text, and copy this to the
\&.\(rs" PDF document outline, at the specified nesting level.
\&.\(rs"
\&.   shift
\&.   sanitize \(rs\(rs$0.text \(rs\(rs$@
\&.   pdfhref O \(rs\(rsn[\(rs\(rs$0.level] -- \(rs\(rs*[\(rs\(rs$0.text]
\&.
\&.\(rs" Write a formatted copy of the heading text to the body
\&.\(rs" of the PDF document.
\&.\(rs"
\&.   ft B
\&.   nop \(rs&\(rs\(rs$*
\&.   ft 1
\&.
\&.\(rs" Clean up temporary local storage.
\&.\(rs"
\&.   rr \(rs\(rs$0.level
\&.   rm \(rs\(rs$0.text
\&..
.EE
.RE
.
.P
This heading macro might be invoked by a call such as:
.
.RS 3n
.sp 0.15v
.EX
\&.H 1 An Example PDF Heading with \(rsF[C]sanitize\(rsF[] Requirement
.EE
.sp 0.15v
.RE
which,
in the absence of the
.B \%sanitize
macro call within the
.B H
macro definition,
would result in artefacts of
the embedded change of font family escape sequences
infiltrating the corresponding PDF document outline reference.
.
.
.\" ====================================================================
.SH Authors
.\" ====================================================================
.
The
.B \%groff_sanitize
macros are provided by the
.I \%groff-pdfmark
auxiliary package,
which was written by
.MT @AUTHOR_MT_ADDRESS@
Keith\ Marshall
.ME ;
this is maintained independently of
.IR \%GNU\ roff ,
at
.UR @PROJECT_HOSTING_SITE@
Keith's
.I \%groff-pdfmark
project hosting \%web-site
.UE ,
whence the latest version may
.I always
be obtained.
.
.
.\" ====================================================================
.SH See Also
.\" ====================================================================
.
.\" @ENUMERATE_MR_REFERENCES@
.
.P
More comprehensive documentation,
on the use of the
.I \%groff-pdfmark
macro suite may be found,
in PDF format,
in the reference guide
.RI \[lq] "Portable Document Format Publishing with GNU Troff" \[rq],
which has also been written by Keith Marshall;
the most recently published version of this guide may be read online,
by following the appropriate document reference link on
.UR @PROJECT_HOSTING_SITE@
the
.I \%groff-pdfmark
project hosting \%web-site
.UE ,
whence a copy may also be downloaded.
.
.\" ====================================================================
.
.\" Restore compatibility mode (for, e.g., Solaris 10/11).
.cp \n[*groff_sanitize_7_man_C]
.do rr *groff_sanitize_7_man_C
.
.\" ====================================================================
.\"
.\" Local Variables:
.\" fill-column: 72
.\" mode: nroff
.\" End:
.\"
.\" vim: set filetype=groff textwidth=72:
