P18 Internationalizing Preprocessor


Previous  Next  .  Contents
About  .  Documentation  .  License   .  Download


The P18 preprocessor recognizes three different types of escape sequences: The preprocessor maintains a stack of conditions, which may be manipulated using the ## id, ## else, and ## endif directives. The preprocessor will send input characters to the output if this stack is empty or all conditions on the stack are TRUE (we'll say the preprocessor output is active if this is the case).

All input characters that are not escape sequences are sent to the output if the output is active and discarded if the output is not active.

Preprocessor Directives

Most preprocessor directives are used to control the preprocessor state. P18 recognizes the following directives: Directives take arguments of three different types (depending on the directive):
  1. Expression arguments. See section Preprocessor Expressions.
  2. Name arguments. A name argument is a valid identifier. See section Preprocessor Identifiers.
  3. Value arguments. A preprocessor value is always a string constant. See Preprocessor Values.

Every line starting with two successive hash marks is interpreted as a directive. Whitespace characters preceeding the hash marks are ignored. Directive lines can be continued by putting a backslash characters at the end of the line.

Before the directive line is parsed, variable references are resolved. However, @-escape sequences not forming a variable reference are not resolved at this stage. @-characters that are shadowed by a backslash character are not interpreted as the beginning or end of an @ escape. It is possible to quote a variable reference by preceeding the identifier with a colon.

For ## eval and ## endmacro directives, the directive keyword (eval or endmacro) must be visible on the input line (i.e. before the variable references are resolved).

The following example will define the macro FOO:
## define MyMacro macro FOO(X)
## @MyMacro@
## endmacro

However, the following will not work, since the ## endmacro can't be recognized while the macro is being defined:
## define MyMacro macro FOO(X)
## define MyEndMacro endmacro
## @MyMacro@
## ; This will NOT work!
## @MyEndMacro@

Preprocessor Identifiers

Variables and macros are bound to identifiers. An identifier is a sequence of alphanumeric characters, underscores, and dollar signs ($). An identifier may not start with a digit.

The dollar signs are used for mangling macro names and macro default parameters, and should be used for that purpose only. However, P18 does not enforce this.

Even though it is possible to use the names true and false (case is insignificant here), you probably don't want to do this, since these identifiers could not be used within an expression (see below).

Preprocessor Values

The preprocessor can handle only two data types: string values and booleans. Only string values may be bound to variables/macros.

A string value is represented by a string constant, which is a sequence of characters enclosed in a pair of double quotes ("). The string constant syntax is similar to the ANSI C string constant syntax, all backslash-escaped special characters valid in ANSI C are also valid in P18 string constants.

When defining a variable using the ## define, an alternate shorthand syntax is avalable: if the value does not start with a quoting character, the character string up to (but not including) the next newline character or EOF is taken as the value, literally.

Preprocessor Expressions

Preprocessor expressions are expressions following the usual syntax for partially parenthesed infix expressions with some unary prefix operators. In plain english, this means that binary operators appear between their operands, unary operators appear in front if their operands, operators are evaluated according to a defined operator preceedence, and parentheses can be used to modify the evaluation order.

When an expression is evaluated in a boolean context (either as a condition or as a part of a larger expression), then every value is read as TRUE, except for an undefined value and the boolean value FALSE. This means that an identifier in a boolean context evaluates to TRUE if and only if the variable is defined, since boolean values can't be bound to variables.

Here's the expression grammar in BNF:

  PrimaryExpression = PrimaryExpression
PrimaryExpression != PrimaryExpression
OrExpression | AndExpression
OrExpression ^ AndExpression
AndExpression & UnaryExpression
! UnaryExpression
  ( Expression )

The following binary operators are recognized:

The only unary operator recognized is the logical NOT operator (!). The NOT operator evaluates to TRUE if (and only if) its operatand evaluates to FALSE.

The boolean constant values TRUE and FALSE are represented by the case-insensitive character sequences true and false. This means that variables bound to identifiers true or false (or FaLSe...) can't be referenced from within an expression. Avoid using such identifiers.

List of Directives

The ## ; Directive (Comment)
## ; comment
Insert a comment into the source file. This directive is ignored. It is possible to combine the comment directive with most other directives, the only exceptions are ## eval, ## include, and ## define. A comment in a ## define is possible only if the value defined is written as a string constant, otherwise the comment would be considered part of the defined value (the same is true for ## include). An example of a combined comment:
## if FOO == "foo" ; This is the magic "foo" case.
## endif ; End of the magic "foo" case.
The ## \ Quote
## \ text
This is not really a directive, it prevents a line of input from being interpreted as a directive. The ## \ and all leading whitespace is ignored, and the rest of the line is processed as if the ## \ were not there.
The ##condition Directive
## condition expression

This directive must be used before the first non-whitespace character is sent to the output. If the specified expression evaluates to FALSE, then no output file is generated (i.e. the output is sent to /dev/null). However, the input file is processed anyway.

The ##include Directive
## include filename

Include the specified file. The contents of the file are read and inserted into the input stream. The filename may be specified in C syntax or without any delimiters. If the filename is specified without delimiters, everything starting from the first non-whitespace character to the end of the line is interpreted as the filename. Note that the ## include can't be combined with a comment if the filename is specified without delimiters.

It is also possible to specify the filename using quoting characters as delimiters. In addition to the standard quoting characters singlequote (') and doublequote ("), the filename may also be quoted in angle brackets (i.e. the less-than and greater-than signs, < and >). Note that in contrast to the standard C preprocessor, all methods of quoting the filename are equivalent.

The ##if Directive
## if expression

Mark the beginning of a conditional block. If the specified expression evaluates to TRUE, then the initial condition of the block is TRUE (i.e. the output is active if the conditions of all surrounding conditional block are TRUE). If the expression evaluates to FALSE, then the initial condition of the conditional block is set to FALSE (i.e. the output is deactivated).

The ##else Directive
(a) ## else
(b) ## else if expression

This directive may appear only inside of a conditional block. For variant (a) the condition of the inner-most conditional block is inverted. For variant (b), if the condition of the inner-most conditional block is TRUE, it is set to FALSE (unconditionally), if the condition is FALSE, it is set to the evaluation result of the specified expression. (In plain english, ## else and ## else if do exactly what you expect.)

Technically, there's no reason why one should not have an arbitrary number of ## else directives within a single conditional block. However, allowing this would be error prone without being useful, so P18 imposes the usual restrictions on the use of ## else and ## else if (i.e. within a single conditional block, ## else or ## else if may appear only following an ## if or ## else if).

The ##endif Directive
## endif

Mark the end of a conditional block.

The ##define Directive
(a) ## define name
(b) ## define name value

Define a variable or macro named name. If the value parameter is omitted, the variable is defined to an empty string (which evaluates to TRUE in a boolean context, see section Expressions). The ## define directives takes effect only if the output is active.

If a value parameter is specified, the parameter is expected to be a string constant value (see section Values). If the value does not start with a double quote, the value is the entire string from the first non-whitespace character following the name parameter up to (but not including) the next newline character. If the value is not written as a string constant, the ## define can't be combined with a comment.

Note that lines continued with a trailing backslash character are interpreted as a single line.

The ##undef Directive
(a) ## undef name
(b) ## undef ! name

Un-define a variable or macro. The ## undef directives takes effect only if the output is active.

Variant (a) un-defines the nemed variable and all variables associated with the specified name. A variables is associated with the specified name, if the name of that variable starts with the specified name followed by a dollar sign. Use variant (a) to remove a defined macro and all its default parameters. Variant (b) only un-defines the variable with the exact name specified. It is not an error if the specified variable or macro is not defined, the ## undef directive will if no effect in that case.

The ##macro Directive
(a) ## macro name
(b) ## macro name(parameter-list)

Define a variable or macro. Variant (a) is just a multi-line version of ## define, however if you want to define a macro with parameters (and possibly default values for these parameters), you may use variant (b). For variant (b), the specified name may not contain a dollar sign. The ## macro directive takes effect only if the output is active.

The parameter list is a non-empty list of comma separated simple identifiers (identifiers not containing a dollar sign). The last of these identifiers may be preceeded by an ellipsis (three successive dots, "..."), indicating a variable argument list. The identifiers in the parameter list define the names of the parameters, arguments passed to the macro are available as variables bound the corresponding parameters names in the context of the macro expansion.

For every parameter. a default value may be specified. The default value is written behind the parameter name, separated by an equal sign (=). When the macro is called and an argument for a parameter with a default value is omitted, the specified default value is used in place of the omitted argument.

An example for a macro definition:
## macro FEATURE(feature = "NONE")
## if ENABLE_@feature@ || feature == "NONE"
## endmacro

## macro FEATURE_END
## endif
## endmacro

The macro body is everything up to the next ## endmacro directive, not including the newline character preceeding the ## endmacro.

When referencing a parameter passed to a macro, care should be taken if the argument text passed through the parameter should be parsed again. E.g. if an argument text might contain an I18N escaped, the parameter should be referenced as a macro, not as a variable. Here is an example:
## macro ITEM(title, text, link = "NONE")
## if link == "NONE"
## else
<a name="@link@"><b><u>@title()@</u></b></a>
## endif
## endmacro
Note that the parameter link is referenced as a variable, not as a macro, i.e. when the macro ITEM is expanded, the value of link will be copied straight to the output without being reparsed.

The ##endmacro Directive
(a) ## endmacro [name]
(b) ## endmacro counter [name]

Mark the end of a macro definition. People with a healthy brain will probably only need variant (a) of this directive. The counter parameter in variant (b) determines for how many expansions the directive should be ignored, i.e. when the directive ## endmacro n is found in a macro body and n is an interger greater than zero, then the directive ## endmacro n - 1 is inserted into the macro body. The directive ## endmacro 0 equivalent to variant (a).

Variant (b) is useful if you want to define a macro which in turn defines a macro when expanded. Counter values larger than 1 are probably hardly useful, but I am sure that someone, somewhere will define a macro, which defines a macro, which defines a macro...

For both variants, the name of the macro may be specified through the optional parameter name. If the name parameter is specified, it must match the name of the macro.

The ##dnl Directive
## dnl

Delete the newline character preceeding the ## dnl from the output. This directive can be used to join the surrounding lines to a single long line in the output or to remove the newline character at the end of a conditional block.

## if some_condition
## dnl
## endif
This example will generate the output string "foobar" if the condition evaluates to TRUE and to "bar" otherwise.

The ##eval Directive
## eval counter text

Enforce a variable substitution on the specified text argument text. If the counter argument counter is 0, then the directive is replaced by the specified text. If the counter argument is positive, the directive is replaced by an equivalent ## eval directive with the counter reduced by one.

The ## eval directive may appear only inside of a macro definition. The directive is processed when the macro is defined, not when the macro is expanded.

Note: The parameter text is an arbitrary sequence of characters, possibly containing semicolons. As a consequence, the ## eval directive can't be combined with a comment.

The ##mute Directive
## mute [comment]

Mark the beginning of a mute block. Within a mute block, all variable and macro definitions take effect, but the preprocessor output is discarded. Mute blocks are useful for include files defining a set of variables and macros.

Mute blocks can be nested. The ## mute directive only takes effect when the output is active.

The ##endmute Directive
## endmute [comment]

Mark the end of a mute block. The ## endmute directive only takes effect when the output is active.

Directive Shorthands

P18 provides a short notation for ## dnl directives and block end directives. Preceeding an arbitrary directive with a plus character (+) is equivalent to preceeding the entire directive with a ## dnl. Consider the following example:
## macro LINK(link, text, frame = "NONE")
<a href="@link()@"
## dnl
## if frame != "NONE"
## dnl
## endif
## endmacro
The macro LINK from this example could be written shorter using a plus-prefix for ## if and ## endif:
## macro LINK(link, text, frame = "NONE")
<a href="@link()@"
##+if frame != "NONE"
## endmacro
(Note that there is no point in preceeding an ## endmacro with a plus character, since the newline preceeding the ## endmacro is ignored anyway.) A single ## dnl may be abbreviated to ## +.

In all block end directives (i.e. ## endif, ## endmacro, ## endmute) the keyword prefix end may be replaced by a forward slash (i.e. ## /if, ## /macro, and ## /mute respectively). When both a plus and a slash prefix are used, the plus prefix has be put first.

Variable/Macro Escapes

Variables an macros are referenced by @-escapes (et character). The syntax is inspired by the way configuration variables are referenced by GNU autoconf generated files. The syntax for a simple variable reference is.

The grammar for variable and macro references in BNF:

  @ Identifier @
  @ SimpleIdentifier ( MacroArgumentList ) @
@ Identifier () @
MacroArgumentList , MacroArgument

An Identifier is a sequences of alphanumeric characters, underscores, and dollar signs. A SimpleIdentifier is a sequences of alphanumeric characters and underscores. Both Identifier and SimpleIdentifier may not start with a digit. A MacroArgument is an arbitrary sequence of characters not containing an unquoted comma or closing parenthesis. Note that for MacroArgument all whitespace is significant, i.e. the escape "@FOO( )@" calls the macro FOO with one argument (which consists of a single space character), while the escape "@FOO()@" calls the macro FOO with no arguments.

The expanded text of a variable reference is sent to the output, while the expanded text of a macro reference is pushed back to the input stream and is parsed again.

The arguments passed to a macro are separated by unquoted commas and terminated by an unquoted closing parenthesis. If an argument contains a comma or a closing parenthesis, the comma or parenthesis has to be quoted by a preceeding backslash character. Before the arguments are assigned to the parameter variables, one level of quoting is removed (i.e. you may use all standard quoting sequences to quote characters).

If the called macro is defined with a variable argument list, then the last parameter receives the unassigned rest of the argument list. Example:
## macro varargs(arg1 = "ARG1", ...arg2 = "ARG2")
arg1="@arg1@"; arg2="@arg2@"
## endmacro
1: @varargs()@
2: @varargs(X)@
3: @varargs(X,)@
4: @varargs(X, Y, Z)@
This example will produce the following output:
1: arg1="ARG1"; arg2="ARG2"
2: arg1="X"; arg2="ARG2"
3: arg1="X"; arg2=""
4: arg1="X"; arg2=" Y, Z"

Note: If a macro is expanded, the beginning of the macro body will be treated as if it was on the beginning of a line. This makes it possible to start a macro with an ## if, for example. On the other hand, care must be taken not to start a macro argument with a double-hash if the macro references its parameters as macros. You can use the quoting directive in such a case. Example:
## macro A(b)
## endmacro
@A(##\\## something)@
@A(## something)@
In this example, the first macro call will produce a line reading ## something (two backslashes are needed, since one level of quoting is removed from the macro arguments). The second macro call will interpret the string ## something as a directive.

Internationalization (I18N) Escapes

An internationalization escape (I18N escape) marks a translatable message. When an I18N escape is processed, P18 will try to translate the message to the specified output language. The translation is performed even if the output is not active, so you will get an error message if the input file contains a message that can't be translated, even if that message does not appear in the output file.

The syntax for I18N escapes is described by the following grammar:

  _( Message )_
_( Message) MessageArgumentList _
_( Message) MessageOptionsExtension _
_( Message) MessageOptionsExtension MessageArgumentList _
  [ MessageOptions ]
LanguageID / MessageType
/ MessageType
  { MessageArgument }
MessageArgumentList {MessageArgument}

A Message is a sequence of characters not containing an unquoted closing parenthesis. A LanguageID is a language identifier, as described in section Language Identifiers. A MessageType is a message type identifier as described in section Message Types. A MessageArgument is a sequence of characters not containing an unquoted closing brace (}) and not starting with an unquoted dollar sign (the dollar sign at the beginning of a message argument is reserved for future extensions).

For the parameter MessageOptions all variable references are resolved, before the parameter is split into LanguageID and MessageType. The Message and all MessageArguments are passed through a normalization step. This normalization step is described in section Message Normalization. A Message may contain a variant specifier, as described in section Message Variants.

What happens when a message is perocessed depends on the operation mode of P18 (see section Invocation):

In the translated message, references to message parameters are resolved by substituting the parameter references with the specified message arguments. A parameter reference is an unquoted dollar sign ($) followed by a sequence digits forming the parameter number. The parameter number may be enclosed in curly braces, which is useful in case the parameter reference is followed by a digit. While resolving the parameter references, one level of quoting is removed from the message text.

## define OPT "en/HTML"
  <title>_(The Book of Yendor)[@OPT@]_
- _(Chapter $1, Section $2)[@OPT@]{<?=$chapter?>
Note that the whitespace preceeding the closing curly braces will not appear in the output file, since it is stripped of the message argument before it is copied to the translated message (see section Message Normalization). In the translated text, then entire <title> element will be on the same line (assuming the translated message does not contain quoted newline characters). Note that ##+ is a short notation for ## dnl.

By default the expanded I18N escape is prepended with a quoting directive and pushed back to the input for reparsing (you can change this by using a preput extension, see below). The quoting directive is prepended to avoid expanded messages starting with a double-hash to be interpreted as a directive.

Internationalization Preput Extensions

I18N escapes are not substituted in lines forming a preprocessor directive. As a consequence, translated message texts can't be used as part of a directive. To overcome this limitation, an I18N escape may be combined with a preput extension. Preput extensions are placed after the initial underscore of an I18N escape, enclosed in double angle brackets << and >>. The text enclosed within these delimiters is prepended to the translated message instead of a quoting directive. You may also specify an empty preput extension, causing the translated message to be interpreted as a directive if it starts with a double-hash.

_<<## define FOO >>(BAR)_
In this example, the translation of BAR is bound to the variable FOO. Note that translated messages are normalized (see Message Normalization), so if the translated message does not contain quoted newline characters, the entire message text is bound to the variable FOO. However, you will run into problems if the translated message starts with a quoting character or contains quoted newline characters.

Previous  Next  .  Contents
About  .  Documentation  .  License   .  Download