Sindbad~EG File Manager

Current Path : /usr/local/share/info/
Upload File :
Current File : /usr/local/share/info/source-highlight.info

This is source-highlight.info, produced by makeinfo version 6.5 from
source-highlight.texi.

This manual is for GNU Source-highlight (version 3.1.9, 2 June 2019),
which given a source file, produces a document with syntax highlighting.

   Copyright (C) 2005-2008 Lorenzo Bettini,
<http://www.lorenzobettini.it>.

     Permission is granted to copy, distribute and/or modify this
     document under the terms of the GNU Free Documentation License,
     Version 1.1 or any later version published by the Free Software
     Foundation; with no Invariant Sections, with no Front-Cover Texts,
     and no Back-Cover Texts.  A copy of the license is included in the
     section entitled "GNU Free Documentation License."
INFO-DIR-SECTION Utilities
START-INFO-DIR-ENTRY
* Source-highlight: (source-highlight).  Highlights contents
END-INFO-DIR-ENTRY


File: source-highlight.info,  Node: Top,  Next: Introduction,  Prev: (dir),  Up: (dir)

GNU Source-highlight
********************

GNU Source-highlight, given a source file, produces a document with
syntax highlighting.

   This is Edition 3.1.9 of the Source-highlight manual.

   This file documents GNU Source-highlight version 3.1.9.

   This manual is for GNU Source-highlight (version 3.1.9, 2 June 2019),
which given a source file, produces a document with syntax highlighting.

   Copyright (C) 2005-2008 Lorenzo Bettini,
<http://www.lorenzobettini.it>.

     Permission is granted to copy, distribute and/or modify this
     document under the terms of the GNU Free Documentation License,
     Version 1.1 or any later version published by the Free Software
     Foundation; with no Invariant Sections, with no Front-Cover Texts,
     and no Back-Cover Texts.  A copy of the license is included in the
     section entitled "GNU Free Documentation License."

* Menu:

* Introduction::                What's it for?
* Installation::                Download and installation
* Copying::                     Licence issues
* Simple Usage::                Very basic usage
* Configuration files::         Files needed for execution
* Invoking source-highlight::   How to run 'source-highlight'.
* Language Definitions::        How to define an input language
* Output Language Definitions::  How to define an output format
* Generating References::       Anchors and cross references
* Examples::                    Some output examples
* Problems::                    Reporting bugs.
* Mailing Lists::
* Concept Index::               Index of concepts.


File: source-highlight.info,  Node: Introduction,  Next: Installation,  Prev: Top,  Up: Top

1 Introduction
**************

GNU Source-highlight, given a source file, produces a document with
syntax highlighting.  The colors and the styles can be specified (bold,
italics, underline) by means of a configuration file, and some other
options can be specified at the command line.

   The program already recognizes many programming languages (e.g., C++,
Java, Perl, etc.)  and file formats (e.g., log files, ChangeLog, etc.),
and some output formats (e.g., HTML, ANSI color escape sequences, LaTeX,
etc.).  Since version 2.0, it allows you to specify your own input
source language via a simple syntax described later in this manual
(*note Language Definitions::).  Since version 2.1, it allows you to
specify your own output format language via a simple syntax described
later in this manual (*note Output Language Definitions::).  Since
version 2.2, it is able to generate cross references (e.g., to variable
names, field names, etc.)  by relying on the program _ctags_,
<http://ctags.sourceforge.net> (*note Generating References::).

   Since version 3.0, GNU Source-highlight also provides a C++ library
(which is used by the main program itself), that can be used by C++
programmers to add highlighting functionalities to their programs.
*note (source-highlight-info)Introduction::.

* Menu:

* Supported languages::
* The program source-highlight-settings::
* Notes on some languages::
* Using source-highlight as a simple formatter::
* Related Software and Links::


File: source-highlight.info,  Node: Supported languages,  Next: The program source-highlight-settings,  Prev: Introduction,  Up: Introduction

1.1 Supported languages
=======================

The complete list of languages (indeed, file extensions) natively
supported by this version of Source-highlight (3.1.9), as reported by
'--lang-list', is the following:

     C = cpp.lang
     F77 = fortran.lang
     F90 = fortran.lang
     H = cpp.lang
     ac = m4.lang
     ada = ada.lang
     adb = ada.lang
     am = makefile.lang
     applescript = applescript.lang
     asm = asm.lang
     autoconf = m4.lang
     awk = awk.lang
     bash = sh.lang
     bat = bat.lang
     batch = bat.lang
     bib = bib.lang
     bison = bison.lang
     c = c.lang
     caml = caml.lang
     cbl = cobol.lang
     cc = cpp.lang
     changelog = changelog.lang
     clipper = clipper.lang
     cls = latex.lang
     cobol = cobol.lang
     coffee = coffeescript.lang
     coffeescript = coffeescript.lang
     conf = conf.lang
     cpp = cpp.lang
     cs = csharp.lang
     csh = sh.lang
     csharp = csharp.lang
     css = css.lang
     ctp = php.lang
     cxx = cpp.lang
     d = d.lang
     desktop = desktop.lang
     diff = diff.lang
     dmd = d.lang
     docbook = xml.lang
     dtx = latex.lang
     el = lisp.lang
     eps = postscript.lang
     erl = erlang.lang
     erlang = erlang.lang
     errors = errors.lang
     f = fortran.lang
     f77 = fortran.lang
     f90 = fortran.lang
     feature = feature.lang
     fixed-fortran = fixed-fortran.lang
     flex = flex.lang
     fortran = fortran.lang
     free-fortran = fortran.lang
     glsl = glsl.lang
     go = go.lang
     groovy = groovy.lang
     h = cpp.lang
     haskell = haskell.lang
     haxe = haxe.lang
     hh = cpp.lang
     hpp = cpp.lang
     hs = haskell.lang
     htm = html.lang
     html = html.lang
     hx = haxe.lang
     hxx = cpp.lang
     in = makefile.lang
     ini = desktop.lang
     ipxe = ipxe.lang
     islisp = islisp.lang
     java = java.lang
     javalog = javalog.lang
     javascript = javascript.lang
     js = javascript.lang
     json = json.lang
     kcfg = xml.lang
     kdevelop = xml.lang
     kidl = xml.lang
     ksh = sh.lang
     l = flex.lang
     lang = langdef.lang
     langdef = langdef.lang
     latex = latex.lang
     ldap = ldap.lang
     ldif = ldap.lang
     lex = flex.lang
     lgt = logtalk.lang
     lhs = haskell_literate.lang
     lilypond = lilypond.lang
     lisp = lisp.lang
     ll = flex.lang
     log = log.lang
     logtalk = logtalk.lang
     lsm = lsm.lang
     lua = lua.lang
     ly = lilypond.lang
     m4 = m4.lang
     makefile = makefile.lang
     manifest = manifest.lang
     mf = manifest.lang
     ml = caml.lang
     mli = caml.lang
     moc = cpp.lang
     opa = opa.lang
     outlang = outlang.lang
     oz = oz.lang
     pas = pascal.lang
     pascal = pascal.lang
     patch = diff.lang
     pc = pc.lang
     perl = perl.lang
     php = php.lang
     php3 = php.lang
     php4 = php.lang
     php5 = php.lang
     pkgconfig = pc.lang
     pl = prolog.lang
     pm = perl.lang
     po = po.lang
     postscript = postscript.lang
     pot = po.lang
     prg = clipper.lang
     prolog = prolog.lang
     properties = properties.lang
     proto = proto.lang
     protobuf = proto.lang
     ps = postscript.lang
     py = python.lang
     python = python.lang
     r = r.lang
     rb = ruby.lang
     rc = xml.lang
     rs = rust.lang
     ruby = ruby.lang
     s = s.lang
     scala = scala.lang
     scheme = scheme.lang
     scm = scheme.lang
     scpt = applescript.lang
     sh = sh.lang
     shell = sh.lang
     sig = sml.lang
     sl = slang.lang
     slang = slang.lang
     slsh = slang.lang
     sml = sml.lang
     spec = spec.lang
     sql = sql.lang
     sty = latex.lang
     style = style.lang
     syslog = log.lang
     tcl = tcl.lang
     tcsh = sh.lang
     tex = latex.lang
     texi = texinfo.lang
     texinfo = texinfo.lang
     tk = tcl.lang
     tml = tml.lang
     txt = nohilite.lang
     ui = xml.lang
     upc = upc.lang
     vala = vala.lang
     vbs = vbscript.lang
     vbscript = vbscript.lang
     vim = vim.lang
     xhtml = xml.lang
     xml = xml.lang
     xorg = xorg.lang
     y = bison.lang
     yacc = bison.lang
     yy = bison.lang
     zsh = zsh.lang

   The complete list of output formats natively supported by this
version of Source-highlight (3.1.9), as reported by '--outlang-list', is
the following:

     docbook = docbook.outlang
     esc = esc.outlang
     esc256 = esc256.outlang
     groff_man = groff_man.outlang
     groff_mm = groff_mm.outlang
     groff_mm_color = groff_mm_color.outlang
     html = html.outlang
     html-css = htmlcss.outlang
     html5 = html5.outlang
     htmltable = htmltable.outlang
     javadoc = javadoc.outlang
     latex = latex.outlang
     latexcolor = latexcolor.outlang
     mediawiki = mediawiki.outlang
     odf = odf.outlang
     sexp = sexp.outlang
     texinfo = texinfo.outlang
     xhtml = xhtml.outlang
     xhtml-css = xhtmlcss.outlang
     xhtmltable = xhtmltable.outlang

The meaning of the suffix '-css' is explained in *note Output Language
map::(1).

   Please, keep in mind, that I haven't tested personally all these
language definitions: I actually checked that the definition files are
syntactically correct (with the command line option '--check-lang' and
'--check-outlang', *note Invoking source-highlight::), but I'm not sure
their definition actually respects that language syntax (e.g., I've put
up together some language definitions by searching for information in
the Internet, but I've never programmed in that language).  So, if you
find that a language definition is not precise, please let me know.
Moreover, if you have a program example in a language that's not
included in the 'tests' directory, please send it to me so that I can
include it in the test suite.

   ---------- Footnotes ----------

   (1) Up to version 2.9, there were also the suffixes '-doc' and
'-css-doc', but this mechanism was quite confusing and complex;
hopefully, this new one should be better.


File: source-highlight.info,  Node: The program source-highlight-settings,  Next: Notes on some languages,  Prev: Supported languages,  Up: Introduction

1.2 The program 'source-highlight-settings'
===========================================

Since version 3.0, GNU Source-highlight includes also the program
'source-highlight-settings', which can be used to check whether
source-highlight will be able find its language definition files, and
other configuration files, and in case, to store the correct settings in
a configuration file, in the user home directory.

   In particular, the stored configuration file will be called
'source-highlight.conf' and stored in '$HOME/.source-highlight/'.

   For the moment, this file only stores the default value for the
'--data-dir' option.

   The user can always override the contents of this configuration file,
and the default hardcoded value, by using the environment variable
'SOURCE_HIGHLIGHT_DATADIR'.


File: source-highlight.info,  Node: Notes on some languages,  Next: Using source-highlight as a simple formatter,  Prev: The program source-highlight-settings,  Up: Introduction

1.3 Notes on some languages
===========================

In this section I'd like to go into details on the highlighting of some
specific programming languages.  These notes might be useful when the
highlighted language has some "dialects" that might require some further
specification at the command line (e.g., to select a specific dialect).

* Menu:

* Fortran::
* Perl::


File: source-highlight.info,  Node: Fortran,  Next: Perl,  Prev: Notes on some languages,  Up: Notes on some languages

1.3.1 Fortran
-------------

As Toby White explained to me, Fortran comes into different "flavors": a
fixed-format, where some characters have a different semantics depending
on their column position in the source file, and a free-format where
this is not true.  For instance, in the former, '*' and 'c' start a
command line, but only if they are specified in the first column (while
this is not true in the free-format).

   By default, the free-format is assumed for Fortran files; if you want
to use the fixed-format, you need to specify 'fortran-fixed' at the
'--src-lang' command line option.


File: source-highlight.info,  Node: Perl,  Prev: Fortran,  Up: Notes on some languages

1.3.2 Perl
----------

Perl syntax forms, especially its regular expression specifications, are
quite a nightmare ;-) I tried to specify as much as possible in the
'perl.lang' but some particular regular expressions might not be
highlighted correctly.  Actually, I never programmed in Perl, so, if you
see that some parts of your Perl programs are not highlighted correctly,
please do not hesitate to contact me, so that I can improve Perl
highlighting.

   Moreover, although the standard extension for Perl files is '.pl',
since the Prolog language definition was implemented in source-highlight
before Perl, this extension is assigned, by default, to Prolog files.
However, you can use '--infer-lang' command line option, so that
source-highlight can try to detect the language by inspecting the first
lines of the input file (*note How the input language is discovered::);
you can also use '--src-lang=perl' command line specification to
explicitly require Perl highlighting.


File: source-highlight.info,  Node: Using source-highlight as a simple formatter,  Next: Related Software and Links,  Prev: Notes on some languages,  Up: Introduction

1.4 Using source-highlight as a simple formatter
================================================

You can also use source-highlight as a simple formatter of input file,
i.e., without performing any highlighting(1).

   You can achieve this by using, as the language definition file for
input sources the file 'nohilite.lang', using the command line option
'--lang-def' (*note Invoking source-highlight::).  Since that language
definition is empty, no highlighting will be performed; however,
source-highlight will transform the input file in the output format.
Note, in the input language associations in *note Supported languages::,
that 'nohilite.lang' is also associated to txt files.

   This, for instance, makes source-highlight useful in cases you want
to transform a text file into HTML or LaTeX.  During the output, in
fact, source-highlight will correctly generate characters that have a
specific meanings in the output format.

   For instance, in this Texinfo manual, if I want to insert a @ or a {
I have to "escape" them to make them appear literally since they have a
special meaning in Texinfo.  The same holds, e.g., for '<', '>' or '&'
in HTML. If you use source-highlight, it will take care of this,
automatically for you.

   This is the Texinfo source of the above sentence:

     For instance, in this Texinfo manual,
     if I want to insert a @@ or a @{
     I have to ``escape'' them to make them appear literally
     since they have a special meaning in Texinfo.
     The same holds, e.g.,
     for @code{<}, @code{>} or @code{&} in HTML.
     If you use source-highlight,
     it will take care of this, automatically for you.

This was processed by source-highlight as a simple text file, without no
highlighting; however since it was formatted in Texinfo, all the
necessary escaping was automatically performed.  This way, it is very
easy to insert, in the same document, a code, and its result (as in this
example).

   This is actually the formatting performed by source-highlight; except
for the comment, this is basically what you should have written yourself
to do all the escaping stuff manually:

     @c Generator: GNU source-highlight, by Lorenzo Bettini, http://www.gnu.org/software/src-highlite
     @example
     For instance, in this Texinfo manual,
     if I want to insert a @@@@ or a @@@{
     I have to ``escape'' them to make them appear literally
     since they have a special meaning in Texinfo.
     The same holds, e.g.,
     for @@code@{<@}, @@code@{>@} or @@code@{&@} in HTML.
     If you use source-highlight,
     it will take care of this, automatically for you.
     @end example

   In case source-highlight does not handle a specific input language,
you can still use the option '--failsafe' (*note Invoking
source-highlight::) and also in that case no highlighting will be
performed, but source-highlight will transform the input file in the
output format.

   Note, however, that if the input language cannot be established, the
'default.lang' will be used: an empty language definition file which you
might want to customize.

   ---------- Footnotes ----------

   (1) Although this might have been achieved with previous version, it
is an official supported feature since version 2.5.


File: source-highlight.info,  Node: Related Software and Links,  Prev: Using source-highlight as a simple formatter,  Up: Introduction

1.5 Related Software and Links
==============================

Here we list some software related to source-highlight in the sense that
it uses it as a backend (i.e., provides an interface to
source-highlight) or it uses some of its features (e.g., definition
files):

   * Source-highlight-qt is a library for performing syntax highlighting
     in Qt documents by relying on GNU Source-Highlight library.  This
     library provides an implementation of the qt abstract class
     QSyntaxHighlighter class, and it deals both with Qt3 and Qt4.

     <http://srchiliteqt.sourceforge.net>.

   * QSource-Highlight is a Qt4 front-end for GNU Source-Highlight (it
     relies on the library Source-Highlight-Qt).  You can highlight your
     code on the fly, and have the highlighted output in all the formats
     supported by source-highlight (e.g., HTML, LaTeX, Texinfo, etc.).
     You can then copy the formatted output and paste it (e.g., in your
     blog), or save it to a file.  A preview of the highlighted output
     is available for some output formats (e.g., HTML, XHTML, etc.).

     <http://qsrchilite.sourceforge.net>.

   * SourceHighlightIDE is a small IDE (based on Qt4 and
     Source-highlight-qt) I wrote for developing and debugging new
     language definitions for source-highlight:

     <http://srchighliteide.sourceforge.net>.

   * Martin Gebert implemented a KDE interface to source-highlight
     programs (and he did a wonderful job!), and it is called
     _Ksrc2highlight_; if you want to test it:

     <http://www.mgebert.de/Ksrc2highlight>.

   * There's also a Java version of java2html, you can find it at

     <http://www.generationjava.com/projects/Java2Html.shtml>.

   * This web site provides a web interface to source-highlight so that
     you can highlight your code on-line:

     <http://www.alaide.com/outils_colorsyntaxe.php>

   * SHJS is a JavaScript program that highlights source code passages
     in HTML documents.  Documents using SHJS are highlighted on the
     client side by the web browser.  SHJS uses language definitions
     from Source-highlight.

     <http://shjs.sourceforge.net>

   * Code2blog is a pyGTK front-end to source-highlight for easy
     conversion from source code to HTML.

     <http://code.google.com/p/code2blog>

   * Andy Buckley wrote a wrapper around source-highlight, which can be
     used as an Apache filter to highlight source code in Web pages on
     the fly.

     <http://www.insectnation.org/projects/filter-src-highlight>

   * Roger Nilsson wrote a frontend for source-highlight that is used in
     a popular webdesign app for OSX called RapidWeaver.  The frontend
     is called High-Light and allows users to easily add syntax-colored
     code inside RapidWeaver.

     <http://nilrogsplace.se/webdesign/rapidweaver/plugins/high-light/index_en.html>

   * Mauricio Zepeda published in his blog an article with a script to
     automatically highlight a file and show it in Firefox:

     <http://chillorb.com/?p=122>

   * Jason Blevins made a plugin for Ikiwiki that enables syntax
     highlighting of source code fragments and whole files via
     source-highlight.

     <http://jblevins.org/projects/ikiwiki/code>

   * Pascal Bleser created a PHP extension that uses the GNU
     source-highlight library directly from PHP, instead of relying on
     spawning a process or using the source-highlight CGI.

     <http://code.google.com/p/php-source-highlight/>

   * Roberto Alsina made a partial python binding using SIP so that you
     can use Source-Highlight-Qt in PyQt programs.

     <http://marave.googlecode.com/svn/trunk/marave/highlight/>

   * A perl binding for source-highlight is available at CPAN:

     <http://search.cpan.org/perldoc?Syntax::SourceHighlight>

   * Danijel Tasov wrote a pastebin service based on perl
     source-highlight binding:

     <http://pb.rbfh.de>


File: source-highlight.info,  Node: Installation,  Next: Copying,  Prev: Introduction,  Up: Top

2 Installation
**************

See the file 'INSTALL' for detailed building and installation
instructions; anyway if you're used to compiling Linux software that
comes with sources you may simply follow the usual procedure, i.e.,
untar the file you downloaded in a directory and then:

     cd <source code main directory>
     ./configure
     make
     make install

   We strongly suggest to use shadow builds, thus, create a build
directory, say 'build' and run configuration and make in that directory:

     cd <source code main directory>
     mkdir build
     cd build
     ../configure
     make
     make install

   However, before you do this, please check that you have everything
that is needed to build source-highlight, *note What you need to build
source-highlight::.

   Note: unless you specify a different install directory by '--prefix'
option of configure (e.g.  './configure --prefix=<your home>'), you must
be root to run 'make install'.

   You may want to run './configure --help' to see all the possible
options that can be passed to the configuration script.

   Files will be installed in the following directories:

'Executables'
     'prefix/bin'
'docs and output examples'
     'prefix/share/doc/source-highlight'
'library examples'
     'prefix/share/doc/source-highlight/examples'
'library API documentation'
     'prefix/share/doc/source-highlight/api'
'conf files'
     'prefix/share/source-highlight'

   Default value for prefix is '/usr/local' but you may change it with
'--prefix' option to configure.  For further 'configure' options, you
can run 'configure --help'.

   Tiziano Muller wrote a bash completion configuration file for
source-highlight; this will be installed by default in the directory
'sysconfdir/bash_completion.d', where 'sysconfdir' defaults to
'prefix/etc'; however, typically, the directory where the bash
completion script searches for configuration file is
'/etc/bash_completion.d'.  Thus, we suggest you explicitly specify this
directory with the configuration script command line option
'--with-bash-completion'.

   If you want to build and install the API documentation of
Source-highlight library, you need to run 'configure' with the option
'--with-doxygen', but you need the program _Doxygen_,
<http://www.doxygen.org>, to build the documentation.  The documentation
will be installed in the following directory:

'Library API documentation'
     'prefix/share/doc/source-highlight/api'

   NOTE: Originally, instead of Source-highlight, there were two
separate programs, namely _GNU java2html_ and _GNU cpp2html_.  There are
two shell scripts with the same name that will be installed together
with Source-highlight in order to facilitate the migration (however
their use is not advised and it is deprecated).

* Menu:

* Building with qmake::
* Download::
* Anonymous Git Checkout::
* What you need to build source-highlight::
* Tips on installing Boost Regex library::
* Patching from a previous version::
* Using source-highlight with less::
* Using source-highlight as a CGI::
* Building .rpm::


File: source-highlight.info,  Node: Building with qmake,  Next: Download,  Prev: Installation,  Up: Installation

2.1 Building with qmake
=======================

Since version 3.1.2, Source-highlight can be built also using 'qmake',
the build tool from Qt libraries (<http://qt.nokia.com>).  This was made
available to build Source-highlight on Windows based systems without
using a Unix shell, and in particular to build Source-highlight with
Microsoft MSVC compiler.  You should use this method only if you don't
have a Unix shell or if you really need to use the MSVC compiler (e.g.,
if you want to build Source-highlight library to be used in MSVC based
programs).  You still need the boost regex library, and if you use MSVC,
you can find installation packages for this library at
<http://www.boostpro.com>.

   This build mechanism is still experimental, and, when using MSVC,
only a static version of Source-highlight library can be built (not a
.dll).  You can also use this method if you have the MinGW compiler,
<http://www.mingw.org>, (e.g., the one that comes with Qt Windows
distribution) and you don't have Msys
(<http://www.mingw.org/wiki/MSYS>).  Otherwise, you should still use the
'configure' based mechanims.

   Using 'qmake', only a few options can be specified during the
building (besides the ones you usually use with qmake), and these
options can be specified only using environment variables:

'BOOST_REGEX'
     By default, 'boost_regex' will be used to link the boost library
     (i.e., '-lboost_regex'); if your boost regex library has a
     different name you must specify this name using this environment
     variable; e.g., if the library file is called
     'libboost_regex-mt.lib' or 'boost_regex-mt.dll' you must set this
     variable to 'boost_regex-mt'.
'INCPATH'
     Specify the path of the boost header files.
'LIBS'
     Specify the path of the boost lib files.

   Please, take into consideration that specifying the boost library
include and library paths is completely up to you, using 'INCPATH' and
'LIBS', if they're not in the system path directories.

   Also remember to always use the option '-recursive' when running
qmake.

   If you then want to run 'make install', you can use the variable
'INSTALL_ROOT' to prefix the installation path, which, otherwise, is the
root directory.


File: source-highlight.info,  Node: Download,  Next: Anonymous Git Checkout,  Prev: Building with qmake,  Up: Installation

2.2 Download
============

You can download it from GNU's ftp site:
<ftp://ftp.gnu.org/gnu/src-highlite> or from one of its mirrors (see
<http://www.gnu.org/prep/ftp.html>).

   I do not distribute Windows binaries anymore; since, they can be
built by using Cygnus C/C++ compiler, available at
<http://www.cygwin.com>.  However, if you don't feel like downloading
such compiler or you experience problems with the Boost Regex library
(see also *note Tips on installing Boost Regex library::; please also
keep in mind that if you don't have these libraries installed, and your
C/C++ compiler distribution does not provide a prebuilt package, it
might take some time, even hours, to build the Boost libraries from
sources), you can request such binaries directly to me, by e-mail (find
my e-mail at my home page) and I'll be happy to send them to you.  An
MS-Windows port of Source-highlight is available from
<http://gnuwin32.sourceforge.net>; however, I don't maintain those
binaries personally, and they might be out of date.

   Archives are digitally signed by me (Lorenzo Bettini) with GNU gpg
(<http://www.gnupg.org>).  My GPG public key can be found at my home
page (<http://www.lorenzobettini.it>).

   You can also get the patches, if they are available for a particular
release (see below for patching from a previous version).


File: source-highlight.info,  Node: Anonymous Git Checkout,  Next: What you need to build source-highlight,  Prev: Download,  Up: Installation

2.3 Anonymous Git Checkout
==========================

This project's git repository can be checked out through the following
clone instruction(1):

     git clone git://git.savannah.gnu.org/src-highlite.git

   Further instructions can be found at the address:

   <http://savannah.gnu.org/projects/src-highlite>.

   And the git repository can also browsed on-line at

   <http://git.savannah.gnu.org/cgit/src-highlite.git>.

   Please note that this way you will get the latest development sources
of Source-highlight, which may also be unstable.  This solution is the
best if you intend to correct/extend this program: you should send me
patches against the latest git repository sources.

   If, on the contrary, you want to get the sources of a given release,
through git, say, e.g., version X.Y.Z, you must specify the tag
'rel_X_Y_Z'.

   When you compile the sources that you get from the git repository,
before running the 'configure' and 'make' commands, for the first time,
you must run the command:

     autoreconf -i

This will run the autotools commands in the correct order, and also copy
possibly missing files.  You should have installed recent versions of
'automake', 'autoconf' and 'libtool' in order for this to succeed.

   We strongly suggest to use shadow builds, thus, create a build
directory, say 'build' and run configuration and make in that directory:

     cd <source code main directory>
     mkdir build
     cd build
     ../configure
     make
     make install

   To summarize, the steps to get the sources from git and make the
first build are:

     git clone git://git.savannah.gnu.org/src-highlite.git
     cd src-highlite
     autoreconf -i
     mkdir build
     cd build
     ../configure
     make

   ---------- Footnotes ----------

   (1) Since version 3.1.2 of Source-highlight the CVS repository was
dismissed in favor of Git (<http://git-scm.com/>).


File: source-highlight.info,  Node: What you need to build source-highlight,  Next: Tips on installing Boost Regex library,  Prev: Anonymous Git Checkout,  Up: Installation

2.4 What you need to build source-highlight
===========================================

Since version 2.0 Source-highlight relies on regular expressions as
provided by boost (<http://www.boost.org>), so you need to install at
least the regex library from boost.

   Most GNU/Linux distributions provide this library already in a
compiled form.  If you use your distribution packages, please be sure to
install also the development package of the boost libraries.

   If you experience problems in installing Boost Regex library, or in
compiling source-highlight because of this library, please take a look
at *note Tips on installing Boost Regex library::.

   If you want to use a specific version of the Boost regex library
(because you have many versions of it), you can use the configure option
'--with-boost-regex' to specify a particular suffix.  For instance,

     ./configure --with-boost-regex=boost_regex-gcc-1_31

   Source-highlight has been developed under GNU/Linux, using gcc (C++),
and bison (yacc) and flex (lex), and ported under Win32 with Cygwin
C/C++compiler, available at <http://www.cygwin.com>.

   I use the excellent GNU Autoconf(1), GNU Automake(2) and GNU
Libtool(3).  Since version 2.6 I also started to use Gnulib - The GNU
Portability Library(4), "a central location for common GNU code,
intended to be shared among GNU packages" (for instance, I rely on
Gnulib for checking for the presence and correctness of 'getopt_long'
function).

   Finally I used _GNU gengetopt_
(<http://www.gnu.org/software/gengetopt>), for command line parsing.

   I started to use also _doublecpp_
(<http://doublecpp.sourceforge.net>) that permits achieving dynamic
overloading.

   Actually, apart from the boost regex library, you don't need the
other tools above to build source-highlight (indeed I provide the output
sources generated by the above mentioned tools), unless you want to
develop source-highlight.

   However, if you obtained sources through Git, you need some other
tools, see *note Anonymous Git Checkout::.

   ---------- Footnotes ----------

   (1) <http://www.gnu.org/software/autoconf>

   (2) <http://www.gnu.org/software/automake>

   (3) <http://www.gnu.org/software/libtool>

   (4) <http://www.gnu.org/software/gnulib>


File: source-highlight.info,  Node: Tips on installing Boost Regex library,  Next: Patching from a previous version,  Prev: What you need to build source-highlight,  Up: Installation

2.5 Tips on installing Boost Regex library
==========================================

If you experience no problem in compiling source-highlight, you can
happily skip this section(1) :-)

   I created this section because many users reported some problems
after installing Boost Regex library from sources; other users had
problems in compiling source-highlight even if this library was already
correctly installed (especially windows users, using cygwin).  I hope
this section sheds some light in installing/using the Boost Regex
library.  Please, note that this section does not explain how to compile
the Boost libraries (the documentation you'll find on
<http://www.boost.org> is well done); it explains how to tweak things if
you have problems in compiling source-highlight even after a successful
installation of Boost libraries.

   First of all, if your distribution provides packages for the Boost
regex library, please be sure to install also the development package of
the boost libraries, i.e., those providing also the header files needed
to compile a program using these libraries.  For instance, on my Debian
system I had to install the package 'libboost-regex-dev', besides the
package 'libboost-regex'.

   If your distribution does not provide these packages then you have to
download the sources of Boost libraries from <http://www.boost.org> and
follow the instructions for compilation and installation.  However, I
suggest you specify '/usr' as prefix for installation, instead of
relying on the default prefix '/usr/local' (unless '/usr/local/include'
is already in the inclusion path of your C++ compiler), since this will
make things easier when compiling source-highlight.  I suggest this,
since '/usr/include' is usually the place where C++ searches for header
files during compilation.

   If you successfully compiled and installed the Boost Regex library,
or you installed the package from your distribution, but you STILL
experience problems in compiling source-highlight, then you simply have
to adjust some things as described in the following.

   If the './configure' command of source-highlight reports this error:

     ERROR! Boost::regex library not installed.

then, the compiler cannot find the header files for this library.  In
this case, check that the directory '/usr/include/boost' actually
exists; if it does not, then probably you'll find a similar directory,
e.g., '/usr/include/boost-1_33/boost', depending on the version of the
library you have installed.  Then, all you have to do is to create a
symbolic link as follows:

     ln -s /usr/include/boost-1_33/boost /usr/include/boost

Alternatively, you might run source-highlight's configure as follows:

     ./configure CXXFLAGS=-I/usr/include/boost-1_33/

   If you install (or build) the Boost Regex library in a non standard
path, e.g., somewhere in your home directory, say
'/home/myhome/boost-1_33', you'll have to update the 'CXXFLAGS' variable
accordingly on the 'configure' command line; in this particular case,
you might also have to specify the path of actual library files
('CXXFLAGS' will only specify the path of header files).  In particular,
you'll have to know where the lib files are within the boost
installation (or build directory); for instance, if they are in
'/home/myhome/boost-1_33/stage/lib', while the header files (i.e., the
'boost' header files directory) are in '/home/myhome/boost-1_33', the
complete 'configure' command should be

     ./configure CXXFLAGS=-I/home/myhome/boost-1_33 \
                 LDFLAGS=-L/home/myhome/boost-1_33/stage/lib

   If then './configure' command of source-highlight reports this other
error:

     ERROR! Boost::regex library is installed, but you
     must specify the suffix with --with-boost-regex at configure
     for instance, --with-boost-regex=boost_regex-gcc-1_31

then, there's still another thing to fix: you must find out the exact
names of the files of your installed Boost Regex libraries; you can do
this by using the command:

     $ ls -l /usr/lib/libboost_regex*

that, for instance, on one of my cygwin installation reports:

     -rwxr-x---+ Nov  9 23:29 /usr/lib/libboost_regex-gcc-mt-s-1_33.a
     -rwxr-x---+ Nov 22 09:22 /usr/lib/libboost_regex-gcc-mt-s.a
     -rwxr-x---+ Nov  9 23:29 /usr/lib/libboost_regex-gcc-mt-s-1_33.so
     -rwxr-x---+ Nov 22 09:22 /usr/lib/libboost_regex-gcc-mt-s.so

Now, you have all the information to correctly run the
source-highlight's configure command:

     ./configure --with-boost-regex=boost_regex-gcc-mt-s-1_33

or, if you solved the first problem in the second way(2),

     ./configure CXXFLAGS=-I/usr/include/boost-1_33/ \
                 --with-boost-regex=boost_regex-gcc-mt-s-1_33

   Of course, you have to modify this command according to the names of
your Boost Regex library installed files.

   These instructions managed to let many users, who were experiencing
problems, to compile source-highlight If you still have problems, please
send me an e-mail.

   ---------- Footnotes ----------

   (1) Since version 2.11, the 'configure' script should be able to
correctly find the boost regex library if it is in the compiler default
path.

   (2) Command lines that are too long are split into multiple indented
lines separated by a '\'.  Of course these commands are to be given in
one line only, anyway.


File: source-highlight.info,  Node: Patching from a previous version,  Next: Using source-highlight with less,  Prev: Tips on installing Boost Regex library,  Up: Installation

2.6 Patching from a previous version
====================================

If you downloaded a patch, say 'source-highlight-1.3-1.3.1-patch.gz'
(i.e., the patch to go from version 1.3 to version 1.3.1), cd to the
directory with sources from the previous version (source-highlight-1.3)
and type:

     gunzip -cd ../source-highlight-1.3-1.3.1.patch.gz | patch -p1

   and restart the compilation process (if you had already run configure
a simple make should do).


File: source-highlight.info,  Node: Using source-highlight with less,  Next: Using source-highlight as a CGI,  Prev: Patching from a previous version,  Up: Installation

2.7 Using source-highlight with less
====================================

This was suggested by Konstantine Serebriany.  The script
'src-hilite-lesspipe.sh' will be installed together with
source-highlight.  You can use the following environment variables:

     export LESSOPEN="| /path/to/src-hilite-lesspipe.sh %s"
     export LESS=' -R '

   This way, when you use less to browse a file, if it is a source file
handled by source-highlight, it will be automatically highlighted.

   Xavier-Emmanuel Vincent recently provided an alternative version of
ANSI color scheme, 'esc256.style': some terminals can handle 256 colors.
Xavier also provided a script which checks how many colors your terminal
can handle, and in case, uses the 256 variant.  The script is called
'source-highlight-esc.sh' and it will be installed together with the
other binaries.


File: source-highlight.info,  Node: Using source-highlight as a CGI,  Next: Building .rpm,  Prev: Using source-highlight with less,  Up: Installation

2.8 Using source-highlight as a CGI
===================================

CGI support was enabled thanks to Robert Wetzel; I haven't tested it
personally.  If you want to use source-highlight as a CGI program, you
have to use the executable source-highlight-cgi.  You can build such
executable by issuing

     make source-highlight-cgi

in the 'src' directory.


File: source-highlight.info,  Node: Building .rpm,  Prev: Using source-highlight as a CGI,  Up: Installation

2.9 Building .rpm
=================

Christian W. Zuckschwerdt added support for building an .rpm and an
.rpm.src.  You can issue the following command

     rpmbuild -tb source-highlight-3.1.9.tar.gz

   for building an .rpm with binaries and

     rpmbuild -ts source-highlight-3.1.9.tar.gz

   for building an .rpm.src with sources.


File: source-highlight.info,  Node: Copying,  Next: Simple Usage,  Prev: Installation,  Up: Top

3 Copying Conditions
********************

GNU Source-highlight is free software; you are free to use, share and
modify it under the terms of the GNU General Public License that
accompanies this software (see 'COPYING').

   GNU 'source-highlight' was written and maintained by Lorenzo Bettini
<http://www.lorenzobettini.it>.


File: source-highlight.info,  Node: Simple Usage,  Next: Configuration files,  Prev: Copying,  Up: Top

4 Simple Usage
**************

Here are some realistic examples of running 'source-highlight'(1).

   Source-highlight only does a lexical analysis of the source code, so
the program source is assumed to be correct!

   Here's how to run source-highlight (for this example we will use
C/C++ input files, but this is valid also for other source-highlight
input languages):

     source-highlight --src-lang cpp --out-format html \
         --input <C++ FILE> \
         --output <HTML FILE> \
         --style-file <STYLE FILE> \
         OPTIONS

   For input files, apart from the '-i (--input)' option and the
standard input redirection, you can simply specify some files at the
command line and also use regular expressions (for instance '*.java').
In this case the name for the output files will be formed using the name
of the source file with a .<ext> appended, where <ext> is the extension
chosen according to the output format specified (in this example it
would be .html).  The style file (*note Output format style::) contains
information on how to format specific language parts (e.g., keywords in
blue and boldface, etc.).

   IMPORTANT: you must choose one of the above two invocation modes:
either you use '-i (--input)', '-o (--output)' (possibly replacing them
with standard input/output redirection), or you specify one or many
files without '-i (--input)'; if you try to mix them you'll get an
error:

     source-highlight -o main.html main.cpp
     Please, use one of the two syntaxes for invocation:
     source-highlight [OPTIONS]... -i input_file -o output_file
     source-highlight [OPTIONS]... [FILES]...

   If 'STDOUT' string is passed as '-o (--output)' option, then the
output is forced to the standard output anyway.

   If '-s (--src-lang)' is not specified, the source language is
inferred by the extension of the input file or from the file name itself
(possibly using also lower case versions); this, of course, does not
work with standard input redirection.  For further details, see *note
How the input language is discovered::.

   If '-f (--out-format)' is not specified, the output will be produced
in HTML.

   If '--style-file' is not specified, the 'default.style', which is
included in the distribution, will be used (see *note Output format
style:: for further information).

* Menu:

* HTML and XHTML output::
* LaTeX output::
* Texinfo output::
* DocBook output::
* ANSI color escape sequences::
* Odf output::
* Groff output::

   ---------- Footnotes ----------

   (1) Command lines that are too long are split into multiple indented
lines separated by a '\'.  Of course these commands are to be given in
one line only, anyway.


File: source-highlight.info,  Node: HTML and XHTML output,  Next: LaTeX output,  Prev: Simple Usage,  Up: Simple Usage

4.1 HTML and XHTML output
=========================

The default output format for HTML and XHTML uses fixed width fonts by
inserting all the formatted output between '<tt>' and '</tt>'.  Thus,
for instance, specification for fixed width and not fixed width (see
*note Output format style::) will have no effect: every character will
have fixed width.  If you don't like this default behavior and would
like to have not fixed fonts by default (as it happens, e.g., with LaTeX
output) you can use the file 'html_notfixed.outlang' with the command
line argument '--outlang-def'.  For XHTML output, the corresponding file
is 'xhtml_notfixed.outlang'

   Furthermore, the file 'htmltable.outlang' can be used to generate
HTML output enclosed in an HTML table (which will use also a background
color if specified in the style file).  The file 'xhtmltable.outlang'
does the same but for XHTML output.


File: source-highlight.info,  Node: LaTeX output,  Next: Texinfo output,  Prev: HTML and XHTML output,  Up: Simple Usage

4.2 LaTeX output
================

When using LaTeX output format you can choose between monochromatic
output (by using '-f latex') or colored output (by using '-f
latexcolor').  When using colored output, you need the 'color' package
(again this should be present in your system).  Of course, you are free
to define your own LaTeX output format, see *note Output Language
Definitions::.


File: source-highlight.info,  Node: Texinfo output,  Next: DocBook output,  Prev: LaTeX output,  Up: Simple Usage

4.3 Texinfo output
==================

When using the Texinfo output format, you may want to use a dedicated
style file, 'texinfo.style', which comes with the source-highlight
distribution, with the option '--style-file'.  For instance, the example
in *note Examples:: is formatted with this style file.


File: source-highlight.info,  Node: DocBook output,  Next: ANSI color escape sequences,  Prev: Texinfo output,  Up: Simple Usage

4.4 DocBook output
==================

DocBook output is generated using the '<programlisting>' tag.  If the
'--doc' command line option is given, an '<article>' document is
generated.


File: source-highlight.info,  Node: ANSI color escape sequences,  Next: Odf output,  Prev: DocBook output,  Up: Simple Usage

4.5 ANSI color escape sequences
===============================

If you're using this output format, for instance together with 'less'
(see *note Using source-highlight with less::), you may want to use the
'esc.style' (or 'esc256.style' if your terminal can handle 256 colors),
which comes with the source-highlight distribution, with the option
'--style-file'.  This should result in a more pleasant coloring output.


File: source-highlight.info,  Node: Odf output,  Next: Groff output,  Prev: ANSI color escape sequences,  Up: Simple Usage

4.6 Odf output
==============

The ODF language output for GNU source-highlight enables the user to
generate color-highlighted ODF output of source code files.  Or to
generate ODF color-highlighted snippets to be used by ODF back-ends
(like asciidoc-odf).  We create an '.fodt' file, which is a Text
document that newer versions of LibreOffice can open.


File: source-highlight.info,  Node: Groff output,  Prev: Odf output,  Up: Simple Usage

4.7 Groff output
================

The Groff language output for GNU source-highlight enables the user to
generate black and white or color-highlighted Groff output using using
groff's Memorandum Macros (the output formats to specify on the command
line are 'groff_mm' and 'groff_mm_color', respectively) or for Man pages
(output format to specify on the command line: 'groff_man').  Such
formats have been contributed by this project
<https://github.com/papoanaya/emacs_utils>.


File: source-highlight.info,  Node: Configuration files,  Next: Invoking source-highlight,  Prev: Simple Usage,  Up: Top

5 Configuration files
*********************

During execution, source-highlight needs some files where it finds
directives on how to recognize the source language (if not specified
explicitly with '--src-lang' or '--lang-def'), on which output format to
use (if not specified explicitly with '--out-format' or
'--outlang-def'), on how to format specific source elements (e.g.,
keywords, comments, etc.), and source and output language definitions.
These files will be explained in the next sections.

   If the directory for such files is not explicitly specified with the
command line option '--data-dir', these files are searched for in the
following order:

   * the current directory;
   * the installation directory for conf files, see *note Installation::
     (please keep in mind that this directory is hard-coded into
     source-highlight during compilation).
   * if the source-highlight command is specified with an explicit path
     name, the installation directory name is still used, but relative
     to the explicit path name.

   In particular, the user can set the value also with the environment
variable 'SOURCE_HIGHLIGHT_DATADIR' (see also *note The program
source-highlight-settings::).

   If you want to be sure about which file is used during the execution,
you can use the command line option '--verbose'.

* Menu:

* Output format style::
* Output format style using CSS::
* Default Styles::
* Language map::
* Language definition files::
* Output Language map::
* Output Language definition files::
* Developing your own definition files::


File: source-highlight.info,  Node: Output format style,  Next: Output format style using CSS,  Prev: Configuration files,  Up: Configuration files

5.1 Output format style
=======================

You must specify your options for syntax highlighting in the file
'default.style'(1).  You can specify formatting options for each element
defined by a language definition file (you can get the list of such
elements, by using '--show-lang-elements', see *note Listing Language
Elements::).

   Since version 2.6, you can also specify the background color for the
output document, using the keyword 'bgcolor' (this might be visible only
when the '--doc' command line option is used).

   If many elements share the same formatting options, you can specify
these elements in the same line, separated by a comma(2).

   Here's the 'default.style' that comes with this distribution (this is
formatted by using the 'style.lang' that is shown in *note Tutorials on
Language Definitions::):

     bgcolor "white"; // the background color for documents
     context gray; // the color for context lines (when specified with line ranges)

     keyword blue b ; // for language keywords
     type darkgreen ; // for basic types
     usertype teal ; // for user defined types
     string red f ; // for strings and chars
     regexp orange f ; // for strings and chars
     specialchar pink f ; // for special chars, e.g., \n, \t, \\
     comment brown i, noref; // for comments
     number purple ;       // for literal numbers
     preproc darkblue b ; // for preproc directives (e.g. #include, import)
     symbol darkred ; // for simbols (e.g. <, >, +)
     function black b; // for function calls and declarations
     cbracket red; // for block brackets (e.g. {, })
     todo bg:cyan b;       // for TODO and FIXME
     code bg:brightgreen b; // for code snippets

     //Predefined variables and functions (for instance glsl)
     predef_var darkblue ;
     predef_func darkblue b ;

     // for OOP
     classname teal ; // for class names, e.g., in Java and C++

     // line numbers
     linenum black f;

     // Internet related
     url blue u, f;

     // other elements for ChangeLog and Log files
     date blue b ;
     time, file darkblue b ;
     ip, name darkgreen ;

     // for Prolog, Perl...
     variable darkgreen ;

     // explicit for Latex
     italics darkgreen i;
     bold darkgreen b;
     underline darkgreen u;
     fixed green f;
     argument darkgreen;
     optionalargument purple;
     math orange;
     bibtex blue;

     // for diffs
     oldfile orange;
     newfile darkgreen;
     difflines blue;

     // for css
     selector purple;
     property blue;
     value darkgreen i;

     // for oz
     atom orange;
     meta i;

     // for file system
     path orange;

     // for C (or other language) labels
     label teal b;

     // for errors
     error purple;
     warning darkgreen;

     // for feature (Cucumber) files
     cuketag green ;
     gherken blue ;
     given red ;
     when cyan ;
     then yellow ;
     and_but pink ;
     table gray ;

   This file tries to define a style for most elements defined in the
language definition files that comes with Source-highlight distribution.

   You can specify your own file (it doesn't have to be named
'default.style') with the command line option '--style-file'(3), see
*note Invoking source-highlight::.

   You can also specify the color of normal text by adding this line

     normal darkblue ;

   As you might see the syntax of this file is quite straightforward:
after the element (or elements, separated by commas) you can specify the
color, and the background color(4) by using the prefix 'bg:' (for
instance, in the 'default.style' above the background color is specified
for the 'todo' element).

   Note that the background color might not be available for all output
formats: it is available for XHTML and LaTeX but not for HTML(5).

   Then, you can specify further formatting options such as bold,
italics, etc.; these are the keywords that can be used:

     b = bold
     i = italics
     u = underline
     f = fixed
     nf = not fixed
     noref = no reference information is generated for these elements

   Since version 2.2, the color specification is not required.  For
instance, the 'texinfo.style' is as follows (we avoid colors for Texinfo
outputs):

     keyword, type b ;
     variable f, i ;
     string f ;
     regexp f ;
     comment nf, i, noref ;
     preproc b ;

     // line numbers
     linenum f;

     // Internet related
     url f;

     // for diffs
     oldfile, newfile i;
     difflines b;

     // for css
     selector, property b;
     value i;

   You may also specify more than on of these options separated by
commas, e.g.

     keyword blue u, b ;

Please keep in mind that in this case the order of these specified
options is kept during the generation of the output; for instance,
depending on the specific output format, the sequences 'u, b' and 'b, u'
may lead to different results.  In particular, the style that comes
first is used after the ones that follow.  For instance, in the case of
HTML, the sequence 'u, b' will lead to the following formatting:
'<u><b>...</b></u>'.

   The 'noref' option specifies that for this element reference
information are not generated (see *note Generating References::).  For
instance, this is used for the 'comment' element, since we do not want
that elements in a comment are searched for cross-references.

   These are all possible color logical names handled by
source-highlight(6):

     black
     red
     darkred
     brown
     yellow
     cyan
     blue
     pink
     purple
     orange
     brightorange
     green
     brightgreen
     darkgreen
     teal
     gray
     darkblue
     white

   You can also use the direct color scheme for the specific output
format, by using double quotes, such as, e.g., '"#00FF00"' in HTML(7) or
even string colors in double quotes(8), such as '"lightblue"'.  Of
course, the double quotes will be discarded during the generation.

   For instance, this is the 'syslog.style' used in the 'tests'
directory.  This uses direct color schemes.

     date, keyword yellow b ;
     time "#9999FF" ;
     ip "lightblue" b ;

     type cyan b ;
     string "brown" b ;
     comment teal ;
     number red ;
     preproc cyan ;
     symbol green ;
     function "#CC66CC" b ;
     cbracket green b ;
     twonumbers green b ;
     port green b ;
     webmethod teal ;

     // foo option
     foo red b ; // foo entry



   Note that, if you use direct color schemes, source-highlight will
perform no transformation, and will output exactly the color scheme you
specified.  For instance, the specification '"brown"' is different from
'brown': the former will be output as it is, while the latter will be
translated in the corresponding color of the output format (for HTML the
visible result is likely to be the same).

   It is up to you to specify a color scheme string that is handled by
the specific output format.  Thus, direct color schemes might not be
portable in different output formats; for instance, '"#00FF00"' is valid
in HTML but not in LaTeX.

   ---------- Footnotes ----------

   (1) Before version 2.1, this file was called 'tags.j2h' which used to
be a very obscure name.  I hope this name convention is a better one
:-).

   (2) Since version 2.6.

   (3) Before version 2.1, this command line option was called
'--tags-file' which used to be a very obscure name.  I hope this name
convention is a better one :-).

   (4) Since version 2.6.

   (5) Of course, if you use HTML and an external CSS file you will
achieve the same result.

   (6) You can see these colors in HTML in the file 'colors.html'.

   (7) Note that, since version 2.2, you must use double quotes.

   (8) Since version 2.6.


File: source-highlight.info,  Node: Output format style using CSS,  Next: Default Styles,  Prev: Output format style,  Up: Configuration files

5.2 Output format style using CSS
=================================

Since version 2.6 you can specify the output format style also using a
limited CSS syntax.  Please, note that this has nothing to do with
output produced by source-highlight using the '--css' option.

   By using a CSS file as the style file (i.e., passing it to the
'--style-css-file' command line option) you will only specify the output
format style using the same syntax of CSS. This means that you can use a
css syntax for specifying the output format style independently from the
actual output (this is what the output format style is for).  Thus, you
can use a css file as the output format style also for LaTeX output
(just like you would do with a source-highlight output format style,
*note Output format style::).

   This feature is provided basically for code re-use: you can specify
the output format style using a css file, and then re-use the same css
file as the actual style sheet of other HTML pages (or even output files
produced by source-highlight using the '--css' option).

   Note that this feature is quite primordial, so only a limited subset
of CSS syntax is recognized.  In particular, selectors are always
intended as CSS class selectors, so they must start with a dot.  '/* */'
comments are handled.  Properties (and their values) not handled by
source-highlight are simply (and silently) discarded).

   This is an example of CSS specification handled correctly by
source-highlight as a style format specification:

     body {
       background-color: <color specification>;
      }

     .selector {
       color: <color specification>;
       background-color: <color specification>;
       font-weight: bold; /* this is a comment */
       font-family: monospace;
       font-style: italic;
       text-decoration: underline;
      }

   Finally, this is the 'default.css' that corresponds to
'default.style' presented in *note Output format style:::

     body {  background-color: white;  }

     /* the color for context lines (when specified with line ranges) */
     .context {  color: gray; }

     .keyword { color: blue; font-weight: bold; }
     .type { color: darkgreen; }
     .usertype, .classname { color: teal; }
     .string { color: red; font-family: monospace; }
     .regexp { color: orange; }
     .specialchar { color: pink; font-family: monospace; }
     .comment { color: brown; font-style: italic; }
     .number { color: purple; }
     .preproc { color: darkblue; font-weight: bold; }
     .symbol { color: darkred; }
     .function { color: black; font-weight: bold; }
     .cbracket { color: red; }
     .todo { font-weight: bold; background-color: cyan; }

     /* line numbers */
     .linenum { color: black; font-family: monospace; }

     /* Internet related */
     .url { color: blue; text-decoration: underline; font-family: monospace; }

     /* other elements for ChangeLog and Log files */
     .date { color: blue; font-weight: bold; }
     .time, .file { color: darkblue; font-weight: bold; }
     .ip, .name { color: darkgreen; }

     /* for Prolog, Perl */
     .variable { color: darkgreen; }
     .italics { color: darkgreen; font-style: italic; }
     .bold { color: darkgreen; font-weight: bold; }

     /* for LaTeX */
     .underline { color: darkgreen; text-decoration: underline; }
     .fixed { color: green; font-family: monospace; }
     .argument, .optionalargument { color: darkgreen; }
     .math { color: orange; }
     .bibtex { color: blue; }

     /* for diffs */
     .oldfile { color: orange; }
     .newfile { color: darkgreen; }
     .difflines { color: blue; }

     /* for css */
     .selector { color: purple; }
     .property { color: blue; }
     .value { color: darkgreen; font-style: italic; }

     /* for Oz */
     .atom { color: orange; }
     .meta { font-style: italic; }

     /* for feature/cucumber files */
     .cuketag { color: green; }
     .gherken { color: blue; }
     .given { color: red; }
     .when { color: cyan; }
     .then { color: yellow; }
     .and_but { color: pink; }
     .table { color: gray; }

   If you pass this file to the '--style-css-file' command line option
and you produce an output file, you will get the same result of using
'default.style'.

   Source-highlight comes with a lot of CSS files that can be used
either as standard CSS files for HTML documents, or as style files to
pass to '--style-css-file'.  In the documentation installation directory
(see *note Installation::) you will find the file 'style_examples.html'
which shows many output examples, each one with a different CSS style.


File: source-highlight.info,  Node: Default Styles,  Next: Language map,  Prev: Output format style using CSS,  Up: Configuration files

5.3 Default Styles
==================

This file(1) (the default file is 'style.defaults') lists the default
style for a language element whose output style is not specified in the
style file; in particular the following line (comment lines start with
'#'):

     elem1 = elem2

tells that, if the style for an element, say elem1, is not specified in
the style file, then elem1 will have the same style of elem2.

   For instance, this is the 'style.defaults' that comes with
Source-highlight:

     # defaults for styles
     # the format is:
     # elem1 = elem2
     # meaning that if the style for elem1 is not specified,
     # then it will have the same style as elem2

     classname = normal
     usertype = normal
     preproc = keyword
     section = function
     paren = cbracket
     attribute = type
     value = string
     predef_var = type
     predef_func = function
     atom = regexp
     meta = function
     path = regexp
     label = preproc
     error = string
     warning = type
     code = preproc

In this case the style for the element 'preproc' will default to the
style of the element 'keyword'.

   This file is useful when you want to create your own style file and
you don't want to specify styles for all the elements that will have the
same output style in your style (e.g., the default style formats
'preproc' elements differently from keywords, but if in your style you
don't specify a style for it, a 'preproc' element will still be
formatted as a 'keyword').

   ---------- Footnotes ----------

   (1) Since version 2.9.


File: source-highlight.info,  Node: Language map,  Next: Language definition files,  Prev: Default Styles,  Up: Configuration files

5.4 Language map
================

This configuration file associates a file extension to a specific
language definition file.  You can also use such file extension to
specify the '--src-lang' option (see *note Simple Usage::).
Source-highlight comes with such a file, called 'lang.map'.

   Of course, you can override the settings of this file by writing your
own language map file and specify such file with the command line option
'--lang-map').  Moreover, as explained above, if a file 'lang.map' is
present in the current directory, such version will be used.  The format
of such file is quite simple (comment lines start with '#'):

     extension = language definition file

   The default language definition file is shown in *note
Introduction::.


File: source-highlight.info,  Node: Language definition files,  Next: Output Language map,  Prev: Language map,  Up: Configuration files

5.5 Language definition files
=============================

These files are crucial for source-highlight since they specify the
source elements that have to be highlighted.  These files also allow to
specify your own language definitions in order to deal with a language
that is not handled by source-highlight(1).  The syntax for these files
is explained in *note Language Definitions::.

   ---------- Footnotes ----------

   (1) This is the main difference introduced in version 2.0 with
respect the previous version.


File: source-highlight.info,  Node: Output Language map,  Next: Output Language definition files,  Prev: Language definition files,  Up: Configuration files

5.6 Output Language map
=======================

This configuration file associates an output format to a specific output
language definition file.  You can use the name of that output format to
specify the '--out-format' option (see *note Simple Usage::).
Source-highlight comes with such a file, called 'outlang.map'.

   Of course, you can override the settings of this file by writing your
own output language map file and specify such file with the command line
option '--outlang-map').  Moreover, as explained above, if a file
'outlang.map' is present in the current directory, such version will be
used.  The format of such file is quite simple:

     output format name = language definition file

   The default language definition file is shown in *note
Introduction::.

   In particular, there is a convention for the output format name in
the output language map: the one with '-css' suffix is the one used when
'--css' command line option is given


File: source-highlight.info,  Node: Output Language definition files,  Next: Developing your own definition files,  Prev: Output Language map,  Up: Configuration files

5.7 Output Language definition files
====================================

These files are crucial for source-highlight since they specify how the
source elements are highlighted.  These files also allow to specify your
own output format definitions in order to deal with an output format
that is not handled by source-highlight(1).  The syntax for these files
is explained in *note Output Language Definitions::.

   These files are part of source-highlight distribution, but they can
also be downloaded, independently, from here:

   <http://www.gnu.org/software/src-highlite/outlang_files/>

   ---------- Footnotes ----------

   (1) This is the main difference introduced in version 2.1 with
respect the the previous version.


File: source-highlight.info,  Node: Developing your own definition files,  Prev: Output Language definition files,  Up: Configuration files

5.8 Developing your own definition files
========================================

I encourage those who write new language definitions or correct/modify
existing language definitions to send them to me so that they can be
added to the source-highlight distribution!

   Since these files require more explanations (that, however, are not
necessary to the standard usage of source-highlight), they are carefully
explained in separate parts: *note Language Definitions:: and *note
Output Language Definitions::.

   These files are part of source-highlight distribution, but they can
also be downloaded, independently, from here:

   <http://www.gnu.org/software/src-highlite/lang_files/>


File: source-highlight.info,  Node: Invoking source-highlight,  Next: Language Definitions,  Prev: Configuration files,  Up: Top

6 Invoking 'source-highlight'
*****************************

The format for running the 'source-highlight' program is:

     source-highlight OPTION ...

   'source-highlight' supports the following options, shown by the
output of 'source-highlight --detailed-help':

     source-highlight

     Highlight the syntax of a source file (e.g. Java) into a specific format (e.g.
     HTML)

     Usage:  [OPTIONS]... < INPUT_FILE > OUTPUT_FILE
            source-highlight [OPTIONS]... -i INPUT_FILE -o OUTPUT_FILE
            source-highlight [OPTIONS]... [FILES]...

       -h, --help                    Print help and exit
           --detailed-help           Print help, including all details and hidden
                                       options, and exit
       -V, --version                 Print version and exit
       -i, --input=FILENAME          Input file (default=stdin)
       -o, --output=FILENAME         Output file (default=stdout, when the third
                                       invocation form is used). If 'STDOUT' is
                                       specified, the output is directed to standard
                                       output

     You can simply specify some files at the command line and also use regular
     expressions (for instance *.java).  In this case the name for the output files
     will be formed using the name of the source file with a .EXT appended, where
     EXT is the extension chosen according to the output format specified (for
     instance .html).

       -s, --src-lang=STRING         source language (use --lang-list to get the
                                       complete list).  If not specified, the source
                                       language will be guessed from the file
                                       extension.
           --lang-list               list all the supported language and associated
                                       language definition file
           --outlang-list            list all the supported output language and
                                       associated language definition file
       -f, --out-format=STRING       output format (use --outlang-list to get the
                                       complete list)  (default=`html')
       -d, --doc                     create an output file that can be used as a
                                       stand alone document (e.g., not to be
                                       included in another one)
           --no-doc                  cancel the --doc option even if it is implied
                                       (e.g., when css is given)
       -c, --css=FILENAME            the external style sheet filename.  Implies
                                       --doc
       -T, --title=STRING            give a title to the output document.  Implies
                                       --doc
       -t, --tab=INT                 specify tab length.  (default=`8')
       -H, --header=FILENAME         file to insert as header
       -F, --footer=FILENAME         file to insert as footer
           --style-file=FILENAME     specify the file containing format options
                                       (default=`default.style')
           --style-css-file=FILENAME specify the file containing format options (in
                                       css syntax)
           --style-defaults=FILENAME specify the file containing defaults for format
                                       options  (default=`style.defaults')
           --outlang-def=FILENAME    output language definition file
           --outlang-map=FILENAME    output language map file
                                       (default=`outlang.map')
           --data-dir=PATH           directory where language definition files and
                                       language maps are searched for.  If not
                                       specified these files are searched for in the
                                       current directory and in the data dir
                                       installation directory
           --output-dir=PATH         output directory
           --lang-def=FILENAME       language definition file
           --lang-map=FILENAME       language map file  (default=`lang.map')
           --show-lang-elements=FILENAME
                                     prints the language elements that are defined
                                       in the language definition file
           --infer-lang              force to infer source script language
                                       (overriding given language specification)

     Lines:
       -n, --line-number[=PADDING]   number all output lines, using the specified
                                       padding character  (default=`0')
           --line-number-ref[=PREFIX]
                                     number all output lines and generate an anchor,
                                       made of the specified prefix + the line
                                       number  (default=`line')

     Filtering output:

      Mode: linerange
       specifying line ranges
           --line-range=STRING       generate only the lines in the specified
                                       range(s)
       each range can be of the shape:
       	single line (e.g., --line-range=50)
       	full range (e.g., --line-range=2-10)
       	partial range (e.g., --line-range=-30, first 30 lines,
       	--line-range=40- from line 40 to the end

           --range-separator=STRING  the optional separator to be printed among
                                       ranges (e.g., "...")
           --range-context=INT       number of (context) lines generated even if not
                                       in range
       The optional --range-context specifies the number of lines that are not in
       	range that will be printed anyway (before and after the lines in range);
       	These lines will be formatted according to the "context" style.


      Mode: regexrange
       specifying regular expression delimited ranges
           --regex-range=STRING      generate only the lines within the specified
                                       regular expressions
       when a line containing the specified regular expression is found, then
       the lines after this one are actually generated, until another line,
       containing the same regular expression is found (this last line is not
       generated).
       More than one regular expression can be specified.

     reference generation:
           --gen-references=STRING   generate references  (possible
                                       values="inline", "postline", "postdoc"
                                       default=`inline')
           --ctags-file=FILENAME     specify the file generated by ctags that will
                                       be used to generate references
                                       (default=`tags')
           --ctags=CMD               how to run the ctags command.  If this option
                                       is not specified, ctags will be executed with
                                       the default value.  If it is specified with
                                       an empty string, ctags will not be executed
                                       at all  (default=`ctags --excmd=n
                                       --tag-relative=yes')

     testing:
       -v, --verbose                 verbose mode on
       -q, --quiet                   print no progress information
           --binary-output           write output files in binary mode
       This is useful for testing purposes, since you may want to make
       sure that output files are always generated with a final newline character
       only
           --statistics              print some statistics (i.e., elapsed time)
           --gen-version             put source-highlight version in the generated
                                       file  (default=on)
           --check-lang=FILENAME     only check the correctness of a language
                                       definition file
           --check-outlang=FILENAME  only check the correctness of an output
                                       language definition file
           --failsafe                if no language definition is found for the
                                       input, it is simply copied to the output
       -g, --debug-langdef[=TYPE]    debug a language definition.  In dump mode just
                                       dumps all the steps; in interactive, at each
                                       step, waits for some input (press ENTER to
                                       step)  (possible values="interactive",
                                       "dump" default=`dump')
           --show-regex=FILENAME     show the regular expression automaton
                                       corresponding to a language definition file

   Let us explain some options in details (apart from those that should
be clear from the '--help' output itself, and those already explained in
*note Simple Usage::).

'--data-dir'
     Source-highlight, during the execution, will need some files, such
     as, e.g., language definition files, output format definition
     files, etc.  These files are installed in
     'prefix/share/source-highlight' where 'prefix' is chosen at
     compilation time (see *Note Installation::).  Thus,
     source-highlight should be able to find all the files it needs
     independently.  However, if you want to override this setting,
     e.g., because you have your own language definition files, or
     simply because you installed a possible source-highlight binary in
     a different directory from the one used during the compilation, you
     can use the command line option '--data-dir'.

'--doc'
'-d'
     If you want a stand alone output document (i.e., an output file
     that is not thought to be included in another document), specify
     this option (otherwise you just get some text that you can paste
     into another document).  If you choose this option and do not
     provide a '--title', the your source file name will be used as the
     title.

'--no-doc'
     The '--doc' option above is actually implied by other command line
     options (e.g., '--css').  If you do not want this (e.g., you want
     to include the output in an existing document containing the global
     style sheet), you can disable this by using '--no-doc'.

'--css'
'-c'
     Specify the style sheet file (e.g., a '.css' for HTML(1)) for the
     output document.  Note that source-highlight will not use this
     file: it will simply use this file name when generating the output
     file, so to specify that the output file uses this file as the
     style sheet (e.g., if the generated HTML relies on this file as the
     CSS file).

'--tab'
'-t'
     With this options, tab characters will be converted into specified
     number of space characters (tabulation points will be preserved).
     This option is automatically selected when generating line numbers.

'--style-file'
'--style-css-file'
     Specify the file that source-highlight will use to produce (i.e.,
     format) the output (e.g., colors and other styles for each language
     element).  The formats of these files are detailed in *note Output
     format style:: and in *note Output format style using CSS::,
     respectively.

'--style-defaults'
     Specify the file that contains the default styles for elements
     whose styles are not found in the style file (see *note Default
     Styles:: for further details).

'--output-dir'
     You can pass to source-highlight more than one input file (see
     *note Simple Usage::).  In this case you cannot specify the output
     file name.  In such cases the output files will be automatically
     generated into the directory where you invoked the command from; if
     you want the output files to be generated into a different
     directory you can use this option.

'--infer-lang'
     Force the inference mechanism for detecting the input language.
     This is detailed in *note How the input language is discovered::.

'--line-number'
     Line numbers will be generated in the output, using the (optional)
     specified padding character(2) (the default padding character is
     '0').

'--line-number-ref'
     As '--line-number', this option numbers all the output lines, and,
     additionally, generates an anchor for each line.  The anchor
     consists of the specified prefix (default is 'line') and the line
     number (e.g., 'line25').  For instance, as prefix, if you deal with
     many files, you can use the file name.  Note that some output
     languages might not support this feature (e.g., 'esc', since it
     makes no sense in such case).  See *note Anchors and References::
     for defining how to generate an anchor in a specific output
     language.

'--line-range'
'--range-context'
'--range-separator'
     Since version 2.11, you can specify multiple line ranges: only the
     lines in the source that are in these ranges will be output.  For
     instance, by specifying

          --line-range="-5","10","20-25","50-"

     Only the following lines will be output: the first 5 lines, line
     10, lines 20 to 25 and from line 50 to the end of input.  (See also
     the examples in *note Line ranges::).

     Together with '--line-range', you can also specify
     '--range-context': this is the number of lines that will be printed
     before and after the lines of a range (i.e., the surrounding
     "context").  These lines will not be highlighted: they will be
     printed according to the style 'context'.  For instance, extending
     the previous example,

          --line-range="-5","10","20-25","50-" --range-context=1

     Also the following lines will be output: 6, 9, 11, 19, 26, 49.
     (See also the examples in *note Line ranges (with context)::).

     Finally, you can specify a range separator line string with
     '--range-separator' that will be printed between ranges (See also
     the examples in *note Line ranges (with context)::).  The separator
     string is preformatted automatically, so, e.g., you don't have to
     escape special output characters, such as the { } in texinfo
     output.

'--regex-range'

     Ranges can be expressed also using regular expressions, with the
     command line option '--regex-range'.  In this case the beginning of
     the range will be detected by a line containing (in any point) a
     string matching the specified regular expression; the end will be
     detected by a line containing a string matching the same regular
     expression that started the range.  This feature is very useful
     when we want to document some code (e.g., in this very manual) by
     showing only specific parts, that are delimited in a ad-hoc way in
     the source code (e.g., with specific comment patterns).  You can
     see some usage examples in *Note Regex ranges::.

     The specified strings (this option accepts multiple occurrences)
     must be valid regular expressions (thus you must escape special
     characters accordingly), otherwise you will get an error.

     Furthermore, '--line-range' and '--regex-range' cannot coexist in
     the same command line.

'--failsafe'
     If no language specification is found, an error will be printed and
     the program exits.  With this option, instead, in such situations,
     the input is simply formatted in the output format.  This is useful
     when 'source-highlight' is used with many input files, and it is
     also used in the 'src-hilite-lesspipe.sh' script.  Actually I
     failed to find a good reason why one should not always use this
     option.  So my suggestion is to always use it when you run
     source-highlight (and indeed, in the future, this option might
     become the default one).  See also *note Using source-highlight
     with less::, *note Using source-highlight as a simple formatter::.

     When using '--failsafe', if no input language can be established,
     source-highlight will use the input language definition file
     'default.lang', which is an empty file.  You might want to
     customize such file, though.

'--debug-lang'
'--show-regex'
     Allows to debug a language definition file, *note Debugging::.

   The other command line options dealing with references are explained
in more details in *note Generating References::.

* Menu:

* How the input language is discovered::

   ---------- Footnotes ----------

   (1) As explained before, originally Source-highlight was thought
mainly for generating HTML output, this is why the term _css_ is used
for style sheets.

   (2) Padding character can be specified since version 2.8.


File: source-highlight.info,  Node: How the input language is discovered,  Prev: Invoking source-highlight,  Up: Invoking source-highlight

6.1 How the input language is discovered
========================================

As already explained, *note Simple Usage::, source-highlight uses a
language definition file according the language specified with the
option '--src-lang', or '--lang-def', or by using the input file
extension.

   Since version 2.5, source-highlight can use an inference mechanism to
deduce the input language.  For the moment, it can detect script
languages based on the "sha-bang" mechanism, i.e., when the first line
of a script contains a line such as, e.g.,

     #!/bin/sh

   It detects script languages specified by using the 'env' program(1):

     #!/usr/bin/env perl

   It recognizes the Emacs convention, of declaring the Emacs major mode
using the format '-*- lang -*-'.

   For instance, a script starting as the following one:

     #!/bin/bash
     # -*- Tcl -*-

will be interpreted as a Tcl script, and not as bash script.

   Finally, it recognizes '<?' specifications (e.g., '<?php' and
'<?xml') and '<!doctype' (in that case, it infers it is an xml file)(2).

   This inference mechanism is performed, by default, in case the input
language is neither explicitly specified nor found in the language map
file by using the input file extension or the filename itself, possibly
also the lowercase version (the input file may also have no extension at
all, but, for instance, a 'ChangeLog' input file will be highlighted
using 'changelog.lang').

   Furthermore, this mechanism can be given priority with the command
line option '--infer-lang'.  For instance, this is used in the script
'src-hilite-lesspipe.sh' (*note Using source-highlight with less::) when
running source-highlight, in order to avoid the problem of formatting a
Perl script as a Prolog program (since the extension '.pl' is associated
to Prolog programs in the language map file, *note Perl::).

   ---------- Footnotes ----------

   (1) Since version 2.7.

   (2) Since version 3.1.2.


File: source-highlight.info,  Node: Language Definitions,  Next: Output Language Definitions,  Prev: Invoking source-highlight,  Up: Top

7 Language Definitions
**********************

Since version 2.0 source-highlight uses a specific syntax to specify
source language elements (e.g., keywords, strings, comments, etc.).
Before version 2.0, language elements were scanned through Flex.  This
had the drawback of writing a new flex file to deal with a new language;
even worse, a new language could not be added "dynamically": you had to
recompile the whole source-highlight program.

   Instead, now, language elements are specified in a file, loaded
dynamically, through a (hopefully) simple syntax.  Then, these
definitions are used internally to create, on-the-fly, regular
expressions that are used to highlight the elements (see also *note How
source-highlight works::).  In particular, we use the regular
expressions provided by the Boost library (see *note Installation::).
Thus, when writing a language definition file you will surely have to
deal with regular expressions.  Don't be scared: for most of the
languages you may never have to deal with difficult regular expressions,
and you can also specify language keywords (such as, e.g., "if",
"while", etc., see *note Simple definitions::); moreover, for defining
delimited language elements you will not have to write a regular
expression, but just the delimiters (see *note Delimited definitions::).
However, there might be some language definitions that may require heavy
use of more involved regular expressions (e.g., Perl, just to mention
one).

   Of course, we use the Boost regex library regular expression syntax.
We refer to Boost documentation for such syntax,
<http://www.boost.org/libs/regex/doc/syntax.html>, however, in *note
Notes on regular expressions::, we provide some notes on regular
expressions that might be helpful for those who never dealt with them.
By default, Boost regex library uses Perl regular expression syntax,
and, at the moment, this is the only syntax supported by
source-highlight.

   Here, we see such syntax in details, by relying on many examples.
This allows a user to easily modify an existing language definition and
create a new one.  These files have, typically, extension '.lang'.

   Each definition basically associates a regular expression to a
language element and defines a name for the language element.  Such name
will be used to associate a particular style (e.g., bold face, color,
etc.)  when highlighting such elements.  You cannot use names that are
the same of keywords used in the language definition syntax (e.g.,
'start', as shown later, is a reserved word).

   Comments can be given by using '#'; the rest of the line is
considered as a comment.

   Source-highlight will scan each line of the input file separately.
So a regular expression that tries to match new line characters is
destined to fail.  However, the language definition syntax provides
means to deal with multiple lines (see *note Delimited definitions:: and
*note State/Environment Definitions::).

* Menu:

* Ways of specifying regular expressions::
* Simple definitions::
* Line wide definitions::
* Order of definitions::
* Delimited definitions::
* Variable definitions::
* Dynamic Backreferences::
* File inclusion::              Include the contents of another file
* State/Environment Definitions::
* Explicit subexpressions with names::
* Redefinitions and Substitutions::
* How source-highlight works::
* Notes on regular expressions::
* The program check-regexp::
* Listing Language Elements::
* Concluding Remarks::
* Debugging::                   Debug a language definition file
* Tutorials on Language Definitions::


File: source-highlight.info,  Node: Ways of specifying regular expressions,  Next: Simple definitions,  Prev: Language Definitions,  Up: Language Definitions

7.1 Ways of specifying regular expressions
==========================================

Before getting into details of language definition syntax, it is crucial
to describe the 3 possible ways of specifying a regular expression
string.  These 3 different ways, basically differ in the way they handle
regular expression special characters, such, e.g., parenthesis.  For
this reason, one mechanism can be more powerful than another one, but it
could also require more attention; furthermore, there can be situations
where you're forced to use only one mechanism, since the other ones
cannot accomplish the required goal.

'"expression"'

     If you use double quotes (note, '"' and not '``' or '''') to
     specify a regular expression, then basically all the characters,
     but the alternation symbol, i.e., the pipe symbol '|', are
     considered literally, and thus will be automatically escaped (e.g.,
     a dot '.' is interpreted as the character '.' not as the regular
     expression wild card).  Thus, for instance, if you specify

          "my(regular)ex.pre$$ion{*}"

     source-highlight will automatically transform it into

          my\(regular\)ex\.pre\$\$ion\{\*\}

     The special character '|', unless it is meant to separate two
     alternatives (*note Simple definitions::), must be escaped with the
     character '\', e.g., '\|'.  Also the character '\', if it is
     intended literally, must be escaped, e.g., '\\'.

''expression''

     If you want to enjoy (almost) the full power of regular
     expressions, you need to use single quoted strings ('''), instead
     of double quoted strings.  This way, you can specify special
     characters with their intended meaning.

     However, marked subexpressions are automatically transformed in non
     marked subexpressions, i.e., the parts in the expression of the
     shape '(...)' will be transformed into '(?:...)' (as explained in
     *note Notes on regular expressions::, '(?:...)' lexically groups
     part of a regular expression, without generating a marked
     sub-expression).

     Thus, for instance, if you specify

          'my(regular)ex.pre$ion*'

     source-highlight will automatically transform it into

          my(?:regular)ex.pre$ion*

     Since marked subexpressions cannot be specified with this syntax,
     then _backreferences_ (see *note Notes on regular expressions::)
     are not allowed.

'`expression`'

     This syntax(1) (note the difference, this one uses the _backtick_
     '`' while the previous one uses ''') for specifying a regular
     expression was introduced to overcome the limitations of the other
     two syntaxes.  With this syntax, the marked subexpressions are not
     transformed, and so you can use regular expressions mechanisms that
     rely on marked subexpressions, such as _backreferences_ and
     _conditionals_ (see *note Notes on regular expressions::).

     This syntax is also crucial for highlighting specific program parts
     of some programming languages, such as, e.g., Perl regular
     expressions (e.g., in substitution expressions) that can be
     expressed in many forms, in particular, separators for the part to
     be replaced and the part to replace which can be any non
     alphanumerical characters(2), for instance,

          s/foo/bar/g
          s|foo|bar|g
          s#foo#bar#g
          s@foo@bar@g

     Using this syntax, and backreferences, we can easily define a
     single language element to deal with these expressions (without
     specifying all the cases for each possible non alphanumerical
     character):

          regexp = `s([^[:alnum:][:blank:]]).*\1.*\1[ixsmogce]*`

   Since version 2.11, in all kinds of regular expression specification,
you can insert newline characters, which will simply be ignored.  Thus,
e.g., the file:

     # test_newlines.lang
     # test that newlines in expressions are simply discarded

     keyword = "foo
     |
     lang"

     (keyword,normal,classname) =
       `(\<struct)
     ([[:blank:]]+)
     ([[:alnum:]_]+)`

     preproc = '^[[:blank:]]*
     #([[:blank:]]*
     [[:word:]]*)'

and the file:

     # test_nonewlines.lang
     # test that newlines in expressions are simply discarded
     # see the corresponding test_newlines.lang

     keyword = "foo|lang"

     (keyword,normal,classname) = `(\<struct)([[:blank:]]+)([[:alnum:]_]+)`

     preproc = '^[[:blank:]]*#([[:blank:]]*[[:word:]]*)'

are equivalent.  However, the former is surely more readable.

   Note however, that space characters are NOT ignored in regular
expression definitions.

   ---------- Footnotes ----------

   (1) Since version 2.7.

   (2) This issue concerning Perl regular expression syntax was raised
by Elias Pipping, and this also pushed me to deal with this more
powerful syntax that permits using backreferences, for instance.
Although we're still far from highlighting Perl syntax completely (*note
Perl::), I definitely must thank Elias for his precious information
about this matter :-)


File: source-highlight.info,  Node: Simple definitions,  Next: Line wide definitions,  Prev: Ways of specifying regular expressions,  Up: Language Definitions

7.2 Simple definitions
======================

The simplest way to specify language elements is to list the possible
alternatives.  This is the case, for instance, for keywords.  For
instance, in 'java.lang' you have:

     keyword = "abstract|assert|break|case|catch|class|const",
               "continue|default|do|else|extends|false|final",
               "finally|for|goto|if|implements|instanceof|interface"
     keyword = "native|new|null|private|protected|public|return",
               "static|strictfp|super|switch|synchronized|throw",
               "throws|true|this|transient|try|volatile|while"

   You can separate quoted definitions with commas.  Alternatively,
within a quoted definition, alternatives can be separated with the pipe
symbol '|'.  The above definition defines the language element
'keyword'.  Each time an element is found in the source file, it is
highlighted with the style for the element with the same name in the
output format style file (note that all elements shown in the example
are taken from the language definition files that come with
source-highlight and there is a style for each of such elements, see
*note Configuration files::).  If such an element is not specified in
the output format style file, it is simply not highlighted (actually, it
is highlighted with style 'normal', *note Configuration files::) (so pay
attention to typos :-).

   From the above example you may have noted that language element
definitions are cumulative, so the second 'keyword' definition does not
replace the first one.  (Indeed, in some cases you may want to actually
redefine a language element; this is possible as explained in *note
Redefinitions and Substitutions::).

   Note that words specified in double quotes have to match exactly in a
source file, and they must be isolated (not surrounded by anything but
spaces).  Thus for instance 'class' is matched as a keyword, but in
'my_class' the substring 'class' is not matched as keyword.  From the
point of view of regular expressions a string such as 'class' in a
double quote simple definition is intended as '\<(class)\>'.

   Special characters have to be escaped with the character '\'.  So for
instance if you want to specify the character '|', which is normally
used to separate alternatives in double quoted strings, you have to
specify '\|'.

   As explained in *note Ways of specifying regular expressions::,
definitions in double quotes are interpreted literally (thus, e.g., a
dot '.' is interpreted as the character '.' not as the regular
expression wild card).  If you want to enjoy the full power of regular
expressions to specify a language alternative, you have to use single
quoted strings ('''), instead of double quoted strings, or strings
quoted with backticks ('`').

   For instance, the following is the definition for a preprocessor
directive in C/C++:

     preproc = '^[[:blank:]]*#([[:blank:]]*[[:word:]]*)'

   Note that the definition ''class'' is different from '"class"', as
explained above.  Thus, for instance ''class'' matches also the
sub-expression 'class' inside 'my_class'.

   Furthermore, you are not allowed to specify, in the same list, double
quoted strings and single quoted strings: you need to split such list
definitions.  Thus, for instance, the following definition is wrong:

     preproc = "#define",'^[[:blank:]]*#([[:blank:]]*[[:word:]]*)'

   while the following one is correct:

     preproc = "#define"
     preproc = '^[[:blank:]]*#([[:blank:]]*[[:word:]]*)'

   Finally, at the end of a list of definitions, one can specify the
keyword 'nonsensitive'; in that case, the specified strings will be
interpreted in a non case sensitive way.  For instance, we use this
feature in Pascal language definition, 'pascal.lang' where keywords are
parsed in a non sensitive way:

     keyword = "alfa|and|array|begin|case|const|div",
           "do|downto|else|end|false|file|for|function|get|goto|if|in",
           "label|mod|new|not|of|or|pack|packed|page|program",
           "put|procedure|read|readln|record|repeat|reset|rewrite|set",
           "text|then|to|true|type|unpack|until|var|while|with|writeln|write"
       nonsensitive


File: source-highlight.info,  Node: Line wide definitions,  Next: Order of definitions,  Prev: Simple definitions,  Up: Language Definitions

7.3 Line wide definitions
=========================

It is often useful to define a language element that affects all the
remaining characters up to the end of the line.  For such definitions,
instead of the '=' you must use the keyword 'start'.  For instance, the
following is the definition of a single line comment in C++:

     comment start "//"

   This means that when the two characters '//' are encountered in the
source file, everything from these characters on, up to the end of the
line, will be highlighted according to the style 'comment'.


File: source-highlight.info,  Node: Order of definitions,  Next: Delimited definitions,  Prev: Line wide definitions,  Up: Language Definitions

7.4 Order of definitions
========================

It is important to observe that the order of language definitions is
important since it will be used during regular expression matching (this
will be detailed in *note How source-highlight works::).  You then have
to make sure that, if there are definitions that start with same
characters, the longest expression is specified first in the file.  For
instance if you write

     symbol = "/"
     comment start "//"

   The first expression will always be matched first, and the second
expression will never be matched.  The right order is

     comment start "//"
     symbol = "/"


File: source-highlight.info,  Node: Delimited definitions,  Next: Variable definitions,  Prev: Order of definitions,  Up: Language Definitions

7.5 Delimited definitions
=========================

Many elements are delimited by specific character sequences.  For
instance, strings and multiline comments.  The syntax for such an
element definition is

     <name> delim <left delimited> <right delimiter> \
             {escape <escape character>} \
             {multiline} {nested}

   The 'escape' statement specifies the escape character that may
precede one of the delimiters inside the element.  This is optional.

   For instance, this is the definition of C-like strings:

     string delim "\"" "\"" escape "\\"

   Note that '\' is a special characters in definitions so it has to be
escaped.  If the 'escape' specification was omitted, the C string
'"write \"hello\" string"' would have been highlight incorrectly (it
would have been highlighted as the string '"write \"', the normal
character sequence 'hello\' and the string '" string"').

   The option 'multiline' specifies that the element can spawn multiple
lines.  For instance, PHP strings are defined as follows:

     string delim "\"" "\"" escape "\\" multiline

   The option 'nested' instructs to count possible multiple occurrences
of delimited characters and to match relative multiple occurrences
(using a stack).  For instance, if we wanted to highlight C-like
multiline comments in a nested way(1), we could use the following
definition:

     comment delim "/*" "*/" multiline nested

   If 'nested' was not used, then the closing '*/' of the following
nested comment would conclude the comment (and the second '*/' would not
be highlighted as a comment):

     /*
        This is a /* nested comment */
     */

   Note that, in order for a delimited language element to be nested,
its starting and ending elements must be different; thus, for instance,
the following definition is not correct:

     string delim "\"" "\"" nested # WRONG!

   As said above, definitions are cumulative, and they are also
cumulative even when using different syntactic forms.  Thus, for
instance, the complete definition for C++-style comments are the
following (actually, the definition of C-style comment is more involved,
see the file 'c_comment.lang'):

     comment start "//"
     comment delim "/*" "*/" multiline

   ---------- Footnotes ----------

   (1) As Ed Kelly correctly pointed out, C-style comments are NOT
nested; it's a big shame I've been using C++ and Java for years and have
always thought they were nested :-)...  Thus, in previous versions of
source-highlight distributions, C-style comments were (uncorrectly)
defined as nested.  Thank you Ed, for your feedback!


File: source-highlight.info,  Node: Variable definitions,  Next: Dynamic Backreferences,  Prev: Delimited definitions,  Up: Language Definitions

7.6 Variable definitions
========================

It is possible to define variables to be re-used in many parts in a
language definition file.  A variable is defined by using

   'vardef' <name of the variable> '=' <list of definitions>

   Once defined, a variable can be used by prepending the symbol '$' to
its name.  For instance,

     vardef FUNCTION = '(?:[[:alpha:]]|_)[[:word:]]*(?=[[:blank:]]*\()'
     function = $FUNCTION

   The capital letters are used only for readability.

   It is also possible to concatenate variables and expressions, and
reuse variables inside further variable definitions:

     vardef basic_time = '[[:digit:]]{2}:[[:digit:]]{2}:[[:digit:]]{2}'
     vardef time = '\<' + $basic_time + '\>'


File: source-highlight.info,  Node: Dynamic Backreferences,  Next: File inclusion,  Prev: Variable definitions,  Up: Language Definitions

7.7 Dynamic Backreferences
==========================

With _dynamic backreferences_ you can refer to a string matched by the
regular expression of the first element of a 'delim' specification(1).
I called these backreferences dynamic in order to distinguish them by
the backreferences of regular expression syntax, *note Ways of
specifying regular expressions::.  This is crucial in cases when the
right delimiter depends on a subexpression matched by the left
delimiter; for instance, Lua comments can be of the shape '--[[ comment
]]' or '--[=[ comment ]=]', but not '--[=[ comment ]]' neither '--[[
comment ]=]' (furthermore, they can be nested)(2).  Thus, the regular
expression of the right element depends on the one of the left element.

   A dynamic backreference is similar to a variable (*note Variable
definitions::), but there's no declaration, and have the shape of

     @{number}

where 'number' is the number of the marked subexpression in the left
delimiter (source-highlight will actually check that such a marked
subexpression exists in the left delimiter).

   For instance, this is the definition of Lua comments (see also
'lua.lang'):

     environment comment delim `--\[(=*)\[` "]" + @{1} + "]"
                 multiline nested begin
       include "url.lang"
       ...
     end

Note how the left delimiter can match an optional '=', as a marked
subexpression, and the right delimiter refers to that with @{1}.

   Source-highlight will take care of escaping possible special
characters during dynamic backreference substitutions.  For instance,
suppose that you must substitute '|' for @{1}, because we matched '|'
with the subexpression '[^[:alnum:]]' in a delim element like the
following one:

     comment delim `([^[:alnum:]])` @{1}

Since '|' is a special character in regular expression syntax
source-highlight will actually replace '@{1}' with '\|'.

   IMPORTANT: the right delimiter can only refer to subexpressions of
its left delimiter; thus, in case of nested delim element definitions
(e.g., in states or environment, *note State/Environment Definitions::),
the left delimiter acts as a binder and hides possible subexpressions
defined in outer delim elements.

   This is crucial to correctly match nested delimited elements with
backreferences: source-highlight will correctly recognize this nested
(and syntactically correct) Lua comment:

     --[[
       first level comment
       --[=[
         second level
          --[[
            third level
         ]]
       ]=]
     ]]

   ---------- Footnotes ----------

   (1) Since version 2.8

   (2) I'm grateful to Jurgen Hotzel for rising this issue about Lua
comments; this led me to introduce dynamic backreferences.


File: source-highlight.info,  Node: File inclusion,  Next: State/Environment Definitions,  Prev: Dynamic Backreferences,  Up: Language Definitions

7.8 File inclusion
==================

It is possible to include other language definition files into another
file.  This is inclusion actually physically includes the contents of
the included file into the current file during parsing, at the exact
point of inclusion (just like the '#include' in C/C++).  This is useful
for re-using definitions in many files.  For instance, C++ comment
definitions are given in a file 'c_comment.lang', and this file is
included in the Java and C++ definition files.  The same happens for
number and functions.  For instance, the file 'java.lang' contains the
following include instructions:

     include "c_comment.lang"

     include "number.lang"

     keywords ...

     include "function.lang"

   Note that the order of inclusion is crucial since the order of
definition is crucial.  If function definition was included before
keyword definitions, then the sentence 'if (exp)' would be highlighted
as a function invocation (see *note Order of definitions:: and *note How
source-highlight works::).


File: source-highlight.info,  Node: State/Environment Definitions,  Next: Explicit subexpressions with names,  Prev: File inclusion,  Up: Language Definitions

7.9 State/Environment Definitions
=================================

Sometimes you want some source element to be highlighted only if they
are surrounded by other elements.  Source-highlight language definitions
provides also this feature.

     state|environment <standard definition> begin
       <other definitions>
     end

   This structure is recursive (so other state/environment definitions
can be given within a state/environment).  The meaning of a
state/environment is that the definitions within the 'begin ... end' are
matched only if the definitions that define the state/environment have
been matched.  When entering a state/environment, however, the
definitions given outside the state/environment are not matched.  The
difference between 'state' and 'environment' is that in the latter,
normal parts of the source language (i.e., those that do not match any
definition) are highlighted according to the style of the definition
that defines the environment.

   As an example, the following defines the multiline nested C comment,
and highlights URL and e-mail addresses only when they appear inside a
comment (note that this uses file inclusion):

     environment comment delim "/*" "*/" multiline nested begin
           include "url.lang"
     end

   Note that we used 'environment' because everything else inside a
comment has to be formatted according to the comment style.

   While for programming language definitions states/environments can be
avoided (although they allow to highlight some parts only if inside a
specific environment, e.g., URLs inside comments, or documentation tags
in Javadoc comments), they are pretty important for highlighting files
such as logs and ChangeLog files, since elements have to be highlighted
when they appear in a specific position.  For instance, for ChangeLog
(see 'changelog.lang'), we use a state for highlighting the date, name,
e-mail or URL (taken from 'url.lang'):

     state date start '[[:digit:]]{2,4}-?[[:digit:]]{2}-?[[:digit:]]{2}' begin
       include "url.lang"
       name = '([[:word:]]|[[:punct:]])+'
     end

   Note that definitions that appear inside a state/environment have the
same scope of the expressions that define the environment.  While this
makes sense for 'start' and 'delim' definitions, it may make less sense
for simple definitions (i.e., those that simply lists all possible
expressions): in fact, in this case, such expressions do not define a
scope.  For such definitions, the semantics of state/environment is that
the state/environment starts after matching one of the alternatives.
And where will it end?  In this case you must explicitly exit the
environment.  For instance, you can say that, when inside a
state/environment, a specific language definition, when encountered also
exits the environment, with the keyword 'exit' (you can also specify the
number of states to exit).  You can even exit all the environments with
'exitall'.  For instance, the following definition, highlights a non
empty string following a web method:

     vardef non_empty = '[^[:blank:]]+'

     state webmethod = "OPTIONS|GET|HEAD|POST|PUT|DELETE",
               "TRACE|CONNECT|PROPFIND|MKCOL|COPY|MOVE|LOCK|UNLOCK" begin
       string = $non_empty exit
     end

   If you ever need such advanced features, you may want to take a look
at the 'log.lang' definition file that defines highlighting for several
log files (access logs, Apache logs, etc.).  Moreover, there might be
cases, and the above one is one of such cases, explicit subexpressions
with names will be enough (see *note Explicit subexpressions with
names::).

   We conclude this section with an interesting example: comments in M4
files can start with the 'dnl' keyword (up to the end of line), e.g.,

     dnl @synopsis AC_CTAGS_FLAGS

   Now if we want to highlight the 'dnl' as a keyword, and the rest of
line as a comment, we cannot simply rely on an environment, since this
would highlight all the line with the same style.  Moreover, we want to
highlight elements starting with '@' differently, so we actually need a
state (this would allow us also to highlight urls inside a comment just
like in C++ comments in the example above).  Thus, we need to simulate
an environment with a state, and we do this for M4 as follows (see the
file 'm4.lang'):

     state keyword start "dnl" begin
       # avoid spaces in front of urls or @[[:alpha:]]+ be captured as prefixes
       comment = '[[:blank:]]+'
       include "url.lang"
       include "html_simple.lang"
       type = '@[[:alpha:]]+'
       # avoid non-word characters not include in urls etc in front of urls etc
       # be captured prefixes
       comment = '[^[:word:]]'
       # everything else is a comment
       comment = '[[:word:]]+'
     end

   Once entered the state, every isolated space character is highlighted
as a comment; then we have rules for URLs and @ elements; then
('[^[:word]]+') not include in URLs and @ elements is highlighted as
comment; then everything else ('[[:word:]]+') is highlighted as a
comment.

   One might think that a smarter way would be to have simply the
following definition (after all, why bothering highlighting spaces as
comments):

     state keyword start "dnl" begin
       include "url.lang"
       include "html.lang"
       type = '@[[:alpha:]]+'
       # avoid non-word characters not include in urls etc in front of urls etc
       # be captured prefixes
       comment = '[^[:word:]]'
       # everything else is a comment
       comment = '[[:word:]]+'
     end

   Well, with this definition spaces in front of matched URLs or @
elements would be highlighted as normal, being considered as prefixes.
This is due to how source-highlight searches for matching rules; we
refer to *note How source-highlight works:: for further details.


File: source-highlight.info,  Node: Explicit subexpressions with names,  Next: Redefinitions and Substitutions,  Prev: State/Environment Definitions,  Up: Language Definitions

7.10 Explicit subexpressions with names
=======================================

Often, you need to specify two program elements in the same regular
expressions, because they are tightly related, but you also need to
highlight them differently.

   For instance, you might want to highlight the name of a class (or
interface) in a class (or interface) definition (e.g., in Java).  Thus,
you can rely on the preceding 'class' keyword which will then be
followed by an identifier.

   A definition such as

     keyword = '(\<(?:class|interface))([[:blank:]]+)([$[:alnum:]]+)'

will not produce a good final result, since the name of the class will
be highlighted as a keyword, which is not what you might have wanted:
for instance, the class name should be highlighted as a 'type'.

   Up to version 2.6, the only way to do this was to use state or
environments (*note State/Environment Definitions::) but this tended to
be quite difficult to write.

   Since version 2.7, you can specify a regular expression with marked
subexpressions and bind each of them to a specific language element (the
regular expression must be enclosed in '`', see *note Ways of specifying
regular expressions::):

     (elem1,...,elemn) = `(subexp1)(...)(subexpn)`

   Now, with this syntax, we can accomplish our previous goal:

     (keyword,normal,type) =
       `(\<(?:class|interface))([[:blank:]]+)([$[:alnum:]]+)`

This way, the 'class' (or 'interface') will be highlighted as a keyword,
the separating blank characters are formatted as 'normal', and the name
of the class as a 'type'.

   Note that the number of element names must be equal to the number of
subexpressions in the expression; furthermore, at least in the current
version, the expression can contain only marked subexpressions (no
character outside is allowed) and no nested subexpressions are allowed.

   Thus, the following specifications are NOT correct:

     (keyword,symbol) = `(...)(...)(...)` # number of elements doesn't match
     (keyword,symbol) = `(...(...)...)(...)` # contains nested subexpressions
     (keyword,symbol) = `...(...)...(...)` # outside characters

   This mechanism permits expressing regular expressions for some
situation in a much more compact and probably more readable way.  For
instance, for highlighting ChangeLog parts (the optional '*' as a
symbol, the optional file name and the element specified in parenthesis
as a 'file' element, and the rest as 'normal') such as

       * src/Makefile.am (source_highlight_SOURCES): correctly include
       changelog_scanner.ll

       * this is a comment without a file name

before version 2.6, we used to use these two language definitions:

     state symbol start '^(?:[[:blank:]]+)\*[[:blank:]]+' begin
       state file start '[^:]+\:' begin
         normal start '.'
       end
     end

     state normal start '^(?:[[:blank:]]+)' begin
       state file start '[^:]+\:' begin
         normal start '.'
       end
     end

which can be hard to read after having written them.  Now, we can write
them more easily (see 'changelog.lang'):

     (normal,symbol,normal,file)=
       `(^[[:blank:]]+)(\*)([[:blank:]]+)((?:[^:]+\:)?)`
     (normal,file)= `(^[[:blank:]]+)((?:[^:]+\:)?)`

   Since a language element definition using explicit subexpressions
with names consists of more than one element, and thus of more than one
formatting style, it cannot be used to start an environment (what would
the default element be?); while, as seen above, they can be used to
start a state.


File: source-highlight.info,  Node: Redefinitions and Substitutions,  Next: How source-highlight works,  Prev: Explicit subexpressions with names,  Up: Language Definitions

7.11 Redefinitions and Substitutions
====================================

These two features are useful when you want to define a language by
re-using an existing language definition with some changes.  Typically
you 'include' another language definition file and you
redefine/substitute some elements.

   When you use 'redef' you erase all the previous definitions of that
language elements with the new one.  The new language element definition
will be placed exactly in the point of the new definition.  We use this
feature, for instance, when we define the 'sml' language by re-using the
'caml' one: they differ only for the keywords(1).  In fact, the contents
of 'sml.lang' is summarized as follows:

     include "caml.lang"

     redef keyword = "abstraction|abstype|and|andalso..."

     redef type = "int|byte|boolean|char|long|float|double|short|void"

   Since the new language element definition appears in the exact point
of the redefinition, this means that such a regular expression will be
matched only if all the previous ones (the ones of the included file)
cannot be matched.  This may lead to unwanted results in some cases (not
in the 'sml' case though).  In other words the following code

     keyword = "foo"
     keyword = "bar"
     type = "int"
     redef keyword = "myfoo"

is equivalent to the following one

     type = "int"
     keyword = "myfoo"

   If this is not what you want, you can use 'subst', which is similar
to 'redef' apart from that it replaces the previous first definition of
that language element in the exact point of that first definition (all
other possible definitions are simply erased).  That is to say that the
following code

     keyword = "foo"
     keyword = "bar"
     type = "int"
     subst keyword = "myfoo"

is equivalent to the following one

     keyword = "myfoo"
     type = "int"

   It is up to you to decide which one fits best your needs.  We could
use this feature to define 'javascript' in terms of 'java', e.g.:

     include "java.lang"

     subst keyword = "abstract|break|case|catch|class|if..."

Here using 'redef' would have led to the unwanted behavior that 'if
(exp)' would have been highlighted as a function call, since the
function element definition would have come first (and then matched
first) than the redefinition of 'if' as a keyword.  Another example is
the language definition for C# by reusing the one for C/C++, *note
Highlighting C/C++ and C#::.

   ---------- Footnotes ----------

   (1) At least, to the best of my knowledge :-)


File: source-highlight.info,  Node: How source-highlight works,  Next: Notes on regular expressions,  Prev: Redefinitions and Substitutions,  Up: Language Definitions

7.12 How source-highlight works
===============================

As hinted at the beginning of *note Language Definitions::,
source-highlight uses the definitions in the language definition file to
internally create, on-the-fly, regular expressions that are used to
highlight the tokens of an input file.  Here we provide some internal
details that are crucial to understand how to write language definition
files correctly(1).

   First of all, for each element definition an highlighting rule is
created by source-highlight (even if they correspond to the same
language element); thus, each language definition file will correspond
to a list of highlighting rules.  For each line of the input file,
source-highlight will try to match all these rules against the whole
line (more formally, against the part of the line that has not been
highlighted yet).  It will not stop as soon as an highlighting rule
matched, since there might be another rule that matches "better".  Now,
everything basically reduces to the semantics of that _better match_.

   The strategy used by source-highlight is to select the first matching
rule

   * with empty prefix (or prefix containing only space characters,
     i.e., spaces or tabs) or

   * with the smallest prefix,

   where the _prefix_ of a matched rule is the part of the examined
string that did not match(2).  Thus, for instance, if we try to match
the simple regular expression '=' against the string

     i = 10;

   then the prefix is 'i ', including the space.  Following the
terminology of regular expression, the remaining part that did not
match, i.e., ' 10;', is the _suffix_.  When source-highlight finds a
matching rule, according to the above strategy, it formats the matched
part (and the prefix as 'normal'), and then it starts again searching
for a matching rule on the suffix, until it processed the whole line.

   Let us explain this strategy a little bit further with an example.
Consider the following language definition file:

     # an example for explaining the strategy of source-highlight
     type = "int"
     keyword = "null"
     symbol = "="

and the following line to be highlighted:

     int i = null

   Then source-highlight performs these steps:

  1. The first matching rule is the one for 'type'; since it has an
     empty prefix, there's no need to look any further: it highlights
     'int' as 'type'; the remaining part to be processed is now ' i =
     null';

  2. the first matching rule is the one for 'keyword', with the prefix '
     i = '; since the prefix is not empty (nor it contain only spaces),
     we inspect other rules;

  3. the next matching rule is the one for 'symbol', with prefix ' i ',
     which is smaller than the one for 'keyword', and since there are no
     other matching rules, the one for 'symbol' is better, and we
     highlight '=' as symbol; the remaining part to be processed is now
     ' null';

  4. the first matching rule is the one for 'keyword', and, since it has
     a prefix with only spaces, we look no further, and we highlight
     'null' as 'keyword'.

   We conclude this section by showing the following language
definition, which summarizes what we said about the highlighting
strategy:

     keyword = "if|class"

     type = 'int'

     comment delim "/*" "*/"

     # thus this won't catch "/* */ /" as a regexp,
     # since comment elem definition comes first
     regexp = '/.*/.*/'

     # this won't match if ( ) as a function,
     # since keyword elem definition comes first
     function = '([[:alpha:]]|_)[[:word:]]*[[:blank:]]*\(*[[:blank:]]*\)'

     # the following order is conceptually wrong,
     # since "//" won't be highlighted as a comment, but as two symbols
     symbol = "/"
     comment start "//"

   ---------- Footnotes ----------

   (1) The strategy used by source-highlight for matching regular
expressions changed since version 2.11 (and in version 2.10 the strategy
used was not completely conceptually correct and it had a lot of
overhead).

   (2) according to the terminology of regular expressions.


File: source-highlight.info,  Node: Notes on regular expressions,  Next: The program check-regexp,  Prev: How source-highlight works,  Up: Language Definitions

7.13 Notes on regular expressions
=================================

Although we refer to Boost documentation for such syntax(1), we want to
provide here some explanations of some forms of regular expressions that
might be unknown but that are pretty useful in language definitions.

   Typically, when you need to group sub-expressions with parenthesis,
but you don't want the parenthesis to spit out another marked
sub-expression, you can use a _non-marking parenthesis_
'(?:expression)'.  This is not necessary in the language definition
syntax: even though you use standard parenthesis, source-highlight will
transform it into a non-marking parenthesis.

   Source-highlight translates possible _marked subexpressions_, i.e.,
those enclosed in '(' and ')', into non-marked subexpressions (i.e.,
those explained above).  Since version 2.7, if you specify the
expression inside '`' the marked subexpressions are left as such (see
also *note Ways of specifying regular expressions::).  This is useful
for _backreferences_ and _conditionals_.

   An escape character followed by a digit n, where n is in the range
1-9, is a _backreference_ matches the same string that was matched by
sub-expression n.  For example the expression '^(a*).*\1$' will match
the string: 'aaabbaaa' but not the string 'aaabba'.  Backreferences are
useful to write compact language elements, such as in the case of Perl's
substitution modifiers; thus

     regexp = `s([^[:alnum:][:blank:]]).*\1.*\1[ixsmogce]*`

   will match all these forms

     s/foo/bar/g
     s|foo|bar|g
     s#foo#bar#g
     s@foo@bar@g

   A useful regular expression form is the _Forward Lookahead Asserts_
that come in two forms, one for positive forward lookahead asserts, and
one for negative lookahead asserts:

'(?=abc)'
     matches zero characters only if they are followed by the expression
     "abc".
'(?!abc)'
     matches zero characters only if they are not followed by the
     expression "abc".

   For instance, in the definition of a function ('function.lang') we
use the following regular expression:

     ([[:alpha:]]|_)[[:word:]]*(?=[[:blank:]]*\()

Thus after the name of a function we test, with the regular expression
'(?=\()' whether an open parenthesis '(' can be matched.  If it can be
matched, however, we leave that part in the input, so that the
parenthesis will not be formatted the same way of a function name (see
also *note How source-highlight works:: to understand better this
language element definition).

   Please, be careful when using such regular expression forms: since
part of the input is not actually removed you may end up always scanning
the same input part (thus looping) if you do not write the regular
expressions well.  For instance, consider this language definition

     state foo = '(?=foo)' begin
       foo = '(?=foo)'
     end

and the following input file:

     hello
     foo
     bar

As soon as we match the word 'foo' we leave it in the input and we enter
a state where we try to match the word 'foo' still leaving it in the
input.  As you might have guess this will make source-highlight loop
forever.  Probably one might have wanted to write this language
definition:

     state foo = '(?=foo)' begin
       foo = 'foo'
     end

but a cut-and-paste error had its way ;-)

   You can also use _Lookbehind Asserts_:

'(?<=pattern)'
     consumes zero characters, only if pattern could be matched against
     the characters preceding the current position (pattern must be of
     fixed length).
'(?<!pattern)'
     consumes zero characters, only if pattern could not be matched
     against the characters preceding the current position (pattern must
     be of fixed length).

   Another advanced regular expression mechanism is the one of
_conditional expressions_

'(?(condition)yes-pattern|no-pattern)'
     attempts to match yes-pattern if the condition is true, otherwise
     attempts to match no-pattern.
'(?(condition)yes-pattern)'
     attempts to match yes-pattern if the condition is true, otherwise
     fails.

   Condition may be either a forward lookahead assert, or the index(2)
of a marked sub-expression (the condition becomes true if the
sub-expression has been matched).  For instance, the following
expression(3), that we wrote on more lines to try to make it more
readable

     (?:
       (\()
       |(\[)
       |(\{)
     )
     [[:alpha:]]*
     (?:
       (?(1)
         \)
         |(?:(?(2)
           \]
           |(?:\}
     )))))

   will match '(foo)', '[foo]' and '{foo}' but not '(foo]', '{foo]' or
'{foo)'.

   ---------- Footnotes ----------

   (1) <http://www.boost.org/libs/regex/doc/syntax.html>

   (2) the index only, without the escape character.

   (3) This expression was provided by John Maddock, the author of the
Boost regex library, as a solution of a problem I posted on the boost
list,

   <http://thread.gmane.org/gmane.comp.lib.boost.devel/158237/focus=158276>


File: source-highlight.info,  Node: The program check-regexp,  Next: Listing Language Elements,  Prev: Notes on regular expressions,  Up: Language Definitions

7.14 The program 'check-regexp'
===============================

Since version 2.7, the source-highlight package comes with a small
additional program, 'check-regexp', that permits testing regular
expressions on the command line.

   You simply pass as the first command line argument the regular
expression and then the strings you want to try to match (actually, the
program searches the string for the given regular expression, so it is
not required to match the whole string).  It is crucial, in order to
avoid shell substitutions, to enclose both the expression and the
strings in single quotes.

   The program then prints some information about the (possibly
successful matching).  The 'what[0]' part represents the whole match,
and 'what[i]' part represents the i-th marked subexpression that
matched.  The program also prints possible prefix and suffix.

   Here's an example of output of the program:

     check-regexp '(a+)(.*)\1' 'aabcdaa' 'babbbacc'

     searching      : aabcdaa
     for the regexp : (a+)(.*)\1
     what[0]: aabcdaa
       what[1]: aa
       length: 2
       what[2]: bcd
       length: 3
     total number of matches: 1

     searching      : babbbacc
     for the regexp : (a+)(.*)\1
     prefix: b
     what[0]: abbba
       what[1]: a
       length: 1
       what[2]: bbb
       length: 3
     suffix: cc
     total number of matches: 1

   And here's the example of matching parenthesis we saw in *note Notes
on regular expressions:::

     check-regexp \
        '(?:(\()|(\[)|(\{))[[:alnum:]]*(?:(?(1)\)|(?:(?(2)\]|(?:\})))))' \
        '{ciao}' '(foo]' '[hithere]'

     searching      : {ciao}
     for the regexp : (?:(\()|(\[)|(\{))[[:alnum:]]*(?:(?(1)\)|(?:(?(2)\]|(?:\})))))
     what[0]: {ciao}
       what[3]: {
       length: 1
     total number of matches: 1

     searching      : (foo]
     for the regexp : (?:(\()|(\[)|(\{))[[:alnum:]]*(?:(?(1)\)|(?:(?(2)\]|(?:\})))))
     total number of matches: 0

     searching      : [hithere]
     for the regexp : (?:(\()|(\[)|(\{))[[:alnum:]]*(?:(?(1)\)|(?:(?(2)\]|(?:\})))))
     what[0]: [hithere]
       what[2]: [
       length: 1
     total number of matches: 1


File: source-highlight.info,  Node: Listing Language Elements,  Next: Concluding Remarks,  Prev: The program check-regexp,  Up: Language Definitions

7.15 Listing Language Elements
==============================

In order for language definitions to be really useful they must be used
in proper combination with formatting styles (see *note Output format
style::).  However, these different files might not be developed by the
same person, or simply some one may want to customize one of these.  In
order to define good output formatting style files you should be aware
of each language element defined by a language definition file.  Instead
of having to look inside the language definition file itself (and
recursively in each included file) you can use the command line option
'--show-lang-elements'(1), that simply prints to the standard output all
the language elements that can be highlighted with a specific language
definition file.

   For instance, for 'cpp.lang' you get:

     cbracket
     classname
     comment
     function
     keyword
     label
     normal
     number
     preproc
     specialchar
     string
     symbol
     todo
     type
     url
     usertype

   while for 'log.lang' you get:

     cbracket
     comment
     date
     function
     ip
     normal
     number
     port
     string
     symbol
     time
     twonumbers
     webmethod

   ---------- Footnotes ----------

   (1) Since version 2.4.


File: source-highlight.info,  Node: Concluding Remarks,  Next: Debugging,  Prev: Listing Language Elements,  Up: Language Definitions

7.16 Concluding Remarks
=======================

By mixing all these features you can unleash your imagination and define
highlighting for complex source languages such as Flex and Bison by
writing few lines of code and re-use existing ones.  For instance, Flex
and Bison have their own syntax and lets you write C/C++ code in
specific parts of the source language, e.g., the code between the
outmost brackets, in the following example, is C++ code, and should be
highlighted following C++ language definitions (apart from variables
that are prefixed with '$'):

     globaltags : options { if (...) { setTags( $1 ); } }

   This is easy to do (taken from 'flex.lang'):

     state cbracket delim "{" "}" multiline nested begin
       variable = '\$.'
       include "cpp.lang"
     end

   Note that, since we used 'nested' we can be sure that the C++
language definitions are not considered anymore when we matched the last
closing '}'.


File: source-highlight.info,  Node: Debugging,  Next: Tutorials on Language Definitions,  Prev: Concluding Remarks,  Up: Language Definitions

7.17 Debugging
==============

When writing a language definition file, it is quite useful to be able
to debug it (by using complex regular expressions one may experience
unwanted behaviors).  Since version 2.1 the command line option
'--debug-lang' is available.  When using this option, some additional
information are printed to the standard output.

   Since version 2.5 this option also accepts the a sub specification
(see *note Invoking source-highlight::).  When using 'dump' (the
default) all the additional information explained below will be dumped
without interaction with the user.  When using 'interactive', for each
formatted string the program will stop waiting for a command from the
user.  In this very primordial version of interactive debug, the user
will only have to press 'ENTER' to make the program continue until the
next formatted string.  This way, the programmer will have the chance to
step the highlighting of each part of the input file.  Moreover, when
debugging is enabled, no buffering will be performed by the program,
thus each formatted element will be immediately available in the output.
For instance, you can use the command 'tail -f' to see the modifications
on the output file on-the-fly.

   When using this command line option the additional information
produced has the following format:

     <.lang filename>:<line number>
     expression: <matched subexpression>
     formatting: <source file string to be formatted>
     entering: <next state's id>
     exiting state, level: <number of states>

   The lines starting with 'entering', 'exiting' and 'exitingall' are
related to entering a new state/environment and exiting one and all
states/environments ('current state', if shown, comes after 'entering'
and prints the same state's regular expression but after the
substitution of dynamic backreferences, *note Dynamic Backreferences::).
The first line shows a link to the '.lang' definition file and the line
number, i.e., and the sub-expression that matched and the line starting
with 'formatting' shows the source file string that matched with that
expression.  If a line starting with 'formatting' is not preceded by a
line with the link to the sub-expression, it means that no particular
regular expression has matched, and thus the style 'normal' will be used
to format that string.

   Consider the following (simplified) Java source file:

     01: /*
     02:   This is to demonstrate -debug-lang
     03:   http://www.lorenzobettini.it
     04: */
     05:
     06: package hello;
     07:
     08: public class Hello {
     09:         // just some greetings ;-)  /*
     10:     int i = 10;
     11:     System.out.println("Hello World!");
     12: }
     13:

   Now you can debug the 'java.lang' file by using the '--debug-lang'
command line option.  And the output is as follows:

     c_comment.lang:24
     expression: "/\*"
     formatting "/*" as comment
     entering state: 23
     formatting "  This is to demonstrate --debug-lang" as default
     formatting "  " as default
     url.lang:3
     expression: "(?:(?:<?)[[:word:]]+://[[:word:]\./\-_]+(?:>?))"
     formatting "http://www.lorenzobettini.it" as url
     c_comment.lang:24
     expression: "\*/"
     formatting "*/" as comment
     exiting state, level: 1
     java.lang:1
     expression: "\<(?:import|package)\>"
     formatting "package" as preproc
     formatting " hello" as default
     symbols.lang:1
     expression: "(?:~|!|%|\^|\*|\(|\)|-|\+|=|\[|\]|\\|:|;|,|\.|/|\?|&|<|>|\|)"
     formatting ";" as symbol
     ... omissis ...
     c_comment.lang:13
     expression: "//"
     formatting "//" as comment
     entering state: 12
     formatting " just some greetings ;-)  /*" as default
     c_comment.lang:13
     expression: "\z"
     formatting "" as comment
     exiting state, level: 1
     ... omissis ...

   This should provide enough information to understand how the regular
expressions are used and how the states/environments are entered and
exited.  Please note that the sub-expressions that are shown may differ
from the original ones specified in the '.lang' file.  This is due to
the preprocessing that is performed by Source-highlight.  Moreover, some
sub-expressions are not defined at all in the '.lang' file: for
instance, this is the case for line wide definitions, i.e., those that
are defined with the keyword 'start', *note Line wide definitions::.
The last lines above, showing 'expression: "\z"', means that we matched
the end of a line.

   Another useful feature in debugging is the option '--show-regex' that
shows, on the standard output, the regular expression automaton that
source-highlight creates.

   For instance, consider this language definition
('comment-show.lang'):

     vardef TODO = '(TODO|FIXME)([:]?)'

     environment comment delim "/**" "*/" multiline begin
       type = '@[[:alpha:]]+'
       todo = $TODO
     end

     state cbracket delim "{" "}" escape "\\" multiline nested begin
       keyword = "if|then|else|endif"
     end

     string delim "<" ">"

     string2 delim "<<" ">>" multiline

If you now execute the following command:

     source-highlight --show-regex=comment-show.lang

you will get, on the standard output, the following output(1):

     STATE 1 default: normal
       rule (comment) "/\*\*" (exit level: 0, next: 2)
         STATE 2 default: comment
           rule (comment) "\*/" (exit level: 1, next: 0)
           rule (type) "(?:\@[[:alpha:]]+)" (exit level: 0, next: 0)
           rule (todo) "(?:(?:TODO|FIXME)(?:[:]?))" (exit level: 0, next: 0)
       rule (cbracket) "\{" (exit level: 0, next: 3)
         STATE 3 default: normal
           rule (cbracket) "\}" (exit level: 1, next: 0)
           rule (cbracket) "\\." (exit level: 0, next: 0)
           rule (cbracket) "\{" (exit level: 0, next: 0, nested)
           rule (keyword) "\<(?:if|then|else|endif)\>" (exit level: 0, next: 0)
       rule (string) "<(?:[^<>])*>" (exit level: 0, next: 0)
       rule (string2) "<<" (exit level: 0, next: 4)
         STATE 4 default: string2
           rule (string2) ">>" (exit level: 1, next: 0)


This shows the states and highlight rules of the regular expression
automaton that source-highlight creates and will use to format an input
source.

   Each state is associated a unique number in order to identify it;
moreover, the default element of the state is shown (i.e., if none of
the state's rule match, then that part is highlighted with the default
element style).  For instance, in the initial state the default style is
normal.  Then for each state it shows the rules for that state.  For
each rule you can see the corresponding element of the rule, the regular
expression for the rule and some other information, that we explain in
the following.

   We can see that if we match a '/**' (it is shown as a string with
escaped special characters, '/\*\*') we enter a new state, in this case
the state 2 ('next: 2').  This corresponds to the delimited element
defining a new environment (*note State/Environment Definitions::).  The
fact that it is actually an environment and not a state(2) can be seen
by the fact that the default element is the same of the environment
itself.  If we match a '*/', i.e., the end of the delimited element, we
exit one level ('exit level: 1') meaning that we go back to state 1.
Then we have the state for 'cbracket', which is not an environment, in
fact its default state is normal.  The second rule of this state, '\\.'
represents the 'escape' string of the state definition.  Since the
delimited element is defined as nested, we have a third rule '{' which
has the 'nested' information; thus, if we match it, we simply enter a
new instance of state 3 itself.

   The 'string' and 'string2' show the difference implied by the
'multiline' option: since source-highlight handles a line of input
separately, the first delimited definition can be handled with a single
regular expression while the multiline version cannot.

   Note that the states/environments are indented so that it's easier to
understand the outer and the inner states.

   Let us now consider a variation of the previous example:

     vardef TODO = '(TODO|FIXME)([:]?)'

     environment comment delim "/**" "*/" multiline nested begin
       type = '@[[:alpha:]]+'
       todo = $TODO
     end

     regexp = `([^[:alnum:]]).*(\1)`

     string delim "<" ">"

     string2 delim "<<" ">>" multiline

     (paren,normal,paren) = `(\[)(.*)(\])`

   and let us see the output of '--show-regex'

     STATE 1 default: normal
       rule (comment) "/\*\*" (exit level: 0, next: 2)
         STATE 2 default: comment
           rule (comment) "\*/" (exit level: 1, next: 0)
           rule (comment) "/\*\*" (exit level: 0, next: 0, nested)
           rule (type) "(?:\@[[:alpha:]]+)" (exit level: 0, next: 0)
           rule (todo) "(?:(?:TODO|FIXME)(?:[:]?))" (exit level: 0, next: 0)
       rule (regexp) "(?:([^[:alnum:]]).*(\1))" (exit level: 0, next: 0)
       rule (string) "<(?:[^<>])*>" (exit level: 0, next: 0)
       rule (string2) "<<" (exit level: 0, next: 3)
         STATE 3 default: string2
           rule (string2) ">>" (exit level: 1, next: 0)
       rule (paren normal paren) "(\[)(.*)(\])" (exit level: 0, next: 0)


   Since in the rule 'regexp' we used the '`' regular expression (see
*note Ways of specifying regular expressions::), then, the marked
subexpressions are not translated in order to make backreferences work
correctly.

   The last rule uses explicit subexpressions with names (see *note
Explicit subexpressions with names::); although that expression is made
up of different elements, the expression is matched as a whole.

   ---------- Footnotes ----------

   (1) Up to version 2.9 the output of '--show-regex' was a little bit
more complex to read; hopefully this output is better.

   (2) Please note that this concept of state is different from the
concept of "state" of an automaton.


File: source-highlight.info,  Node: Tutorials on Language Definitions,  Prev: Debugging,  Up: Language Definitions

7.18 Tutorials on Language Definitions
======================================

Now we provide some examples of language definitions.  In the previous
sections we have already provided some code snippets, while here we
provide complete examples of language definitions that are included in
the source-highlight distribution itself.

   In particular we will first show the language definition for the
language definition syntax itself (file 'langdef.lang').  This will be
used to highlight the examples of language definitions that we will show
in this section (the highlighting will not be visible if you are viewing
this manual with the 'info' command).  Of course, this example is
highlighted itself.

     # this is the language definition for the
     # language definition syntax itself
     comment start "#"

     preproc = "include"

     string delim "\"" "\"" escape "\\" multiline
     regexp delim "'" "'" escape "\\" multiline
     regexp delim "`" "`" escape "\\" multiline

     keyword = "state|environment|begin|end|delim|escape|start",
               "multiline|nested|vardef|exitall|exit",
               "redef|subst|nonsensitive"

     symbol = "=|+|,|(|)"

     vardef ID = '[[:word:]]+'

     variable = '\$' + $ID

     variable = $ID


   The style that is used to highlight these examples in Texinfo is
'texinfo.style' that is shown in *note Output format style::.  The
language definition for the style syntax (file 'style.lang') is even
simpler:

     # this is the language definition for the
     # style definition syntax
     comment start "//"

     string delim "\"" "\"" escape "\\"

     keyword = "bgcolor|purple|orange|brightorange|brightgreen|darkgreen",
               "green|darkred|red|brown|pink|yellow|cyan",
               "black|teal|gray|darkblue|blue",
               "normal|linenum",
               "noref|nf|f|u|i|b"
     keyword = 'bg\:'

     symbol = ",|;"

     variable = '[[:word:]]+'


   Note that this definition is pretty simple since the language
definition syntax is simple.  In the next examples we will see how to
use more complex features to highlight more complex language syntaxes.

* Menu:

* Highlighting C/C++ and C#::
* Highlighting Diff files::
* Pseudo semantic analysis::


File: source-highlight.info,  Node: Highlighting C/C++ and C#,  Next: Highlighting Diff files,  Prev: Tutorials on Language Definitions,  Up: Tutorials on Language Definitions

7.18.1 Highlighting C/C++ and C#
--------------------------------

This is the language definition for C, included in the file 'c.lang':

     # definitions for C

     include "c_comment.lang"

     label = '^[[:blank:]]*[[:alnum:]]+:[[:blank:]]*\z'

     (keyword,normal,classname) =
       `(\<(?:enum|struct|union))([[:blank:]]+)([[:alnum:]_]+)`

     include "c_preprocessor.lang"

     include "number.lang"

     include "c_string.lang"

     keyword = "__asm|__cdecl|__declspec|__export|__far16",
       "__fastcall|__fortran|__import",
       "__pascal|__rtti|__stdcall|_asm|_cdecl",
       "__except|_export|_far16|_fastcall",
       "__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto",
       "_Alignas|_Alignof|_Atomic|_Generic|_Noreturn|_Static_assert|_Thread_local",
       "break|case|catch|cdecl|const|continue|default",
       "do|else|enum|extern|for|goto",
       "if|pascal",
       "register|restrict|return|sizeof|static",
       "struct|switch",
       "typedef|union",
       "volatile|while"

     type = "bool|char|double|float|int|long",
       "short|signed|unsigned|void|wchar_t",
       "_Bool|_Complex|_Imaginary"

     include "symbols.lang"

     cbracket = "{|}"

     include "function.lang"

     include "clike_vardeclaration.lang"


Note that this makes use of lots of 'include's since these parts are
reused in other language definitions (e.g., Java has lots of parts that
are in common with C/C++ so we wrote these parts in separate files).  In
particular the comments definitions:

     # c_comment.lang

     # comments with documentation tags
     environment comment start "///" begin
       include "url.lang"
       include "html_simple.lang"
       type = '@[[:alpha:]]+'
       include "todo.lang"
     end

     comment start "//"

     # comments with documentation tags
     environment comment delim "/**" "*/" multiline begin
       include "url.lang"
       include "html_simple.lang"
       type = '@[[:alpha:]]+'
       include "todo.lang"
     end

     # standard comments
     environment comment delim "/*" "*/" multiline begin
       include "url.lang"
       include "todo.lang"
     end

Here we have the definitions for line-wide comments ('//') and for multi
line comments where we highlight also URL addresses and e-mail addresses
(defined in the file 'url.lang' not shown here).  Moreover, for comments
that are used in automatic documentation generation tools (such as
Doxygen or Javadoc), i.e., those that start with '/**' or '///') we also
highlight the complete HTML syntax (defined in the file 'html.lang' not
shown here).

   Going back to 'c.lang' we see that we use subexpressions with names
(see *note Explicit subexpressions with names::) for highlighting the
struct name (when preceded by 'struct', highlighted as a keyword).

   For preprocessor directives '#include' we use a state definition
since in this case the file included with the '<file>' syntax must be
formatted as strings (and only in this context the '<>' must be
considered as strings, anywhere else they are operators).  Since a state
erases definitions defined outside the state we must include
'c_comment.lang' again in order to highlight comments also in this
context(1).  Then we have a definition of 'preproc' that catches all the
other preprocessor directives.

   The included file 'number.lang' defines the regular expression that
catches number constants (not shown here), then we include the file
'c_string.lang' that define strings (again shared by Java):

     vardef SPECIALCHAR = '\\.'

     environment string delim "\"" "\"" begin
       specialchar = $SPECIALCHAR
     end

     environment string delim "'" "'" begin
       specialchar = $SPECIALCHAR
     end


inside a string we want to highlight in a different way the special
characters (such as, e.g., '\n', '\t', etc.)  and in general escaped
characters, matched by the regular expression ''\\.''.

   The included file 'symbols.lang' defines all the symbols (shared also
by other languages):

     symbol = "~","!","%","^","*","(",")","-","+","=","[",
             "]","\\",":",";",",",".","/","?","&","<",">","\|"

This has nothing interesting but the fact that it shows that the
character '\' and '|' have to be escaped.

   The included file 'function.lang' defines the regular expression to
match a function definition or invocation:

     vardef FUNCTION = '([[:alpha:]]|_)[[:word:]]*(?=[[:blank:]]*\()'
     function = $FUNCTION

that shows an example of forward lookahead assert for the opening
parenthesis (see *note Notes on regular expressions::).  As noted in
*note File inclusion::, it is crucial that this file is included after
the keyword definition.

   Finally, 'c.lang' includes the file 'clike_vardeclaration.lang':

     (usertype,usertype,normal) =
     `([[:alpha:]_](?:[^[:punct:][:space:]]|[_])*)
     ((?:<.*>)?)
     (\s+(?=[*&]*[[:alpha:]_][^[:punct:][:space:]]*\s*[[:punct:]\[\]]+))`


   This definition, using subexpressions with names (see *note Explicit
subexpressions with names::), tries(2) to match user types (e.g., struct
names) in function parameter and variable declarations.  It basically
tries to match a type identifier, then a possible template
specification(3) and then we have a complete lookahead assert (*note
Notes on regular expressions::) that tries to match the variable
identifier, possibly with '&' and '*' reference and pointer
specification, followed by an assignment '=' or a ';', more generally a
'[:punct:]' or '[]' (for array specifications).  This should catch the
user types in the correct contexts, as in the following (where we
intentionally highlighted 'usertype' in italics):

     Integer i = 10;
     Boolean b;
     String args[];
     const MyType args[];
     const My_Type args[];
     List<Integer> mylist;
     List<List<Integer> > mylist;
     myspace::InputStream iStream ;
     MyType *t;
     MyType **t;
     const MyType &t;
     if (argc > 0) { }
     __mytype _i;
     typedef _mytype __i;


   Note that since for the third group we use a lookahead assert, what
is matched is not actually formatted but it is put back in the input
stream so that it can be formatted using other rules (e.g., 'symbol' for
'*' and '=').

   Since, at least syntactically, C++ is an extension of C, the language
definition for C++, included in the file 'cpp.lang', relies on
'c.lang'(4):

     # definitions for C++

     include "c_comment.lang"

     (keyword,normal,classname) =
       `(\<(?:enum|class|struct|typename|union))([[:blank:]]+)([[:alnum:]_]+)`

     include "c_preprocessor.lang"

     include "number.lang"

     include "c_string.lang"

     keyword = "__asm|__cdecl|__declspec|__export|__far16",
       "__fastcall|__fortran|__import",
       "__pascal|__rtti|__stdcall|_asm|_cdecl",
       "__except|_export|_far16|_fastcall",
       "__finally|_fortran|_import|_pascal|_stdcall|__thread|__try|asm|auto",
       "break|case|cdecl|const|continue|default",
       "do|else|enum|extern|for|goto",
       "if|pascal",
       "register|restrict|return|sizeof|static",
       "struct|switch",
       "typedef|union",
       "volatile|while",
       "catch|class|const_cast|constexpr|decltype|delete",
       "dynamic_cast|explicit|export|false|final|friend",
       "inline|mutable|namespace|new|noexcept|operator|override",
       "private|protected|public|reinterpret_cast|static_cast",
       "static_assert|template|this|throw|true",
       "try|typeid|typename",
       "using|virtual"

     label = '^[[:blank:]]*[[:alnum:]]+:[[:blank:]]*\z'

     type = "bool|char|double|float|int|long",
       "short|signed|unsigned|void|wchar_t",
       "char16_t|char32_t"

     include "symbols.lang"

     cbracket = "{|}"

     include "function.lang"

     include "clike_vardeclaration.lang"


   In particular, it extends the set of keywords.  Moreover, note that
we use subexpressions with names (see *note Explicit subexpressions with
names::) for highlighting the class (or struct) name (when preceded by
'class', 'struct' or 'typename', highlighted as a keyword).  A similar
rule was also present in 'c.lang', but it concerned only 'struct'.

   Now that we wrote the language definition for C/C++, writing the one
for C# is straightforward, since we only need to add the keyword 'using'
as a preprocessor element, and redefine (or better, "substitute", *note
Redefinitions and Substitutions::) the keywords and types:

     # definitions for C-sharp
     # by S. HEMMI, updated by L. Bettini.
     preproc = "using"

     number =
     '\<[+-]?((0x[[:xdigit:]]+)|(([[:digit:]]*\.)?
     [[:digit:]]+([eE][+-]?[[:digit:]]+)?))([FfDdMmUulL]+)?\>'

     include "cpp.lang"

     subst keyword = "abstract|event|new|struct",
      "as|explicit|null|switch",
      "base|extern|this",
      "false|operator|throw",
      "break|finally|out|true",
      "fixed|override|try",
      "case|params|typeof",
      "catch|for|private",
      "foreach|protected",
      "checked|goto|public|unchecked",
      "class|if|readonly|unsafe",
      "const|implicit|ref",
      "continue|in|return",
      "virtual",
      "default|interface|sealed|volatile",
      "delegate|internal",
      "do|is|sizeof|while",
      "lock|stackalloc",
      "else|static",
      "enum|namespace",
      "get|partial|set",
      "value|where|yield"

     subst type = "bool|byte|sbyte|char|decimal|double",
      "float|int|uint|long|ulong|object",
      "short|ushort|string|void"


   ---------- Footnotes ----------

   (1) As a future extension we might think of providing a way, in the
language definition syntax, to define a state/environment that extends
the outer contexts instead of overriding them.

   (2) This was not tested extensively and might not catch all the
correct situations.

   (3) OK, there are no templates in C, and they are only in C++, but we
think it should no harm when highlighting C files.

   (4) Before version 2.9, there was only 'cpp.lang' which was used both
for C and C++; however, this way, if you had a C program where you were
using a C++ keyword as a variable name--which of course is correct in
C--that variable was actually highlighted as a keyword and this was not
correct.


File: source-highlight.info,  Node: Highlighting Diff files,  Next: Pseudo semantic analysis,  Prev: Highlighting C/C++ and C#,  Up: Tutorials on Language Definitions

7.18.2 Highlighting Diff files
------------------------------

Now we want to highlight files that are generated by 'diff' (typically
used to create patches).  This program can generate outputs in three
different formats (at least at best of my knowledge).

   With the option '-u|--unified' the differences among files are shown
in the same context, for instance (the examples of the diff files shown
here are manually modified so that they can fit in the page width):

     diff -ruP source-highlight-2.1.1/source-highlight.spec ...
     -- source-highlight-2.1.1/source-highlight.spec ...
     +++ source-highlight-2.1.2/source-highlight.spec ...
     @@ -6,8 +6,8 @@

      Summary:   syntax highlighting for source documents
      Name:      source-highlight
     -Version:   2.1.1
     -Release:   2.1.1
     +Version:   2.1.2
     +Release:   2.1.2
      License:   GPL
      Group:     Utilities/Console
      Source:    ftp://ftp.gnu.org/gnu/source-highlight/%{name}-%{version}.tar.gz


   With the option '-c--context' the differences are shown into two
different parts:

     diff -rc2P source-highlight-2.1.1/source-highlight.spec ...
     *** source-highlight-2.1.1/source-highlight.spec ...
     -- source-highlight-2.1.2/source-highlight.spec ...
     ***************
     *** 7,12 ****
       Summary:   syntax highlighting for source documents
       Name:      source-highlight
     ! Version:   2.1.1
     ! Release:   2.1.1
       License:   GPL
       Group:     Utilities/Console
     -- 7,12 ---
       Summary:   syntax highlighting for source documents
       Name:      source-highlight
     ! Version:   2.1.2
     ! Release:   2.1.2
       License:   GPL
       Group:     Utilities/Console
     diff -rc2P source-highlight-2.1.1/src/latex.outlang ...
     *** source-highlight-2.1.1/src/latex.outlang ...
     -- source-highlight-2.1.2/src/latex.outlang ...
     ***************
     *** 35,37 ****
     -- 35,38 ---
       "--" "-\\/-"
       "---" "-\\/-\\/-"
     + "\"" "\"{}" # avoids problems with some inputenc
       end


   Without options it generates only the essential difference
information without any addition context lines:

     diff -rP source-highlight-2.1.1/source-highlight.spec ...
     9,10c9,10
     < Version:   2.1.1
     < Release:   2.1.1
     ---
     > Version:   2.1.2
     > Release:   2.1.2

   Summarizing, we would like to be able to handle all these three
different syntaxes; note that the first format and the second format
have something conflicting: the first one uses the '---' to indicate the
new version of a file while the second format uses it to indicate the
old version of a file.  Since we want to highlight differently the old
parts and the new parts (this is not visible in the Texinfo highlighting
due to the lack of enhanced formatting features, but it is visible for
instance in HTML output where we use two different colors), this
behavior adds some difficulties.  Of course, we could define three
different language definitions, one for each diff output format.
However, we prefer to handle them all in the same file!

   This is the language definition for diff files:

     # language definition for files created with 'diff'

     # diff created with -u option
     state oldfile = '(?=^[-]{3})' begin
       oldfile start '^[-]{3}'
       oldfile start '^[-]'
       newfile start '^[+]'
       difflines start '^@@'
     end

     # diff created with -c option
     state oldfile = '(?=^[*]{3})' begin
       environment oldfile = '^[*]{3}[[:blank:]]+[[:digit:]]' begin
         normal start '^[[:space:]]'
         newfile = '(?=^[-]{3})' exit
       end
       oldfile start '^[*]{3}'

       environment newfile = '^[-]{3}[[:blank:]]+[[:digit:]]' begin
         normal start '^[[:space:]]'
         newfile = '(?=^[*]{3})' exit
         normal start '^diff' exit
       end
       newfile start '^[-]{3}'
     end

     # otherwise, created without options
     state difflines = '(?=^[[:digit:]])' begin
       difflines start '^[[:digit:]]'
       oldfile start '^[<]'
       newfile start '^[>]'
     end


   Since we can safely assume that when we process a diff file it
contains only information created with the same diff command line
switch, we define three different states that correspond to the three
diff output formats.  Note that these states are entered with a simple
definition; as noted in *note State/Environment Definitions::, this
means that no automatic exit means are provided, and since no explicit
exit condition is specified, this means that once one of this state is
entered it will never be exited.  This is consistent with our goal.  Of
course, the expression that makes us enter a state must be defined
correctly, and in particular we first search for an initial '---'
sequence since this is used as the first difference specification by the
'-u|--unified' option, so this is a distinguishing feature to be used to
infer which diff format file we are processing.

   Another interesting thing, is that we use the forward lookahead
assert for the opening parenthesis (see *note Notes on regular
expressions::), since we only want to see which file format we are
processing.  Once we entered the right state we can define the regular
expressions for the elements of the specific diff file format.

   For the files created with the option '-c|--context' we define two
inner environments, one for the new file part and one for the old file
part (these are delimited by a '---' or '***' and line number
information).  Note that these are environments, so anything that is not
matched by any expression is formatted according to the style of the
element that defines the environment.  Thus, we provide an expression
for text that must be formatted as normal.  For diff files this
corresponds to a line that start with a space or with 'diff' (take a
look at the examples above).  In particular the latter case can take
place only during the new file part.  In both environments we must
define the exit conditions.  In both cases these correspond to the
beginning of the complementary part; also in this case we use forward
lookahead assertions, since we use it only to exit the environment.  The
outer definitions for 'oldfile' and 'newfile' are used to match the
lines with source file information information.

   The third state, corresponding to the normal diff output format,
should be straightforward by now.


File: source-highlight.info,  Node: Pseudo semantic analysis,  Prev: Highlighting Diff files,  Up: Tutorials on Language Definitions

7.18.3 Pseudo semantic analysis
-------------------------------

Source-highlight, by means of regular expressions can only perform
lexical analysis of the input source.  In particular, it is based on the
assumption that the input source is syntactically correct with respect
to the input language.  However, by using the language definition syntax
and by writing the right regular expression it is possible to simulate
some sort of semantic analysis of the input source.

   For instance, consider the following C (or C++) source file:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
       return 1;
     #else
       printf("Hello world!\n");
       return 0;
     #endif

       printf("never reach here!\n");
     }


It is easy to verify that the code between '#if 0' and '#else' will be
never executed (indeed it will not even be compiled).  Thus, we might
want to format it as a comment.

   We then write another language definition file, based on the file
'cpp.lang':

     environment comment start '^[[:blank:]]*#if[[:blank:]]+0' begin
       comment start '^[[:blank:]]*#(else|endif)' exit
     end

     include "cpp.lang"


We intentionally included an error in this first version: we used the
'start' element to start the environment, but such element has the scope
of a single line, thus, it does not have the desired behavior:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
       return 1;
     #else
       printf("Hello world!\n");
       return 0;
     #endif

       printf("never reach here!\n");
     }


   A better solution is the following one:

     environment comment = '^[[:blank:]]*#[[:blank:]]*if[[:blank:]]+0' begin
       comment start '^[[:blank:]]*#[[:blank:]]*(else|endif)' exit
     end

     include "cpp.lang"


here we enter the 'comment' environment by not using a delimited
element, but simply the regular expression to match '#ifdef 0'.  Then we
exit the environment either when we match an '#else' or a '#endif'.
This seems to work:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
       return 1;
     #else
       printf("Hello world!\n");
       return 0;
     #endif

       printf("never reach here!\n");
     }


   However, it does not work if we consider nested '#if...#else'; for
instance consider the following code, formatted with the previous
language definition:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
     #  ifdef FOO
       printf("foo\n");
     #     ifndef BAR
       printf("no bar\n");
     #     else
     #     endif
     #  else
       printf("no foo\n");
     #  endif // FOO
       return 1;
     #else
       printf("Hello world!\n");
       return 0;
     #endif

       printf("never reach here!\n");
     }


The problem is that the previous language definition does not consider
nested '#if' and thus, the first time it matches a '#else' or an
'#endif' it exits the 'comment' environment.

   We must then take into account possible nested occurrences.  This can
be done by using a delimited element with the 'nested' option (*note
Delimited definitions::):

     # treat the preprocess statement
     #  #if 0
     #    ...
     #  #else
     # as a comment

     environment comment = '^[[:blank:]]*#[[:blank:]]*if[[:blank:]]+0' begin
       comment start '^[[:blank:]]*#[[:blank:]]*else' exit
       comment delim '^[[:blank:]]*#[[:blank:]]*if'
                     '^[[:blank:]]*#[[:blank:]]*endif' multiline nested

     end

     include "cpp.lang"



This time the right block of code is correctly formatted as a comment:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
     #  ifdef FOO
       printf("foo\n");
     #     ifndef BAR
       printf("no bar\n");
     #     else
     #     endif
     #  else
       printf("no foo\n");
     #  endif // FOO
       return 1;
     #else
       printf("Hello world!\n");
       return 0;
     #endif

       printf("never reach here!\n");
     }


   Note that it is crucial to exit the environment even when we match an
'#else' (not only an '#endif', since, this way, we can match again
another '#ifdef 0'; consider, for instance, the following code:

     // test special #if 0 treatment

     int main() {
     #if 0 // equivalent to a comment
       int i = 10;
       printf("this should never be executed\n");
       return 1;
     #else
       printf("Hello world!\n");
     #   if 0 // another one
       return 1;
     #   else
       return 0;
     #   endif
     #endif

       printf("never reach here!\n");
     }



File: source-highlight.info,  Node: Output Language Definitions,  Next: Generating References,  Prev: Language Definitions,  Up: Top

8 Output Language Definitions
*****************************

Since version 2.1 source-highlight uses a specific syntax to specify
output formats (e.g., how to format in HTML, LaTeX, etc.).  Before
version 2.1, in order to add a new output format, many C++ classes had
to be written.  This had the drawback that a new output format could not
be added "dynamically": you had to recompile the whole source-highlight
program.

   Instead, now, an output format is specified in a file, loaded
dynamically, through a (hopefully) simple syntax.  Then, these
definitions are used internally to create, on-the-fly, text formatters.

   Here, we see such syntax in details, by relying on many examples.
This allows a user to easily modify an existing output format definition
and create a new one.  These files have, typically, extension
'.outlang'.

   Each definition basically associates a text style (such as, e.g.,
bold, italics, colors, etc.)  to the representation of that style into
the output format (such as, e.g., '<b>$text</b>' in HTML). The
representation is given in '"' and you can use the classic escape
character '\' to use the '"' inside the definition.  If you want to
specify the ASCII code for a character you can do so by specifying the
numeric code in hexadecimal notation preceded by '\x', for an example,
see *note Style template::.

   If no definition is given for a specific style, e.g., bold, then when
that style is requested during formatting, the text will be formatted as
it is, i.e., the style without the definition is simply ignored.

   Comments can be given by using '#'; the rest of the line is
considered as a comment.

   Files can be included in the same way as for language definitions,
*note File inclusion::.

   In any case, if a definition for a style is given more than once, the
last definition replaces all the others.

* Menu:

* File extension::              Specify the output file extension
* Text styles::                 Bold, Italics, Underline, etc.
* Colors::                      Style and definitions for colors
* Anchors and References::
* One style::
* Style template::
* Line prefix::
* String translation::
* Document template::
* Generating HTML output::


File: source-highlight.info,  Node: File extension,  Next: Text styles,  Prev: Output Language Definitions,  Up: Output Language Definitions

8.1 File extension
==================

With the line:

     extension "<file extension>"

you define the default file extension (without the '.') used to generate
files formatted according to this output format.  This is used when no
output file name is specified; if the file extension is not included in
the '.outlang' is not defined, and no output file name is specified, an
error will occur.

   For instance, this is used in 'html_common.outlang':

     extension "html"


File: source-highlight.info,  Node: Text styles,  Next: Colors,  Prev: File extension,  Up: Output Language Definitions

8.2 Text styles
===============

These are the text styles that one can define:

     bold
     italics
     underline
     notfixed
     fixed

These, of course, correspond to the ones used to specify the output
format style, *note Output format style::.

   These definitions, for instance, are from the HTML format definition:

     bold "<b>$text</b>"
     italics "<i>$text</i>"
     underline "<u>$text</u>"

Inside a definition you use the special variable '$text' to specify
where the actual text to be formatted has to be inserted.  For instance,
the definition of 'bold' above says that if you need to format the
keyword 'class' in bold in HTML, the following text will be generated:
'<b>class</b>'.  This variable is used also when mixing more than one
styles recursively, in particular if you want to format in bold and
italics (i.e, first bold and then italics, or, in other words, the
sequence 'i, b' is used in the the output format style file, see *note
Output format style::), then first the text 'class' is substituted for
'$text' into '<b>$text</b>' and then the text '<b>class</b>' will be
substituted for '$text' into '<i>$text</i>', thus obtaining
'<i><b>class</b></i>'.


File: source-highlight.info,  Node: Colors,  Next: Anchors and References,  Prev: Text styles,  Up: Output Language Definitions

8.3 Colors
==========

The definition for using colors during formatting requires the
definition for the 'color' style

     color "..."

and for the 'bgcolor' style(1):

     bgcolor "..."

   This definition concerns only the background color for a specific
highlighted element, i.e., the color specified in the style file with
the prefix 'bg:' (see *note Output format style::) or the property
'background-color' specified in a CSS file passed to '--style-css-file'
(see *note Output format style using CSS::).  Thus it should not be
confused with the background color of the entire output (i.e., the one
specified using 'bgcolor' in a style file or the property
'background-color' of the 'body' selector in a CSS). The background
color for the entire document is explained in *note Document template::.

   Note that the background color might not be available for all output
formats.  For instance, for HTML we only have:

     color "<font color=\"$style\">$text</font>"

while for XHTML we have:

     color "<span style=\"color: $style\">$text</span>"
     bgcolor "<span style=\"background-color: $style\">$text</span>"

   Apart from the variable '$text' that we already saw, we have also the
variable '$style', that will be replaced with the actual color.

   Source-highlight recognizes a number of color constants, see *note
Output format style::.

   You then must associate a color constant to the color definition in
the output format, through the 'colormap' definition:

     colormap
     "color constant" "color representation"
     "color constant" "color representation"
     ...
     default "default color representation"
     end

   The 'default' row (note the absence of '"') defines the color to be
used in case a color constant is used during formatting, but it is not
defined in the output format.

   For instance, for HTML we have:

     colormap
     "green" "#33CC00"
     "red" "#FF0000"
     "darkred" "#990000"
     "blue" "#0000FF"
     "brown" "#9A1900"
     "pink" "#CC33CC"
     "yellow" "#FFCC00"
     "cyan" "#66FFFF"
     "purple" "#993399"
     "orange" "#FF6600"
     "brightorange" "#FF9900"
     "brightgreen" "#33FF33"
     "darkgreen" "#009900"
     "black" "#000000"
     "teal" "#008080"
     "gray" "#808080"
     "darkblue" "#000080"
     default "#000000"
     end

   If your output format does not handle colors you can simply avoid the
definitions of 'color' and 'colormap' and Source-highlight will simply
ignore colors.

   The color is applied after applying the other styles, e.g., bold,
italics, etc.

   Thus, by continuing the example of the previous section, suppose you
defined the following output style for keywords:

     keyword blue i, b;

then the 'class' text will be replaced to '$text' variable and the value
'#0000FF' to '$style' inside the color definition '<font
color="$style">$text</font>' obtaining '<font
color="#0000FF">class</font>' which will then be replaced to '$text' in
'<b>$text</b>' and so on for italics, finally obtaining

   '<i><b><font color="#0000FF">class</font></b></i>'.

   ---------- Footnotes ----------

   (1) Since version 2.6.


File: source-highlight.info,  Node: Anchors and References,  Next: One style,  Prev: Colors,  Up: Output Language Definitions

8.4 Anchors and References
==========================

When using the command line option '--line-number-ref' (*note Invoking
source-highlight::) an anchor is generated in the output file for each
line numbering.  The style of the anchor is defined by the definition
'anchor'.  If this is not defined, the option '--line-number-ref' has no
effect.  The '$linenum' variable will be replaced with the line number,
and the '$text' variable with the actual text.

   For instance, for HTML we have

     anchor "<a name=\"$linenum\">$text</a>"

   Since version 2.2 source-highlight can also generate references to
several elements (e.g., variables, class definitions, etc.), *note
Generating References::.  Also in this case the definition 'anchor' is
used; furthermore, the definition of 'reference' is required.  In the
definition of 'anchor' and 'reference', apart from the variable
'$linenum', we also have the variables '$infile' (the name of the
original input file) and '$infilename' (the name of the original input
file without the path) and in the definition of 'reference' we also have
the variable '$outfile' (the name of the file where the anchor is).  One
can decide how to define an anchor and a reference by using these two
variables.  For instance, for HTML we have

     reference "<a href=\"$outfile#$linenum\">$text</a>"

Note, that in this case we use the '$outfile' since we actually generate
a link to another (or possibly the same) output file.

   On the contrary, for LaTeX, since we do not generate a "clickable"
reference, we refer to the original input file (we use both
'$infilename' and '$linenum' in both definitions of 'anchor' and
'reference'):

     anchor "\label{$infilename:$linenum}$text"
     reference "{\hfill $text $\rightarrow$ $infile:$linenum, \
                page~\pageref{$infilename:$linenum}}"

In particular, we use '$infilename' for generating the '\label' and not
'$infile' because the path symbol would "disturb" LaTeX (while we use
the complete file path in the textual information of the reference).

   This will generate a right aligned reference.  Note that it is
assumed that when generating references in LaTeX one uses
'--gen-references=postline' or '--gen-references=postdoc' and not
'--gen-references=inline' (*note Generating References::), since it
makes no sense to generate an inline reference (or at least I would not
know how to generate a nice looking one :-).

   Furthermore, for Texinfo:

     anchor "@anchor{$infilename:$linenum}$text"
     reference "@flushright
     @xref{$infilename:$linenum,$text,$text $infile:$linenum}.
     @end flushright"

Note that using both '$infilename' (and not '$infile' for the same
reasons) and '$linenum' also in the definition of 'anchor' somehow
ensures that there are no duplicate anchors; this is done for LaTeX and
Texinfo but not for HTML because it is assumed that the generated '.tex'
and '.texinfo' file is included directly in a master file, as it is done
in this manual (while, for instance, it is assumed that a separate HTML
file is generated for each source and kept separate).  If this is not
your case you can change the definitions of 'anchor' and 'reference' as
you see fit.  Some examples of outputs with references in Texinfo are
shown in *note Examples::.

   Indeed, one can use three more definitions for 'reference' that
corresponds to the three arguments that can be passed to
'--gen-references' command line option (*note Generating References::):
'inline_reference', 'postline_reference' and 'postdoc_reference'.  If
one of this not defined, then the same definition of 'reference' is
used.  Having the possibility of specifying different definitions is
useful for instance in the case of HTML: the same style for an inline
reference is pretty ugly when used also for a postline or postdoc
reference:

     postline_reference "<a href=\"$outfile#$linenum\">$text -> $infile:$linenum</a>"
     postdoc_reference "<a href=\"$outfile#$linenum\">$text -> $infile:$linenum</a>"
     reference "<a href=\"$outfile#$linenum\">$text</a>"


File: source-highlight.info,  Node: One style,  Next: Style template,  Prev: Anchors and References,  Up: Output Language Definitions

8.5 One style
=============

If the output format you are defining does not have a specific style for
bold, italics, ...  and for colors you can simply use the definition
'onestyle', where you can use both '$style' and '$text'.  This will be
used for any style (indeed any other definition such as bold, italics,
color will be ignored).  Indeed, in this case, it is assumed that the
style of each source element is defined in a file with its own syntax,
i.e., not with a syntax defined by Source-highlight.  (This is the case,
for instance, of HTML using CSS style sheets.)  Moreover, since the
output format style is not used, during formatting the variable '$style'
will be replaced with the name of the element to highlight (e.g.,
'keyword', 'comment', etc.).

   For instance, for HTML CSS, we simply have:

     onestyle "<span class=\"$style\">$text</span>"

In fact, HTML CSS relies on style definitions provided in a separate
file (the '.css' file indeed).  Thus, when formatting a 'keyword', e.g.,
'abstract', we will obtain:

     <span class="keyword">abstract</span>

Of course, the style for 'keyword' must be defined in the '.css' file.


File: source-highlight.info,  Node: Style template,  Next: Line prefix,  Prev: One style,  Up: Output Language Definitions

8.6 Style template
==================

Some output formats are based on a unique template that where the other
styles are composed; during composition the styles can be separated with
a specific separator:

     styletemplate "..."
     styleseparator "..."

   This is used, for instance, for the ANSI color escape sequence output
format ('esc.outlang'):

     styletemplate "\x1b[$stylem$text\x1b[m"
     styleseparator ";"

     bold "01$style"
     underline "04$style"
     italics "$style"
     color "$style"

Note that, since more than one style can be mixed into the style
template, 'bold', 'underline', ...  explicitly use the variable
'$style'.


File: source-highlight.info,  Node: Line prefix,  Next: String translation,  Prev: Style template,  Up: Output Language Definitions

8.7 Line prefix
===============

This feature allows you to generate a string as the prefix of each
generated line that corresponds to an input line (i.e., this prefix is
not generated for other generated output elements, e.g., the lines in
the header, footer, etc.).

   We use this feature in the LaTeX output (*note LaTeX output::):

     lineprefix "\mbox{}"

This way each line in the LaTeX output is prefixed with '\mbox{}'(1).

   Another interesting example that uses 'lineprefix' is the javadoc
output, see *note Generating HTML output::.

   ---------- Footnotes ----------

   (1) This is a sort of trick to insert spaces at the beginning of a
line without using a tabular environment; without the leading '\mbox{}'
these spaces would be ignored.  This is the only way I found to achieve
this, if you have suggestions, please let me know!


File: source-highlight.info,  Node: String translation,  Next: Document template,  Prev: Line prefix,  Up: Output Language Definitions

8.8 String translation
======================

Some character sequences that are in the source file may have a special
meaning in an output format, so they need some preprocessing (e.g.,
escaping them).  You can specify the translation table with:

     translations
     "original sequence" "transformed sequence"
     'regex' "transformed sequence"
     ...
     end

The difference between '"original sequence"' and ''regex''(1) is that
with the former you specify a character sequence that will be matched
literally, apart from special characters such as '\' (which, if needed
to be inserted, must be escaped), '\n' (new line) and '\t' (tab
character).  Instead, with the latter, you can specify a regular
expression (this is basically the same difference between '"' and ''' in
language definitions, see *note Simple definitions::).

   For instance, for HTML, we have the following translation table:

     translations
     "&" "&amp;"
     "<" "&lt;"
     ">" "&gt;"
     end

   For LaTeX, the translation table is a little bit bigger; here we show
only a little part, that shows how to escape special characters (such as
'\'), to translate a new line character and tab character:

     translations
     "<" "$<$"
     ">" "$>$"
     "&" "\\&"
     "\\" "\\textbackslash{}"
     "\n" " \\\\\n"
     " " "\\ "
     "\t" "\\ \\ \\ \\ \\ \\ \\ \\ "
     end

Note that, since a new character must be translated in LaTeX with '\\',
we have to escape two '\' (i.e., '\\\\') and then we want to actually
insert a new line in the output file '\n'.

   For HTML with not fixed font by default, 'html_notfixed.outlang' (see
*note HTML and XHTML output::), we need two translate two space sequence
(i.e., two adjacent spaces, since in HTML more adjacent spaces are
rendered as only one space(2), while we want them as they are), and we
also need to translate a space starting a new line in the source (thus
we use the regular expression '^ ', enclosed in '''); thus we have:

     translations
     "\n" "<br>\n"
     "  " "&nbsp; "
     '^ ' "&nbsp;" # a space at the beginning of a line
     "\t" "&nbsp; &nbsp; &nbsp; &nbsp; "
     end

   ---------- Footnotes ----------

   (1) Since version 2.4.

   (2) Unless they are inside a '<tt>...</tt>'.


File: source-highlight.info,  Node: Document template,  Next: Generating HTML output,  Prev: String translation,  Up: Output Language Definitions

8.9 Document template
=====================

You can define the beginning and the end of an output file, with

     doctemplate
     "...beginning..."
     "...end..."
     end

     nodoctemplate
     "...beginning..."
     "...end..."
     end

   The first one is used when the '--doc' command line option is
specified, while the second one is used in the other case(1).

   For instance, for HTML we have

     nodoctemplate
     "<!-- Generator: $additional -->
     $header<pre><tt>"
     "</tt></pre>$footer
     "
     end

Note that in the end part there is an explicit new line.

   In the definition of the 'doctemplate' and 'nodoctemplate' the
following variables can be used and will be replaced during the output
generation:

'$title'
     the value of the title for the output file (e.g., the one passed
     with the '--title' command line option;
'$header'
     the contents of the file specified with the command line option
     '--header';
'$footer'
     the contents of the file specified with the command line option
     '--footer';
'$css'
     the value passed with the command line option '--css';
'$additional'
     other additional information.  Source-highlight replaces this with
     its name and its version.
'$docbgcolor(2)'
     the background color for the output document.  Source-highlight
     replaces this with the value specified in the 'bgcolor' of the
     '.style' file (see *note Output format style::) or in the 'body'
     selector of the CSS file passed with '--style-css-file' (see *note
     Output format style using CSS::).

   For instance, for an HTML document with css, (file 'htmlcss.outlang')
we have:

     doctemplate
     "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\"
         \"http://www.w3.org/TR/REC-html40/strict.dtd\">
     <html>
     <head>
     <meta http-equiv=\"Content-Type\"
     content=\"text/html; charset=iso-8859-1\">
     <meta name=\"GENERATOR\" content=\"$additional\">
     <title>$title</title>
     <link rel=\"stylesheet\" href=\"$css\" type=\"text/css\">
     </head>
     <body>
     $header<pre><tt>"
     "</tt></pre>
     $footer</body>
     </html>
     "
     end

   For an HTML document with header and footer, (file 'html.outlang') we
have (note the use of '$docbgcolor'):

     doctemplate
     "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML//EN\">
     <html>
     <head>
     <meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">
     <meta name=\"GENERATOR\" content=\"$additional\">
     <title>$title</title>
     </head>
     <body bgcolor=\"$docbgcolor\">
     $header<pre><tt>"
     "</tt></pre>
     $footer</body>
     </html>
     "
     end

   And for an HTML table output (file 'htmltable.outlang'):

     doctemplate
     "<table  BGCOLOR=\"$docbgcolor\" NOSAVE >
     <tr NOSAVE>
     <td NOSAVE>
     <pre><tt>"
     "</tt></pre>
     </td>
     </tr>
     </table>
     "
     end

   ---------- Footnotes ----------

   (1) Up to version 2.9, there was only 'doctemplate' and for '--doc'
there was a separate '.outlang' file; I think the present solution is
better and reduces the number of files.

   (2) Since version 2.6.


File: source-highlight.info,  Node: Generating HTML output,  Prev: Document template,  Up: Output Language Definitions

8.10 Generating HTML output
===========================

As a complete example we show the file 'html_common.outlang' which
contains the common definitions for the various HTML output formats
('html.outlang', 'htmltable.outlang', etc.):

     include "html_ref.outlang"

     extension "html"

     bold "<b>$text</b>"
     italics "<i>$text</i>"
     underline "<u>$text</u>"
     color "<font color=\"$style\">$text</font>"

     colormap
     "green" "#33CC00"
     "red" "#FF0000"
     "darkred" "#990000"
     "blue" "#0000FF"
     "brown" "#9A1900"
     "pink" "#CC33CC"
     "yellow" "#FFCC00"
     "cyan" "#66FFFF"
     "purple" "#993399"
     "orange" "#FF6600"
     "brightorange" "#FF9900"
     "brightgreen" "#33FF33"
     "darkgreen" "#009900"
     "black" "#000000"
     "teal" "#008080"
     "gray" "#808080"
     "darkblue" "#000080"
     "white" "#FFFFFF"
     default "#000000"
     end

     translations
     "&" "&amp;"
     "<" "&lt;"
     ">" "&gt;"
     end


   Moreover, this file is also used for generating javadoc output:

     include "html_common.outlang"

     doctemplate
     " * <!-- Generated by Source-highlight -->
      * <pre><tt>
     "
     " * </tt></pre>
     "
     end

     nodoctemplate
     " * <!-- Generated by Source-highlight -->
      * <pre><tt>
     "
     " * </tt></pre>
     "
     end

     lineprefix " * "

     translations
     "*/" "&#42;/" # this avoids the */ to be interpreted as
     # the end of a comment inside a javadoc comment
     end


   The javadoc output format is useful to format code snippets that have
to be included inside a javadoc comment of another Java file(1).  Apart
from being formatted nicely in the generated HTML documentation, this
also releases the programmer from escaping specific characters in the
code snippet (i.e., '&', '<' and '>').  Note also that it also avoids
the sequence '*/' to be interpreted as the closing of the (javadoc)
comment.  For instance, if you write this code:

     /**
      * This is an example of usage
      *
      * <pre><tt>
      * System.out.println("*/");
      * </tt></pre>
      */

The resulting Java code contains a syntax error.  If you use
source-highlight to format the code to insert in a javadoc comment you
will avoid these problems.

   An example of a javadoc generated HTML page containing a code snippet
formatted with source-highlight can be found in the file
'SimpleClass-doc.html' in the documentation directory.

   ---------- Footnotes ----------

   (1) Although I haven't tested it, I think this will work also for
Doxygen comments.


File: source-highlight.info,  Node: Generating References,  Next: Examples,  Prev: Output Language Definitions,  Up: Top

9 Generating References
***********************

Since version 2.2 Source-highlight also produces references to fields,
variables, etc.  In order to do this it relies on the program _Exuberant
Ctags_, by Darren Hiebert, available at <http://ctags.sourceforge.net>.
Thus, you must install this program if you want Source-highlight to
provide this feature.

   The 'ctags' program generates an index (or "tag") file for a variety
of language objects found in file(s).  This allows these items to be
quickly and easily located by a text editor or other utility (as in this
case for Source-highlight).  A "tag" signifies a language object for
which an index entry is available (or, alternatively, the index entry
created for that object)(1).

   This means that Source-highlight is able to generate references for a
specific source language if and only if 'ctags' handles such language.
We refer to the command line options of 'ctags': '--list-maps' and
'--list-languages' to find out the associations of file extensions and
supported languages.

   Reference generation is enable by using the command line option
'--gen-references' (*note Invoking source-highlight::).  This option
takes an argument that rules how references will be generated:

'inline'
     a reference pointer will be generated exactly in the same place of
     the specific element.  This is useful in output formats that
     naturally supports links, such as HTML, while it is useless for
     output formats that do not support inline links, such as LaTeX.
'postline'
     if a line of the input source contains elements for which we found
     references, the list of references will be generated right after
     the line (see the examples, *note Examples::).
'postdoc'
     All the references will be generated after the whole input file has
     been generated.

   There is an exception: when an element has more than one reference
(because a variable is defined in many sources or because a method is
overloaded) then if 'inline' is specified, the generation switches to
'postline' for that occurrence.

   When '--gen-references' is specified, Source-highlight first invokes
'ctags'.  The use can customize this call by using the command line
option '--ctags' (*note Invoking source-highlight::).  In particular, if
one does not want 'ctags' to be invoked by Source-highlight (e.g.,
because the tags file has already been generated) then '--ctags' must be
passed an empty string, '""'.  In this case or when the specified
'ctags' command line generates an alternative output tag file (the
default generated file is 'tags'), one must specify the exact tag file
with the command line option '--ctags-file'.

   Once the tag file is generated, Source-highlight relies on the
library 'readtags' provided by the 'ctags' distribution, and included in
the Source-highlight sources.

   Note that if a program element is formatted according to a style that
has the option 'noref' (see *note Output format style::) then this
element is not considered a tag, and no reference is generated.  This is
the case, for instance, for a 'comment' element: each string that is
generated with the 'comment' style, since this is declared with the
option 'noref', it is not considered a tag (see *note Examples::).

   ---------- Footnotes ----------

   (1) This description is taken from the ctags man page


File: source-highlight.info,  Node: Examples,  Next: Problems,  Prev: Generating References,  Up: Top

10 Examples
***********

Here we provide some examples of sources formatted with Source-highlight
using the '-f texinfo' command line option.  Please keep in mind that
the highlighting will not be visible in the Info file, but only in the
printed manual and in the HTML output (well, at least line numbers are
visible everywhere :-).

* Menu:

* Simple example::
* References::
* Line ranges::
* Line ranges (with context)::
* Regex ranges::


File: source-highlight.info,  Node: Simple example,  Next: References,  Prev: Examples,  Up: Examples

10.1 Simple example
===================

The first example is produced by using the command:

     source-highlight -f texinfo -i test.java -o test.java.texinfo -n

   and here's the result

     01: /*
     02:   This is a classical Hello program
     03:   to test source-highlight with Java programs.
     04:
     05:   to have an html translation type
     06:
     07:         source-highlight -s java -f html -input Hello.java -output Hello.html
     08:         source-highlight -s java -f html < Hello.java > Hello.html
     09:
     10:   or type source-highlight -help for the list of options
     11:
     12:   written by
     13:   Lorenzo Bettini
     14:   http://www.lorenzobettini.it
     15:   http://www.gnu.org/software/src-highlite
     16: */
     17:
     18: package hello;
     19:
     20: import java.io.* ;
     21:
     22: /**
     23:  * <p>
     24:  * A simple Hello World class, used to demonstrate some
     25:  * features of Java source highlighting.
     26:  * </p>
     27:  * TODO: nothing, just to show an highlighted TODO or FIXME
     28:  *
     29:  * @author Lorenzo Bettini
     30:  * @version 2.0
     31:  */ /// class
     32: public class Hello {
     33:     int foo = 1998 ;
     34:     int hex_foo = 0xCAFEBABE;
     35:     boolean b = false;
     36:     Integer i = null ;
     37:     char c = '\'', d = 'n', e = '\\' ;
     38:     String xml = "<tag attr=\"value\">&auml;</tag>", foo2 = "\\" ;
     39:
     40:     /* mymethod */
     41:     public void mymethod(int i) {
     42:         // just a foo method
     43:     }
     44:     /* mymethod */
     45:
     46:     /* main */
     47:     public static void main( String args[] ) {
     48:         // just some greetings ;-)  /*
     49:         System.out.println( "Hello from java2html :-)" ) ;
     50:         System.out.println( "\tby Lorenzo Bettini" ) ;
     51:         System.out.println( "\thttp://www.lorenzobettini.it" ) ;
     52:         if (argc > 0)
     53:             String param = argc[0];
     54:         //System.out.println( "bye bye... :-D" ) ; // see you soon
     55:     }
     56:     /* main */
     57: }
     58: /// class
     59:
     60: // end of file test.java


File: source-highlight.info,  Node: References,  Next: Line ranges,  Prev: Simple example,  Up: Examples

10.2 References
===============

This example shows the use of '--gen-references' functionality.  In
particular, the following output is generated with the command:

     source-highlight -f texinfo -i test.h -o test_ref.h.texinfo -n \
          --gen-references=postline

   and here's the result (note how the comment line containing the
string 'mysum' does not contain references, since it is a 'comment'
element, and this element has the option 'noref' in the 'texinfo.style',
see *note Output format style::.  The same holds for the '_TEXTGEN_H'
comment in the last comment line).

     01: /**
     02: ** Copyright (C) 1999-2007 Lorenzo Bettini
     03: **
     04:   http://www.lorenzobettini.it
     05:
     06:   r2 = r2 XOR (1<<10);
     07:   cout << "hello world" << endl;
     08: **
     09: */
     10:
     11: // this file also contains the definition of mysum as a #define
     12:
     13: // textgenerator.h : Text Generator class &&
     14:
     15: #ifndef _TEXTGEN_H
                                           *Note _TEXTGEN_H: test.h:16.
     16: #define _TEXTGEN_H
     17:
     18: #define foo(x) (x + 1)
     19:
     20: #define mysum myfunbody
     21:
     22: #include <iostream.h> // for cerr
     23:
     24: #include "genfun.h" /* for generating functions */
     25:
     26: class TextGenerator {
     27:   public :
     28:     virtual void generate( const char *s ) const { (*sout) << s ; }
     29:     virtual void generate( const char *s, int start, int end ) const
     30:       {
     31:         for ( int i = start ; i <= end ; ++i )
     32:           (*sout) << s[i] ;
     33:         return a<p->b ? a : 3;
     34:       }
     35:     virtual void generateln( const char *s ) const
     36:         {
     37:             generate( s ) ;
                                             *Note generate: test.h:28.
                                             *Note generate: test.h:29.
     38:             (*sout) << endl ;
     39:         }
     40:     virtual void generateEntire( const char *s ) const
     41:         {
     42:             startTextGeneration() ;
                                  *Note startTextGeneration: test.h:46.
                                  *Note startTextGeneration: test.h:70.
     43:             generate(s) ;
                                             *Note generate: test.h:28.
                                             *Note generate: test.h:29.
     44:             endTextGeneration() ;
                                    *Note endTextGeneration: test.h:47.
                                    *Note endTextGeneration: test.h:76.
     45:         }
     46:     virtual void startTextGeneration() const {}
     47:     virtual void endTextGeneration() const {}
     48:     virtual void beginText( const char *s ) const
     49:         {
     50:             startTextGeneration() ;
                                  *Note startTextGeneration: test.h:46.
                                  *Note startTextGeneration: test.h:70.
     51:             if ( s )
     52:                 generate( s ) ;
                                             *Note generate: test.h:28.
                                             *Note generate: test.h:29.
     53:         }
     54:     virtual void endText( const char *s ) const
     55:         {
     56:             if ( s )
     57:                 generate( s ) ;
                                             *Note generate: test.h:28.
                                             *Note generate: test.h:29.
     58:             endTextGeneration() ;
                                    *Note endTextGeneration: test.h:47.
                                    *Note endTextGeneration: test.h:76.
     59:         }
     60: } ;
     61:
     62: // Decorator
     63: class TextDecorator : public TextGenerator {
                                        *Note TextGenerator: test.h:26.
     64:   protected :
     65:     TextGenerator *decorated ;
                                        *Note TextGenerator: test.h:26.
     66:
     67:   public :
     68:     TextDecorator( TextGenerator *t ) : decorated( t ) {}
                                        *Note TextGenerator: test.h:26.
                                            *Note decorated: test.h:65.
     69:
     70:     virtual void startTextGeneration() const
     71:     {
     72:         startDecorate() ;
     73:         if ( decorated )
                                            *Note decorated: test.h:65.
     74:             decorated->startTextGeneration() ;
                                  *Note startTextGeneration: test.h:46.
                                            *Note decorated: test.h:65.
                                  *Note startTextGeneration: test.h:70.
     75:     }
     76:     virtual void endTextGeneration() const
     77:     {
     78:         if ( decorated )
                                            *Note decorated: test.h:65.
     79:             decorated->endTextGeneration() ;
                                    *Note endTextGeneration: test.h:47.
                                            *Note decorated: test.h:65.
                                    *Note endTextGeneration: test.h:76.
     80:         endDecorate() ;
     81:         mysum;
                                                *Note mysum: test.h:20.
     82:     }
     83:
     84:     // pure virtual functions
     85:     virtual void startDecorate() const = 0 ;
     86:     virtual void endDecorate() const = 0 ;
     87: } ;
     88:
     89: #endif // _TEXTGEN_H
     90:


File: source-highlight.info,  Node: Line ranges,  Next: Line ranges (with context),  Prev: References,  Up: Examples

10.3 Line ranges
================

This is an example that uses '--line-range' command line option on the
input file shown in *Note Simple example:::

     source-highlight -f texinfo -i test.java -n \
          --line-range="12-18","29-34"

   This generates the following output

     12:   written by
     13:   Lorenzo Bettini
     14:   http://www.lorenzobettini.it
     15:   http://www.gnu.org/software/src-highlite
     16: */
     17:
     18: package hello;
     29:  * @author Lorenzo Bettini
     30:  * @version 2.0
     31:  */ /// class
     32: public class Hello {
     33:     int foo = 1998 ;
     34:     int hex_foo = 0xCAFEBABE;


   Note that, although the specified line ranges span comment
environments, the highlighting is respected: the starting of the comment
is not printed, but the remaining parts of the comment are correctly
highlighted as comment.


File: source-highlight.info,  Node: Line ranges (with context),  Next: Regex ranges,  Prev: Line ranges,  Up: Examples

10.4 Line ranges (with context)
===============================

This is an example that uses the command line option '--line-range'
together with the '--range-context' and '--range-separator':

     source-highlight -f texinfo -i test.java -n \
          --line-range="12-18","29-34" \
          --range-context=2 \
          --range-separator="{... not in range ...}"

   This generates the following output

     {... not in range ...}
     10:   or type source-highlight --help for the list of options
     11:
     12:   written by
     13:   Lorenzo Bettini
     14:   http://www.lorenzobettini.it
     15:   http://www.gnu.org/software/src-highlite
     16: */
     17:
     18: package hello;
     19:
     20: import java.io.* ;
     {... not in range ...}
     27:  * TODO: nothing, just to show an highlighted TODO or FIXME
     28:  *
     29:  * @author Lorenzo Bettini
     30:  * @version 2.0
     31:  */ /// class
     32: public class Hello {
     33:     int foo = 1998 ;
     34:     int hex_foo = 0xCAFEBABE;
     35:     boolean b = false;
     36:     Integer i = null ;
     {... not in range ...}


   Note the two additional 2 lines before and after the ranges (compare
it with the output in *note Line ranges::).  Note that the (elements of
the) context lines are not highlighted.  Moreover, the range separator
line '"{... not in range ...}"' is printed between ranges (the separator
string is preformatted automatically, so, e.g., you don't have to escape
special output characters, such as the { } in texinfo output).


File: source-highlight.info,  Node: Regex ranges,  Prev: Line ranges (with context),  Up: Examples

10.5 Regex ranges
=================

Ranges can be expressed also using regular expressions, with the command
line option '--regex-range'.  In this case the beginning of the range
will be detected by a line containing (in any point) a string matching
the specified regular expression; the end will be detected by a line
containing a string matching the same regular expression that started
the range.  This feature is very useful when we want to document some
code (e.g., in this very manual) by showing only specific parts, that
are delimited in a ad-hoc way in the source code (e.g., with specific
comment patterns).

   For instance, the following output was produced, starting from the
source file shown in *Note Simple example::, by specifying:

     --regex-range="/// [[:alpha:]]+"

Note that the lines containing '/// class', which determine the range,
are not shown in the output:

     32: public class Hello {
     33:     int foo = 1998 ;
     34:     int hex_foo = 0xCAFEBABE;
     35:     boolean b = false;
     36:     Integer i = null ;
     37:     char c = '\'', d = 'n', e = '\\' ;
     38:     String xml = "<tag attr=\"value\">&auml;</tag>", foo2 = "\\" ;
     39:
     40:     /* mymethod */
     41:     public void mymethod(int i) {
     42:         // just a foo method
     43:     }
     44:     /* mymethod */
     45:
     46:     /* main */
     47:     public static void main( String args[] ) {
     48:         // just some greetings ;-)  /*
     49:         System.out.println( "Hello from java2html :-)" ) ;
     50:         System.out.println( "\tby Lorenzo Bettini" ) ;
     51:         System.out.println( "\thttp://www.lorenzobettini.it" ) ;
     52:         if (argc > 0)
     53:             String param = argc[0];
     54:         //System.out.println( "bye bye... :-D" ) ; // see you soon
     55:     }
     56:     /* main */
     57: }


   Furthermore, the line numbers are consistent with the lines of the
original file.

   If we want to output only what is included between '/* main */', we
specify (note that we must escape the special regular expression
character '*'):

     --regex-range="/\* main \*/"

and we get:

     47:     public static void main( String args[] ) {
     48:         // just some greetings ;-)  /*
     49:         System.out.println( "Hello from java2html :-)" ) ;
     50:         System.out.println( "\tby Lorenzo Bettini" ) ;
     51:         System.out.println( "\thttp://www.lorenzobettini.it" ) ;
     52:         if (argc > 0)
     53:             String param = argc[0];
     54:         //System.out.println( "bye bye... :-D" ) ; // see you soon
     55:     }


   If we want to show only the methods, which in the source file are
delimited by comment lines containing the method's name, we can specify:

     --regex-range="/\* [[:alpha:]]+ \*/"

     41:     public void mymethod(int i) {
     42:         // just a foo method
     43:     }
     47:     public static void main( String args[] ) {
     48:         // just some greetings ;-)  /*
     49:         System.out.println( "Hello from java2html :-)" ) ;
     50:         System.out.println( "\tby Lorenzo Bettini" ) ;
     51:         System.out.println( "\thttp://www.lorenzobettini.it" ) ;
     52:         if (argc > 0)
     53:             String param = argc[0];
     54:         //System.out.println( "bye bye... :-D" ) ; // see you soon
     55:     }


   In this case, we might have also specified:

     --regex-range="/\* main \*/","/\* mymethod \*/"

since '--regex-range' accepts multiple regular expressions.

   IMPORTANT: the order of regular expression specification is crucial,
since they are tested in the same order they are specified at the
command line.


File: source-highlight.info,  Node: Problems,  Next: Mailing Lists,  Prev: Examples,  Up: Top

11 Reporting Bugs
*****************

If you find a bug in 'source-highlight', please send electronic mail to

   'bug-source-highlight at gnu dot org'

   Include the version number, which you can find by running
'source-highlight --version'.  Also include in your message the output
that the program produced and the output you expected.

   Even better, please file a bug report at Savannah site:

   <https://savannah.gnu.org/bugs/?group=src-highlite>

   If you have other questions, comments or suggestions about
'source-highlight', contact the author via electronic mail (find the
address at <http://www.lorenzobettini.it>).  The author will try to help
you out, although he may not have time to fix your problems.


File: source-highlight.info,  Node: Mailing Lists,  Next: Concept Index,  Prev: Problems,  Up: Top

12 Mailing Lists
****************

The following mailing lists are available:

   'help-source-highlight at gnu dot org'

   for generic discussions about the program and for asking for help
about it (open mailing list),
<http://mail.gnu.org/mailman/listinfo/help-source-highlight>

   'info-source-highlight at gnu dot org'

   for receiving information about new releases and features (read-only
mailing list),
<http://mail.gnu.org/mailman/listinfo/info-source-highlight>.

   If you want to subscribe to a mailing list just go to the URL and
follow the instructions, or send me an e-mail and I'll subscribe you.

   I'll describe new features in new releases also in my blog, at this
URL:

   <http://tronprog.blogspot.com/search/label/source-highlight>


File: source-highlight.info,  Node: Concept Index,  Prev: Mailing Lists,  Up: Top

Concept Index
*************

[index]
* Menu:

* "expression":                          Ways of specifying regular expressions.
                                                              (line  15)
* $infile:                               Anchors and References.
                                                              (line  17)
* $infilename:                           Anchors and References.
                                                              (line  17)
* $linenum:                              Anchors and References.
                                                              (line   6)
* $outfile:                              Anchors and References.
                                                              (line  17)
* $style:                                Colors.              (line   6)
* $text:                                 Text styles.         (line  23)
* 'expression':                          Ways of specifying regular expressions.
                                                              (line  35)
* --data-dir:                            The program source-highlight-settings.
                                                              (line  15)
* --data-dir <1>:                        Configuration files. (line  14)
* --data-dir <2>:                        Invoking source-highlight.
                                                              (line 165)
* --infer-lang:                          Perl.                (line  14)
* --infer-lang <1>:                      Invoking source-highlight.
                                                              (line 229)
* --infer-lang <2>:                      How the input language is discovered.
                                                              (line  42)
* --show-lang-elements:                  Output format style. (line   9)
* --show-lang-elements <1>:              Listing Language Elements.
                                                              (line  13)
* --style-css-file:                      Output format style using CSS.
                                                              (line  10)
* --style-file:                          Output format style. (line 108)
* --with-doxygen:                        Installation.        (line  63)
* `expression`:                          Ways of specifying regular expressions.
                                                              (line  61)
* anchor:                                Generating References.
                                                              (line   6)
* ANSI color:                            ANSI color escape sequences.
                                                              (line   6)
* Apache:                                Related Software and Links.
                                                              (line  61)
* autoconf:                              Anonymous Git Checkout.
                                                              (line  35)
* autoconf <1>:                          What you need to build source-highlight.
                                                              (line  28)
* automake:                              Anonymous Git Checkout.
                                                              (line  35)
* automake <1>:                          What you need to build source-highlight.
                                                              (line  28)
* background color:                      Output format style. (line  12)
* background color <1>:                  Output format style. (line 116)
* background color <2>:                  Colors.              (line  11)
* backreference:                         Ways of specifying regular expressions.
                                                              (line  63)
* backreference <1>:                     Notes on regular expressions.
                                                              (line  24)
* backtick:                              Ways of specifying regular expressions.
                                                              (line  63)
* bash completion:                       Installation.        (line  54)
* bgcolor:                               Output format style. (line  12)
* bold:                                  Output format style. (line 125)
* bold <1>:                              Text styles.         (line   6)
* boost:                                 Building with qmake. (line  13)
* boost <1>:                             What you need to build source-highlight.
                                                              (line   6)
* Boost regex:                           Tips on installing Boost Regex library.
                                                              (line   6)
* bugs:                                  Problems.            (line   6)
* building requirements:                 What you need to build source-highlight.
                                                              (line   6)
* CGI:                                   Using source-highlight as a CGI.
                                                              (line   6)
* check-regexp:                          The program check-regexp.
                                                              (line   6)
* code2blog:                             Related Software and Links.
                                                              (line  56)
* color:                                 Output format style. (line 116)
* color <1>:                             Output format style. (line 135)
* colors:                                Colors.              (line   6)
* compilation:                           Installation.        (line   6)
* compilation requirements:              What you need to build source-highlight.
                                                              (line   6)
* conditional expressions:               Notes on regular expressions.
                                                              (line 101)
* configuration files:                   Configuration files. (line   6)
* Copying conditions:                    Copying.             (line   6)
* cpp2html:                              Installation.        (line  72)
* CSS:                                   Output format style using CSS.
                                                              (line   6)
* ctags:                                 Generating References.
                                                              (line   6)
* CXXFLAGS:                              Tips on installing Boost Regex library.
                                                              (line  55)
* debug:                                 Debugging.           (line   6)
* default.lang:                          Using source-highlight as a simple formatter.
                                                              (line  67)
* default.lang <1>:                      Invoking source-highlight.
                                                              (line 312)
* default.style:                         Output format style. (line   6)
* definition order:                      Order of definitions.
                                                              (line   6)
* delimited definitions:                 Delimited definitions.
                                                              (line   6)
* direct color scheme:                   Output format style. (line 200)
* directories:                           Installation.        (line  37)
* DocBook:                               DocBook output.      (line   6)
* doctemplate:                           Document template.   (line   8)
* download:                              Download.            (line   6)
* doxygen:                               Installation.        (line  63)
* dynamic backreference:                 Dynamic Backreferences.
                                                              (line   6)
* environments:                          State/Environment Definitions.
                                                              (line   6)
* failsafe:                              Using source-highlight as a simple formatter.
                                                              (line  61)
* failsafe <1>:                          Invoking source-highlight.
                                                              (line 301)
* features:                              Introduction.        (line   6)
* file inclusion:                        File inclusion.      (line   6)
* Firefox:                               Related Software and Links.
                                                              (line  74)
* fixed:                                 Output format style. (line 125)
* fixed <1>:                             Text styles.         (line   6)
* Fortran:                               Fortran.             (line   6)
* getting help:                          Invoking source-highlight.
                                                              (line   6)
* Git:                                   Anonymous Git Checkout.
                                                              (line   6)
* gnulib:                                What you need to build source-highlight.
                                                              (line  29)
* help:                                  Invoking source-highlight.
                                                              (line   6)
* HTML:                                  HTML and XHTML output.
                                                              (line   6)
* Ikiwiki:                               Related Software and Links.
                                                              (line  79)
* inline_reference:                      Anchors and References.
                                                              (line  72)
* installation:                          Installation.        (line   6)
* introduction:                          Introduction.        (line   6)
* invoking:                              Invoking source-highlight.
                                                              (line   6)
* italics:                               Output format style. (line 125)
* italics <1>:                           Text styles.         (line   6)
* java2html:                             Related Software and Links.
                                                              (line  40)
* java2html <1>:                         Installation.        (line  72)
* KDE:                                   Related Software and Links.
                                                              (line  28)
* KDE <1>:                               Related Software and Links.
                                                              (line  34)
* Ksrc2highlight:                        Related Software and Links.
                                                              (line  34)
* language definition:                   Language Definitions.
                                                              (line   6)
* language inference:                    Invoking source-highlight.
                                                              (line 229)
* language map:                          Language map.        (line   6)
* LaTeX:                                 LaTeX output.        (line   6)
* LDFLAGS:                               Tips on installing Boost Regex library.
                                                              (line  71)
* library:                               Introduction.        (line  23)
* library <1>:                           Installation.        (line  63)
* libtool:                               Anonymous Git Checkout.
                                                              (line  35)
* libtool <1>:                           What you need to build source-highlight.
                                                              (line  28)
* line ranges:                           Invoking source-highlight.
                                                              (line 248)
* line ranges <1>:                       Line ranges.         (line   6)
* line ranges <2>:                       Line ranges (with context).
                                                              (line   6)
* lines:                                 Line wide definitions.
                                                              (line   6)
* lookahead asserts:                     Notes on regular expressions.
                                                              (line  40)
* lookbehind asserts:                    Notes on regular expressions.
                                                              (line  90)
* mailing list:                          Mailing Lists.       (line   6)
* marked subexpressions:                 Ways of specifying regular expressions.
                                                              (line  63)
* marked subexpressions <1>:             Notes on regular expressions.
                                                              (line  17)
* matching strategy:                     How source-highlight works.
                                                              (line  23)
* MinGW:                                 Building with qmake. (line  19)
* MSVC:                                  Building with qmake. (line   9)
* msys:                                  Building with qmake. (line  21)
* nodoctemplate:                         Document template.   (line  13)
* nohilite.lang:                         Using source-highlight as a simple formatter.
                                                              (line   9)
* non-marking parenthesis:               Notes on regular expressions.
                                                              (line  10)
* nonsensitive:                          Simple definitions.  (line  76)
* noref:                                 Output format style. (line 173)
* notfixed:                              Output format style. (line 125)
* notfixed <1>:                          Text styles.         (line   6)
* one style:                             One style.           (line   6)
* options:                               Invoking source-highlight.
                                                              (line   6)
* output language definition:            Output Language Definitions.
                                                              (line   6)
* output language map:                   Output Language map. (line   6)
* output style:                          Output format style. (line   6)
* Pastebin:                              Related Software and Links.
                                                              (line 100)
* patching:                              Patching from a previous version.
                                                              (line   6)
* Perl:                                  Perl.                (line   6)
* Perl <1>:                              Related Software and Links.
                                                              (line  96)
* Php:                                   Related Software and Links.
                                                              (line  85)
* postdoc_reference:                     Anchors and References.
                                                              (line  72)
* postline_reference:                    Anchors and References.
                                                              (line  72)
* prefix:                                How source-highlight works.
                                                              (line  31)
* problems:                              Problems.            (line   6)
* PyQt:                                  Related Software and Links.
                                                              (line  91)
* Python:                                Related Software and Links.
                                                              (line  91)
* qmake:                                 Building with qmake. (line   6)
* QSource-Highlight:                     Related Software and Links.
                                                              (line  18)
* Qt:                                    Related Software and Links.
                                                              (line  11)
* Qt <1>:                                Related Software and Links.
                                                              (line  18)
* range context:                         Invoking source-highlight.
                                                              (line 248)
* range context <1>:                     Line ranges (with context).
                                                              (line   6)
* range separator:                       Invoking source-highlight.
                                                              (line 248)
* range separator <1>:                   Line ranges (with context).
                                                              (line   6)
* RapidWeaver:                           Related Software and Links.
                                                              (line  67)
* redef:                                 Redefinitions and Substitutions.
                                                              (line  11)
* reference:                             Generating References.
                                                              (line   6)
* regex ranges:                          Invoking source-highlight.
                                                              (line 280)
* regex ranges <1>:                      Regex ranges.        (line   6)
* regular expressions:                   Notes on regular expressions.
                                                              (line   6)
* rpm:                                   Building .rpm.       (line   6)
* sample:                                Simple Usage.        (line   6)
* shadow build:                          Installation.        (line  16)
* shadow build <1>:                      Anonymous Git Checkout.
                                                              (line  38)
* SHJS:                                  Related Software and Links.
                                                              (line  49)
* simple language definition:            Simple definitions.  (line   6)
* SIP:                                   Related Software and Links.
                                                              (line  91)
* source-highlight-esc.sh:               Using source-highlight with less.
                                                              (line  16)
* Source-Highlight-Qt:                   Related Software and Links.
                                                              (line  11)
* source-highlight-settings:             The program source-highlight-settings.
                                                              (line   6)
* source-highlight.conf:                 The program source-highlight-settings.
                                                              (line  12)
* SourceHighlightIDE:                    Related Software and Links.
                                                              (line  28)
* SOURCE_HIGHLIGHT_DATADIR:              The program source-highlight-settings.
                                                              (line  19)
* SOURCE_HIGHLIGHT_DATADIR <1>:          Configuration files. (line  27)
* src-hilite-lesspipe.sh:                Using source-highlight with less.
                                                              (line   6)
* states:                                State/Environment Definitions.
                                                              (line   6)
* style separator:                       Style template.      (line   6)
* style template:                        Style template.      (line   6)
* style.defaults:                        Default Styles.      (line  16)
* subst:                                 Redefinitions and Substitutions.
                                                              (line  40)
* suffix:                                How source-highlight works.
                                                              (line  37)
* tail recursion:                        Concept Index.       (line   6)
* Texinfo:                               Texinfo output.      (line   6)
* underline:                             Output format style. (line 125)
* underline <1>:                         Text styles.         (line   6)
* usage:                                 Invoking source-highlight.
                                                              (line   6)
* variables:                             Variable definitions.
                                                              (line   6)
* version:                               Invoking source-highlight.
                                                              (line   6)
* Wiki:                                  Related Software and Links.
                                                              (line  79)
* Wiki <1>:                              Related Software and Links.
                                                              (line  85)
* XHTML:                                 HTML and XHTML output.
                                                              (line   6)



Tag Table:
Node: Top847
Node: Introduction2509
Node: Supported languages4084
Ref: Supported languages-Footnote-110045
Node: The program source-highlight-settings10218
Node: Notes on some languages11175
Node: Fortran11732
Node: Perl12453
Node: Using source-highlight as a simple formatter13524
Ref: Using source-highlight as a simple formatter-Footnote-116822
Node: Related Software and Links16947
Node: Installation20992
Node: Building with qmake24164
Node: Download26505
Node: Anonymous Git Checkout27969
Ref: Anonymous Git Checkout-Footnote-129895
Node: What you need to build source-highlight30017
Ref: What you need to build source-highlight-Footnote-132271
Ref: What you need to build source-highlight-Footnote-232318
Ref: What you need to build source-highlight-Footnote-332365
Ref: What you need to build source-highlight-Footnote-432411
Node: Tips on installing Boost Regex library32456
Ref: Tips on installing Boost Regex library-Footnote-137680
Ref: Tips on installing Boost Regex library-Footnote-237827
Node: Patching from a previous version37994
Node: Using source-highlight with less38637
Node: Using source-highlight as a CGI39665
Node: Building .rpm40180
Node: Copying40629
Node: Simple Usage41055
Ref: Simple Usage-Footnote-143673
Node: HTML and XHTML output43840
Node: LaTeX output44858
Node: Texinfo output45371
Node: DocBook output45793
Node: ANSI color escape sequences46111
Node: Odf output46659
Node: Groff output47140
Node: Configuration files47710
Node: Output format style49405
Ref: Output format style-Footnote-156644
Ref: Output format style-Footnote-256792
Ref: Output format style-Footnote-356819
Ref: Output format style-Footnote-456985
Ref: Output format style-Footnote-557012
Ref: Output format style-Footnote-657106
Ref: Output format style-Footnote-757174
Ref: Output format style-Footnote-857240
Node: Output format style using CSS57267
Node: Default Styles62030
Ref: Default Styles-Footnote-163705
Node: Language map63732
Node: Language definition files64625
Ref: Language definition files-Footnote-165192
Node: Output Language map65289
Node: Output Language definition files66411
Ref: Output Language definition files-Footnote-167213
Node: Developing your own definition files67314
Node: Invoking source-highlight68146
Ref: Invoking source-highlight-Footnote-185003
Ref: Invoking source-highlight-Footnote-285160
Node: How the input language is discovered85222
Ref: How the input language is discovered-Footnote-187268
Ref: How the input language is discovered-Footnote-287295
Node: Language Definitions87324
Node: Ways of specifying regular expressions91054
Ref: Ways of specifying regular expressions-Footnote-195855
Ref: Ways of specifying regular expressions-Footnote-295882
Node: Simple definitions96247
Node: Line wide definitions100586
Node: Order of definitions101285
Node: Delimited definitions102067
Ref: Delimited definitions-Footnote-1104491
Node: Variable definitions104823
Node: Dynamic Backreferences105704
Ref: Dynamic Backreferences-Footnote-1108410
Ref: Dynamic Backreferences-Footnote-2108436
Node: File inclusion108565
Node: State/Environment Definitions109756
Node: Explicit subexpressions with names115734
Node: Redefinitions and Substitutions119433
Ref: Redefinitions and Substitutions-Footnote-1122092
Node: How source-highlight works122142
Ref: How source-highlight works-Footnote-1126118
Ref: How source-highlight works-Footnote-2126334
Node: Notes on regular expressions126395
Ref: Notes on regular expressions-Footnote-1131163
Ref: Notes on regular expressions-Footnote-2131221
Ref: Notes on regular expressions-Footnote-3131275
Node: The program check-regexp131500
Node: Listing Language Elements133828
Ref: Listing Language Elements-Footnote-1135245
Node: Concluding Remarks135272
Node: Debugging136349
Ref: Debugging-Footnote-1146232
Ref: Debugging-Footnote-2146359
Node: Tutorials on Language Definitions146464
Node: Highlighting C/C++ and C#148830
Ref: Highlighting C/C++ and C#-Footnote-1158518
Ref: Highlighting C/C++ and C#-Footnote-2158708
Ref: Highlighting C/C++ and C#-Footnote-3158796
Ref: Highlighting C/C++ and C#-Footnote-4158921
Node: Highlighting Diff files159218
Node: Pseudo semantic analysis165819
Node: Output Language Definitions171003
Node: File extension173351
Node: Text styles173972
Node: Colors175289
Ref: Colors-Footnote-1178528
Node: Anchors and References178555
Node: One style182752
Node: Style template184041
Node: Line prefix184824
Ref: Line prefix-Footnote-1185544
Node: String translation185810
Ref: String translation-Footnote-1188124
Ref: String translation-Footnote-2188151
Node: Document template188200
Ref: Document template-Footnote-1191305
Ref: Document template-Footnote-2191487
Node: Generating HTML output191514
Ref: Generating HTML output-Footnote-1194137
Node: Generating References194225
Ref: Generating References-Footnote-1197657
Node: Examples197715
Node: Simple example198263
Node: References200595
Ref: test.h:16201784
Ref: test.h:18201821
Ref: test.h:20201862
Ref: test.h:26202023
Ref: test.h:28202085
Ref: test.h:29202162
Ref: test.h:35202407
Ref: test.h:40202725
Ref: test.h:46203382
Ref: test.h:47203439
Ref: test.h:48203494
Ref: test.h:54203995
Ref: test.h:63204525
Ref: test.h:65204688
Ref: test.h:68204809
Ref: test.h:70205046
Ref: test.h:76205543
Node: Line ranges206321
Node: Line ranges (with context)207323
Node: Regex ranges208994
Node: Problems212825
Node: Mailing Lists213644
Node: Concept Index214504

End Tag Table

Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists