BSM Audit Parser README
Parser for Solaris Basic Security Module audit log files.
Copyright 1997-2000 by the Purdue Research Foundation for CERIAS (the
Center for Education and Research in Information Assurance and
Security). All rights reserved. This work may be used for
non-profit educational and research purposes only. Any copies made
of this file or portions of its contents must include this copyright
statement. For information on reuse, licensing, or copying, contact
.
This software is experimental in nature and is provided without any
express or implied warranties, including, without limitation, the
implied warranties of merchantability and fitness for any particular
purpose.
$Id: README,v 1.7 2000/02/16 23:42:01 flack Exp $
BUILD SUMMARY FOR THE IMPATIENT:
1. Edit Makefile variables as described below.
2. make extensions (first time only, builds some extensions to ANTLR)
3. make (builds and tests the parser)
If "make extensions" fails or you get "File to patch" prompts, arrange your
PATH so a working version of "patch" is found ahead of the broken one.
Note: The scanner will not yet work under Solaris on Intel, because BSM
logs use SPARC byte order and the scanner does not yet do byte swapping.
It will be added in a later release, or feel free to tweak BSMTokenStream.c
if you can't wait.
END BUILD SUMMARY FOR THE IMPATIENT.
A misuse detection or audit reduction project for Solaris BSM
probably requires a module to read and parse the binary BSM audit
format into some convenient internal form. The audit format is
described in a Solaris AnswerBook.
Restating that description in a formal way has some benefits. The
process can help to reveal ambiguities in the descriptions, and a
formal specification in the form of a grammar can be compiled directly
to produce a tool that reads the audit format. Such a tool can be used
to confirm that the documentation does accurately describe the data
format, or (if parse errors are reported), identify discrepancies.
The tool also provides a reliable starting point for development of
any program to do something useful with the audit data.
The file BSM.g in this package is such a formal description of the BSM
format. You can read it for your own understanding of what is contained
in a BSM log, and you can compile it to create tools that use BSM data.
To read it, you don't need anything special, but if the grammar notation
is unfamiliar, you can get up to speed by reading about the
ANTLR metalanguage.
To compile the grammar and make BSM-reading tools, you need somewhat more:
- A Solaris system with the BSM header files and libraries.
- A C compiler
- A Java development kit (free)
- The ANTLR parser generator (also free)
This package has been tested with Java 2 (a/k/a Java 1.2) and ANTLR 2.7.0.
To build it, edit the Makefile to set CC to the proper command to invoke
your C compiler. Set JAVA_HOME to the directory where your Java developer kit
is installed (e.g. /somewhere/jdk1.2). That directory should have "bin" and
"include" subdirectories (among other things). Set ANTLR_HOME to the directory
where the ANTLR software was installed (e.g. /somewhere/antlr-2.7.0). That
directory should have "antlr" and "doc" subdirectories.
If there are individual class files in the "antlr" subdirectory, leave
ANTLR_CLASSES equal to ANTLR_HOME. If instead you have an antlrclasses.jar
in ANTLR_HOME, then set ANTLR_CLASSES = $(ANTLR_HOME)/antlrclasses.jar
The command "make extensions" will make some extensions to the ANTLR parser
generator (this should only need to be done once). Plain "make" will then
build a rudimentary BSM parser.
As provided, the BSM grammar includes no action routines to do anything with
the audit data. When run on an audit log, it will simply print a message when
it finishes, after any syntax error messages if the file contents do not
conform to the grammar. The error message includes the number of the record
where the error was detected, to help track down any discrepancies
between the grammar examples in the Sun documentation and the actual format of
the log files.
As with yacc, add semantic processing to the parser simply by adding
action routines to the various productions in the grammar. For convenience,
ANTLR has built-in support for building and traversing Abstract Syntax Trees
of the parsed data. It may also be of interest that by changing the language
option at the top of BSM.g, a C++ parser can be produced; that would require
translating the few snippets of Java code that appear in the grammar, and
porting the BSMToken and BSMTokenStream classes.
A test driver routine is included as BSMParser.main. To run the parser on
an audit log file, use the command
java -jar BSMParser.jar logfilename...
making sure the environment variable LD_LIBRARY_PATH includes this directory
(containing libBSMTokenStream.so). The ... indicates several log files can be parsed in a
single command. The Java runtime has some start-up overhead, so that might be
noticeably faster than parsing several files in separate commands.
Three more options can be given between "java" and "-jar BSMParser.jar":
-DbufferInput=true will probably improve run time in most cases
-DtraceParse=true prints parsing details, not just success message.
This functionality depends on (1) BSM.g being
compiled with -trace (the Makefile does this), and
(2) verbose_trace being on the classpath (or,
equivalently, the replacement LLkParser.class being
included in the jar file, as it is by default). If
verbose_trace is not on the classpath, you will get
less informative trace messages, and you will always
get them (even if traceParse=false) when compiled
with -trace. If BSM.g is compiled without -trace,
then even if verbose_trace is on the classpath and
traceParse=true, only a flat list of tokens will
be produced.
-DeventFile= specifies an audit event file other than
/etc/security/audit_event. For example, if you are
running on Solaris 2.5.1 but need to parse audit
data from a 2.6 system, get a copy of audit_event
from a 2.6 system, and refer to it with this option.
Another test driver is provided, BSMTokenStream.main, to exercise the
lexical analyzer without the parser. The command
java edu.purdue.cerias.projects.BSMParser.BSMTokenStream logfilename...
will simply print all the tokens read from the file(s) in sequence, and can
be used to find the source of a syntax error reported by the parser. The
-DbufferInput=true and -DeventFile= options are valid here also.
Complete grammar rules for some BSM events, and lexical actions for some
BSM tokens, are not yet included for events or tokens too uncommon to occur
in our test data sets. Because skeletal rules and actions are provided for
all events and tokens, fully supporting another event or token
is just a matter of filling in the details. We ask that all such additions
be forwarded to us for inclusion in the distributed code.
flack@cs.purdue.edu