
1. Introduction

Brecis has a new family of chips.  The internal, engineering codename
for this family has been "Polo."  This chip family has a new
implementation of the security engine, which required a new API, not
just extending the previous API to take full advantage of the
hardware.

This document refers to the previous security engine as the Original
Brecis Security Engine, or OBSE, and the new security engine as Polo
Security Engine, or PSE.

This document explains the new API, how it differs from the old API,
and how it should be used.

1.1 Operation of the New API

Note: Most of this discussion is done with the kernel level API in mind.
The user level API has many restrictions that are covered later.

The PSE adds AES encryption/decryption capabilities, and can process
encryption or decryption in parallel with the HMAC operation necessary
for IPSEC ESP processing.  

1.1.1 Work and completion queues

The PSE has two sets of queues (two queues in each set, for a total
of 4) that are used to initiate operations and communicate results.
The two queues that are used to initiate requests are called work
queues, and the queues that contain the results of the requests are
called completion queues.  The work queues are not directly paired
with the completion queues; each work queue request indicates which of
the two completion queues the results should be placed in.

Which of the two work queues to use is selected by a parameter to the
work queue entry creation functions, and which completion queue the
results will end up in is specified as part of the parameters to the
msp_sec2_new_request() call.  The completion queue seleciton is
independent of the work queue; some or all of the requests in work
queue 0 can have their completion queue entries placed in work queue
1, for instance.

The way these queues are used is left up to the application software.
I'll give a quick example here, but in the end it is completely up to
the system designer as to the best way to take advantage of the queues.

Here's the example:  Consider that an IPSEC tunnel most often travels
over a WAN connection of some sort, and that the systems that send and
receive data for the IPSEC tunnel are often connected by local
interfaces that are faster than the given WAN connection.  It is quite
possible that data from local systems could monopolize the security
engine with data that exceeds the bandwidth available on the tunnel
connection, i.e. data that will later be discarded by the network
interface.  An easy way to be "fair" about this would be to put
all encryptions on work queue 0, and all decryptions on work queue 1.
Because the two work queues are serviced round-robin, the security
engine will work on each tunnel direction in a fair manner.

In a similar manner, the completion queues can be used differently.
It is up to the application software to decide when to examine the
completion queues for results.  You can, of course, use interrupts to
indicate when an operation has completed.  But maybe you don't want to
incur the interrupt overhead.  You can poll continuously for
completion after submitting an operation to the engine.  You can poll
off of a timer routine.  Or, maybe you have a way to queue up many
requests to the driver, then you want to wait for them all to
complete.  You can choose to poll one or both queues when you poll
manually.

1.1.2 Creating work queue entries

Entries in the work queue are created with a sequence of function
calls to the driver.  First, msp_sec2_new_request() is called to begin
an operation.  It sets most of the parameters for performing this
request.  Then, msp_sec2_add_sg() is called up to sixteen times to
create the hardware scatter/gather list.  This is simply the list of
buffers the request will operate on; scatter entries are destination
buffers, and gather entries are source buffers.  Once the
scatter/gather list is built, msp_sec2_end_request() is called to
indicate that building of the request is complete, and the operation
may commence.

These calls actually build the work queue entry for the hardware right
in the circular buffer that the hardware uses as the queue.  It is
required that the building of one request finish before trying to
build another request in the same queue.  In many cases this may be
ensured by the fact that there's only one task-level thread in the
kernel at a time.  But, for instance, if you're ever going to build a
security request from interrupt level, you must ensure that you have
not interrupted a task level building of a request.

This is most often done by masking interrupts around building of the
request; and, in fact, by default the API will do this for you --
blocking interrupts when msp_sec2_new_request() is called, and
returning the interrupt mask to the previous state when
msp_sec2_end_request() is called.  If you don't want the API to mask
interrupts, you must tell it not to with a call to
msp_sec2_set_q_options().

The building of requests is independent for each work queue; you can
interrupt building a request in queue 0 to build a request in queue 1,
or vice-versa, without problems.  So another available option is use
one queue for requests from task level, and the other queue for
requests from interrupt level.

1.1.3 Completion callback function

The call to msp_sec2_new_request() includes the address of a callback
function and a 32 bit opaque parameter to pass to that callback
function.  When the operation is complete, the callback function is
called with the given user-supplied parameter, and another parameter
that indicates the status of the operation.  A status of zero
indicates success; other values indicate an error.

There are some cautions about the callback function.  It MUST assume
that it may be called from interrupt level, even if your normal
operation would have it called at task level via your own call to the
polling routine.  Certain error and housekeeping conditions can cause
the driver to call your callback function from interrupt level.

Finally, the structure of your code must allow for the possibility
that the callback function may be called even before
msp_sec2_end_request() returns.  This CAN happen.

There are three special values for the callback function: CBK_NONE,
CBK_POLL, and CBK_SLEEP.

CBK_NONE means the driver does nothing when the function completes.
This would be used when you have a series of operations to queue to
the driver, and only need to know when the last one completes.

CBK_POLL tells the driver to continuously poll the completion queues
for the results of this operation before returning from
msp_sec2_end_request().

CBK_SLEEP is similar to CBK_POLL, but instead of busy waiting, puts
the task to sleep and waits for an interrupt to indicate that the
operation is complete.  Must not be used from interrupt level, for
obvious reasons.

If any of these special values is used (CBK_NONE, CBK_POLL, or
CBK_SLEEP), the cbk_prm parameter should be a pointer to a 32-bit
integer; the status of the operation will be stored here when the
operation completes.  

1.1.4 Polling the completion queues

The completion queues are polled by the function
msp_sec2_poll_completion_queues().

This function is called by the driver on certain interrupt conditions,
when it needs more queue space for an operation, and at other times.

The user can also call this function from any convenient place.  It
takes an integer mask that indicates which completion queue(s) to
poll, and an integer that indicates the maximum number of completion
entries to process when polling.

1.2 Security Association Structure

Each entry in the work queue takes a pointer to a Security Association
(SA) structure.  This structure is designed to contain the parameters
of a particular secure communication stream that would not vary from
one packet to the next.  The parameters include things like the type
of operation, option flags, hash and/or crypt algorithms, and keys for
the algorithms (sometimes keys are pre-processed).

Creation of the SA structure is mostly a matter of declaring it and
filling in the required fields.  Certain operations require
pre-processing of some SA structure fields (in particular, HMAC ipad
and opad chaining variables, and AES decryption keys for certain
modes), and helper functions are provided for these, but other than
that, the user must fill the structure in properly.

1.3 Parameter Validation

The hardware has some very strict requirements about its incoming
parameters.  For instance, the (total) output buffer size must EXACTLY
match that required by the operation, the buffer pointers must point
to valid memory, the values in the first 32 bits of the SA structure
must specify legal values, and the length of any encrypted/decrypted
portion of the data must be modulo the block size of the encryption
algorithm.

It is our experience that these parameters are either set directly by
the calling software, or already checked by the calling code in the
process of, e.g., making sure there's enough buffer space allocated
for the results.  Any invalid parameters would be the result of a
programmer error.  Therefore, it would usually be a waste of processor
cycles to check the values again in a shipping system.

So, parameter validation is provided in the driver, but it is enabled
and disabled by the CONFIG_BRECIS_SEC_V2_CKPARM configuration switch.
The parameter validation code is NOT designed to be enabled in a
shipping system, as it panics when a problem is found.  (Consider that
this is the same as what happens if you pass a bad buffer pointer to a
software routine.)

1.4 Calling the API from user space

User space calling of the API is in many ways very similar to calling
it from within the kernel.  The same structures are used, and pretty
much the same sort of function calls are used to build requests.

However, there is no provision for "callback" functions in user
space.  The user space interface is strictly a blocking interface.
Once the request is built in user space, an ioctl call is made.  This
ioctl call passes the request to the hardware, requesting an interrupt
on completion of the request.  The process is then put to sleep until
the interrupt routine discovers that the request is complete and wakes
the process back up.  Then the ioctl() call returns with the status of
the operation.


2. API definitions

2.1 Kernel API.

2.1.1 Kernel Structures

The kernel structures are defined in #include <brecis/msp_secv2.h>

2.1.1.1 Security Association structure (MSP_SEC2_SA)

The following fields are present in the SA structure:

	unsigned int flags ;
	unsigned int esp_spi ;
	unsigned int esp_sequence ;
	unsigned int hash_chain_a[5] ;
	unsigned int crypt_keys[8] ;
	unsigned int hash_chain_b[5] ;
	unsigned int hash_init_len[2] ;
	unsigned int crypt_iv[4] ;

The 'flags' field is split up into many sub-fields.

    Engine Mode -- defines what mode the engine is to operate in:
       SAFLG_MODE_ESP_IN   --  ESP incoming mode
       SAFLG_MODE_ESP_OUT  --  ESP outgoing mode
       SAFLG_MODE_HMAC     --  Single pass HMAC processing. 
       SAFLG_MODE_HASH_PAD --  Hash with padding
       SAFLG_MODE_HASH     --  Hash without final padding
       SAFLG_MODE_CRYPT    --  Encryption or decryption only.

       SAFLG_MODE_MASK     --  Mask to filter out engine mode bits.

       The various engine modes are discussed later in this document.

       The remaining sub-fields in the 'flags' variable may or may not
       have any meaning, depending upon the engine mode.  However, the
       software parameter validation code (see
       CONFIG_BRECIS_SEC_V2_CKPARM) checks for valid values in all
       fields whether they have meaning or not, so you should make
       sure that all sub-fields have valid values -- zero values are
       valid in all fields, so one way to ensure this is to start with
       a zeroed field.

    Individual Bits:
       SAFLG_SI           --  Sequence Increment.  Increment the
                              sequence number automatically when
                              creating an ESP packet.  This bit
                              has meaning only in ESP outgoing mode.

       SAFLG_CRI          --  CReate IV.  Fill in the IV field of the
                              packet with a random number obtained
                              from the random number generator.  This
                              bit has meaning only in ESP outgoing
                              mode.

       SAFLG_CPI          --  Compare ICV.  Compare computed check
                              value with the received value in
                              hardware.  This bit has meaning only in
                              ESP incoming mode

       SAFLG_EM           --  ESP Manual mode.  Discussed more fully
                              in engine mode documentation later in
                              this document.  This bit has meaning
                              only in ESP outgoing mode.

       SAFLG_CV           --  Use Chaining Variables.  Has meaning for
                              all modes that do hashing -- all but
                              Crypt mode, and ESP modes with a NULL
                              hash algorithm.

       SAFLG_DES_K1_DECRYPT
                          --  If this bit is set, the engine will
                              decrypt in single DES, or decrypt
                              for the first key in triple DES.
                              Otherwise, the engine encrypts.
                              This bit has meaning only for modes that
                              do crypt operations (Crypt mode, and ESP
                              in and out modes with a non-null
                              encryption algorithm.)

       SAFLG_DES_K2_DECRYPT
                          --  If this bit is set, the engine will
                              decrypt for the second key in triple DES.
                              Otherwise, the engine encrypts.
                              This bit has meaning only for modes that
                              do crypt operations (Crypt mode, and ESP
                              in and out modes with a non-null
                              encryption algorithm.)

       SAFLG_DES_K3_DECRYPT
                          --  If this bit is set, the engine will
                              decrypt for the third key in triple DES.
                              Otherwise, the engine encrypts.
                              This bit has meaning only for modes that
                              do crypt operations (Crypt mode, and ESP
                              in and out modes with a non-null
                              encryption algorithm.)

       SAFLG_AES_DECRYPT  --  If this bit is set, the engine will
                              encrypt in AES mode; otherwise, the
                              engine decrypts.  This is the same bit
                              as SAFLG_DES_K1_DECRYPT, with a
                              different name for the convenience of
                              the programmer only.
                              This bit has meaning only for modes that
                              do crypt operations (Crypt mode, and ESP
                              in and out modes with a non-null
                              encryption algorithm.)

    Hash Algorithm -- which hashing algorithm is used.
      SAFLG_MD5           --  MD5 algorithm
      SAFLG_MD5_96        --  MD5 algorithm, keeping only 96 bits of
                              the results
      SAFLG_SHA1          --  SHA1 algorithm
      SAFLG_SHA1_96       --  SHA1 algorithm, keeping only 96 bits of
                              the results
      SAFLG_HASHNULL      --  No hash algorithm used.

      The Hash Algorithm subfield has meaning only for modes that do
      hashing; that is, all but Crypt mode.  SAFLG_HASHNULL is
      only known to be valid for the ESP modes.

    Crypt Algorithm -- which cryptography algorithm is to be used.
      SAFLG_DES           --  Single DES encryption/decryption
      SAFLG_3DES          --  Triple DES encryption/decryption
      SAFLG_AES_128       --  AES with 128 bit keys
      SAFLG_AES_192       --  AES with 192 bit keys
      SAFLG_AES_256       --  AES with 256 bit keys
      SAFLG_CRYPTNULL     --  No crypt algorithm used

      The Crypt Algorithm subfield has meaning only for modes that do
      encryption/decryption; that is, Crypt and ESP in/out modes.
      SAFLG_CRYPTNULL is only known to be valid for the ESP modes.

    Crypt Block Modes -- Which block mode to use for encryption.  
      SAFLG_ECB           --  ECB (Electronic Code Book) operation.
      SAFLG_CBC_ENCRYPT   --  CBC (Cipher Block Chaining) encrypt operation
      SAFLG_CBC_DECRYPT   --  CBC (Cipher Block Chaining) decrypt operation
      SAFLG_CFB_ENCRYPT   --  CFB (Cipher FeedBack) encrypt operation
      SAFLG_CFB_DECRYPT   --  CFB (Cipher FeedBack) decrypt operation
      SAFLG_CTR           --  CTR (Counter mode) operation
      SAFLG_OFB           --  OFB (???) operation

      The Crypt Block Mode subfield has meaning only for modes that do
      encryption/decryption; that is, Crypt and ESP in/out modes.
      CTR and OFB modes are only valid for AES operations.
      (???) CBC encrypt/decrypt are the only supported operations in
      ESP modes.

The 'esp_spi' field contains the SPI that will be copied into the
created ESP packet.  Used only in ESP_OUT mode.

The 'esp_sequence' field contains the sequence number copied into the
created ESP packet.  Used only in ESP_OUT mode.  If the SAFLG_SI bit
is set in the 'flags' field, this value will be incremented each time
it is used.  The sequence number is incremented after it is put into
the packet, so, e.g., if you want the sequence number of the first
packet to be 1, initialize the sequence number field to be 1.

The 'hash_chain_a' field is used in all hashing operations when
SAFLG_CV is set.  It contains the partial results of previous hashes
for the first (HMAC, ESP_IN, and ESP_OUT) or only (HASH_PAD, HASH)
hash operation.

The 'hash_chain_b' field is used in HMAC operations (including HMAC,
ESP_IN, and ESP_OUT operations) when SAFLG_CV is set.  It contains the
partial results of previous hashes for the second hash operation.

Note: in normal use of HMAC mode, and HMAC as used in ESP_IN and
ESP_OUT mode, SAFLG_CV is always set, and the 'hash_chain_a' and
'hash_chain_b' fields represent the pre-calculated partial hashes of
the inner and outer hashes, respectively.  See HMAC discussion below.

(At the time of writing, the author of this document knows of no
useful situation where HMAC (or ESP w/hmac) processing would be done
without the SAFLG_CV bit set.)

The 'hash_init_len' field is used in HMAC, HASH_PAD, and
ESP_IN/ESP_OUT modes where hmac is used.  It represents the length, in
bits, of the data that was run through the hash algorithm to get the
partial hash results in hash_chain_a (and hash_chain_b, if
applicable).  The whole field should be 0's if SAFLG_CV is not set.
In HMAC modes, this should almost always be set to 0x200; that is:

         sa.hash_init_len[0] = 0 ;
         sa.hash_init_len[1] = 0x200 ;

[ For those of you who happen to know way too much about the internals
of SHA1 and MD5, the hash_init_len field is always expressed as a big
endian count of bits, regardless of the hash function; the harware
does any required byte reordering for the algorithm. ]

The 'crypt_keys' field contains the keys for the encryption algorithm;
8 bytes (2 ints) for single DES, 24 bytes for 3DES, 16, 24, or 32
bytes for AES.  (The field is 32 bytes long; unused bytes are
ignored.)  Except in the case of AES Decryption, these bytes are the
plaintext keys, without any "pre-processing" or "key scheduling."  In
AES with the engine in decrypt mode, the keys must be pre-processed,
as discussed below.

The 'crypt_iv' field contains the IV, if any, for the crypt operation.
This field is used in CRYPT, ESP_IN, and ESP_OUT modes when the crypt
algorithm is not SAFLG_CRYPTNULL, and the crypt blocking mode is
something other than SAFLG_ECB.

2.1.1.2 Work Queue Entry

A structure called a Work-queue Entry (WE) is used internally by the
driver, and passed to the hardware to do the work of one request.
This structure is not explicitly described in the msp_secv2.h file,
because the structure is built on the fly by the driver within the
calls to msp_sec2_new_request(), msp_sec2_add_sg(), and
msp_sec2_end_request.  Nevertheless, it may be useful to understand
this structure, so it is described here.

The following fields are present in the WE structure:

	struct SA	*sa ;
        unsigned int	control ;
	unsigned int    sw0 ;
	unsigned int	sw1 ;
	struct sg_entry	sge[] ;

The 'sa' field is a pointer to the SA structure to be used for this
operation.

The 'control' field is split up into a few subfields.  All but the
"Work Element Size" field must be given by the caller in the 'control'
parameter of the msp_sec2_new_request() function.

    Work Element Size -- indicates how long this WE structure is
       (which is variable, depending on the number of scatter/gather
       entries in the descriptor).  This is calculated and filled in
       by the driver.

    SEC2_WE_CTRL_CQ -- A single bit flag, which indicates which
       completion queue to place the results of this operation in,
       zero or one (if not set or set, respectively).

    SEC2_WE_CTRL_GI -- Generate interrupt.  If you want an interrupt
       generated when this request is finished, set this bit.  The
       driver will poll the completion queues when the interrupt is
       received, and call the callback function you specified
       (assuming you specified one -- setting this GI bit without a
       callback function is of questionable use).

    SEC2_WE_CTRL_AKO -- AES Key Out. This is used in preprocessing AES
       decryption keys.  If you are using the driver's pre-processing
       functions, you will never need to set this bit.

    ESP Trailer Next Header -- used in ESP_OUT mode only, this field
       gets copied to the Next Header field of the created ESP
       Trailer.  The field is 8 bits wide; set the field in this
       manner:

		control_val |= nxt_hdr << SEC2_WE_CTRL_NXTHDR_SHF ;

    ESP Trailer Pad Length -- used in ESP_OUT mode only, this field
       gets copied to the Pad Length field of the created ESP Trailer,
       and the hardware generates padding of this length before the
       the ESP trailer.  The field is 8 bits wide; set the field in
       this manner:

		control_val |= pad_len << SEC2_WE_CTRL_PADLEN_SHF ;

       NOTE that it is vital that you set this length correctly, so
       that the number of bytes that get encrypted is a multiple of
       the encryption algorithm's block size (8 for DES/3DES, 16 for
       AES).

The 'sw0' and 'sw1' fields are used by the driver in implementing the
callback function.

The 'sge[]' array is a list of scatter/gather elements (in other
words, buffer descriptors), listing for each element the address of
the buffer, its length (up to 8191 bytes per buffer), and whether it
is a scatter or gather element (scatter elements are destination
buffers, gather elements are source buffers).

2.1.2.  Kernel entry points

2.1.2.1 msp_sec_enquire

The prototype for this function is:

void
msp_sec_enquire(int *info) ;

The parameters are as follows:

    info: pointer to an array of two ints.  The first int of this array
	(info[0]) will hold information about the hardware:

        One of the following bits will be set to indicate which type of
        security hardware is present:

            SECENQ_HW1    -- Original, "DUET" security hardware present.
            SECENQ_HW2    -- Newer, "POLO" security hardware present.

        If zero is returned for this value, there is no security hardware
        present.

      The second int of the info array (info[1]) will hold information
        about the driver capabilities:

        The following bits will be set to indicate what is supported by the
        driver.

	    SECENQ_API1     -- Original, "DUET" api is available.
            SECENQ_API2     -- Newer, "POLO" api (described in this document)
            SECENQ_API2_AES -- can do AES with new API

Discussion:

    The driver has the ability to support the old API on either kind
    of hardware, and you do not need to restrict yourself to one API
    or the other on a given platform.  (In other words, you can run
    some programs using the old API, and others using the new API,
    without interference.)

    This takes some code space, though, so there are compile time
    driver configuration options that select which hardware is
    supported by the driver, and which APIs are supported.

    And, even if both APIs are available, it is likely the native API
    for the given hardware will be the most efficient to use ("old"
    API on "old" hardware, "new" API on "new" hardware).

    In addition, the "old" hardware does not support AES.  Someday,
    there may be an option to do AES in software on the old hardware.
    So, a separate bit to indicate AES availability is supplied here.

    This function gives software the ability to check for API
    availability at runtime, and act apropriately.  

    Due to memory space concerns, we anticipate API selection will
    usually be done at compile time, with #ifdefs, or a single API
    will be chosen at design time.  In this case a call to this
    funciton will be used simply to make sure that the API required by
    the code is available.

2.1.2.2 msp_sec2_new_request

The prototype for this function is:
		
int
msp_sec2_new_request(	int work_q, 
			int n_sg_entries ,
			MSP_SEC2_SA *sa, 
			int control,
			void (*callback_fn)(void *,unsigned int),
			void *cbk_prm,
			int block) ;

The parameters are as follows:

    work_q  -- Which queue (0 or 1) to put this request on.

    n_sg_entries -- The maximum total number of scatter/gather entries
        this request will put in the queue (maximum number of times
        you will call msp_sec2_add_sg()).  This is used to check that
        space for the Work-queue Entry structure is available in the
        work queue.  This is the *ONLY* time the space is checked for;
        you must NOT call msp_sec2_add_sg() more times than you
        indicate here.  You, may, however, call msp_sec2_add_sg()
        fewer times than you specify here -- the final descriptor's
        size will be adjusted accordingly.  (If you use a value larger
        than you really need for n_sg_entries, the risk, of course, is
        that the call will return bad status (or wait), thinking
        there's insufficient space in the queue, when there really is
        enough to do the operation you want.)  

        You should use a total of 16 or fewer scatter/gather entries.

    sa -- the address of the Security Association structure to use
        with this request.

    control -- the user-specified values for the control field for the
        Work-queue Entry structure.  Start with the value 0, set or
        don't set SEC2_WE_CTRL_CQ, SEC2_WE_CTRL_GI, and
        SEC2_WE_CTRL_AKO as desired, and set the ESP Trailer Next
        Header and Pad Length fields appropriately if requesting an
        ESP_OUT operation.

    callback_fn -- a pointer to the function you want called when the
        hardware completes the request.  See "Completion callback
        function," section 1.1.3, for more info.

    cbk_prm -- a 32 bit parameter to be passed to the callback
        function.  Again, see section 1.1.3 for more information.

    block -- if this parameter is BLK_NONE, the function will return with
        a bad return code if there is not enough space available.  If this
        parameter is BLK_POLL, the function will poll the hardware to
        empty the completion queues until space is available.  If this
        parameter is BLK_SLEEP, the task will be put to sleep until
        space is available.  You must be running at task level, not
        interrupt level, to use BLK_SLEEP.

The return value is non-zero on failure, such as not enough space, and
0 for success.

This function checks for available space, and begins the building of a
work-queue entry in the specified work queue.

If the queue option WQ_OPT_MASKINT is set (the default, see
msp_sec2_set_q_options()) for the given work queue, interrupts will be
disabled when msp_sec2_new_request() returns successfully.  The
interrupt status will be returned to its previous state when
msp_sec2_end_request() or msp_sec2_abort_request() is called.

For this and other reasons, you MUST be sure to pair each successful
call to msp_sec2_new_request() with a call to either
msp_sec2_end_request() or msp_sec2_abort_request().

2.1.2.3 msp_sec2_add_sg

The prototype for this function is:

void
msp_sec2_add_sg(	int work_q, 
			int scatter,
			void * address,
			int size ) ;

The parameters are as follows:

    work_q -- Which queue (0 or 1) to add this scatter/gather entry
        to.  It is possible to be simultaneously building requests on
        separate queues, so the driver uses this parameter to
        determine which queue to take action on.

    scatter -- If set to SG_SCATTER, this is a description of a
        scatter (i.e.  "destination") buffer; if set to SG_GATHER,
        it's a gather (i.e. source) buffer.

    address -- the address of the buffer.

    size -- the size of the buffer.  Maximum is 8191 (13 bits).

This function adds a buffer descriptor to the work queue entry.  You
should limit yourself to 16 or fewer total scatter/gather buffers.

2.1.2.4  msp_sec2_end_request

The prototype for this function is:

void
msp_sec2_end_request( int work_q ) ;

The parameter is:

    work_q -- Which queue (0 or 1) to complete the request on.  It is
        possible to be simultaneously building requests on separate
        queues, so the driver uses this parameter to determine which
        queue to take action on.

This function completes the building of the work queue entry, and
gives the request to the hardware, then (if necessary) returns the
interrupt status to its previous state.  Your callback routine, if
any, should be written to assume that it may be called before this
routine returns.

2.1.2.5  msp_sec2_abort_request

The prototype for this function is:

void
msp_sec2_abort_request( int work_q ) ;

The parameter is:

    work_q -- Which queue (0 or 1) to abort the request on.  It is
        possible to be simultaneously building requests on separate
        queues, so the driver uses this parameter to determine which
        queue to take action on.

This function abandons the work request we are in the process of
building, and, if necessary, returns the interrupt status to its
previous state.

NOTE: We expect that there may not be a need for this routine, as it
would usually be much more efficient to figure out that you're not
going to finish building the descriptor before calling
msp_sec2_new_request() in the first place.  But this function is
included for the odd case that may need it.

2.1.2.6  msp_sec2_poll_completion_queues

The prototype for this function is:

int
msp_sec2_poll_completion_queues(int queue_mask, int max_req) ;

The parameters are:

    queue_mask -- specifies whether you want to poll one or both of
        the completion queues, and if only one, which one.  Values:
            1 - Poll completion queue 0
            2 - Poll completion queue 1
            3 - Poll both completion queues.

    max_req -- maximum number of completion queue entries that will be
        processed in this polling operation.  If this value is zero,
        there is no limit -- the queues will be processed until empty.

The function returns one of these values:

    -1  - no work to be done on entering (the specified queue(s) were
         empty when the function was called).
     0  - some work was done, the queues are now empty (no more work
         to be done).
     1  - some work was done, but the limit was reached and the
         function returned before the specified queue(s) were empty.

This function checks for entries in the completion queues, which are
indications that the hardware has completed an operation.  For each
completed operation, the driver does any necessary housekeeping and
calls the request's callback function.  Your callback functions will
be called from whatever context you call this function from --
be it interrupt level or task level.

2.1.2.7  msp_sec2_set_q_options

The prototype for this function is:

int
msp_sec2_set_q_options(		int work_q,
				int opt,
				int val ) ;

The parameters are:

    work_q -- Which queue (0 or 1) to set the option on.

    opt -- Which option to change.

    val -- What value to set the option to.

The function returns one of these values:

    0 -- success
    1 -- Unknown option number 'opt'
    2 -- Value 'val' is out of range for option 'opt'

Currently available options are:

    WQ_OPT_MASKINT:  If value is non-zero, a successful return from
        msp_sec2_new_request() will leave interrupts disabled, and
        calling either msp_sec2_end_request() or
        msp_sec2_abort_request() will return interrupts to their
        previous state -- for operations on the given queue.  If value
        is zero, interrupt status is not touched by these operations.

2.1.2.8  msp_sec2_set_hmac_key

The prototype for this function is:

int
msp_sec2_set_hmac_key(		MSP_SEC2_SA *sa,
				unsigned char *key,
				int keylen,
				int workq,
				int compq,
				int sleep) ;

The parameters are:

    sa -- the Security Association structure that will hold the
        resulting pre-processed partial hashes.  The flags field of
        the SA structure must be filled in properly before calling
        this function, as it is used to determine the proper
        algorithms to use for the partial hashes.

    key -- a pointer to the key data.

    keylen -- length in bytes of the key data

    workq -- which work queue to use for the hashing operations.

    compq -- which completion queue to use for the hash results.

    sleep -- if zero, this call will busy-wait, polling for the
        security engine to finish its work.  If it is non-zero, this
        call will put the task to sleep and use interrupts to wait for
        completion.

The possible return values are:

    0 -- Success.
    -1 -- Failure for any reason (most likely cause is a bad key
        pointer or bad keylen value)

This function takes a HMAC key and creates the partial hashes that
HMAC and ESP require in the SA (fills in the hash_chain_a and
hash_chain_b fields of the SA).  It also fills in the hash_init_len
field of the SA to properly reflect these partial hashes.

This function does not return until the SA hash chain fields are
filled in properly.

This function is a "convenience function".  Use of it is not strictly
required.  For instance, if you don't want to block on this call, you
can do the equivalent operations yourself using callback functions.
(This function allocates the temporary SA structures needed to perform
the partial hashes on the stack; that is why it needs to block before
returning.)

The operations performed by this function are:

    1. If key is longer than the hash block size, it is run through
       the hash algorithm, and the result is used as the key.

    2. The key (or it's hash) is copied into a hash-block sized
       buffer, and padded with nulls up to the end of the buffer if
       necessary.

    3. The result of step 2 gets each byte xored with 0x36, and a
       partial hash is run from this buffer; the results are placed in
       hash_chain_a[].

    4. The result of step 2 gets each byte xored with 0x5c, and a
       partial hash is run from this buffer; the results are placed in
       hash_chain_b[].

    5. hash_init_len[0] is set to 0, and hash_init_len[1] is set to
       the length of the hash block in BITS (in this case, 0x200
       (decimal 512) bits, meaning 64 bytes).

The length of the key must be less than 8192 bytes (because a single
gather descriptor is used in the implementation of this function). 

2.1.2.9  msp_sec2_set_aes_decrypt_key

The prototype for this function is:

int
msp_sec2_set_aes_decrypt_key(	MSP_SEC2_SA *sa,
				int workq,
				int compq,
				int sleep) ;

The parameters are:

    sa -- the Security Association structure that will hold the
        resulting pre-processed key.  The flags field of the SA
        structure must be filled in properly before calling this
        function, as it is used to determine the correct AES
        encryption mode to use.  The crypt_keys field should be filled
        in with the original key to be pre-processed.

    workq -- which work queue to use for the operation.

    compq -- which completion queue to use for the results.

    sleep -- if zero, this call will busy-wait, polling for the
        security engine to finish its work.  If it is non-zero, this
        call will put the task to sleep and use interrupts to wait for
        completion.

The possible return values are:

    0 -- Success.
    -1 -- Failure for any reason (most likely cause is a bad crypt
        type in the keys field).

For AES decrypt operations (where the engine is used in decrypt mode;
i.e. only ECB and CBC modes), the hardware engine requires the keys to
go through some pre-processing before being used.  This function can
be used to do this pre-processing.

The function does not return until the AES key is ready for use.

This function is a "convenience function".  Use of it is not strictly
required.  For instance, if you don't want to block on this call, you
can do the equivalent operations yourself using callback functions.
(This function allocates the temporary SA structures needed to perform
the preprocessing on the stack; that is why it needs to block before
returning.)

The operation performed by this function is:

    1. Create a temporary SA structure similar to the one provided,
       but indicating only a CRYPT operation, with SAFLG_AES_DECRYPT *not*
       set, and block mode of ECB, using the original keys.

    2. Create a request that uses this temporary SA to encrypt 16
       bytes (the encryption results will not be used); set the
       SEC2_WE_CTRL_AKO (AES Key OUT) bit in the control word of the
       request, which causes the output of the operation to be 32
       bytes of key data to place in the SA for decryption (rather
       than the encryption results).  So the destination buffer for
       the operation is the crypt_keys field in the original SA.

    3. Wait for the request to complete, and return.

2.2 User level API

The user level API is very similar to the kernel API.

If you thought you were just going to use the user level API, and
therefore skipped the previous sections on the kernel API: Stop; go
back and read it, perhaps after reading the next few paragraphs.  What is
written in the following sections only documents the differences
between the kernel and the user level APIs.

One major difference between the kernel level and user level APIs
is a major reduction in flexibility.  The user level calls ALWAYS
block waiting for the command to complete, and they ALWAYS do so using
interrupts.  There is therefore no concept of a completion callback
function in the user level API.  

Another major difference between the user level API and kernel level
is in the building of the work queue entry.  Calls to driver functions
within the kernel are quite efficient, like normal function calls
within the program.  However, calls to the driver from a user level
program have much more overhead, that of a system call.  

Building a work queue entry with the kernel API is done with multiple
function calls to the driver.  This would be expensive from user
level, though.  So instead, the work queue entry is built in user
memory, and a single function call is made to the driver.  The
difference is fairly minor because building the work queue entry in
memory and handing it to the driver is done with C macros that match
up pretty well with the kernel API function calls.

Yet another difference between kernel level and user level APIs:
Traditionally, user programs are not allowed to perform operations
that would be detrimental to any other task in the system.  Since
"hanging" or "wedging" the security engine would definitely be a bad
thing to the system as a whole, parameter validation is ALWAYS
performed for user level API calls.  This is somewhat slow; the driver
design assumes that operations that must be fast would be done within
the kernel.

2.2.1  User level structures

The user level structures are defined in the include file
"brecis/msp_secv2.h", which should be #included in your program.

2.2.1.1  Security Association structure (MSP_SEC2_SA)

The user level API uses the same Security Association structure as the
kernel API.  See section 2.1.1.1 for documentation on this structure.

2.2.1.2  Work Queue Entry structure

In the user level API, the work queue entry is built in an actual
structure -- much different from the kernel API.

The structure typedef name is MSP_SEC2_WQE.  

The following fields are present in the WE structure:

	int 		wqn ;
	struct SA	*sa ;
        unsigned int	control ;
	unsigned int    sw0 ;
	unsigned int	sw1 ;
	struct sg_entry	sge[MSP_SEC2_SGE_MAX] ;

(There are other fields that are used internally by the macros that
you shouldn't need to deal with.)

The 'wqn' field indicates which work queue to use for this operation,
and is set by the MSP_SEC2_NEW_REQUEST() macro.

The 'sa' field is a pointer to the SA structure to be used for this
operation, exactly like in the kernel API.  The MSP_SEC2_NEW_REQUEST
macro sets this field.

The 'control' subfield is the same as in the kernel API.  

    The Work Element Size subfield is set by the MSP_SEC2_END_REQUEST
    to indicate the number of scatter/gather entries you have given.

    The SEC2_WE_CTRL_CQ subfield is a single bit that controls which
    completion queue is used for this operation.  Set by the caller in
    the control parameter of MSP_SEC2_NEW_REQUEST() macro.

    SEC2_WE_CTRL_GI, which controls generating of interrupts, is
    always set by the driver, because interrupts are used to complete
    user level API requests.  This may not affect the structure you
    create, but whether or not you set this bit has no effect.

    SEC2_WE_CTRL_AKO (AES Key out), ESP Trailer Next Header, and ESP
    Trailer Pad Length work exactly like the same fields in the kernel
    API, and are also taken from the control parameter of the
    MSP_SEC2_NEW_REQUEST() macro.

The 'sw0' and 'sw1' fields are ignored by the driver.

The sge[] array holds scatter gather entries for the operation.  This
array holds up to MSP_SEC2_SGE_MAX (16) entries.  Scatter gather
entries are created by the MSP_SEC2_ADD_SG() macro.

2.2.2  User level API calls

Many of the user level API calls are macros or inline functions.  The
main difference between these macros and the corresponding kernel API
calls is that they take a pointer to a Work Queue Entry structure in
place of, or in addition to, the work queue number.  The caller must
supply the Work Queue Entry structure for the operation.

2.2.2.1  MSP_SEC2_ENQUIRE

MSP_SEC_ENQUIRE(int *info)

The parameters are as follows:

    info: pointer to an array of two ints.  The first int of this array
	(info[0]) will hold information about the hardware:

        One of the following bits will be set to indicate which type of
        security hardware is present:

            SECENQ_HW1    -- Original, "DUET" security hardware present.
            SECENQ_HW2    -- Newer, "POLO" securitye hardware present.

        If zero is returned for this value, there is no security hardware
        present.

      The second int of the info array (info[1]) will hold information
        about the driver capabilities:

        The following bits will be set to indicate what is supported by the
        driver.

	    SECENQ_API1     -- Original, "DUET" api is available.
            SECENQ_API2     -- Newer, "POLO" api (described in this document)
            SECENQ_API2_AES -- can do AES with new API

The return value is normally zero.  Non-zero is failure, and probably means
the driver couldn't be found on this system.

2.2.2.2  MSP_SEC2_NEW_REQUEST

MSP_SEC2_NEW_REQUEST(	MSP_SEC2_WQE	*wqep,
			int		work_q,
			MSP_SEC2_SA	*sa,
			int		control ) ;

The parameters are as follows:

    wqep -- a pointer to a MSP_SEC2_WQE structure, which must be
        allocated by the caller.
			
    work_q -- Which queue (0 or 1) to put this request on.

    sa -- the address of the Security Association structure to use
        with this request.

    control -- the user-specified values for the control field for the
        Work-queue Entry structure.  Start with the value 0, set or
        don't set SEC2_WE_CTRL_CQ and SEC2_WE_CTRL_AKO as desired, and
        set the ESP Trailer Next Header and Pad Length fields
        appropriately if requesting an ESP_OUT operation.

Differences from the kernel API:  

    'wqep' (and the structure it points to) is needed, because that's
         where the work queue entry is built.  

    'n_sg_entries' is not needed, because the amount of space needed
        is calculated at the end of creating the entry, before handing
        it to the kernel.

    'control' field has the SEC2_WE_CTRL_GI bit regarded as 1 at all
        times by the driver.

    'callback_fn' and 'cbk_parm' not used, no callback routines.

    'block' parameter not present, as this mechanism always uses
    block=1 within the kernel (wait until queue space is available).

    Has no return code (is a void function).  Can never fail because
    the user allocates the memory.

2.2.2.3  MSP_SEC2_ADD_SG

int
MSP_SEC2_ADD_SG(	MSP_SEC2_WQE	*wqep,
			int		scatter,
			void		*address,
			int		size) ;

The parameters are identical to the kernel msp_sec2_add_sg() call,
with the exception that the first parameter is a pointer to the work
queue entry pointer, instead of the work queue number.

This function adds a buffer descriptor to the work queue entry.

It will return a non-zero value if you have gone over the maximum
number of scatter/gather entries (16 total).  It will return 0 on
success.

2.2.2.4  MSP_SEC2_END_REQUEST

int
MSP_SEC2_END_REQUEST(	MSP_SEC2_WQE	*wqep) ;

Finishes up the request, and sends it to the kernel; then returns when
the operation finishes.  The results of the operation are written into
the status variable of the wqe structure.

The return value will be 0 if the driver was called successfully, and
the status value of the wqe structure is valid.  If it returns non-zero,
the driver could not be called for some reason.

2.2.2.5  MSP_SEC2_ABORT_REQUEST

MSP_SEC2_ABORT_REQUEST(	MSP_SEC2_WQE	*wqep ) ;

This macro aborts the building of the request in MSP_SEC2_WQE, freeing
any resources that may have been allocated; to be used any time
MSP_SEC2_NEW_REQUEST is used but MSP_SEC2_END_REQUEST will not be
called.

2.2.2.6  MSP_SEC2_SET_HMAC_KEY

The prototype for this function is:

int
MSP_SEC2_SET_HMAC_KEY(		MSP_SEC2_SA *sa,
				unsigned char *key,
				int keylen,
				int workq,
				int compq) ;

The parameters are:

    sa -- the Security Association structure that will hold the
        resulting pre-processed partial hashes.  The flags field of
        the SA structure must be filled in properly before calling
        this function, as it is used to determine the proper
        algorithms to use for the partial hashes.

    key -- a pointer to the key data.

    keylen -- length in bytes of the key data

    workq -- which work queue to use for the hashing operations.

    compq -- which completion queue to use for the hash results.

The possible return values are:

    0 -- Success.
    -1 -- Failure for any reason (most likely cause is a bad key
        pointer or bad keylen value)

This function takes a HMAC key and creates the partial hashes that
HMAC and ESP require in the SA (fills in the hash_chain_a and
hash_chain_b fields of the SA).  It also fills in the hash_init_len
field of the SA to properly reflect these partial hashes.

This function is a "convenience function".  Use of it is not strictly
required.  If you want to,, you can do the equivalent operations
yourself.

The operations performed by this function are:

    1. If key is longer than the hash block size, it is run through
       the hash algorithm, and the result is used as the key.

    2. The key (or it's hash) is copied into a hash-block sized
       buffer, and padded with nulls up to the end of the buffer if
       necessary.

    3. The result of step 2 gets each byte xored with 0x36, and a
       partial hash is run from this buffer; the results are placed in
       hash_chain_a[].

    4. The result of step 2 gets each byte xored with 0x5c, and a
       partial hash is run from this buffer; the results are placed in
       hash_chain_b[].

    5. hash_init_len[0] is set to 0, and hash_init_len[1] is set to
       the length of the hash block in BITS (in this case, 0x200
       (decimal 512) bits, meaning 64 bytes).

The length of the key must be less than 8192 bytes (because a single
gather descriptor is used in the implementation of this function).

2.1.2.9  MSP_SEC2_SET_AES_DECRYPT_KEY

The prototype for this function is:

int
MSP_SEC2_SET_AES_DECRYPT_KEY(	MSP_SEC2_SA *sa,
				int workq,
				int compq) ;

The parameters are:

    sa -- the Security Association structure that will hold the
        resulting pre-processed key.  The flags field of the SA
        structure must be filled in properly before calling this
        function, as it is used to determine the correct AES
        encryption mode to use.  The crypt_keys field should be filled
        in with the original key to be pre-processed.

    workq -- which work queue to use for the operation.

    compq -- which completion queue to use for the results.

The possible return values are:

    0 -- Success.
    -1 -- Failure for any reason (most likely cause is a bad crypt
        type in the keys field).

For AES decrypt operations (where the engine is used in decrypt mode;
i.e. only ECB and CBC modes), the hardware engine requires the keys to
go through some pre-processing before being used.  This function can
be used to do this pre-processing.

The function does not return until the AES key is ready for use.

This function is a "convenience function".  Use of it is not strictly
required.  If you want to, you can do the equivalent operations
yourself.

The operation performed by this function is:

    1. Create a temporary SA structure similar to the one provided,
       but indicating only a CRYPT operation, with SAFLG_AES_DECRYPT *not*
       set, and block mode of ECB, using the original keys.

    2. Create a request that uses this temporary SA to encrypt 16
       bytes (the encryption results will not be used); set the
       SEC2_WE_CTRL_AKO (AES Key OUT) bit in the control word of the
       request, which causes the output of the operation to be 32
       bytes of key data to place in the SA for decryption (rather
       than the encryption results).  So the destination buffer for
       the operation is the crypt_keys field in the original SA.

    3. Wait for the request to complete, and return.

3.  Discussion of Specific Engine Modes

While some of the engine modes are relatively simple, the ESP engine
modes are more complex.  Each mode is discussed in depth here.

3.1  CRYPT mode

Crypt mode does encryptions and decryptions only.

The total length of source buffers MUST be equal to the total length
of the destination buffers.  The total length MUST also be a multiple
of the block size for the encryption algorithm (that is, a multiple of
8 for DES, and a multiple of 16 for AES).

[ Exception: when pre-computing AES keys (i.e. the SEC2_WE_CTRL_AKO
bit is set in the control field of the working entry), the scatter
(destination) buffer length MUST be 32 bytes long.  AES decrypt key
pre-processing is described below. ]

To perform an encrypt or decrypt operation, the Engine Mode sub field
of 'flags' in the SA would be set to SAFLG_MODE_CRYPT.  The Crypt
Algorithm subfield, the encrypt / decrypt flags (SAFLG_DES_K1_DECRYPT,
SAFLG_DES_K2_DECRYPT, and SAFLG_DES_K3_DECRYPT, or SAFLG_AES_DECRYPT),
and the Crypt Block Mode subfield would all be set to match the
desired operation.  The necessary keys are placed in the crypt_keys[]
array directly, with the exception of AES keys when SAFLG_AES_DECRYPT
is set (see below).  If an IV is used by the selected blocking mode,
it is placed in the crypt_iv[] array -- 8 bytes for DES, 16 bytes for
AES.

3.1.1  Single DES Operation Notes

Single DES is affected by SAFLG_DES_K1_DECRYPT only.  Single DES keys
are placed in the first 8 bytes (2 ints) of crypt_keys[].

3.1.2  Triple DES Operation Notes

Triple DES is affected by all three Encrypt / Decrypt flags
(SAFLG_DES_K1_DECRYPT, SAFLG_DES_K2_DECRYPT, and
SAFLG_DES_K3_DECRYPT).  The keys for Triple DES are placed in the
first 24 bytes (6 ints) of crypt_keys[], and are treated as 3 keys of
8 bytes each.

The engine applies the three Keys and Encrypt / Decrypt flags in the
order you give them.  Because of this, it is often necessary to
reverse the order the three keys are placed in crypt_keys[] on, e.g.,
the receiving end of a secure tunnel.  Here's an explanation that may
help you determine when this is necessary:

    Triple DES is commonly done as an Encrypt / Decrypt / Encrypt
    (EDE) operation, where the plaintext block is encrypted with key
    A, then decrypted with key B, then encrypted with key C, to
    produce the ciphertext.  The correct steps to obtain the original
    plaintext from the ciphertext are to decrypt with key C, then
    encrypt with key B, then decrypt with key A.

    You would get incorrect results if you tried to obtain the
    plaintext by decrypting with key A, then encrypt with key B, then
    decrypt with key C.

Unfortunately, we can't just tell you to always give the keys in the
reverse order for decrypting; it depends upon the blocking mode in
use, and how the blocking mode uses the DES engine.  The short answer
is: in ECB mode and CBC mode, the receiving end must reverse the order
of the keys.  Leave them in the original order for other blocking modes.

3.1.3  AES Operation Notes

AES encrypt/decrypt operation is affected by the SAFLG_AES_DECRYPT flag.
In encrypt mode (SAFLG_AES_DECRYPT not set), the keys for AES-128
occupy the first 16 bytes (4 ints) of crypt_keys[]; the keys for
AES-192 occupy the first 24 bytes (6 ints) of crypt_keys[]; and the
keys for AES-256 occupy all 32 bytes (8 ints) of crypt_keys[].

To use the AES engine in decrypt mode, the crypt_keys[] array must be
set with a pre-processed version of the keys, also known as a "key
expansion".  This is pre-processing of the keys is done by setting up
a small, 16 byte, dummy AES "encryption" using the desired keys, but
with the SEC2_WE_CTRL_AKO bit set in the control field of the work
queue entry, and with the scatter list pointing to a 32 byte buffer to
contain the expanded key.  (The expanded key, and thus the size of the
scatter buffer, is always 32 bytes, regardless of the original key
size; and the scatter buffer would usually be set to point directly at
the crypt_keys[] field of SA structure you're setting up).

The driver provides a convenience routine,
msp_sec2_set_aes_decrypt_key(), to handle the details of this
pre-processing for you.

3.2  HASH mode

Hash mode performs a (partial) hash function (SHA-1 or MD5) on the
contents of the source (gather) buffer(s), and places the results in the
destination (scatter) buffer(s).

This function performs only the internal "transform" function of the
given hash algorithm.  It operates only on multiples of the hash
blocking size (which is 64 bytes for both algorithms), so the total
length of the gather buffers MUST be a multiple of 64 bytes.  

The hash results are placed in the scatter buffer: 16 bytes for MD5,
20 bytes for SHA-1, or 12 bytes for the 96 bit truncated versions of
the algorithms (MD5-96 and SHA1-96).  The total length of the scatter
buffers MUST be the correct size for the requested result.

[ This author does not believe there would be much use for truncating
the results (MD5-96 or SHA1-96) without padding being added first. ]

If you have the results of a previous partial hash, and want to
continue, put the previous hash results in the hash_chain_a field of
the SA, and set the SAFLG_CV bit in the flags field of the SA;
otherwise, leave the SAFLG_CV bit clear.

3.3  HASH WITH PAD mode

Hash with pad mode takes the source (gather) buffer(s), adds
internally generated padding bytes and the length in bits to the end
(as specified by the hash algorithms), and runs the hash function over
this data, placing the results in the destination (scatter)
buffer(s).  

Because the internally generated padding automatically fills out the
data to a 64-byte boundary, there is no modulo-64 restriction placed
on the total length of the gather buffers.

The hash results are placed in the scatter buffer: 16 bytes for MD5,
20 bytes for SHA-1, or 12 bytes for the 96 bit truncated versions of
the algorithms (MD5-96 and SHA1-96).  The total length of the scatter
buffers MUST be the correct size for the requested result.

If you have the results of a previous partial hash, and want to
complete it: put the previous hash results in the hash_chain_a field
of the SA, set the SAFLG_CV bit in the flags field of the SA, and set
the hash_init_len field to the length of the data that was hashed to
come up with these partial results, in BITS.

3.3.1  Partial Hashes, Software vs. Hardware

The information in this section is essentially a "sidebar".  You do
not need to read it, but it may aid in your understanding of the hash
functions.

Software hashing libraries, like RSA Data Security, Inc.'s MD5
Message-Digest Algorithm library, have you create a structure to save
the context of the operation in (e.g. MD5_CTX), and provide 3 external
entry points: an Init function (e.g. MD5Init), which initializes the
context structure; an Update function (e.g. MD5Update), which runs the
algorithm over an arbitrary byte-length buffer; and a finalizing
function (e.g MD5Final), which does final processing on the context
structure to get the resulting hash.

Of note is the fact that while the internals of the hash algorithms
work exclusively on 64 byte (512 bit) blocks of data, the update
function works on an arbitrary length buffer, and there is no
requirement that the total amount of data be a multiple of 64 bytes.

How does this work?

First, to handle arbitrarily sized blocks of data, the designers of
these hash algorithms declared that you should take the data to be
hashed, and append zeroes ("padding") until you reach a boundary that
is a multiple of 512 bits minus 64 bits (that is, the boundary is at
(n * 64) - 8 bytes).  A 64 bit (8 byte) number, equal to the number of
bits in the data before padding was added, is added past the end of
the padding.  The total length of these three pieces (data + padding +
size) will always be a multiple of 512 bits, or 64 bytes.  The hash
algorithm is then run over these three pieces put together.

But what if the data you want to run the hashing algorithm on isn't in
one contiguous piece of memory, and isn't divided into even, 64 byte
chunks to run the hashing algorithm on?  How does the update function
allow you to use arbitrarily sized partial message buffers?

If you look inside the provided context structure, it has 3 parts.
One part contains the output of the hashing algorithm so far.  Another
contains a count of the number of bits processed so far.  And the
third contains a partial buffer storage area.

The update function first looks at the partial buffer area in the
context structure; if there's left over bytes there from previous
operations, he adds bytes to it from the buffer you gave him until he
gets to 64, then runs the hash algorithm on those 64 bytes, to obtain
new partial hash results.  Then he continues to run the hash algorithm
on as many 64 byte chunks of what's left in the buffer you pass in as
he can.  When the amount left in the buffer is less than 64 bytes, he
copies any remaining bytes into the partial buffer area of the
context structure for use next time.  (The count of the number of bits
processed by the hash algorithm is also kept current for each call to
the update function).

The finalizing function takes any remaining bytes in the partial
buffer area, adds the padding and length to it, and runs the hash
algorithm one final time.

So, how does this relate to the Brecis Hardware implementation?

You may have noted that the hardware has two ways to run the hash
functions: with or without the final padding.  A partial hash without
the final padding is required to have a total length that is a
multiple of 64.  

In essence, the hardware does not support maintaining the partial
buffer and the length counter that the software routines keep in the
context structure.

On the other hand, the software routines only support running the hash
over a single buffer each time the routine is called, while the
hardware allows more than one buffer to be processed for each call,
via the scatter / gather list.

There are really two ways to handle a set of arbitrary length buffers
while using the hardware: Create a list of buffers for the whole
message, and send it once through the HASH WITH PAD operation; or run
partial buffers through the HASH without padding operation, handling
non-modulo 64 parts of the message and keeping a count of the data in
software.

Consider, for the moment, simply creating a library that substitutes
directly in place of the RSA library, so that application software
calls the same routines with the same parameters, getting the same
results, but the library uses the hardware instead of computing the
hash in software.

There are two ways to implement this library.  First, you could add to
a list of buffers (addresses and lengths) for each call to the update
function, and then give the list to the hardware when the finalization
function is called.  Or, second, you could run partial hashes within
the update function, then a final hash for the final function.

Making a list of buffers and handing it to the hardware would be a
much more efficient choice.  There are two problems with this.  First
and foremost, software could change a buffer after a call to the
update function but before calling the finalization function.  Second,
software sometimes makes a copy of the context structure after doing a
partial hash, expecting it can continue to call the update function
with this copied context and get results that would be the same as
repeating the original partial hash.  At least when using the 'C'
language, this means your list of buffers has to be a fixed size and
contained in the context structure, not a dynamically allocated linked
list; this means an arbitrarily limited number of calls to the update
function, which would be different operation from the original
library.

(Making a copy of the context structure is often used for partial
hashes in computing the HMAC function; see the next section.)

Running partial hashes, and keeping the context structure current just
like the original library, can be made to work.  But it's just not as
efficient.  

Hopefully after reading this, you now have a better understanding of
why the hardware API is not a simple drop-in replacement library, and
how to use partial hashes if the need arises.

3.4  HMAC mode

HMAC mode computes the HMAC (Hash Message Authentication Code) for a
given message.  This is essentially a keyed hash, where only someone
who knows the secret key can calculate the correct hash value.

Here's a quick explanation of how HMAC works:  First, the key is made
to be 64 bytes long (the size of a hash block) -- if it is shorter
than 64 bytes, zeroes are added to the end; if it is longer than 64
bytes, the hashing function is run on the key, and the results of the
hash (16 or 20 bytes) is zero extended to 64 bytes.  Call this
processed key 'K'. 

HMAC(key, message) is defined as:

     H( (K xor OPAD) , H( (K xor IPAD), message ) )

  where
    'H' is the hash function (e.g. MD5 or SHA-1)
    'K' is the processed key discussed above
    'OPAD' is 64 bytes of 0x5c
    'IPAD' is 64 bytes of 0x36
    ',' (comma) means concatenation

If the key stays the same, and only the message changes from one
execution of the HMAC function to the next,  you can pre-compute
partial hashes H( K xor IPAD ) and H( K xor OPAD), saving some
time when computing HMAC for each message.  (Note: the partial hashes
must be done without the final padding.)

This pre-computation of partial hashes is used for HMAC processing in
the Brecis hardware.  The partial hashes are placed in the
hash_chain_a[] and hash_chain_b[] fields of the SA structure, and the
remainder of the HMAC function is computed from there.

The function msp_sec2_set_hmac_key() is provided to calculate these
partial hashes for you.

To perform a HMAC computation, the Engine Mode subfield of 'flags' in
the SA would be set to SAFLG_MODE_HMAC.  The Hash Algorithm subfield
is set appropriately (SAFLG_MD5, SAFLG_MD5_96, SAFLG_SHA1, or
SAFLG_SHA1_96), and the SAFLG_CV bit must be set.  The hash_chain_a
and hash_chain_b fields are set to the pre-computed partial hashes,
probably with msp_sec2_set_hmac_key().  The hash_init_len value is set
to 0x200, to represent the length of these partial hashes
(msp_sec2_set_hmac_key will do this for you, also).  

The length of the destination buffer MUST be the exact size of the
results (16 for SAFLG_MD5, 20 for SAFLG_SHA1, and 12 for both
SAFLG_MD5_96 and SAFLG_SHA1_96).

3.5  ESP OUTGOING mode

The ESP modes combine an (optional) HMAC operation with an (optional)
encryption operation, and in addition do a little bit more to help
create an IPSEC ESP packet from a regular packet.

IPSEC specifies two different modes for ESP operation: Transport Mode
and Tunnel Mode.  The hardware operations are the same for either of
these modes; software just needs to adjust which portion of the
original packet is affected by the operation.

Looking at outgoing processing of ESP packets,  the original
packet consists of:
          * (A) An IP header, which includes a Protocol field that
                  indicates the type of contents in the payload (e.g.
                  the value 6 indicates TCP)
          * (B) The data carried by the packet (e.g. this would
	  	  include both the TCP header and the TCP payload
		  data)

In Transport mode, the resulting packet contains:
          * (A) The original IP header, but modified so the protocol
	  	  field indicates the contents is an ESP packet 
		  (protocol 0x32)

          * (B) An ESP Header, with:
                 - 32-bit Security Parameters Index (SPI), which is
		 	used to find the correct keys for decoding on
			the other end of the communication.
		 - 32-bit Sequence number, used to prevent replay
		        attacks.
                 - (if encryption is used) an Initialization Vector
		        (IV) used by the encryption algorithm

	  * (C) The payload data, section (B) from the original
	          packet.

          * (D) Padding necessary to make (C) + (D) + (E) the correct
	          block size for encryption (must be a multiple of 8
	          bytes for DES, or 3DES, or a multiple of 16 bytes
	          for AES).

          * (E) A two byte ESP trailer; first byte indicates the
                  length of padding in (D); second byte is the
                  protocol field value that was in the original IP
                  header before we modified it to 0x32 / ESP (e.g. 6
                  for TCP).

         Sections (C) through (E) are encrypted (if requested).

          * (F) If an authentication HMAC is to be used, the HMAC is
                  calculated over (B), (C), (D), and (E) -- AFTER
                  encryption -- and the HMAC value is appended here.

In Tunnel mode, the resulting packet contains:
          * (A) A new IP header, with the protocol field indicating
	  	  the contents is an ESP packet (protocol 0x32)

          * (B) An ESP Header, with:
                 - 32-bit Security Parameters Index (SPI), which is
		 	used to find the correct keys for decoding on
			the other end of the communication.
		 - 32-bit Sequence number, used to prevent replay
		        attacks.
                 - (if encryption is used) an Initialization Vector
		        (IV) used by the encryption algorithm

	  * (C) The payload data, this time including the original IP
	          header (section (A)) from the original, plus the
	          original payload data, section (B), from the original
	          packet.

          * (D) Padding necessary to make (C) + (D) + (E) the correct
	          block size for encryption (must be a multiple of 8
	          bytes for DES, or 3DES, or a multiple of 16 bytes
	          for AES).

          * (E) A two byte ESP trailer; first byte indicates the
                  length of padding in (D); second byte is 0x4, which
                  means the protocol of the enclosed payload is an IP
                  packet.

         Sections (C) through (E) are encrypted (if requested).

          * (F) If an authentication HMAC is to be used, the HMAC is
                  calculated over (B), (C), (D), and (E) -- AFTER
                  encryption -- and the HMAC value is appended here.

Like I said, the hardware operation is the same for both modes.  The
difference is how much data the software passes as payload.  In
Transport mode, the driver is given the payload data from the, e.g.,
TCP header on, and the calling software modifies the original IP
header.  In Tunnel mode, the driver is given the whole original
packet, starting with the IP header, and software comes up with the
new IP header that goes on the beginning.

3.5.1  ESP OUTGOING -- non-manual mode.

To perform an ESP outgoing operation, the SA is setup in this fashion:

    Engine Mode subfield of flags is set to SAFLG_MODE_ESP_OUT

    The flag 'SAFLG_SI' should be set, unless you don't want the
    replay-preventing sequence number to be incremented automatically.

        * NOTE: If you don't use automatic sequence number
          incrementing, you must take steps to make sure that only one
          outstanding request to the driver exists per SA structure at
          any given time. Otherwise you risk having duplicate sequence
          numbers assigned.  (Some sort of "locking" mechanism would
          be required for each SA structure).

    If some sort of encryption is used, the flag 'SAFLG_CRI' should be
    set, which will cause the hardware to create a random IV field for
    the packet, using the random number generator.  If this flag is
    not set, the IV field will be copied from the SA (useful for
    testing).

        * NOTE: If you don't use the hardware IV creation feature, you
          must take steps to make sure that only one outstanding
          request to the driver exists per SA structure at any given
          time.  Otherwise you risk having a duplicate IV field in the
          resulting packet.  (Some sort of "locking" mechanism would
          be required for each SA structure).

    The flag 'SAFLG_EM' should not be set, unless you want manual
    mode, which is discussed in the next section.

    One of SAFLG_MD5, SAFLG_MD5_96, SAFLG_SHA1, SAFLG_SHA1_96, or
    SAFLG_HASHNULL should be set to indicate which hashing algorithm
    you will use for the HMAC authentication on the packet.  The
    author of this document has only seen the 96-bit versions used
    (which calculate the same value as the full versions, then
    truncate it to 12 bytes).

    As with HMAC operations, SAFLG_CV should always be set.

    To select the desired encryption algorithm, set one of SAFLG_DES,
    SAFLG_3DES, SAFLG_AES_128, SAFLG_AES_192, SAFLG_AES_256, or, for
    no encryption, set SAFLG_CRYPTNULL.

    If encryption is selected, the encryption blocking mode should be
    selected.  For normal ESP operation, this is almost always
    SAFLG_CBC_ENCRYPT.

    If encryption is selected, the encrypt / decrypt bits should be
    set appropriately.  For 3DES EDE, this will mean setting
    SAFLG_DES_K2_DECRYPT.  For single DES, AES, or no encryption, this
    will usually mean setting none of these bits.

    The esp_spi field of the SA should be set to the SPI to be copied
    into the ESP header.

    The ESP_SEQUENCE field should be set to the (initial) sequence
    number to be used for this SA.

    The crypt_keys should be set up in the same manner as specified
    for the crypt operation.

    If SAFLG_CRI is not set, the crypt_iv field should be set to the
    desired IV. 

    The hash_chain_a, hash_chain_b, and hash_init_len fields should be
    set up exactly the same as for HMAC processing; you can use
    msp_sec2_set_hmac_key() for this.

The work queue entry you will create with msp_sec2_new_request(),
msp_sec2_add_sg(), and msp_sec2_end_request() must meet the following
criteria:

    1) The total count of scatter bytes must be equal to:

        a) the number of incoming bytes (payload data); plus

        b) 4 bytes for the ESP SPI field ; plus

        c) 4 bytes for the esp sequence number ; plus

        d) the number of bytes in the IV: 0 for no encryption; 8 for
        DES and 3DES; or 16 for AES.

        e) the number of bytes of padding generated ; plus

        f) two bytes for the ESP trailer (pad length and next
        protocol) ; plus

        g) the number of bytes in the authentication data field (HMAC
        value) -- that is, 0 bytes for SAFLG_HASHNULL, 12 bytes for
        SAFLG_MD5_96 or SAFLG_SHA1_96, 16 bytes for SAFLG_MD5, or 20
        bytes for SAFLG_SHA1.

    2) If encrypting, the padding length (which is calculated by
       software and placed in the control field) must be a value that
       makes the encrypted portion of the packet a multiple of the
       block length.  That is,

          pad_len = (b_len - ( (payload_len + 2) % b_len )) + n * b_len

       where n is an integer >= 0, and b_len is 8 for DES or 3DES,
       or 16 for AES.  (n is almost always 0 to conserve bandwidth.)

For ESP outgoing operations, the software needs to pass in some
special values for the 'control' parameter to the
msp_sec2_new_request() call.  The SEC2_WE_CTRL_CQ (completion queue)
and SEC2_WE_CTRL_GI (generate interrupt) bits should be set as
desired; the value for the 'next header' field of the trailer should
be or'ed in after shifting left by SEC2_WE_CTRL_NXTHDR_SHF, and the
padding length -- which software needs to calculate, and must meet
rule 2 above -- should be or'ed in after shifting left by
SEC2_WE_CTRL_PADLEN_SHF.

Also of note for outgoing operations:  at least for a single,
contiguous buffer, it is safe to have the scatter buffers placed a
small distance ( < 64 bytes) "ahead" of the gather buffers, to make
room for the ESP header and the new IP header (if necessary, for
tunnel mode).

See the examples for more details.

3.5.2  ESP OUTGOING -- manual mode.

Manual mode differs from regular ESP outgoing in that the engine does
not do any copying of ESP headers, trailers, or creation IV values.

To perform an ESP outgoing operation in MANUAL MODE, the SA is setup
in this fashion:

    Engine Mode subfield of flags is set to SAFLG_MODE_ESP_OUT

    The flag 'SAFLG_EM' is set.

    One of SAFLG_MD5, SAFLG_MD5_96, SAFLG_SHA1, SAFLG_SHA1_96, or
    SAFLG_HASHNULL should be set to indicate which hashing algorithm
    you will use for the HMAC authentication on the packet.  The
    author of this document has only seen the 96-bit versions used
    (which calculate the same value as the full versions, then
    truncate it to 12 bytes).

    As with HMAC operations, SAFLG_CV should always be set.

    To select the desired encryption algorithm, set one of SAFLG_DES,
    SAFLG_3DES, SAFLG_AES_128, SAFLG_AES_192, SAFLG_AES_256, or, for
    no encryption, set SAFLG_CRYPTNULL.

    If encryption is selected, the encryption blocking mode should be
    selected.  For normal ESP operation, this is almost always
    SAFLG_CBC_ENCRYPT.

    If encryption is selected, the encrypt / decrypt bits should be
    set appropriately.  For 3DES EDE, this will mean setting
    SAFLG_DES_K2_DECRYPT.  For single DES, AES, or no encryption, this
    will usually mean setting none of these bits.

    The crypt_keys should be set up in the same manner as specified
    for the crypt operation.

    The hash_chain_a, hash_chain_b, and hash_init_len fields should be
    set up exactly the same as for HMAC processing; you can use
    msp_sec2_set_hmac_key() for this.

The work queue entry you will create with msp_sec2_new_request(),
msp_sec2_add_sg(), and msp_sec2_end_request() must meet the following
criteria:

    1) Data in gather buffers consists of:  
         a) 4 byte SPI,
         b) 4 byte sequence number, 
         c) Proper length IV (0 for NULL, 8 for DES/3DES, 16 for AES)
         d) Payload data (e.g, includes IP header for tunnel mode)
         e) padding data (software generated)
         f) 2 byte trailer (pad length and next header)

    2) The total count of scatter bytes must be equal to:

        a) the number of incoming bytes (spi + sequence + IV +
        payload data + padding + trailer); plus

        b) the number of bytes in the authentication data field (HMAC
        value) -- that is, 0 bytes for SAFLG_HASHNULL, 12 bytes for
        SAFLG_MD5_96 or SAFLG_SHA1_96, 16 bytes for SAFLG_MD5, or 20
        bytes for SAFLG_SHA1.

    2) If encrypting, the padding length must be a value that makes
       the encrypted portion of the packet a multiple of the block
       length.  That is, 

          pad_len = (b_len - ( (payload_len + 2) % b_len )) + n * b_len

       where n is an integer >= 0, and b_len is 8 for DES or 3DES,
       or 16 for AES.  (n is almost always 0 to conserve bandwidth.)

For ESP outgoing operations in manual mode, the next header and pad
length fields in the 'control' parameter to the msp_sec2_new_request()
call are not used.  The SEC2_WE_CTRL_CQ (completion queue) and
SEC2_WE_CTRL_GI (generate interrupt) bits should be set as desired.

3.6  ESP INCOMING mode

Keeping in mind the ESP packet described in the previous ESP OUTGOING
section, ESP INCOMING mode will take the incoming ESP packet and (A)
calculate the ICV (HMAC) value, and optionally compare it with that in
the packet; and (B) decrypt the data within the packet.

Note that checking the sequence number and the padding fields is left
up to software.

To prepare an SA for ESP INCOMING MODE:

    Engine Mode subfield of flags is set to SAFLG_MODE_ESP_IN

    The flag 'SAFLG_CPI' should be set if you want the hardware to
    compare the calculated HMAC value; left unset if you want to do
    the calculation yourself.

    One of SAFLG_MD5, SAFLG_MD5_96, SAFLG_SHA1, SAFLG_SHA1_96, or
    SAFLG_HASHNULL should be set to indicate which hashing algorithm
    you will use for the HMAC authentication on the packet.  The
    author of this document has only seen the 96-bit versions used
    (which calculate the same value as the full versions, then
    truncate it to 12 bytes).

    As with HMAC operations, SAFLG_CV should always be set.

    To select the desired encryption algorithm, set one of SAFLG_DES,
    SAFLG_3DES, SAFLG_AES_128, SAFLG_AES_192, SAFLG_AES_256, or, for
    no encryption, set SAFLG_CRYPTNULL.

    If encryption is selected, the encryption blocking mode should be
    selected.  For normal ESP operation, this is almost always
    SAFLG_CBC_DECRYPT.

    If encryption is selected, the encrypt / decrypt bits should be
    set appropriately.  For 3DES EDE, this will mean setting
    SAFLG_DES_K1_DECRYPT and SAFLG_DES_K3_DECRYPT.  For single DES,
    this will mean setting SAFLG_DES_K1_DECRYPT.  For AES, this will
    mean setting SAFLG_AES_DECRYPT.  For no encryption, this will mean
    setting none of these bits.  (If the blocking mode is not CBC, you
    may have to modify these instructions appropriately).

    The esp_spi field of the SA is not used.  It is assumed the
    calling software used the SPI to find the SA structure.

    The ESP_SEQUENCE field of the SA is not used.  The calling
    software is left to deal with the sequence number.

    The crypt_keys should be set up in the same manner as specified
    for the crypt operation -- keeping in mind the possible reverse
    ordering of the keys for 3DES.

    The hash_chain_a, hash_chain_b, and hash_init_len fields should be
    set up exactly the same as for HMAC processing; you can use
    msp_sec2_set_hmac_key() for this.

The work queue entry you will create with msp_sec2_new_request(),
msp_sec2_add_sg(), and msp_sec2_end_request() must meet the following
criteria:

    1) Data in gather buf consists of:
         a) 4 byte SPI
         b) 4 byte sequence number
         c) Proper length IV (0 for NULL, 8 for DES/3DES, 16 for AES)
         d) Payload data (e.g, includes IP header for tunnel mode)
         e) padding data
	 f) 2 byte trailer
	 g) ICV of correct length

       The ICV data must be given, even if SAFLG_CPI (compare ICV) is
       not set.

    2) The total count of scatter bytes must be equal to:

         a) Payload data size (without SPI, sequence no., or IV), plus
         b) padding data, plus
         c) 2 byte trailer, plus
         d) space for the ICV, if and only if both of these conditions apply:
              i)  SAFLG_CPI is not set (hardware does not compare ICV)
              ii) hash algorithm is not SAFLG_HASHNULL

     3) If decrypting, the payload, padding, and trailer portions must
        add up to a length that is evenly divisible by the crypto
        block length.

The hardware has one final stipulation for ESP incoming mode: The
ICV, a.k.a. HASH value, must be in a single, contiguous buffer, and
you must supply this buffer -- which must be the exact length of the
ICV, and the last buffer passed in.  E.g., if the message is in a
single buffer pointed to by 'buf' with a message of length 'len' and a
HMAC length of 'hlen', you would split the buffer into two parts,
using two gather elements, like this:

	msp_sec2_add_sg(queue, 0, buf, len - hlen) ;
	msp_sec2_add_sg(queue, 0, buf + len - hlen, hlen) ;








4.  Examples

Please see the file sec2_examples.c for demonstrations on how to use
each type of security operation.

