This chapter discusses the all the aspects related to the main JFFS3 entity~--
the tree. Please, refer to section \ref{ref_SectionIndexing} for basic
information about the JFFS3 tree.

%
% OBJECTS
%
\subsection{Objects} \label{ref_SectionObjects}

JFFS3 keeps file system objects in the leaf level of the tree (in leaf nodes)
and the following is the list of supported objects.

\begin{enumerate}

\item \emph{Data} objects contain files' data and are kept in \emph{data
nodes}. Each data node holds one \mbox{RAM page} bytes of data (i.e.,
\texttt{PAGE\_SIZE} which is 4K on most \mbox{32-bit} architectures). But of
course, in case of small files (less then one RAM page bytes) and files'
tails~-- less data data may be put to the data node.

%
% Figure with an data objects illustration
%
\begin{figure}[h]
\begin{center}
\begin{htmlonly}
\includegraphics{pics/dataobj-01.png}
\end{htmlonly}
%begin{latexonly}
\includegraphics[width=130mm,height=85mm]{pics/dataobj-01.pdf}
%end{latexonly}
\end{center}
\caption{An illustration of files' data representation.}
\label{ref_FigureDataObj-01}
\end{figure}

Figure~\ref{ref_FigureDataObj-01} illustrates the correspondence between files'
contents and data objects. Each \mbox{RAM~page-size} piece of a 13K file
corresponds to a data node in the JFFS3 tree. The 1K tail of the file also
corresponds to a data node. But because of compression actual sizes of data
nodes are less then the corresponding file fragments.

The division on \mbox{RAM~page-sized} fragments relates to the Linux Virtual
Memory Management architecture. Namely, the Linux \emph{Page Cache} works in
terms of RAM pages which means, that JFFS3 is always asked to read and write
files' in units of RAM page size.

It is worth noting that in order to optimize flash utilization, JFFS3 may store
multiple of \mbox{RAM page} bytes in one data node for static files. This
admits of better compression and leads to several other benefits.

\item \emph{Direntry} objects contain the correspondence between directory
entry names and inode numbers. Direntry objects are stored in \emph{direntry
nodes}. Every directory entry in the file system has a corresponding direntry
object.

\item \emph{Attr-data} objects contain attributes of inodes~-- both standard
Unix attributes like user~ID, last modification time, inode
length, etc and \mbox{JFFS3-specific} attributes like the type of compression,
etc. Each inode has only one corresponding \mbox{attr-data} object.

\item \emph{Xentry} objects contain the correspondence between names of
extended attributes and \emph{xattr~IDs}. Every extended attribute in the file
system has a corresponding xattr entry object. This is analogous to direntry
objects, but direntries contain
\{\texttt{direntry~name}$\Rightarrow$\texttt{inode~number}\} mapping, instead
of \{\texttt{xattr~name}$\Rightarrow$\texttt{xattr~ID}\} mapping in xentries.

Each extended attribute in JFFS3 has its own unique number~-- xattr~ID, just
like every inode has its own unique inode number. And in fact, JFFS3 utilizes
the same space of numbers to enumerate inodes and extended attributes. 

Xentry objects are stored in \emph{xentry nodes}.

\item \emph{Xattr-data} objects contain the data of extended attributes. The
way how \mbox{xattr-data} objects are kept in the tree is equivalent to the way
how data objects a kept there. \emph{Xattr-data} objects are stored in
\emph{xattr-data} nodes.

\item \emph{Acl} objects contain Access Control Lists (ACL) of inodes
(information about ACLs may be found out at~[\ref{ref_ACL}]). Acl objects are
stored in \emph{acl nodes}.

In \mbox{real-world} systems a vast number of files have equivalent ACL while
only few files have unique ACL. For the former group of files (or more
strictly~-- inodes) JFFS3 makes use of \emph{shared~acl} objects. This means,
that there is only one acl object instance for all of these inodes. Shared acls
are referred to from \mbox{attr-data} objects of these inodes. If a shared acl
is written to, a new acl object is created (\mbox{copy-on-write} mechanism).
Conversely, for the latter group there is a distinct acl object per each inode.

\end{enumerate}

%
% KEYS
%
\subsection{Keys} \label{ref_SectionKeys}

Each object has its own key and may be quickly looked~up in the tree by its
key. As there are 6 object types in JFFS3, there are also 6 key types:

\begin{enumerate}
\item \emph{data keys}~-- index data objects;
\item \emph{direntry keys}~-- index direntry objects;
\item \emph{attr-data keys}~-- index attr-data objects;
\item \emph{xentry keys}~-- index xentry objects;
\item \emph{xattr-data keys}~-- index xattr-data objects;
\item \emph{acl keys}~-- index acl objects.
\end{enumerate}

%
% Trivial key scheme
%
\subsubsection{Trivial key scheme}

Lets start discussing JFFS3 keys with an example of a simple key layout which
is further referred to as the \emph{trivial key scheme}. All keys in this
scheme have the same \mbox{47-bit} length (see figure~\ref{ref_FigureTrivKey}). 

\begin{itemize}

\item Data keys consist of the \mbox{32-bit} inode number the data belongs to,
the unique \mbox{3-bit} key type identifier, and the \mbox{20-bit} data offset.

\item Direntry keys consist of the \mbox{32-bit} parent directory inode number,
the unique \mbox{3-bit} key type identifier, and the \mbox{20-bit} direntry
name hash value.

\item Attr-data keys consist of the \mbox{32-bit} inode number the attributes
belong to, and the unique \mbox{3-bit} key type identifier.

\item Xentry keys consist of the \mbox{32-bit} inode number the extended
attribute belongs to, the unique \mbox{3-bit} key type identifier, and the
\mbox{20-bit} extended attribute name hash value.

\item Xattr-data keys consist of the \mbox{32-bit} xattr ID, the unique
\mbox{3-bit} key type identifier, and the \mbox{20-bit} extended attribute data
offset.

\item Acl keys consist of the \mbox{32-bit} inode number the acl object belongs
to, and the unique \mbox{3-bit} key type identifier.

\end{itemize}

The following is the list of key type identifiers.

\begin{enumerate}

\item Data keys~-- \texttt{0} (\texttt{000} bin);
\item Direntry keys~-- \texttt{1} (\texttt{001} bin);
\item Attr-data keys~-- \texttt{2} (\texttt{010} bin);
\item Xentry keys~-- \texttt{3} (\texttt{011} bin);
\item Xattr-data keys~-- \texttt{4} (\texttt{100} bin);
\item Acl keys~-- \texttt{4} (\texttt{101} bin);

\end{enumerate}

%
% The trivial key scheme
%
\begin{figure}[h]
\begin{center}
\begin{htmlonly}
\includegraphics{pics/trivkey-01.png}
\end{htmlonly}
%begin{latexonly}
\includegraphics[width=110mm,height=70mm]{pics/trivkey-01.pdf}
%end{latexonly}
\end{center}
\caption{The trivial key scheme.}
\label{ref_FigureTrivKey}
\end{figure}

Since data objects may only contain multiple RAM~pages of data (excluding small
files and files' tails) offsets in keys are always RAM~page
\mbox{size-aligned}. Assuming RAM~page is 4K (12~bits), 20~bits is enough to
refer up to 4GB of data. To put it differently, the trivial key scheme limits
files' length to 4GB providing system's RAM~page size is 4K.

It is also worth noting that several objects of the same type may have the same
key. For example, in case of hash collision, two direntries may have equivalent
keys. In this case objects may be distinguished by means of reading the
corresponding leaf node headers.

%
% Keys comparison
%
\subsubsection{Keys comparison}

The important topic is how keys are compared as it defines the relative order
of objects in the tree and is crucial for searching. Note, it only makes sense
to compare keys of the same type.

JFFS3 keys are usually comprised of one or more fields, i.e., keys $K_1$ and
$K_2$ may be represented as

$$
K_1 = \{k^1_1, k^2_1, ..., k^p_1\},
K_2 = \{k^1_2, k^2_2, ..., k^p_2\},
$$
where $p$ is the number of components in keys of this type.

Keys $K_1$ and $K_2$ are considered to be \emph{equivalent} if and only if all
their fields are equivalent, i.e. $k^i_1 = k^i_2, i = 1, 2, ..., p$.

Keys are compared \mbox{field-by-field} starting from the first field. If on
$i$'th step $k^i_1 > k^i_2$, then $K_1$ is considered to be \emph{greater} then
$K_2$.  Similarly, if on $i$'th step $k^i_1 < k^i_2$, then $K_1$ is considered
to be \emph{less} then $K_2$.

%
% Key schemes
%
\subsubsection{Key schemes}

\emph{Key schemes} define layout of keys for all the 6 object types.
Apparently, it would be too inflexible to hardcode JFFS3 to support only one
fixed key scheme. Indeed, there may be a great deal of reasons why users may
want to use different key schemes in different situations~-- some examples go
bellow.

\begin{itemize}

\item The inode number is encoded by a \mbox{32-bit} integer in the trivial key
scheme which means that about 4~million inodes and extended attributes may
exist on the file system simultaneously. But in some cases this may be
insufficient or, conversely, too much and one may want to use less bits to
encode inode numbers (say, only 24 bits), just for optimization.

\item Similarly, offsets are encoded as \mbox{20-bit} integers in the trivial
key scheme which may be insufficient when huge files (larger then 4G) should be
supported. So one may want to use more bits in certain cases.

\item Depending on the concrete JFFS3 usage, different hash functions may be
used in direntry keys. The length of hash values may also vary depending on how
many directory entries are kept in directories. If there are huge directories
with millions of files there, long hash values should be used to avoid massive
hash collisions (say, \mbox{64-bit} hash values). But if it is known in advance
that there will be no too large directory entries, \footnote{Note, when talking
about directories, words "\emph{large}" and "\emph{small}" describe how many
direntries are kept in these directories. The more direntries a directory
contains, the larger is it.} the length of hash values may be shorter.

\item It also possible that one may want to use some tricky key layouts to
achieve different kinds of optimization. For example, direntry keys may
include the first 8~bytes (64~bits) of the direntry name (see figure
\ref{ref_FigureDirentKeyEx_01}). In this case the \texttt{getdents}
\footnote{See \texttt{getdents (2)} Linux manual pages} Linux system call will
return direntries in "mostly" alphabetically sorted order and \mbox{user-space}
programs will not spend much time to sort them. In fact this technique is used
in the Reiser4 file system and it is claimed that slow sorting is a bottleneck
in certain \mbox{real-life} workloads. And the like.

\item Different keys compression methods may be used in different key schemes
(see section \ref{ref_SectionKeysCompr} below).

\end{itemize}

%
% Direntry key layout example
%
\begin{figure}[h]
\begin{center}
\begin{htmlonly}
\includegraphics{pics/keyex-01.png}
\end{htmlonly}
%begin{latexonly}
\includegraphics[width=110mm,height=7mm]{pics/keyex-01.pdf}
%end{latexonly}
\end{center}
\caption{Direntry key layout example.}
\label{ref_FigureDirentKeyEx_01}
\end{figure}

So it is obvious why JFFS3 is not attached to a fixed key scheme but instead,
admits of many different key schemes (one at a time of course) with a
possibility to choose the best suited key scheme.

%
% Keys compression
%
\subsubsection{Keys compression} \label{ref_SectionKeysCompr}

The index is the most frequently \mbox{re-written} part of JFFS3. Indeed, every
single change at the leaf level of the tree requires \mbox{re-writing} $L-1$
indexing nodes. The number of index updates is reduced by the
\mbox{write-behind} cache and by the \mbox{journal}, but it is still changed
very often.  So, it is extremely important for JFFS3 to keep the tree as
shallow as it is possible.

This means, that it makes sense to apply a sort of compression to keys in
indexing nodes. There are several ways to compress keys and the following are
examples of possible compression techniques.

\begin{description}

\item[Offsets coding.] Offsets compression may be based on the observation that
the overwhelming majority of files in many file systems are small files. This
means, that it might makes sense to code smaller offsets by fewer bits.

Table \ref{ref_TableOffsCodes} contains an example of how offsets may be
encoded. For offsets in range \texttt{0KB-8KB} only 3~bits are enough, so the
bit sequence "\texttt{000}" will encode offset \texttt{0}, and the bit sequence
"\texttt{001}" will encode offset \texttt{4K} \footnote{Recall, offsets in keys
are \mbox{RAM page-aligned} and by default, the RAM~page size is assumed to be
4K in this document}. Offsets in range \texttt{8KB-64KB} are encoded by 6~bits
and so on.

\begin{table}[h]
\begin{center}
\begin{tabular}{llll}
\textbf{Offset range} & \textbf{Bits in range} & \textbf{Code prefix} & \textbf{Code length}\\
\hline
\texttt{0KB-8KB}   & 13 bits & \texttt{00}  & 3 bits\\
\texttt{8KB-64KB}  & 16 bits & \texttt{01}  & 6 bits\\
\texttt{64KB-1MB}  & 20 bits & \texttt{10}  & 10 bits\\
\texttt{1MB-128MB} & 27 bits & \texttt{110} & 18 bits\\
\texttt{128MB-4GB} & 32 bits & \texttt{111} & 23 bits\\
\end{tabular}
\caption{An example of offset coding.}
\label{ref_TableOffsCodes}
\end{center}
\end{table}

\item[Inode number coding.] If the approximate number of inodes on the
file system is known in advance, similar coding scheme may be exploited for
inode numbers providing JFFS3 may reuse deleted files' inode numbers.

\item[Common prefix compression.] In case of the trivial key scheme the first
field of any key is the inode number. Any other key scheme will likely also
contain the inode number in keys. If the inode number is the first component of
the key, all keys belonging to the same inode will go sequentially in indexing
nodes. To put it differently, there will be sequences of keys prefixed by the
same inode number in the tree.

%
% Common prefix compression example
%
\begin{figure}[h]
\begin{center}
\begin{htmlonly}
\includegraphics{pics/comprex-01.png}
\end{htmlonly}
%begin{latexonly}
\includegraphics[width=160mm,height=40mm]{pics/comprex-01.pdf}
%end{latexonly}
\end{center}
\caption{The common prefix compression idea illustration.}
\label{ref_FigureKeyComprEx_01}
\end{figure}

The evident compression method for these key sequences is to store the inode
number only once as the common prefix for the entire key sequence, instead of
duplicating it in every key. Figure \ref{ref_FigureKeyComprEx_01} illustrates
how does the prefix compression work.

\item[Offsets sequence compression.] In the trivial key scheme the last field
of data keys is the offset. The offset is multiple of RAM~page size. Obviously,
indexing nodes will contain sequences of keys each of which describes data
objects belonging to the same file but with different offsets. Moreover, the
keys will be ordered in increasing key order. 

For sequences like this it is possible to only specify the starting offset, the
ending offset and the number of keys in the sequence, instead of wasting space
storing the offset in each key of the sequence.

%
% Offsets sequence compression example
%
\begin{figure}[h]
\begin{center}
\begin{htmlonly}
\includegraphics{pics/comprex-02.png}
\end{htmlonly}
%begin{latexonly}
\includegraphics[width=160mm,height=42mm]{pics/comprex-02.pdf}
%end{latexonly}
\end{center}
\caption{The Offsets sequence compression idea illustration.}
\label{ref_FigureKeyComprEx_02}
\end{figure}

Figure \ref{ref_FigureKeyComprEx_02} presents an example of the offsets sequence
compression method. Four consecutive keys which describe four data objects
belonging to the same inode may be represented as a sequence of four keys
without the offset field, but prefixed by the starting offset, the ending
offset and the number of keys in the sequence.

\end{description}

Note, the above compression methods may be combined to achieve better
compression.

Because of compression, JFFS3 keys have variable size which means, that it is
impossible to directly apply the binary search algorithm to the contents of
indexing nodes. In JFFS3, indexing nodes are decompressed when read and are
cached in decompressed form. And after the indexing node has been decompressed,
the binary search algorithm is applicable.

We believe that keys compression will considerably reduce the amount of
\mbox{on-flash} indexing information and increase the overall performance just
because the amount of Input/Output will lessen. But only actual
\mbox{fixed-size} keys~vs.~\mbox{variable-size} keys tests will show if there
is some real performance gain present.

%
% LINKS
%
\subsection{Links} \label{ref_SectionKeys}

Links in JFFS3 have fixed length and are not compressed. The link width depends
on the size of JFFS3 partition~-- the larger is JFFS3 partition, the wider are
links. Instead of choosing a huge link width to suit the largest possible file
systems (e.g. 64~bits), JFFS3 admits of flexible links width, depending on
JFFS3 partition size.

As indexind nodes have fixed size equivalent to one sector, the width of links
stored in branch nodes and in the root nodes is

$$
w = log_2{S} - s.
$$
Twig nodes refer variable-size leaf nodes so the width of links stored twih
nodes is
$$
w = log_2{S},
$$
where $S$ is the size of the JFFS3 partition and $s$ is the size of sector.

