Sindbad~EG File Manager
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>General access method configuration</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="am_conf.html" title="Chapter 2. Access Method Configuration" />
<link rel="prev" href="am_conf_logrec.html" title="Logical record numbers" />
<link rel="next" href="bt_conf.html" title="Btree access method specific configuration" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 18.1.40</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">General access method configuration</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
<th width="60%" align="center">Chapter 2. Access Method Configuration </th>
<td width="20%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="general_am_conf"></a>General access method configuration</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_pagesize">Selecting a page size</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_cachesize">Selecting a cache size</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_byteorder">Selecting a byte order</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_dup">Duplicate data items</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="general_am_conf.html#am_conf_malloc">Non-local memory
allocation</a>
</span>
</dt>
</dl>
</div>
<p>
There are a series of configuration tasks which are common
to all access methods. They are described in the following
sections.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_pagesize"></a>Selecting a page size</h3>
</div>
</div>
</div>
<p>
The size of the pages used in the underlying database can
be specified by calling the <a href="../api_reference/C/dbset_pagesize.html" class="olink">DB->set_pagesize()</a> method. The
minimum page size is 512 bytes and the maximum page size is
64K bytes, and must be a power of two. If no page size is
specified by the application, a page size is selected based on
the underlying filesystem I/O block size. (A page size
selected in this way has a lower limit of 512 bytes and an
upper limit of 16K bytes.)
</p>
<p>
There are several issues to consider when selecting a
pagesize: overflow record sizes, locking, I/O efficiency, and
recoverability.
</p>
<p>
First, the page size implicitly sets the size of an
overflow record. Overflow records are key or data items that
are too large to fit on a normal database page because of
their size, and are therefore stored in overflow pages.
Overflow pages are pages that exist outside of the normal
database structure. For this reason, there is often a
significant performance penalty associated with retrieving or
modifying overflow records. Selecting a page size that is too
small, and which forces the creation of large numbers of
overflow pages, can seriously impact the performance of an
application.
</p>
<p>
Second, in the Btree, Hash and Recno access methods, the
finest-grained lock that Berkeley DB acquires is for a page.
(The Queue access method generally acquires record-level locks
rather than page-level locks.) Selecting a page size that is
too large, and which causes threads or processes to wait
because other threads of control are accessing or modifying
records on the same page, can impact the performance of your
application.
</p>
<p>
Third, the page size specifies the granularity of I/O from
the database to the operating system. Berkeley DB will give a
page-sized unit of bytes to the operating system to be
scheduled for reading/writing from/to the disk. For many
operating systems, there is an internal <span class="bold"><strong>
block size</strong></span> which is used as the granularity of
I/O from the operating system to the disk. Generally, it will
be more efficient for Berkeley DB to write filesystem-sized
blocks to the operating system and for the operating system to
write those same blocks to the disk.
</p>
<p>
Selecting a database page size smaller than the filesystem
block size may cause the operating system to coalesce or
otherwise manipulate Berkeley DB pages and can impact the
performance of your application. When the page size is smaller
than the filesystem block size and a page written by Berkeley
DB is not found in the operating system's cache, the operating
system may be forced to read a block from the disk, copy the
page into the block it read, and then write out the block to
disk, rather than simply writing the page to disk.
Additionally, as the operating system is reading more data
into its buffer cache than is strictly necessary to satisfy
each Berkeley DB request for a page, the operating system
buffer cache may be wasting memory.
</p>
<p>
Alternatively, selecting a page size larger than the
filesystem block size may cause the operating system to read
more data than necessary. On some systems, reading filesystem
blocks sequentially may cause the operating system to begin
performing read-ahead. If requesting a single database page
implies reading enough filesystem blocks to satisfy the
operating system's criteria for read-ahead, the operating
system may do more I/O than is required.
</p>
<p>
Fourth, when using the Berkeley DB Transactional Data Store
product, the page size may affect the errors from which your
database can recover See <a class="xref" href="transapp_reclimit.html" title="Berkeley DB recoverability">Berkeley DB recoverability</a> for more information.
</p>
<div class="note" style="margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Note</h3>
<p>
The <a href="../api_reference/C/db_tuner.html" class="olink">db_tuner</a> utility suggests a page size for btree databases
that optimizes cache efficiency and storage space
requirements. This utility works only when given a
pre-populated database. So, it is useful when tuning an
existing application and not when first implementing an
application.
</p>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_cachesize"></a>Selecting a cache size</h3>
</div>
</div>
</div>
<p>
The size of the cache used for the underlying database can
be specified by calling the <a href="../api_reference/C/dbset_cachesize.html" class="olink">DB->set_cachesize()</a> method. Choosing
a cache size is, unfortunately, an art. Your cache must be at
least large enough for your working set plus some overlap for
unexpected situations.
</p>
<p>
When using the Btree access method, you must have a cache
big enough for the minimum working set for a single access.
This will include a root page, one or more internal pages
(depending on the depth of your tree), and a leaf page. If
your cache is any smaller than that, each new page will force
out the least-recently-used page, and Berkeley DB will re-read
the root page of the tree anew on each database
request.
</p>
<p>
If your keys are of moderate size (a few tens of bytes) and
your pages are on the order of 4KB to 8KB, most Btree
applications will be only three levels. For example, using 20
byte keys with 20 bytes of data associated with each key, a
8KB page can hold roughly 400 keys (or 200 key/data pairs), so
a fully populated three-level Btree will hold 32 million
key/data pairs, and a tree with only a 50% page-fill factor
will still hold 16 million key/data pairs. We rarely expect
trees to exceed five levels, although Berkeley DB will support
trees up to 255 levels.
</p>
<p>
The rule-of-thumb is that cache is good, and more cache is
better. Generally, applications benefit from increasing the
cache size up to a point, at which the performance will stop
improving as the cache size increases. When this point is
reached, one of two things have happened: either the cache is
large enough that the application is almost never having to
retrieve information from disk, or, your application is doing
truly random accesses, and therefore increasing size of the
cache doesn't significantly increase the odds of finding the
next requested information in the cache. The latter is fairly
rare -- almost all applications show some form of locality of
reference.
</p>
<p>
That said, it is important not to increase your cache size
beyond the capabilities of your system, as that will result in
reduced performance. Under many operating systems, tying down
enough virtual memory will cause your memory and potentially
your program to be swapped. This is especially likely on
systems without unified OS buffer caches and virtual memory
spaces, as the buffer cache was allocated at boot time and so
cannot be adjusted based on application requests for large
amounts of virtual memory.
</p>
<p>
For example, even if accesses are truly random within a
Btree, your access pattern will favor internal pages to leaf
pages, so your cache should be large enough to hold all
internal pages. In the steady state, this requires at most one
I/O per operation to retrieve the appropriate leaf
page.
</p>
<p>
You can use the <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility to monitor the effectiveness of
your cache. The following output is excerpted from the output
of that utility's <span class="bold"><strong>-m</strong></span>
option:
</p>
<pre class="programlisting">prompt: db_stat -m
131072 Cache size (128K).
4273 Requested pages found in the cache (97%).
134 Requested pages not found in the cache.
18 Pages created in the cache.
116 Pages read into the cache.
93 Pages written from the cache to the backing file.
5 Clean pages forced from the cache.
13 Dirty pages forced from the cache.
0 Dirty buffers written by trickle-sync thread.
130 Current clean buffer count.
4 Current dirty buffer count.</pre>
<p>
The statistics for this cache say that there have been 4,273
requests of the cache, and only 116 of those requests required
an I/O from disk. This means that the cache is working well,
yielding a 97% cache hit rate. The <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility will present
these statistics both for the cache as a whole and for each
file within the cache separately.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_byteorder"></a>Selecting a byte order</h3>
</div>
</div>
</div>
<p>
Database files created by Berkeley DB can be created in
either little- or big-endian formats. The byte order used for
the underlying database is specified by calling the
<a href="../api_reference/C/dbset_lorder.html" class="olink">DB->set_lorder()</a> method. If no order is selected, the native
format of the machine on which the database is created will be
used.
</p>
<p>
Berkeley DB databases are architecture independent, and any
format database can be used on a machine with a different
native format. In this case, as each page that is read into or
written from the cache must be converted to or from the host
format, and databases with non-native formats will incur a
performance penalty for the run-time conversion.
</p>
<p>
<span class="bold"><strong>It is important to note that the
Berkeley DB access methods do no data conversion for
application specified data. Key/data pairs written on a
little-endian format architecture will be returned to the
application exactly as they were written when retrieved on
a big-endian format architecture.</strong></span>
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_dup"></a>Duplicate data items</h3>
</div>
</div>
</div>
<p>
The Btree and Hash access methods support the creation of
multiple data items for a single key item. By default,
multiple data items are not permitted, and each database store
operation will overwrite any previous data item for that key.
To configure Berkeley DB for duplicate data items, call the
<a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag. Only one copy of
the key will be stored for each set of duplicate data items.
If the Btree access method comparison routine returns that two
keys compare equally, it is undefined which of the two keys
will be stored and returned from future database operations.
</p>
<p>
By default, Berkeley DB stores duplicates in the order in
which they were added, that is, each new duplicate data item
will be stored after any already existing data items. This
default behavior can be overridden by using the <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a>
method and one of the <a href="../api_reference/C/dbcput.html#dbcput_DB_AFTER" class="olink">DB_AFTER</a>, <a href="../api_reference/C/dbcput.html#dbcput_DB_BEFORE" class="olink">DB_BEFORE</a>, <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYFIRST" class="olink">DB_KEYFIRST</a>
or <a href="../api_reference/C/dbcput.html#dbcput_DB_KEYLAST" class="olink">DB_KEYLAST</a> flags. Alternatively, Berkeley DB may be
configured to sort duplicate data items.
</p>
<p>
When stepping through the database sequentially, duplicate
data items will be returned individually, as a key/data pair,
where the key item only changes after the last duplicate data
item has been returned. For this reason, duplicate data items
cannot be accessed using the <a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> method, as it always
returns the first of the duplicate data items. Duplicate data
items should be retrieved using a Berkeley DB cursor interface
such as the <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> method.
</p>
<p>
There is a flag that permits applications to request the
following data item only if it <span class="bold"><strong>is</strong></span>
a duplicate data item of the current entry,
see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_DUP" class="olink">DB_NEXT_DUP</a> for more information. There is a flag that
permits applications to request the following data item only
if it <span class="bold"><strong>is not</strong></span> a duplicate data
item of the current entry, see <a href="../api_reference/C/dbcget.html#dbcget_DB_NEXT_NODUP" class="olink">DB_NEXT_NODUP</a> and
<a href="../api_reference/C/dbcget.html#dbcget_DB_PREV_NODUP" class="olink">DB_PREV_NODUP</a> for more information.
</p>
<p>
It is also possible to maintain duplicate records in sorted
order. Sorting duplicates will significantly increase
performance when searching them and performing equality joins
— both of which are common operations when using
secondary indices. To configure Berkeley DB to sort duplicate
data items, the application must call the <a href="../api_reference/C/dbset_flags.html" class="olink">DB->set_flags()</a> method
with the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag. Note that <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a>
automatically turns on the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> flag for you, so you do
not have to also set that flag; however, it is not an error to
also set <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUP" class="olink">DB_DUP</a> when configuring for sorted duplicate
records.
</p>
<p>
When configuring sorted duplicate records, you can also
specify a custom comparison function using the
<a href="../api_reference/C/dbset_dup_compare.html" class="olink">DB->set_dup_compare()</a> method. If the <a href="../api_reference/C/dbset_flags.html#dbset_flags_DB_DUPSORT" class="olink">DB_DUPSORT</a> flag is given,
but no comparison routine is specified, then Berkeley DB
defaults to the same lexicographical sorting used for Btree
keys, with shorter items collating before longer items.
</p>
<p>
If the duplicate data items are unsorted, applications may
store identical duplicate data items, or, for those that just
like the way it sounds, <span class="emphasis"><em>duplicate
duplicates</em></span>.
</p>
<p>
<span class="bold"><strong>It is an error to attempt to store
identical duplicate data items when duplicates are being
stored in a sorted order.</strong></span> Any such attempt
results in the error message "Duplicate data items are not
supported with sorted data" with a
<code class="literal">DB_KEYEXIST</code> return code.
</p>
<p>
Note that you can suppress the error message "Duplicate
data items are not supported with sorted data" by using the
<a href="../api_reference/C/dbput.html#put_DB_NODUPDATA" class="olink">DB_NODUPDATA</a> flag. Use of this flag does not change the
database's basic behavior; storing duplicate data items in a
database configured for sorted duplicates is still an error
and so you will continue to receive the
<code class="literal">DB_KEYEXIST</code> return code if you try to
do that.
</p>
<p>
For further information on how searching and insertion
behaves in the presence of duplicates (sorted or not), see the
<a href="../api_reference/C/dbget.html" class="olink">DB->get()</a> <a href="../api_reference/C/dbput.html" class="olink">DB->put()</a>, <a href="../api_reference/C/dbcget.html" class="olink">DBC->get()</a> and <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a> documentation.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="am_conf_malloc"></a>Non-local memory
allocation</h3>
</div>
</div>
</div>
<p>
Berkeley DB allocates memory for returning key/data pairs
and statistical information which becomes the responsibility
of the application. There are also interfaces where an
application will allocate memory which becomes the
responsibility of Berkeley DB.
</p>
<p>
On systems in which there may be multiple library versions
of the standard allocation routines (notably Windows NT),
transferring memory between the library and the application
will fail because the Berkeley DB library allocates memory
from a different heap than the application uses to free it, or
vice versa. To avoid this problem, the <a href="../api_reference/C/envset_alloc.html" class="olink">DB_ENV->set_alloc()</a> and
<a href="../api_reference/C/dbset_alloc.html" class="olink">DB->set_alloc()</a> methods can be used to give Berkeley DB
references to the application's allocation routines.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="am_conf_logrec.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="am_conf.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="bt_conf.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Logical record numbers </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Btree access method specific configuration</td>
</tr>
</table>
</div>
</body>
</html>
Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists