Sindbad~EG File Manager
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Selecting an access method</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="am_conf.html" title="Chapter 2. Access Method Configuration" />
<link rel="prev" href="am_conf.html" title="Chapter 2. Access Method Configuration" />
<link rel="next" href="am_conf_logrec.html" title="Logical record numbers" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 18.1.40</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Selecting an access method</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
<th width="60%" align="center">Chapter 2. Access Method Configuration </th>
<td width="20%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="am_conf_select"></a>Selecting an access method</h2>
</div>
</div>
</div>
<div class="toc">
<dl>
<dt>
<span class="sect2">
<a href="am_conf_select.html#idm140654542607664">Btree or Heap?</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="am_conf_select.html#idm140654543637344">Hash or Btree?</a>
</span>
</dt>
<dt>
<span class="sect2">
<a href="am_conf_select.html#idm140654544240336">Queue or Recno?</a>
</span>
</dt>
</dl>
</div>
<p>
The Berkeley DB access method implementation unavoidably
interacts with each application's data set, locking
requirements and data access patterns. For this reason, one
access method may result in dramatically better performance
for an application than another one. Applications whose data
could be stored using more than one access method may want to
benchmark their performance using the different candidates.
</p>
<p>
One of the strengths of Berkeley DB is that it provides
multiple access methods with nearly identical interfaces to
the different access methods. This means that it is simple to
modify an application to use a different access method.
Applications can easily benchmark the different Berkeley DB
access methods against each other for their particular data
set and access pattern.
</p>
<p>
Most applications choose between using the Btree or Heap
access methods, between Btree or Hash, or between Queue and
Recno, because each of these pairs offer similar
functionality.
</p>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm140654542607664"></a>Btree or Heap?</h3>
</div>
</div>
</div>
<p>
Most applications use Btree because it performs well
for most general-purpose database workloads. But there are
circumstances where Heap is the better choice. This
section describes the differences between the two access
methods so that you can better understand when Heap might
be the superior choice for your application.
</p>
<p>
Before continuing, it is helpful to have a high level
understanding of the operating differences between Btree
and Heap.
</p>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h4 class="title"><a id="idm140654541946176"></a>Disk Space Usage</h4>
</div>
</div>
</div>
<p>
The Heap access method was developed for use in
systems with constrained disk space (such as an
embedded system). Because of the way it reuses page
space, for some workloads it can be much better than
Btree on disk space usage because it will not grow the
on-disk database file as fast as Btree. Of course,
this assumes that your application is characterized by
a roughly equal number of record creations and
deletions.
</p>
<p>
Also, Heap can actively control the space used by
the database with the use of the <a href="../api_reference/C/dbset_heapsize.html" class="olink">DB->set_heapsize()</a>
method. When the limit specified by that method is
reached, no additional pages will be allocated and
existing pages will be aggressively searched for free
space. Also records in the heap can be split to fill
space on two or more pages.
</p>
</div>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h4 class="title"><a id="idm140654543271280"></a>Record Access</h4>
</div>
</div>
</div>
<p>
Btree and Heap are fundamentally different because
of the way that you access records in them. In Btree,
you access a record by using the record's key. This
lookup occurs fairly quickly because Btree places
records in the database according to a pre-defined
sorting order. Your application is responsible for
constructing the key, which means that it is
relatively easy for your application to know what key
is in use by any given record.
</p>
<p>
Conversely, Heap accesses records based on their
offset location within the database. You retrieve a
record in a Heap database using the record's Record ID
(RID), which is created when the record is added to
the database. The RID is created for you; you cannot
specify this yourself. Because the RID is created for
you, your application does not have control over the
key value. For this reason, retrieval operations for a
Heap database are usually performed using secondary
databases. You can then use this secondary index to
retrieve records stored in your Heap database.
</p>
<p>
Note that an application's data access requirements
grow complex, Btree databases also frequently require
secondary databases. So at a certain level of
complexity you will be using secondary databases
regardless of the access method that you choose.
</p>
<p>
Secondary databases are described in <a class="xref" href="am_second.html" title="Secondary indexes">Secondary indexes</a>.
</p>
</div>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h4 class="title"><a id="idm140654543535792"></a>Record Creation/Deletion</h4>
</div>
</div>
</div>
<p>
When Btree creates a new record, it places the
record on the database page that is appropriate for
the sorting order in use by the database. If Btree can
not find a page to put the new record on, it locates a
page that is in the proper location for the new
record, splits it so that the existing records are
divided between the two pages, and then adds the new
record to the appropriate page.
</p>
<p>
On deletion, Btree simply removes the deleted
record from whatever page it is stored on. This leaves
some amount of unused space ("holes") on the page.
Only new records that sort to this page can fill that
space. However, once a page is completely empty, it
can be reused to hold records with a different sort
value.
</p>
<p>
In order to reclaim unused disk space, you must run
the <a href="../api_reference/C/dbcompact.html" class="olink">DB->compact()</a> method, which attempts to fill holes
in existing pages by moving records from other pages.
If it is successful in moving enough records, it might
be left with entire pages that have no data on them.
In this event, the unused pages might be removed from
the database (depending on the flags that you provide
to <a href="../api_reference/C/dbcompact.html" class="olink">DB->compact()</a>), which causes the database file to be
reduced in size.
</p>
<p>
Both tree searching and page compaction are
relatively expensive operations. Heap avoids these
operations, and so is able to perform better under
some circumstances.
</p>
<p>
Heap does not care about record order. When a
record is created in a Heap database, it is placed on
the first page that has space to store the record. No
sorting is involved, and so the overhead from record
sorting is removed.
</p>
<p>
On deletion, both Btree and Heap free space within
a page when a record is deleted. However, unlike
Btree, Heap has no compaction operation, nor does it
have to wait for a record with the proper sort order
to fill a hole on a page. Instead, Heap simply reuses
empty page space whenever any record is added that
will fit into the space.
</p>
</div>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h4 class="title"><a id="idm140654543497664"></a>Cursor Operations</h4>
</div>
</div>
</div>
<p>
When considering Heap, be aware that this access
method does not support the full range of cursor
operations that Btree does.
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
On sequential cursor scans of the database,
the retrieval order of the records is not
predictable for Heap because the records are
not sorted. Btree, of course, sorts its
records so the retrieval order is predictable.
</p>
</li>
<li>
<p>
When using a Heap database, you cannot
create new records using a cursor. Also, this
means that Heap does not support the <a href="../api_reference/C/dbcput.html" class="olink">DBC->put()</a>
<code class="literal">DB_AFTER</code> and
<code class="literal">DB_BEFORE</code> flags. You
can, however, update existing records using a
cursor.
</p>
</li>
<li>
<p>
For concurrent applications, iterating
through the records in a Heap database is not
recommended due to performance considerations.
This is because there is a good chance that
there are a lot of empty pages in the database
if you have a concurrent application.
</p>
<p>
For a Heap database, entire regions are
locked when a lock is acquired for a database
page. If there is then contention for that
region, and a new database page needs to be
added, then Berkeley DB simply creates a whole
new region. The locked region is then padded
with empty pages in order to reach the new
region.
</p>
<p>
The result is that if the last used page in
a region is 10, and a new region is created at
page 100, then there are empty pages from 11
to 99. If you are iterating with a cursor,
then all those empty pages must be examined by
the cursor before it can reach the data at
page 100.
</p>
</li>
</ul>
</div>
</div>
<div class="sect3" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h4 class="title"><a id="idm140654543671840"></a>Which Access Method Should You Use?</h4>
</div>
</div>
</div>
<p>
Ultimately, you can only determine which access
method is superior for your application through
performance testing using both access methods. To be
effective, this performance testing must use a
production-equivalent workload.
</p>
<p>
That said, there are a few times when you
absolutely must use Btree:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
If you want to use bulk put and get
operations.
</p>
</li>
<li>
<p>
If having your database clustered on sort
order is important to you.
</p>
</li>
<li>
<p>
If you want to be able to create records
using cursors.
</p>
</li>
<li>
<p>
If you have multiple threads/processes
simultaneously creating new records, and you
want to be able to efficiently iterate over
those records using a cursor.
</p>
</li>
</ul>
</div>
<p>
But beyond those limitations, there are some
application characteristics that should cause you to
suspect that Heap will work better for your
application than Btree. They are:
</p>
<div class="itemizedlist">
<ul type="disc">
<li>
<p>
Your application will run in an environment
with constrained resources and you want to set
a hard limit on the size of the database file.
</p>
</li>
<li>
<p>
You want to limit the disk space growth of
your database file, and your application
performs a roughly equivalent number of record
creations and deletions.
</p>
</li>
<li>
<p>
Inserts into a Btree require sorting the
new record onto its proper page. This
operation can require multiple page reads. A
Heap database can simply reuse whatever empty
page space it can find in the cache.
Insert-intensive applications will typically
find that Heap is much more efficient than
Btree, especially as the size of the database
increases.
</p>
</li>
</ul>
</div>
</div>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm140654543637344"></a>Hash or Btree?</h3>
</div>
</div>
</div>
<p>
The Hash and Btree access methods should be used when
logical record numbers are not the primary key used for
data access. (If logical record numbers are a secondary
key used for data access, the Btree access method is a
possible choice, as it supports simultaneous access by a
key and a record number.)
</p>
<p>
Keys in Btrees are stored in sorted order and the
relationship between them is defined by that sort order.
For this reason, the Btree access method should be used
when there is any locality of reference among keys.
Locality of reference means that accessing one particular
key in the Btree implies that the application is more
likely to access keys near to the key being accessed,
where "near" is defined by the sort order. For example, if
keys are timestamps, and it is likely that a request for
an 8AM timestamp will be followed by a request for a 9AM
timestamp, the Btree access method is generally the right
choice. Or, for example, if the keys are names, and the
application will want to review all entries with the same
last name, the Btree access method is again a good choice.
</p>
<p>
There is little difference in performance between the
Hash and Btree access methods on small data sets, where
all, or most of, the data set fits into the cache.
However, when a data set is large enough that significant
numbers of data pages no longer fit into the cache, then
the Btree locality of reference described previously
becomes important for performance reasons. For example,
there is no locality of reference for the Hash access
method, and so key "AAAAA" is as likely to be stored on
the same database page with key "ZZZZZ" as with key
"AAAAB". In the Btree access method, because items are
sorted, key "AAAAA" is far more likely to be near key
"AAAAB" than key "ZZZZZ". So, if the application exhibits
locality of reference in its data requests, then the Btree
page read into the cache to satisfy a request for key
"AAAAA" is much more likely to be useful to satisfy
subsequent requests from the application than the Hash
page read into the cache to satisfy the same request. This
means that for applications with locality of reference,
the cache is generally much more effective for the Btree
access method than the Hash access method, and the Btree
access method will make many fewer I/O calls.
</p>
<p>
However, when a data set becomes even larger, the Hash
access method can outperform the Btree access method. The
reason for this is that Btrees contain more metadata pages
than Hash databases. The data set can grow so large that
metadata pages begin to dominate the cache for the Btree
access method. If this happens, the Btree can be forced to
do an I/O for each data request because the probability
that any particular data page is already in the cache
becomes quite small. Because the Hash access method has
fewer metadata pages, its cache stays "hotter" longer in
the presence of large data sets. In addition, once the
data set is so large that both the Btree and Hash access
methods are almost certainly doing an I/O for each random
data request, the fact that Hash does not have to walk
several internal pages as part of a key search becomes a
performance advantage for the Hash access method as well.
</p>
<p>
Application data access patterns strongly affect all of
these behaviors, for example, accessing the data by
walking a cursor through the database will greatly
mitigate the large data set behavior describe above
because each I/O into the cache will satisfy a fairly
large number of subsequent data requests.
</p>
<p>
In the absence of information on application data and
data access patterns, for small data sets either the Btree
or Hash access methods will suffice. For data sets larger
than the cache, we normally recommend using the Btree
access method. If you have truly large data, then the Hash
access method may be a better choice. The <a href="../api_reference/C/db_stat.html" class="olink">db_stat</a> utility is a
useful tool for monitoring how well your cache is
performing.
</p>
</div>
<div class="sect2" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h3 class="title"><a id="idm140654544240336"></a>Queue or Recno?</h3>
</div>
</div>
</div>
<p>
The Queue or Recno access methods should be used when
logical record numbers are the primary key used for data
access. The advantage of the Queue access method is that
it performs record level locking and for this reason
supports significantly higher levels of concurrency than
the Recno access method. The advantage of the Recno access
method is that it supports a number of additional features
beyond those supported by the Queue access method, such as
variable-length records and support for backing flat-text
files.
</p>
<p>
Logical record numbers can be mutable or fixed:
mutable, where logical record numbers can change as
records are deleted or inserted, and fixed, where record
numbers never change regardless of the database operation.
It is possible to store and retrieve records based on
logical record numbers in the Btree access method.
However, those record numbers are always mutable, and as
records are deleted or inserted, the logical record number
for other records in the database will change. The Queue
access method always runs in fixed mode, and logical
record numbers never change regardless of the database
operation. The Recno access method can be configured to
run in either mutable or fixed mode.
</p>
<p>
In addition, the Recno access method provides support
for databases whose permanent storage is a flat text file
and the database is used as a fast, temporary storage area
while the data is being read or modified.
</p>
</div>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="am_conf.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="am_conf.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="am_conf_logrec.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Chapter 2. Access Method Configuration </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Logical record numbers</td>
</tr>
</table>
</div>
</body>
</html>
Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists