Sindbad~EG File Manager
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Architecting Data Store and Concurrent Data Store applications</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="cam.html" title="Chapter 10. Berkeley DB Data Store and Concurrent Data Store Applications" />
<link rel="prev" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications" />
<link rel="next" href="gsg_bdb.html" title="Quick Start Guide" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 18.1.40</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Architecting Data Store and Concurrent Data Store applications</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="cam_fail.html">Prev</a> </td>
<th width="60%" align="center">Chapter 10. Berkeley DB Data Store and Concurrent Data Store Applications</th>
<td width="20%" align="right"> <a accesskey="n" href="gsg_bdb.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="cam_app"></a>Architecting Data Store and Concurrent Data Store applications</h2>
</div>
</div>
</div>
<p>
When building Data Store and Concurrent Data Store
applications, the architecture decisions involve application
startup (cleaning up any existing databases, the removal of
any existing database environment and creation of a new
environment), and handling system or application failure.
"Cleaning up" databases involves removal and re-creation of
the database, restoration from an archival copy and/or
verification and optional salvage, as described in <a class="xref" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications">Handling failure in Data Store and Concurrent Data Store applications</a>.
</p>
<p>
Data Store or Concurrent Data Store applications without
database environments are single process, by definition. These
applications should start up, re-create, restore, or verify
and optionally salvage their databases and run until eventual
exit or application or system failure. After system or
application failure, that process can simply repeat this
procedure. This document will not discuss the case of these
applications further.
</p>
<p>
Otherwise, the first question of Data Store and Concurrent
Data Store architecture is the cleaning up existing databases
and the removal of existing database environments, and the
subsequent creation of a new environment. For obvious reasons,
the application must serialize the re-creation, restoration,
or verification and optional salvage of its databases.
Further, environment removal and creation must be
single-threaded, that is, one thread of control (where a
thread of control is either a true thread or a process) must
remove and re-create the environment before any other thread
of control can use the new environment. It may simplify
matters that Berkeley DB serializes creation of the
environment, so multiple threads of control attempting to
create a environment will serialize behind a single creating
thread.
</p>
<p>
Removing a database environment will first mark the
environment as "failed", causing any threads of control still
running in the environment to fail and return to the
application. This feature allows applications to remove
environments without concern for threads of control that might
still be running in the removed environment.
</p>
<p>
One consideration in removing a database environment which
may be in use by another thread, is the type of mutex being
used by the Berkeley DB library. In the case of database
environment failure when using test-and-set mutexes, threads
of control waiting on a mutex when the environment is marked
"failed" will quickly notice the failure and will return an
error from the Berkeley DB API. In the case of environment
failure when using blocking mutexes, where the underlying
system mutex implementation does not unblock mutex waiters
after the thread of control holding the mutex dies, threads
waiting on a mutex when an environment is recovered might hang
forever. Applications blocked on events (for example, an
application blocked on a network socket or a GUI event) may
also fail to notice environment recovery within a reasonable
amount of time. Systems with such mutex implementations are
rare, but do exist; applications on such systems should use an
application architecture where the thread recovering the
database environment can explicitly terminate any process
using the failed environment, or configure Berkeley DB for
test-and-set mutexes, or incorporate some form of long-running
timer or watchdog process to wake or kill blocked processes
should they block for too long.
</p>
<p>
Regardless, it makes little sense for multiple threads of
control to simultaneously attempt to remove and re-create a
environment, since the last one to run will remove all
environments created by the threads of control that ran before
it. However, for some few applications, it may make sense for
applications to have a single thread of control that checks
the existing databases and removes the environment, after
which the application launches a number of processes, any of
which are able to create the environment.
</p>
<p>
With respect to cleaning up existing databases, the database
environment must be removed before the databases are cleaned
up. Removing the environment causes any Berkeley DB library
calls made by threads of control running in the failed
environment to return failure to the application. Removing the
database environment first ensures the threads of control in
the old environment do not race with the threads of control
cleaning up the databases, possibly overwriting them after the
cleanup has finished. Where the application architecture and
system permit, many applications kill all threads of control
running in the failed database environment before removing the
failed database environment, on general principles as well as
to minimize overall system resource usage. It does not matter
if the new environment is created before or after the
databases are cleaned up.
</p>
<p>
After having dealt with database and database environment
recovery after failure, the next issue to manage is
application failure. As described in <a class="xref" href="cam_fail.html" title="Handling failure in Data Store and Concurrent Data Store applications">Handling failure in Data Store and Concurrent Data Store applications</a>, when a thread of control in a
Data Store or Concurrent Data Store application fails, it may
exit holding data structure mutexes or logical database locks.
These mutexes and locks must be released to avoid the
remaining threads of control hanging behind the failed thread
of control's mutexes or locks.
</p>
<p>
There are three common ways to architect Berkeley DB Data
Store and Concurrent Data Store applications. The one chosen
is usually based on whether or not the application is
comprised of a single process or group of processes descended
from a single process (for example, a server started when the
system first boots), or if the application is comprised of
unrelated processes (for example, processes started by web
connections or users logging into the system).
</p>
<div class="orderedlist">
<ol type="1">
<li>
The first way to architect Data Store and Concurrent
Data Store applications is as a single process (the
process may or may not be multithreaded.)
<p>
When this
process starts, it removes any existing database
environment and creates a new environment. It then
cleans up the databases and opens those databases in
the environment. The application can subsequently
create new threads of control as it chooses. Those
threads of control can either share already open
Berkeley DB <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> and <a href="../api_reference/C/db.html" class="olink">DB</a> handles, or create their
own. In this architecture, databases are rarely opened
or closed when more than a single thread of control is
running; that is, they are opened when only a single
thread is running, and closed after all threads but
one have exited. The last thread of control to exit
closes the databases and the database
environment.
</p><p>
This architecture is simplest to implement because
thread serialization is easy and failure detection
does not require monitoring multiple processes.
</p><p>
If the application's thread model allows the process
to continue after thread failure, the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>
method can be used to determine if the database
environment is usable after the failure. If the
application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or
<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>,
the application must behave as if there has been a system
failure, removing the environment and creating a new
environment, and cleaning up any databases it wants to
continue to use. Once these actions have been taken, other
threads of control can continue (as long as all existing
Berkeley DB handles are first discarded), or restarted.
</p><p>
Note that by default <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> will only notify the
calling thread that the database environment is unusable.
However, you can optionally cause <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> to broadcast
this to other threads of control by using the
<code class="literal">--enable-failchk_broadcast</code> flag when you
compile your Berkeley DB library. If this option is turned
on, then all threads of control using the database
environment will return
<a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>
when they attempt to obtain a mutex lock. In this
situation, a <code class="literal">DB_EVENT_FAILCHK_PANIC</code> or
<code class="literal">DB_EVENT_MUTEX_DIED</code> event will also be
raised. (You use <a href="../api_reference/C/envevent_notify.html" class="olink">DB_ENV->set_event_notify()</a> to examine events).
</p></li>
<li>
The second way to architect Data Store and
Concurrent Data Store applications is as a group of
related processes (the processes may or may not be
multithreaded).
<p>
This architecture requires the order
in which threads of control are created be controlled
to serialize database environment removal and
creation, and database cleanup.
</p><p>
In addition, this architecture requires that threads
of control be monitored. If any thread of control
exits with open Berkeley DB handles, the application
may call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method to determine if the
database environment is usable after the exit. If the
application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>, or
<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>,
the application must
behave as if there has been a system failure, removing
the environment and creating a new environment, and
cleaning up any databases it wants to continue to use.
Once these actions have been taken, other threads of
control can continue (as long as all existing Berkeley
DB handles are first discarded), or restarted.
</p><p>
The easiest way to structure groups of related
processes is to first create a single "watcher"
process (often a script) that starts when the system
first boots, removes and creates the database
environment, cleans up the databases and then creates
the processes or threads that will actually perform
work. The initial thread has no further
responsibilities other than to wait on the threads of
control it has started, to ensure none of them
unexpectedly exit. If a thread of control exits, the
watcher process optionally calls the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>
method. If the application does not call <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a>,
or if <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>, the environment can no
longer be used, the watcher kills all of the threads
of control using the failed environment, cleans up,
and starts new threads of control to perform
work.
</p></li>
<li>
The third way to architect Data Store and Concurrent
Data Store applications is as a group of unrelated
processes (the processes may or may not be multithreaded).
This is the most difficult architecture to implement
because of the level of difficulty in some systems of
finding and monitoring unrelated processes.
<p>
One
solution is to log a thread of control ID when a new
Berkeley DB handle is opened. For example, an initial
"watcher" process could open/create the database
environment, clean up the databases and then create a
sentinel file. Any "worker" process wanting to use the
environment would check for the sentinel file. If the
sentinel file does not exist, the worker would fail or
wait for the sentinel file to be created. Once the
sentinel file exists, the worker would register its
process ID with the watcher (via shared memory, IPC or
some other registry mechanism), and then the worker
would open its <a href="../api_reference/C/env.html" class="olink">DB_ENV</a> handles and proceed. When the
worker finishes using the environment, it would
unregister its process ID with the watcher. The
watcher periodically checks to ensure that no worker
has failed while using the environment. If a worker
fails while using the environment, the watcher removes
the sentinel file, kills all of the workers currently
using the environment, cleans up the environment and
databases, and finally creates a new sentinel
file.
</p><p>
The weakness of this approach is that, on some
systems, it is difficult to determine if an unrelated
process is still running. For example, POSIX systems
generally disallow sending signals to unrelated
processes. The trick to monitoring unrelated processes
is to find a system resource held by the process that
will be modified if the process dies. On POSIX
systems, flock- or fcntl-style locking will work, as
will LockFile on Windows systems. Other systems may
have to use other process-related information such as
file reference counts or modification times. In the
worst case, threads of control can be required to
periodically re-register with the watcher process: if
the watcher has not heard from a thread of control in
a specified period of time, the watcher will take
action, cleaning up the environment.
</p><p>
If it is not practical to monitor the processes
sharing a database environment, it may be possible to
monitor the environment to detect if a thread of
control has failed holding open Berkeley DB handles.
This would be done by having a "watcher" process
periodically call the <a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> method. If
<a href="../api_reference/C/envfailchk.html" class="olink">DB_ENV->failchk()</a> returns <a class="link" href="program_errorret.html#program_errorret.DB_RUNRECOVERY">DB_RUNRECOVERY</a>,
the watcher would then
take action, cleaning up the environment.
</p><p>
The weakness of this approach is that all threads of
control using the environment must specify an "ID"
function and an "is-alive" function using the
<a href="../api_reference/C/envset_thread_id.html" class="olink">DB_ENV->set_thread_id()</a> method. (In other words, the
Berkeley DB library must be able to assign a unique ID
to each thread of control, and additionally determine
if the thread of control is still running. It can be
difficult to portably provide that information in
applications using a variety of different programming
languages and running on a variety of different
platforms.)
</p></li>
</ol>
</div>
<p>
Obviously, when implementing a process to monitor other
threads of control, it is important the watcher process' code
be as simple and well-tested as possible, because the
application may hang if it fails.
</p>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="cam_fail.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="cam.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="gsg_bdb.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Handling failure in Data Store and Concurrent Data Store applications </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Quick Start Guide</td>
</tr>
</table>
</div>
</body>
</html>
Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists