Sindbad~EG File Manager
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Network partitions</title>
<link rel="stylesheet" href="gettingStarted.css" type="text/css" />
<meta name="generator" content="DocBook XSL Stylesheets V1.73.2" />
<link rel="start" href="index.html" title="Berkeley DB Programmer's Reference Guide" />
<link rel="up" href="rep.html" title="Chapter 12. Berkeley DB Replication" />
<link rel="prev" href="rep_twosite.html" title="Special considerations for two-site replication groups" />
<link rel="next" href="rep_faq.html" title="Replication FAQ" />
</head>
<body>
<div xmlns="" class="navheader">
<div class="libver">
<p>Library Version 18.1.40</p>
</div>
<table width="100%" summary="Navigation header">
<tr>
<th colspan="3" align="center">Network partitions</th>
</tr>
<tr>
<td width="20%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td>
<th width="60%" align="center">Chapter 12. Berkeley DB Replication </th>
<td width="20%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td>
</tr>
</table>
<hr />
</div>
<div class="sect1" lang="en" xml:lang="en">
<div class="titlepage">
<div>
<div>
<h2 class="title" style="clear: both"><a id="rep_partition"></a>Network partitions</h2>
</div>
</div>
</div>
<p>
The Berkeley DB replication implementation can be affected
by network partitioning problems.
</p>
<p>
For example, consider a replication group with N members.
The network partitions with the master on one side and more
than N/2 of the sites on the other side. The sites on the side
with the master will continue forward, and the master will
continue to accept write queries for the databases.
Unfortunately, the sites on the other side of the partition,
realizing they no longer have a master, will hold an election.
The election will succeed as there are more than N/2 of the
total sites participating, and there will then be two masters
for the replication group. Since both masters are potentially
accepting write queries, the databases could diverge in
incompatible ways.
</p>
<p>
If multiple masters are ever found to exist in a replication
group, a master detecting the problem will return
<a href="../api_reference/C/repmessage.html#repmsg_DB_REP_DUPMASTER" class="olink">DB_REP_DUPMASTER</a>. Replication Manager applications
automatically handle duplicate master situations. If a Base
API application sees this return, it should reconfigure itself
as a client (by calling <a href="../api_reference/C/repstart.html" class="olink">DB_ENV->rep_start()</a>), and then call for an
election (by calling <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a>). The site that wins the
election may be one of the two previous masters, or it may be
another site entirely. Regardless, the winning system will
bring all of the other systems into conformance.
</p>
<p>
As another example, consider a replication group with a
master environment and two clients A and B, where client A may
upgrade to master status and client B cannot. Then, assume
client A is partitioned from the other two database
environments, and it becomes out-of-date with respect to the
master. Then, assume the master crashes and does not come back
on-line. Subsequently, the network partition is restored, and
clients A and B hold an election. As client B cannot win the
election, client A will win by default, and in order to get
back into sync with client B, possibly committed transactions
on client B will be unrolled until the two sites can once
again move forward together.
</p>
<p>
In both of these examples, there is a phase where a newly
elected master brings the members of a replication group into
conformance with itself so that it can start sending new
information to them. This can result in the loss of
information as previously committed transactions are
unrolled.
</p>
<p>
In architectures where network partitions are an issue,
applications may want to implement a heartbeat protocol to
minimize the consequences of a bad network partition. As long
as a master is able to contact at least half of the sites in
the replication group, it is impossible for there to be two
masters. If the master can no longer contact a sufficient
number of systems, it should reconfigure itself as a client,
and hold an election. Replication Manager does not currently
implement such a feature, so this technique is only available
to Base API applications.
</p>
<p>
There is another tool applications can use to minimize the
damage in the case of a network partition. By specifying an
<span class="bold"><strong>nsites</strong></span> argument to
<a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is larger than the actual number of database
environments in the replication group, Base API applications
can keep systems from declaring themselves the master unless
they can talk to a large percentage of the sites in the
system. For example, if there are 20 database environments in
the replication group, and an argument of 30 is specified to
the <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> method, then a system will have to be able to
talk to at least 16 of the sites to declare itself the master.
Replication Manager automatically maintains the number of
sites in the replication group, so this technique is only
available to Base API applications.
</p>
<p>
Specifying a <span class="bold"><strong>nsites</strong></span>
argument to <a href="../api_reference/C/repelect.html" class="olink">DB_ENV->rep_elect()</a> that is smaller than the actual number
of database environments in the replication group has its uses
as well. For example, consider a replication group with 2
environments. If they are partitioned from each other, neither
of the sites could ever get enough votes to become the master.
A reasonable alternative would be to specify a <span class="bold"><strong>nsites</strong></span> argument of 2 to one of the
systems and a <span class="bold"><strong>nsites</strong></span> argument
of 1 to the other. That way, one of the systems could win
elections even when partitioned, while the other one could
not. This would allow one of the systems to continue accepting
write queries after the partition.
</p>
<p>
In a two-site group, Replication Manager by default reacts to
the loss of communication with the master by observing a
strict majority rule that prevents the survivor from taking
over. Thus it avoids multiple masters and the need to unroll
some transactions if both sites are running but cannot
communicate. But it does leave the group in a read-only state
until both sites are available. If application availability
while one site is down is a priority and it is acceptable to
risk unrolling some transactions, there is a configuration
option to turn off the strict majority rule and allow the
surviving client to declare itself to be master. See the
<a href="../api_reference/C/repconfig.html" class="olink">DB_ENV->rep_set_config()</a> method <a href="../api_reference/C/repconfig.html#config_DB_REPMGR_CONF_2SITE_STRICT" class="olink">DB_REPMGR_CONF_2SITE_STRICT</a> flag for more
information.
</p>
<p>
Preferred master mode is another alternative for two-site
Replication Manager replication groups. It allows the survivor
to take over after the loss of communication with the master.
When communications are restored, it always preserves the
transactions from the preferred master site. See
<a class="xref" href="rep_twosite.html#twosite_prefmas" title="Preferred master mode">Preferred master mode</a>
for more information.
</p>
<p>
These scenarios stress the importance of good network
infrastructure in Berkeley DB replicated environments. When
replicating database environments over sufficiently lossy
networking, the best solution may well be to pick a single
master, and only hold elections when human intervention has
determined the selected master is unable to recover at
all.
</p>
</div>
<div class="navfooter">
<hr />
<table width="100%" summary="Navigation footer">
<tr>
<td width="40%" align="left"><a accesskey="p" href="rep_twosite.html">Prev</a> </td>
<td width="20%" align="center">
<a accesskey="u" href="rep.html">Up</a>
</td>
<td width="40%" align="right"> <a accesskey="n" href="rep_faq.html">Next</a></td>
</tr>
<tr>
<td width="40%" align="left" valign="top">Special considerations for two-site replication groups </td>
<td width="20%" align="center">
<a accesskey="h" href="index.html">Home</a>
</td>
<td width="40%" align="right" valign="top"> Replication FAQ</td>
</tr>
</table>
</div>
</body>
</html>
Sindbad File Manager Version 1.0, Coded By Sindbad EG ~ The Terrorists