Next: Database Maintenance Up: The SNO Database: SNODB Previous: Database Structure

Networking and Database Checking

The HEPDB database package comes with a distribution system for transmitting updates to and/or from a central repository. It is based on a master-slave protocol in which one node keeps the primary copy of the database, implements updates to that copy, and distributes updates to secondary copies at the slave nodes. Since it is critical to maintain a certain degree of control over the contents of the database, individual members of the collaboration will not in general be permitted to implement updates to the master database. Other large collaborations typically give control to one person (hereinafter referred to as the database ``czar'') who receives update submissions from individuals responsible for various segments of the database. After a reasonable number of these requests are accumulated and tested, they are incorporated into the master database, verified, and then sent out to the slave nodes and verified. Ideally, this should happen on roughly a monthly basis. Certainly more frequent updates will occur during detector commissioning.

The files and programs HEPDB uses to implement this master-slave protocol are described below:

cdserv is the program which implements changes in local database files. It reads configuration information from the hepdb.names file in the directory pointed to by CDSERV.
hepdb.names is an ASCII file that is used to configure each database management system. It is created and/or modified with the script customize_hepdb_names.pl, called by the main setup script, snodb.perl. Permissions for reading and writing to each database file are set and information about which nodes can modify each database file, where those files are, the paths to certain directories used for managing the database files, whether the current node is a master or slave server, etc., are stored in the names file.
Some sample hepdb.names files are given in Appendix B.
journal files are files that contain the information necessary for the server (cdserv) to make updates to the database files named in the file hepdb.names. They are produced by both master and slave servers.
CDSERV is an environmental variable (UNIX) or a global symbol (VMS) that contains the name of a directory with a names file. It is accessed by each job that needs to find out from the hepdb.names file where the database files and directories are. Only one value for CDSERV is possible for a given server process.
the queue directory is a particular directory (defined in the file hepdb.names) into which applications put journal files. If the application (e.g., a calibration) is running on a slave node, files in the slave's queue directory are moved to the master node by cdmove (see below).
the todo directory is a particular directory (defined in the file hepdb.names) from which the local server reads journal files to be processed as updates to the local copy of the database. Updates are transferred to the todo directory on slave nodes from the master node by cdmove.
the master server is the central repository for the overall database network system configuration, the place where updates received from slave nodes or created on the master node itself are applied to the official database and the place from which updates are propagated to all the slave nodes via cdmove. On the master server the queue and todo directories are the same. This means that, unlike slave nodes, updates which are made on the master node are processed by the master's cdserv without cdmove having to operate.
a slave server is a server for which the queue and todo directories are different. A slave server can make an update to the official database only by submitting its update to the master node, which then propagates the update back out to the slave node (and all other slave nodes). A slave server cannot change its local official database without the update first going through the master server. (One can, however, make temporary local changes to the official database by using a ``test database'' as described in Section 8.2
cdmove is a program which moves journal files (updates) between machines. The names of directories that are sources and targets for moved files are stored in the file hepdb.names on the machine on which the cdmove program is run (for SNO, this machine will be the master server node). cdmove either moves journal files from a slave node to the master node when the slave node is submitting an update, or from the master node to the slave nodes once a suite of updates have been made and the database czar decrees the official database shall be updated.
cdmove makes use of the .netrc file residing on the master server. This file contains machine name, login username and login password information for each slave server. The typical username is cdsno. Password changes in the cdsno (or equivalent) account must be propagated to the czar.
Plans are underway to replace the .netrc file with a less obvious and encrypted file to enhance system security.

The master database server for SNO will probably be placed at a location with good access to the internet, with all other institutions acting as slave servers. The configuration for each server or application is determined by the hepdb.names file in the directory pointed to by the particular CDSERV variable which the job is given. When accessing information, a database application reads directly from the database file pointed to in the hepdb.names file. The cdserv program on the master server will only run when it is deemed appropriate by the database czar. Journal files resulting from applications requesting changes to the database will be automatically written to the queue directory by the database application through interaction with HEPDB and its cdserv program. Since the slave sites' queue and todo directories are not identical, the slave servers do not process the file and hence do not insert the changes into the slave servers' local database. Instead, the database czar, upon determining that a sufficient number of database updates are ready, starts the cdmove program, which in all likelihood will run on the master server. This program moves the journal files from the slaves' local queue directories to the todo directory of the master server.

Before the master server's cdserv program is allowed to process the journal files the database czar must check for the integrity of the submitted database changes. (It is for this reason that the master server's cdserv program does not run continuously; otherwise the journal files copied over by the cdmove program would be automatically inserted into the master server's official copy of the database by cdserv, and then propagated out to the entire collaboration by cdmove.) The way this check is done is as follows: a full ``check'' copy of the database is created, complete with a copy of all the requested changes in the form of the journal files copied from remote slave nodes by cdmove and copies of all the system interfaces' database write requests (see Section 7.2). A cdserv process, as well as all appropriate system interface routines which write to the database (e.g., DAQ, CMA, etc.), are started. These process the copied journal files and interface write requests, respectively, and thereby implement the submitted database changes to the database copy. Then, a comprehensive SNOMAN test job is run to check that the submitted updates are okay. Note that the SNOMAN test job has not yet been created, and probably will be only after we have a better idea what to look for.

Once the checking procedure is satisfactorily completed, the official cdserv process is started. The master server then processes the journal files, updating the master copy of the relevant database files. The master server also puts copies of the journal files in the appropriate directories for distribution to those slave sites that are identified in the master's hepdb.names file. The cdmove program looks in those directories, and moves the copies of the journal files to the todo directory at each slave site. An email message will be sent to local database contacts informing them that an update set has been propagated and that they can process the updates. Alternatively, a slave node can run the cdserv process continuously. By processing these files the official local copies of the database are updated.

The basic master-slave interaction for the case in which a slave server submits an update is shown in Fig. 3. This figure depicts the master-slave interaction for the illustrative case in which the slave server ``slave1'' submits an update. This update is processed by slave1 which puts the resulting journal file into its queue directory (step ``1''). Some time later, the master node starts cdmove which moves the files from slave1 to its todo/queue directory (step ``2''). Assuming the ``check'' copy checks out okay, the master node's server applies this update to the official database and places copies of the update in the form of journal files in special local directories, one for each slave node in the system (steps ``3''). Some time later, cdmove places these files in the todo directories on the corresponding slave nodes (steps ``4''). The cdserv processes running on the slave nodes then apply the update to the local copy of the official database (steps ``5'').

A process dedicated to serving the DAQ read requests will run continously on the ``site node'' (a slave server running on site) and as such will be monitored but will not be regularly stopped and started by the database czar. This is because the DAQ will be making read requests whenever it wants to load in new constants for the electronics (say), and it should not have to wait for the next time the database czar decides to update the database. Database reads use the program sdb_output_titles which does not generate journal files nor does it require the cdserv process to be running, so reads will have no impact on the regular operations of the other parts of the database.

Until the site gets a good connection to the internet, the master node will reside elsewhere. Under this scenario, updates made on site-of which there will be many, especially during the commissioning phase-will not actually appear in the site's official copy of the database until the minimum time it takes for updates to propagate from the site, to the master, and back again. For large update sets, this time lag will be too long.

To overcome this problem, a ``mirror'' database will be implemented on site. The SNODB mirror consists of a separate server on site, running as a master, along with a process which manages the placement of local updates in the relevant todo and queue directories. The mirror takes local updates and applies them to the site's mirror database immediately, as well as submitting the same updates for propagation to the master node. These updates will eventually get processed on the master node and propagated back to the site node, where they will be applied to the local official copy of the database. Since other nodes will also be submitting updates which we will want applied to both the site mirror and official databases, but we do not want updates which originated from the site node getting re-applied to the site mirror database, a special filter has been implemented. The mirror database is depicted diagrammatically in Fig. 3. See Sec. 4.4 for details on how to set up the mirror on the site node. (Users at other nodes should avail themselves of a test database if they wish to see locally-generated updates quasi-instantly.)

Next: Database Maintenance Up: The SNO Database: SNODB Previous: Database Structure

cdsno@higgs.hep.upenn.edu
Mon Aug 10 17:56:28 EDT 1998