Copyright © 2003, 2004, 2005 Iustin Pop, <[email protected]>
This is the usermanual for the cfvers project; homepage is at http://www.nongnu.org/cfvers/. You can also get new versions of this document there.
Revision: $Id: manual.html,v 1.4 2005/10/30 13:25:48 iusty Exp $
Making backup is an important aspect of system administration. The techniques of backing up data are explained in any good document about system administration, and they won't be explained here again.
However, the text configuration files are more suited to versioning systems than to full/incremental backups which are targeted at binary files and miscellaneous data. Unfortunately, the versioning systems are not very good at working directly live on the system: the main reasons are creation of extra-files, inability to cope with special files and with keeping permissions intact.
The working model of the classic versioning systems is one (or more) composed of a central repository (very precious) and a multitude of developer's workspaces, which hold semi-important data; by this I mean it's ok to delete or otherwise break a developer's workspace when no changes have been performed to it - all information can be restored from central repository.
In contrast, a versioning system designed for system configuration has its priorities almost reversed: the critical issue is with the filesystem, and the repository is secondary to that. This means that such a software must obey the following rules:
keep the system's integrity: the software must not do anything to the filesystem it hasn't been asked to do
treat the meta-data of versioned items to be as important as the data
when in doubt about the success of the operation, abort rather than do damage on the workspace
cfvers has been designed with these objectives in mind[1].
There are three components which need installing:
the python library
the command line utilities, cfv and cfvadmin
the cfversd server and its configuration files
If you don't run the server, you can run the cfv/cfvadmin scripts from the install directory, since it contains the python library and it will be picked from there. However, the recommended way is to install the python library in its proper place and the scripts to /usr/local/bin or /usr/bin.
The default ./configure invocation will install all these in their location: scripts in bin, server in sbin and the library in lib/python2.3
The configuration files needed by the server (in /etc/cfvers, if not overriden by command line arguments) are:
the logger configuration file, logging.cfg
the server configuration file, cfversd.conf
PYRO_STORAGE
. This variable should point to a writable directory used for temporary files. If it does not exist, Pyro will use the current directory (which could be even / for a daemon started from the init scripts). The other variable are not needed, but if you want to customize some parameters of the client-server communication, please see the Pyro documentation. Available settings include for example whether to use compression, how many connections to accept, etc.
If you will use the sqlite backend, no customization is necessary. Just choose a writable file in a writable directory; writable by the user who will be accessing the database (this is the server in remote configurationa and the tools in local configurations).
If you are using the postgresql backend, you need to create a database and (preferably) a separate user for the database. Remember the username and password as you will need to fill them in the configuration files.
Also, for the postgresql backend, the --name
argument to cfv find works only if you install the plpythonu server-side language and create the following function in the database:
CREATE OR REPLACE FUNCTION fnmatch (text, text) RETURNS boolean LANGUAGE plpythonu AS ' import fnmatch return fnmatch.fnmatch(args[0], args[1]) ';
How to create your first repository
decide wheter to use a client-server setup or direct access to the repository (this can be also remote, in case of postgresql)
decide on which back-end to use (either sqlite or postgresql for now)
Based on the above answers, create the configuration files.
local repository, sqlite; just create the configuration file ~/.cfvers:
[server] server_type=local repo_meth=sqlite repo_data=/path/to/file.db area=default
local repository, postgresql (first create a postgresql database).
[server] server_type=local repo_meth=postgresql repo_data=dbname=mydb user=myuser password=mypass area=default
remote repository;create the server configuration file (e.g. /etc/cfvers/cfversd.conf):
for sqlite:
[server] port = 9999 pidfile = /var/run/cfvers/cfversd.pid [repository] method=sqlite connect=/var/lib/cfvers/database [auth] users=user1 [user_user1] client_password=cpw server_password=spw valid_from=127.0.0.1,192.168.0.2 areas=default admin=true
for postgresql:
[server] port = 9999 pidfile = /var/run/cfvers/cfversd.pid [repository] method=postgresql connect=dbname=mydb user=myuser password=mypass [auth] users=user1 [user_user1] client_password=cpw server_password=spw valid_from=127.0.0.1,192.168.0.2 areas=default admin=true
[server] server_type=remote host=192.168.0.1 port=9999 username=user1 client_password=cpw server_password=spw area=default
run cfvadmin --init
in order to create the initial repository.
run cfv add ITEMS...
in order to register the items you want versioned.
run cfv store
in order to store the first version.
after every change to the system's configuration, rerun the cfvers store
command in order to update the versioned items. New items you want stored must be given in a separate call (cfvers add
).
schedule a cron job to watch for differences or do automatic commits.
I tried to keep cfvers as simple as possible. But I don't think I succeeded.
The repository is where the files are stored. The repository is manipulated using the cfvadmin command.
Right now, there are two backends implemented for the repository: postgresql-based and sqlite-based. The sqlite backend is very useful for small or standalone installations.
The repository contains areas in which files are stored; this allows to store files from different servers in the same repository. A repository must contain at least one area in order to be able to contain files. The areas are created with the cfvadmin create
command and displayed with cfvadmin info
.
An area has the following attributes:
The name of the area; you use this when referring to the area from the client, either in configuration files or with the -a
option to the cfv command
The root path on the filesystem for the files contained in this area; this allows you to define for example areas for chroot jails and refer to the files in the area using the path in the chroot.
Default value: /
A text describing the area, anything you like
The creation time of the area
The files to be versioned are represented by items. Note that an item doesn't contain actual file information, it represents the intent to track a file.
The attributes of an item:
The filename which this item represents; this is what will be tracked by cfvers;
The entries of an item are affected by the item's flag attribute. Currently, the flags can affect the following:
Amount of information to store. An entry can store for a file:
metadata (name, type, size, access/creation/modification times, owner/group, etc.)
checksum of the contents (for regular files, symbolic links and directories)
file contents (for regular files, symbolic links and directories)
add
--store=level
command, where level is one of metadata, checksum, full.
The kind of the item:
Regular file: if the flags is one of metadata, checksum or contents, the file will be stored as a regular file.
Virtual file: if the flags is virtual, the file will be stored as a virtual file.
Creation time (=registration time) for this item.
The area to which this item belongs.
If the item is a virtual one, this is the command line used to generate the contents.
Usually you will want to track regular files. This is acomplished by defining an item with a certain name and that name will be used as the name of the file to store in the repository.
However, there is another posibility: a virtual file. A virtual file is one whose contents is taken from the output of a command, not from a file in the filesystem. This can be useful for versioning system state, for example: partition tables, either as dd if=/dev/hda bs=512 count=1 or as sfdisk -d /dev/hda, system hardware configuration, as lspci -v, etc.
The command attribute of the item is used to generate the contents of the file. For the moment, both the standard output and the standard error are saved together. The exit code of the command is saved in the entry's exitcode attribute.
An entry represents the information about an item at a certain point in time.
The properties of an entry can be split into two group: own attributes and the attributes of the file it represents. Its own attributes are:
The item to which this entry belongs
The revision number of the revision this entry belongs
The status of this entry, meaning what kind of change to the file it represents. Currently, it can take one of the following values:
A - the entry represents the addition of an item to the area; it does not have any other contents (i.e. the file properties haven't been stored yet)
M - modified; this is a regular entry about a file being update
D - deleted; this is an entry about a file which can no longer be found in the filesystem; see Section 6.4 for more details about deletions
metadata properties of the file
the checksum of the file contents; applicable to regular files, symbolic links and directories;
the file contents; applicable to regulare files, symbolic links and directories; for directories, the contents is the list of filenames separated by newlines
A revision groups togheter entries which represent the state of the items tracked at a certain moment in time.
The area to thich revision belongs.
The revision number of this revision.
The server on which this revision was made.
The log message.
The creation time of this revision.
The numeric and textual representation of the credentials of the process which created this revision.
A textual description of the person or process of this revision; useful when the revision are made from root but you need a more detailed description.
This should be done only once, otherwise it destroys your data.
Generally, you only work with areas at the initial setup of your repositories, or when adding new servers to the setup. There are only two operations posibile on area: creation of a new area and displaying area information.
The item/entry operations can be split roughly in three groups:
storing files |
searching for files |
retrieving files |
The first step in order to track a file is to register it with the system:
Example 5. Registering files
$ cfv add -m "Log message" /etc/passwd /etc/group /etc/hostname Status: Added, revision 1 Time begin: 2004-09-26 15:35:02 EEST Time end: 2004-09-26 15:35:03 EEST Total skipped (error): 0 Total registered: 3 Total skipped (item already registered): 0 Total skipped (invalid name): 0 $
Then you need to actually order the system to store the contents of those files:
Example 6. Storing files
$ ./cfv store -m "Stored files" Status: Stored revision 2 Time begin: 2004-09-26 15:37:01 EEST Time end: 2004-09-26 15:37:02 EEST Total stored: 3 Total skipped (not changed): 0 Total skipped (error): 0 Total skipped (not registered): 0 Total marked deleted: 0 $
This is all there is to storing files.
You can make two kinds of searches: for files with a certain attributes, or for files for which the filesystem is not in sync with the repository.
Example 7. Search files by attribute
$ cfv find --name passwd -l -rw-r--r-- 2 root root 92 2004-04-30 00:32:04 /etc/pam.d/passwd -rw-r--r-- 2 root root 1594 2004-07-20 23:01:57 /etc/passwd $ cfv find --regex '.*[a-k]nes[^/]' /etc/X11/xkb/geometry/kinesis /etc/gconf/schemas/glines.schemas /etc/snmp/mib2c.column_defines.conf /etc/xpdf/xpdfrc-japanese $ cfv find --size '>' 950000 -d ------------------------- Entry for /etc/gconf/schemas/gnome-terminal.schemas File registerd at: 2004-09-26T15:45:18+0 Available revisions: 2 ------------------------- Entry for /etc/gconf/schemas/metacity.schemas File registerd at: 2004-09-26T15:45:18+0 Available revisions: 2 $
Example 8. Searching for modified files
s$ ./cfv diff -l /tmp/a $ ./cfv diff ===== Item /tmp/a (rev 2 -> current) File contents: --- /tmp/a Sun Sep 26 15:59:05 2004 (rev 2) +++ /tmp/a Sun Sep 26 15:59:17 2004 (current) @@ -1,1 +1,1 @@ -Sun Sep 26 15:59:05 EEST 2004 +Test Attribute mtime: - 2004-09-26 15:59:05 EEST + 2004-09-26 15:59:17 EEST Attribute ctime: - 2004-09-26 15:59:05 EEST + 2004-09-26 15:59:17 EEST Attribute size: - 30 + 5 Attribute sha1sum: - dc926ccb39a0c823680bdfeefe59057a6af727fc + 1c68ea370b40c06fcaf7f26c8b1dba9d9caf5dea $ ./cfv diff -l -c mtime /tmp/a $
Once you have found the files you want to retrieve, there are several things you can do with them:
restore them to the filesystem
display their contents
display information about their metadata (like stat)
export them in a tar archive
create a checksum file (SHA1SUM) for external tools to check
When a file which is tracked has been removed from the filesystem, cfvers will notice this at the next store
command and will register this deletion. The item in question will be displayed (by default) in the output of the command. Then, as long as the file hasn't been recreated, cfvers will ignore it. As soon as the file exists again, it will be tracked normally.
The deletion of a file is registered as an entry with status "D" in the repository. When it appears again, it will have a new status "M" entry.
This section should be very big. It's small because I didn't have time to fill it, not because cfvers is complete :-)
These are limitations or design decisions inherent to the POSIX specification or the GNU/Linux implementation. While developing cfvers, I found:
You can't change the ctime of an inode. This is by design in the POSIX filesystem layer: the ctime is for metadata modifications, and the mtime/atime pair for data write/read accesses. Thus a ctime modification would trigger a ctime modification, since the ctime itself is part of metadata, rendering useless the ctime modification :). A read attribute for the metadata would be innapropriate, I think, because such reads are made in a great amount.
utimes(2) and chmod(2) acts on the destination of a symlink (when given an argument which is a symlink). I can't think why anyone would like this (you could always expand the symlink using readlink, but right now you can't act on the symlink!).
[1] | However, nobody said it attained these goals - after all, it software! |