				    inotify
	     a powerful yet simple file change notification system



Document started 15 Mar 2005 by Robert Love <rml@novell.com>

(i) User Interface

Inotify is controlled by a device node, /dev/inotify.  If you do not use udev,
this device may need to be created manually.  First step, open it

	int dev_fd = open ("/dev/inotify", O_RDONLY);

Change events are managed by "watches".  A watch is an (object,mask) pair where
the object is a file or directory and the mask is a bitmask of one or more
inotify events that the application wishes to receive.  See <linux/inotify.h>
for valid events.  A watch is referenced by a watch descriptor, or wd.

Watches are added via a file descriptor.

Watches on a directory will return events on any files inside of the directory.

Adding a watch is simple,

	/* 'wd' represents the watch on fd with mask */
	struct inotify_request req = { fd, mask };
	int wd = ioctl (dev_fd, INOTIFY_WATCH, &req);

You can add a large number of files via something like

	for each file to watch {
		struct inotify_request req;
		int file_fd;

		file_fd = open (file, O_RDONLY);
		if (fd < 0) {
			perror ("open");
			break;
		}

		req.fd = file_fd;
		req.mask = mask;

		wd = ioctl (dev_fd, INOTIFY_WATCH, &req);

		close (fd);
	}

You can update an existing watch in the same manner, by passing in a new mask.

An existing watch is removed via the INOTIFY_IGNORE ioctl, for example

	ioctl (dev_fd, INOTIFY_IGNORE, wd);

Events are provided in the form of an inotify_event structure that is read(2)
from /dev/inotify.  The filename is of dynamic length and follows the struct.
It is of size len.  The filename is padded with null bytes to ensure proper
alignment.  This padding is reflected in len.

You can slurp multiple events by passing a large buffer, for example

	size_t len = read (fd, buf, BUF_LEN);

Will return as many events as are available and fit in BUF_LEN.

/dev/inotify is also select() and poll() able.

You can find the size of the current event queue via the FIONREAD ioctl.

All watches are destroyed and cleaned up on close.


(ii) Internal Kernel Implementation

Each open inotify device is associated with an inotify_device structure.

Each watch is associated with an inotify_watch structure.  Watches are chained
off of each associated device and each associated inode.

See fs/inotify.c for the locking and lifetime rules.


(iii) Rationale

Q: What is the design decision behind not tying the watch to the
open fd of the watched object?

A: Watches are associated with an open inotify device, not an
open file.  This solves the primary problem with dnotify:
keeping the file open pins the file and thus, worse, pins the
mount.  Dnotify is therefore infeasible for use on a desktop
system with removable media as the media cannot be unmounted.

Q: What is the design decision behind using an-fd-per-device as
opposed to an fd-per-watch?

A: An fd-per-watch quickly consumes more file descriptors than
are allowed, more fd's than are feasible to manage, and more
fd's than are ideally select()-able.  Yes, root can bump the
per-process fd limit and yes, users can use epoll, but requiring
both is silly and an extraneous requirement.  A watch consumes
less memory than an open file, separating the number spaces is
thus sensible.  The current design is what user-space developers
want: Users open the device, once, and add n watches, requiring
but one fd and no twiddling with fd limits.
Opening /dev/inotify two thousand times is silly.  If we can
implement user-space's preferences cleanly--and we can, the idr
layer makes stuff like this trivial--then we should.

Q: Why a device node?

A: The second biggest problem with dnotify is that the user
interface sucks ass.  Signals are a terrible, terrible interface
for file notification.  Or for anything, for that matter.  The
idea solution, from all perspectives, is a file descriptor based
one that allows basic file I/O and poll/select.  Obtaining the
fd and managing the watches could of been done either via a
device file or a family of new system calls.  We decided to
implement a device file because adding three or four new system
calls that mirrored open, close, and ioctl seemed silly.  A
character device makes sense from user-space and was easy to
implement inside of the kernel.
