CFR: UFS directory hashing (fwd)

Cejka Rudolf cejkar at dcse.fee.vutbr.cz
Wed Jun 20 08:05:45 CEST 2001


Ahoj,
  mozna to bude nekoho zajimat. Vypada to jako docela pekna hracka
ve stylu hodne muziky za malo penez:

----- Forwarded message from Ian Dowse <iedowse at maths.tcd.ie> -----

Subject: CFR: UFS directory hashing
Date: Tue, 19 Jun 2001 20:16:44 +0100
From: Ian Dowse <iedowse at maths.tcd.ie>

If there are no objections, I would like to commit the UFS_DIRHASH
patch in the next few days. The current patch is at

	http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash.diff4

and if anyone is interested, there is a RELENG_4 version at:

	http://www.maths.tcd.ie/~iedowse/FreeBSD/dirhash_releng_4.diff

This patch has been discussed already on -hackers a few weeks ago,
and I have incorporated many of the suggested improvements. For
anyone who missed that discussion, a brief description is below.

Ian


The idea of this code is to maintain a throw-away in-core data
structure for large directories, allowing all operations to be
performed quickly without the need for linear searches. This new
behaviour is completely optional, and it is only enabled when the
kernel is compiled with "options UFS_DIRHASH".

The implementation uses a hash array that maps filenames to the
directory offset at which the corresponding directory entry exists.
A simple spillover mechanism is used to deal with hash collisions,
and some extra summary information permits the quick location of
free space within the directory itself for create operations.

The in-core data structures have a memory requirement approximately
equal to half of the on-disk directory size. Currently there are
two sysctls that determine which directories get hashed:

  vfs.ufs.dirhashminsize	Minimum directory on-disk size for which
				hashing should be used (default 2.5k).
  vfs.ufs.dirhashmaxmem		Maximum system-wide amount of memory to
				use for directory hashes (default 2Mb).

When the memory limit is reached, the implementation will try to
recycle dirhash memory from other directories as necessary. If it
fails to build up the hash structures for any reason, the code
reverts to the normal linear searches.

Even on a relatively slow machine (200Mhz P5), I'm seeing a file
creation speed that remains at around 1000 creations/second for
directories with more than 100,000 entries. Without this patch, I
get less than 20 creations per second on the same directory (in
both cases soft-updates is enabled).

-- 
Rudolf Cejka   (cejkar at dcse.fee.vutbr.cz;  http://www.fee.vutbr.cz/~cejkar)
Brno University of Technology, Faculty of El. Engineering and Comp. Science
Bozetechova 2, 612 66  Brno, Czech Republic



More information about the Users-l mailing list