True, but it 'feels' as if that captures more form than content. And that's what made CDDB1 fail ultimately -- I can easily imagine that, say, two successive releases of some software development kit have the same number of files, dirs, and size, and only differs in file sizes and file contents.
Any differences between these two will be found in the volume descriptors, i.e. the system area of the CD-ROM.
What I've experimented with so far is to create a MD5 hash from all the
ISO9660 file descriptor blocks followed by at most 20 further evenly spaced
blocks counted from block 16 (that is, with LBA = 16 + n*32 or something like that.
It doesn't capture file structure, which I would like to include.
I would also like to include media size and recording method (is it a CD-ROM, or a CD-R etc.) to ensure simple copies could be recognized.
But I've realised that it's going to be well-nigh impossible to test it properly, so it's rather on back-burner for now.