Why the Nabster watermarking system doesn't work



Date
September 15th 2003


Author
Michael Spath



Introduction

ADVERTISEMENT

A month ago, a cdfreaks newsposting caught my attention : a quite amazing article entitled Nabster -- The Holy Grail Of Online Piracy Detection had been published on mi2n.com, introducing a new anti-piracy technology called Nabster which claimed to handle all types of digital content. Since I got interested in DRM and copy protections many years ago, I lost count of all announcements of ultimate solutions which finally failed, and I usually don't bother looking at them. But this article was so dithyrambic that I decided to give it a try ; I have not been disappointed.

Nabster overview

Nabster is a software package for distribution and piracy detection of various types of digital content. It uses the so called Digital Interactive Fingerprinting (DIF) technology to watermark files on-the-fly, i.e. between the time a user requests a file and the time this user receives it. The goal of Nabster is to embed in the delivered file some unique information related to this transaction, so that if any copy of this file is found later on the Internet or on a P2P network, the leak can be traced back to the person who downloaded it first. The current DIF technology can handle 28 file formats, including the most popular musical, video, graphics and document formats. Here's a graphical description of the Nabster system, taken from pan.com :

ADVERTISEMENT

The PAN network offers a 30 days free trial version of Nabster 2.0b, which currently runs only on Apache/FreeBSD/x86. The package comes as a single zip file and contains 22 files, whose most important ones are :

  • nabster : main binary
  • dif-scan.html : html interface to the Nabster scanning system
  • htaccess.nabster : an .htaccess template to run Nabster with Apache
  • config.dat : main configuration file, which contains the admin password
  • license.dat : license file
  • access.dat : a log file of all transactions
  • block.dat : ip addresses which are banned from the Nabster system
  • errors.dat : error log file
  • scindex.dat : information about the already scanned files
  • today.dat : log file of today's transactions
  • user.dat : log file of all transactions
  • titles.dat : details about all files available for download
  • data.txt : the watermark file

People concerned with privacy should know that Nabster uses "calling home" licensing scheme : once a day, the nabster binary will send a request to http://dif.pan.com/cgi-bin/remote/syscheck.cgi with the local host name, the administrator email address, the nabster version, and the magic data.txt number (which, as I'll explain later, allows pan.com to know how many files have been downloaded from your site).

ADVERTISEMENT

According to the PAN Network, a patent is pending for the DIF technology, but I could not find any such application in the USPTO archive ; also according to PAN itself, the DIF technology has been invented in 1986. Has the patent been pending for 17 years or did the PAN Network realize very late that their technology was worth a patent, I didn't dare to ask.

Also a quick look at nabster binary shows that it is actually made of 2 perl scripts which have been turned into an ELF binary via IndigoSTAR's perl2exe (try for instance 'nabster -p2x_debug'). Although perl is very handy to handle Internet communications (thanks to its URI, HTTP, MIME, etc packages), it does not sound like a natural choice for a watermarking application. Also one can wonder why the Nabster package is only available for FreeBSD/x86, since perl2exe can in a few seconds generate binaries for Linux, Solaris, HP-UX, etc.


Nabster in action

Now let's try the Nabster system with an example : I create an empty directory and copy there Clean_Song.mp3, then htnabster.access as .htaccess and start Apache. Next I browse this directory and download the file as any external client would do : 



Thanks to the .htaccess file
there, Apache knows that download requests of .mp3 files in this directory have
to be handled by the cgi-bin/nabster binary. Therefore, the file my browser
receives actually comes from the standard output of /cgi-bin/nabster and differs
from the original one in /tmp : the file I receive is now watermarked and
therefore I will now refer to it as Watermarked_Song.mp3. Next, to simulate a
pirate trying to distribute this file, I upload it to an external site, then
load dif-scan.html. The standard Nabster verification interface looks like this
:


When I submit the file for verification, the target URL and the administrator password are passed to the nabster binary through a cgi-bin POST query. The nabster binary will then check the administrator password versus the one stored in config.dat, donwload the target file into the /tmpfiles directory, search it for a fingerprint and finally delete it. As expected, the file is identified and the details of the transaction are displayed :



Inside DIF fingerprinting

The DIF fingerprint is actually an 8 bytes value which is the dual precision floating point representation of the integer stored in data.txt. When Nabster receives a download request, it reads the 8 bytes currently in data.txt, increments it and uses this new value as fingerprint for this transaction. Then this fingerprint is both inserted in the sent file and stored in the local user.dat, access.dat, today.dat log files and in data.txt. This means that data.txt always contains the fingerprint which has been used for the last transaction.

To be properly detected by Nabster, this fingerprint has to be located in the target file between two identical arbitrary bytes or just after a 0x0A byte (because nabster reads binary files as multiple strings). Therefore, the DIF fingerprinting method requires to change at least 8 consecutive bytes in the target file, and the problem for Nabster is of course to find the best location where to insert it. Nabster handles this problem in a very crude way, by simply searching for 10 consecutive identical bytes in the original file, then overwriting bytes 2 to 9 with its fingerprint. Nabster only makes a distinction for .mp3 and .wma files to avoid corrupting their header, but apart from this the same basic search algorithm is used for all types of files. When a file does not contain 10 consecutive identical bytes, Nabster cannot fingerprint it.

Let's come back to the previous example and see in details what happened :

1) When I requested Clean_Song.mp3, data.txt was containing the byte sequence 80 EC 4E E5 CE BD D6 42, which corresponds to the integer 100017903057842. In Clean_Song.mp3, the following bytes could be seen :

2) The file that nabster binary sent me (Watermarked_Song.mp3) instead contained at the same location the following bytes :

Bordered by two null bytes, we see 8 bytes which, once unpacked, correspond to the integer 100017903057843, which is the old value in data.txt plus one.

3) When later I used the DIF scanning interface to check Watermarked_Song.mp3, Nabster found in this file these 8 bytes, and then searched for this value in user.dat.

The corresponding entry was found and the details of the transaction revealed.
Now let's modify any of these 8 bytes, upload the file again and check it :


 The fingerprint is no longer detected.

DIF performances

With a good watermarking scheme, information added to a file is be imperceptible to normal users, and to achieve this goal many watermarking methods use perceptual masking algorithms. The DIF system does not use any such algorithm to lower the impact of the fingerprint insertion in the data, but instead it entirely relies on the file formats : it keeps the changes very local in the file, and hopes that the modifications will also be very local to its content, thus imperceptible. I ran a few tests on mp3, avi, jpg and pdf files and indeed it seems indeed that most of the times the content does not appear corrupted by the DIF fingerprint.

Now from the security point of view, the DIF fingerprint is very weak. First, due to its static insertion algorithm, for a given file the fingerprint will always be located at the same place. This means that a simple comparison between two versions of the same file downloaded by two different persons will show the watermark. This fingerprint is also very easy to detect even without knowing its value, because it has to fulfil many requirements

  • on its position : the 8 bytes have to be located in the file between two identical bytes or just after a 0x0A byte, and before any 10 identical consecutive bytes.
  • on its value : these 8 bytes represent a dual precision floating point number, but to be a valid fingerprint this number has to be an integer. Besides, if the attacker has downloaded two files from the same site, the fingerprints will actually represent incrementing integers, and therefore can be trivially identified.

Finally, not only is the DIF fingerprint easy to locate, but once found it is trivial to destroy with a single bit change (for instance in one of the border bytes). Furthermore, we can even recover the original master file, since we know that the 8 replaced bytes had to be the same as the previous and next ones before being replaced. We could even imagine that a malicious pirate could replace the value of the fingerprint in the file he downloaded with another (smaller) value, thus possibly getting another person accused for the leak.

Conclusion

The DIF watermarking algorithm gives an average imperceptibility, but very weak security : given any fingerprinted file, it is very easy to locate and destroy the watermark, and to recover the exact original file. The PAN Network seems mainly committed to help distribution of independent artists, which I can only respect and encourage. However, watermarking and copy-protection are real jobs which should be handled by professionals, and giving artists a false sense of security with a broken product does not help them.

Secure distribution of digital content is an incredibly complicated task, not only from the technical point of view, but also because it seems nearly impossible to content all distributors, artists, and customers. And while alphabet-soup lobbies (RIAA, MPAA) desperately try to fight P2P networks with legal or sabotage actions like panicked dinosaurs which saw a comet, others try to find smarter solutions. Instead of trying to close this Pandora's box, these organisations would better focus on proposing more attractive products and services to their clients, because as history has proven several times, no corporation can compete with the distributed intelligence of many Internet users.

These last months have been very active in the DRM/P2P world and several new products are expected soon from Macromedia and Sunncomm. Let's keep our eyes open and get ready for new surprises.

This article has been written by Michael Spath (spath@cdfreaks.com) a long time contributor to our website and moderator of our advanced optical storage discussions forum.

No posts to display