Xerox copiers found to rewrite documents without OCR

We’ve just posted the following news: Xerox copiers found to rewrite documents without OCR[newsimage][/newsimage]

A German computer scientist has discovered that several Xerox copiers randomly manipulate figures in his scanned documents even though he specifically selects not to use OCR and where the digits are clearly readable, yet clearly different to the original document.

            Read the full article here: [](

            Please note that the reactions from the complete site will be synched below.

I hope some way my bank is using these machines and it will increase the amount of money I have in it. Like add 6 zeros to the end

I saw this story and thought it was amazing.

The implications behind this are incredible particularly for businesses copying invoices or banks photocopying account details etc.


JBIG2 was never meant for text scanning, only for pictures. It analyzes the scan and replaces information with as much of the same information as possible. That’s what lossy compression is.

Say if you have a nice picture with the same 20 brown items on it, it will remove 19 an replace each with a copy of the first one, therefore greatly reducing the file size. Only if you carefully examine the picture you will notice the strange artefacts.

Sadly, it’s not for text, since it will sometimes not see the difference between a 6 or a 8 (amongst other things), therefore giving lots of errors.

All it’s doing is recognizing “similar” patches of the image and coalescing them, which is what it’s supposed to do, according to the standard. Yes, it’s too aggressive for text.

Xerox already responded:

The problem stems from a combination of compression level and resolution setting. The devices mentioned are shipped from the factory with a compression level and resolution that produces scanned files which are optimized for viewing or printing while maintaining a reasonable file size. We do not normally see a character substitution issue with the factory default settings however, the defect may be seen at lower quality and resolution settings.

The Xerox design utilizes the recognized industry standard JBIG2 compressor which creates extremely small file sizes with good image quality, but with inherent tradeoffs under low resolution and quality settings.

For data integrity purposes, we recommend the use of the factory defaults with a quality level set to “higher.” In cases where lower quality/higher compression is desired for smaller file sizes, we provide the following message to our customers next to the quality settings within the device web user interface: “The normal quality option produces small file sizes by using advanced compression techniques. Image quality is generally acceptable, however, text quality degradation and character substitution errors may occur with some originals.”

I remember running across even ‘fine’ settings that had small 4- and 6-point-font numbers altered by a mere fleck of errant toner-dust. A Zero would appear as an 8,

Alas, none of those zeroes-replaced-as-8’s were padding my accounts by kajillions. Too bad. I’d have never mentioned it, you can bet. “14,000,000 instead of 14,888,888” wouldn’t have been THAT big of deal, after all, right?!! And with a wave of a hand (er, smudge by a finger), the copies were re-run and someone blew across the pages to retain the Zero Values. Drat. Failed again!

It sounds like all those rubber “CERTIFIED TRUE COPY” stamps now need to also proclaim “CERTIFIED TRUE COPY NOT USING JBIG2”.

Why a copier needs to scan a document as a lossy format is beyond me. Does it store the data to be stolen sent to the manufacture to spy on users “improve” the product?

TSJ, the only scenario I can think of is… you’re scanning a picture and your scanner only has X amount of memory in it. The full image has to be scanned - into some RAM or volatile memory - first, and then the Print occurs. Once the print is successful, then that memory is flushed.

If the picture scanned requires X+1 RAM capacity, what are you gonna do? “Compress the image.” JBIG2 could mean scanners have very small memory but this algorithm will produce that “11 blocks of Red #2” when actually there may be eleven blocks of slight different Red shades.

But it’s obviously a shabby Text Scanner!

It also changed this picture sorry I just could not help myself

[QUOTE=Mr. Belvedere;2695348]Xerox already responded:[/QUOTE]It doesn’t seem like you’d want the default settings to alter things. I would think the defaults should leave it alone. The user should have to initiate the potentially lossy settings.

Xerox is patching this bug: