Researchers Develop Computer Techniques To Bring Blacked-Out Words To Light
Gemplus information security lab director David Naccache and Dublin City University computer science graduate student Claire Whelan developed a software program that identified words that were blacked-out in several confidential documents using computer-based techniques.
The Eurocrypt software analyzed an April Defense Department memorandum to the White House and determined that the blacked-out word in the sentence “An Egyptian Islamic Jihad (EIJ) operative told an xxxxxxxx service at the same time that Bin Ladin was planning to exploit the operative’s access to the US to mount a terrorist strike” was most likely “Egyptian.”
One program repositioned the document to correct a slight misalignment due to its placement on a copying machine, and a second program extrapolated that it was written in the Arial font; Naccache and Whelan estimated the number of blacked-out pixels and then employed a computer to determine the pixel length of words in the dictionary when rendered in Arial.
The program disregarded all words that were not within three pixels of the assumed length of the blacked-out word, and applied semantic rules to whittle down the number of possible words from 1,530 to seven. “Egyptian” was selected as the most likely candidate, based on the context of the document.
After demonstrating Eurocrypt at a security conference in Switzerland earlier this year, Naccache said the deciphering of blacked-out words could be complicated with the employment of optical character recognition technology to rescan documents and change fonts.
Freedom of Information Act experts expressed concern that the government might censor documents to an even greater degree, using Naccache and Whelan’s technique as an excuse.
Abstracted by the National Law Enforcement and Corrections Technology Center(NLECTC) from the New York Times (05/10/04) P. C4; Markoff, John .