| Page Image
Compression for Mass Digitization |
| In late 2006, Harvard University Library, the California Digital Library, the Internet Archive, and the Bibliothèque nationale de France conducted a collaborative investigation of the the use of lossy JP2 compression for mass digitization of texts. We documented our findings in the IS&T Archiving 2007 Conference Proceedings. We encourage you to consult the published paper, or this preprint, when using any of the following Harvard test suite images, or reviewing the evaluation reports. |
||||
| Harvard University Library Test Suite | ||||
| Text pages | b/w image | |||
![]() |
![]() |
![]() |
![]() |
![]() |
| 003176581_0007 | 003176581_0008 | 003298279_0001 | 006393844_0001 | 006051784_0002 |
| Text + b/w illustration | Color images | |||
![]() |
![]() |
![]() |
![]() |
|
| 003298279_0004 | 002010967_0026 | 002024214_0033 | 006393844_0008 | |
Production notes for the Harvard test suite The digitized pages in this suite were selected to represent a segment (but not the full range) of page characteristics for volumes published in the 19th and 20th centuries. This test suite contains nine book pages. Click on any thumbnail to access:
The IS&T paper provides details of the Peak Signal to Noise Ratio (PSNR) and mean square error (MSE) functions we used in Aware and Kakadu respectively to optimize human-perceived quality of text and illustrations in lossy-compressed images. The LuraWave codec provides quality settings ranging from a low of Q1 to a high of Q100. We encourage you to consult the full paper for an explanation of our research questions, methodology, and findings. May 2007 Text Digitization Resources Library Preservation at Harvard home HUL Office for Information Systems home |
||||