Collectible Stocks and Bonds from North American Railroads             by Terry Cox

A guidebook and catalog of prices
(I neither buy nor sell stocks and bonds)
File formats  

View of the "Pros"

Every imaging book I've read advises saving images in TIF format. Writers say that because they know professionals tweak and adjust every image they touch and they may touch them many times. The TIF format is a widely-used lossless format. As professionals, they are interested in preserving every tiny bit of detail and color. Writers of imaging books tend to be uninterested in file sizes. I presume pros are equally uninterested probably because they may not need to retain files after their images reach their final destinations. Moreover, if pros retain images for future use, they may not have the same economic constraints as collectors when they need longtime storage.

I respect their views, but am less convinced TIFs are always the best choices for average collectors. I suspect average collectors intend to keep their images forever and will probably never adjust their images in the future. Storing 500 typical stock certificates as TIFs consumes about 8.5 Gigabytes of space. That is not a lot of space by today's standards. However, storing as JPGs at the highest possible quality requires only 1.5 Gigabytes. This is small enough for average collectors to carry their entire collection on flash drives or their smartphones. It is also sufficiently small to allow efficient online storage using services such as Carbonite, CrashPlan, iDrive and others. (I sincerely recommend home computer users use online storage to protect their data.)

File size is important when saving images over long periods, but file size is an even larger concern during manipulation, especially on average home computers. Large images often require more RAM (random access memory) than many computers are capable of providing. The RAM in many computers may be too limited to even open very large images. If performance drags, it may be necessary to replace the motherboard, the CPU (the main processor) and add new memory chips. Generally it is wiser to simply buy a new, more powerful computer. This can be an expensive solution for handling unnecessarily large scanned images.

Terminology

No format is perfect for all uses. Each file format supports features which make them good or bad for specific uses. I will compare formats in a table below, but first need to explain features so you will know what I am talking about.

Compression

Images always contain large amounts of repetitive information. Typical certificates , for instance, contain large areas of plain paper not covered by either color or printing. Compression is a way of saving file space by representing repetitive information with special codes. You can imagine image compression being like saying, "samples 887 through 921 in row 74 are all light purple."

Different file formats use different compression routines, although they fall into two categories. Lossless file formats faithfully keep all information during compression; lossy formats judiciously discard some information in order to control file size. Every computer program that uses those formats must contain matching "decompression" routines in order to decode files for display and use.

Browser display

A browser is a program that decodes information sent over the internet. Currently, the most popular browsers are (in order) Microsoft Internet Explorer, Mozilla Firefox, Google Chrome, Apple Safari, and Opera. Browsers designed for mobile devices now constitute about 5% of the browser market. Browser programs are constantly being updated, with later versions always exceeding previous versions in their capabilities. Huge numbers of old versions remain in use a decade after initial release.

Bit depth

Bit depth is a measurement of how accurately color or shades of gray can be stored. (See my discussion about Understanding Bits.) It represents how much digital information is used to describe color or shades of gray for every sample measured by a scanner. 1-bit accuracy can record only black or white. 8-bit accuracy can record 256 shades of gray or color. 24-bit accuracy (three color channels using 8 bits each) can record 16.8 million different colors. 48-bit accuracy can store 2.8 billion different colors.

Color model

Entry-level scanners commonly save color information in two ways. 'RGB' (red, green, blue) is the most common color model. Each color (or channel) is saved using 8 bits of memory. Therefore, each color can be described by 256 shades. By combining 256 possible shades of red, green and blue, a total of 24 bits of memory is used for every spot sampled and can record up to 16.8 million colors. Conveniently, color monitors can represent an equal number of colors using red, green and blue light.

A simpler, but less accurate method of recording color is to use a color table. Essentially, a computer program samples images and calculates an ideal selection of 256 colors out of the 16.8 million possible. Each of these 256 colors is given an index number. Next, every sampled spot on the image is assigned one of those 256 numbers based on the closest indexed color. Using a mere 256 colors, each sampled spot requires only 8 bits of memory and so file sizes are much reduced. Or course, color accuracy, is greatly decreased.

While monitors use red, green and blue light to show images, printers use pigmented dyes and inks. Printing with red, green and blue ink would create a sludge-colored mess. In order to create colored images on paper, printers use cyan, magenta and yellow (CMY) pigments. Printed together, CMY pigments have the ability to recreate the same colors as RGB light. This clever phenomena works because cyan is the spectral opposite of red, magenta the opposite of green, and yellow the opposite of blue. Professional offset printers and many ink jet printers add black to the mix to because cyan, magenta and yellow inks rarely create nice, bold blacks. BlacK is abbreviated as 'K' and the full color model becomes CMYK. Many better scanners allow images to be directly saved using CMY or CMYK color models.

LAB is a color model used in higher-priced image editing programs, but is preserved in its native form in only one major file format. 'L' stands for luminance and 'A' and 'B' are two other color channels (Channel A ranges from green to red and B ranges from blue to yellow.) Editing in the LAB color space takes much experience. I've never encountered a scanner that saves images in the LAB color model.

Transparency

When displaying images on color monitors, it is often desirable to make parts of images transparent. Some formats allow one color to be turned transparent and others use a special transparency channel.

Interlacing

As the internet gained popularity through the mid- and late-1990s, web site developers realized that users really hated waiting for images to download. If images took too long to appear when connecting to the web over ordinary phone lines, users would simply give up and move to different sites. In response, image formats were modified in order for web sites to display partial images very quickly and then gradually improve those images as more information flowed across the wires. Gradual image appearance was achieved by interlacing digital information as it was stored in image files. Interlacing was an absolute necessity with old 56K modems. Curiously, today's various browser programs do NOT always recognize interlacing. Today's broadband cable and DSL connections have pretty much eliminated the need for interlacing.

Metadata

Modern digital devices can large amounts of information in addition to images. Therefore, most file formats have been modified to record extraneous information such as creator's name, date, time of day, device settings, originating device type, even image location. This data is called metadata.

Format specifics

Most home scanners allow saving images in several formats. Image manipulation programs (Photoshop, etc.) normally allow conversion to additional formats, some of which are specialty formats or formats that are no longer used. Be aware that not all programs use all those formats nor all the features of those formats. You need to use formats appropriate to your target needs and you must realize that popularity and acceptance of formats changes with time.

Format TIF JPG JPG 2000 GIF PNG
Compression Lossless Lossy Lossless or lossy Lossless Lossless
Browser display No Yes Varies Yes Varies
Bit depths 1, 2, 3, 4, 5, 6, 7, 8, 24, 32, 48 8, 24, 32 8, 24, 32 1, 2, 3, 4, 5, 6, 7, 8 1, 2, 3, 4, 5, 6, 7, 8, 16, 24, 32, 48
Color model Gray, indexed, RGB, CMYK Gray, RGB, CMYK Gray, CMYK, LAB Indexed Gray, indexed, RGB
Transparency Alpha channel No Alpha channel One color One color or alpha channel
Interlacing No Yes Yes Yes Yes
Metadata Yes Yes Yes Yes Yes

Format features

TIF (TIFF) = Tagged Image File Format

The TIF format is the most widely used image format among professionals. The format has either minimal or no limits to pixel size, resolution, or bit depth. TIFs can be saved either in non-compressed form or compressed using LZW, ZIP or JPEG compression. Some programs have extended the TIF standard to accept transparency and multiple layers, although many programs ignore those features during use. The format is widely touted as being 'lossless.' I prefer to say that the save, re-open process is lossless. All images deteriorate (or at least change) when they are rotated, re-sized, re-sampled, and distorted. TIF images are not used on the web.

JPG (JPEG) = Joint Photographic Experts Group

JPGs (pronounced 'j-peg') are the most prominent file format used on the web and among non-professionals. The format is superb at saving space. The amount of compression is easily configurable. Professionals advise against saving in JPG format until after all image manipulation and color adjustment is finished. They say this because the format is 'lossy' and images lose resolution every time they are saved. Quality loss depends on the quality of imaging programs, the number of saves, resolution, and the degree of changes made during each open-save cycle.

JPG 2000

The JPG 2000 format is an improvement upon the JPG standard with added lossless compression and alpha channel transparency capabilities. Unfortunately, even eleven years after development, support for the format is still spotty, both among browsers and among image manipulation programs.

GIF = Graphics Interchange Format

When developed in the late 1980s, GIF images were pretty much restricted to the Compuserve system. Since then, GIF has become one of the de facto standard formats for use on the World Wide Web. Because the format relies on indexed colors, it is not recommended for storing high-resolution color. GIFs work very well for graphics that display areas of solid color; GIFs works less well or poorly on photographs displaying wide spectrums of shades. The developer of the format claimed that GIF was meant to be pronounced 'Jif', like the peanut butter. Practically everyone I encounter pronounces 'GIF' like 'gift' without the final 't'.

PNG = Portable Network Graphics

The PNG format was developed for web graphics. It was specifically designed to emulate or improve upon features of the GIF format while avoiding patent issues with the LZW compression routine used in GIFs. It is a well-designed, full-featured format decoded by all major web browsers. Unfortunately, support for all PNG features is spotty. When using PNGs for web display, be sure to test all browsers your audience is likely to use. Developers indicated that PNG should be pronounced 'ping.' I hate to irritate anyone, but I almost always hear it pronounced 'puh-nung'.

BMP = Bitmap Image File

This format is a patent-free file standard popularly developed by Microsoft and used by many Windows and OS/2-based operating systems. While the format can be effectively compressed by typical 'Zip' programs, native BMP files tend to be much too large for easy transmittal over slow web connections. While the format can be opened, manipulated and saved by a large number of Windows-based image manipulation programs, it does not seem superior to the widely-accepted TIF format. The BMP format never caught on among professionals and seems destined to remain that way.

PCX = Personal Computer eXchange

This format was developed by the ZSoft Corporation (now defunct) and was the native format for the PC Paintbrush program. A decade or more ago, it was a heavily-used DOS format but has since been greatly superceded by TIF, JPG and GIF standards. The format is very 'fluffy', meaning file sizes are overly large.

Relative sizes

Each format stores digital information differently, so file sizes can vary dramatically from format to format. As mentioned above, file sizes affect the amount of hard drive space they consume for longtime storage, but they also affect how well programs display and manipulate images. Programs running on older computers may be slow or unable to open very large images.

Here are the file sizes I recorded by scanning this typical stock certificate. Because every scanner and every program creates files of different sizes, absolute measurements are not the point. Instead, I stress it is important to recognize relative differences in file sizes. You WILL achieve different file sizes with your equipment, but the relative differences in sizes should be similar.

Format

File size
(megabytes)

Compression
PCX
17.293
none
TIF
11.223
none
BMP
11.218
none
TIF
10.270
LZW
GIF
7.982
GIF
lossless JPG 2000
7.125
0 compression (100% quality)
lossy, JPG 2000
6.171
0 compression (100% quality)
JPG
3.611
0 compression (100% quality)
JPG
0.987
20 compression (80% quality)
JPG
0.626
40 compression (60% quality)
JPG
0.460
60 compression (40% quality)
JPG
0.285
80 compression (20% quality)
JPG
0.081
100 compression (0% quality)

Considerations

Modern scanners give you options to save images in several formats. Which format you choose depends on:

  • What you want to use your image for, both immediately and in the future.
  • Your file storage constraints.

See What is your purpose for scanning? for more discussion.

Back to Scanning home page

Send an email message with corrections, questions or comments about this page.
(Last updated July 11, 2011)
 

 
Papermental logo Help support this free site! Please visit my eBay store called Papermental by Terry Cox. My inventory includes (or will include) railroad ephemera, newspapers, magazines, engravings, and all sorts of paper collectibles. The current inventory is about 1,700 items building toward an estimated 3,000.

Please contact me if you have certificates not yet listed. (See How You Can Help) Try to limit images to 250 Kb each.

Please contact the many fine dealers on my dealers page to buy certificates.