Search this website for information about collecting stocks and bonds.
Coxrail database explained
The purpose of this page. I have been approached several times to advise on the creation of a guide to certificates similar to mine. Since my guide depends completely on the compilation of information, I always explain how experience demanded that I keep information in a large database. Since the subject of database construction would make no sense without a purpose, I am going to assume that readers only come to this page when wanting to learn more about my experiences.
What is a "database?" In the broadest sense, a database is nothing more than a collection of electronic information that is organized in such a fashion that its data can be retrieved, managed and updated. A database can be likened to an electronic filing cabinet.
Does Microsoft Excel qualify a database? According to the broad definition above, it certainly can be classified as a database, especially now that the latest version of Excel can accommodate 1,048,576 rows! Excel holds information in columns and rows and its spreadsheet structure can be searched for information. However, it is crucial to understand that Excel was designed as a robust type of calculator that can display results in the form of spreadsheets. It is not designed for information retrieval. While Excel has grown greatly more powerful and flexible with time, it remains seriously limited by comparison to true database programs.
Flat databases share a strong resemblance to Excel's spreadsheet appearance. A flat database has only two dimensions. Whereas spreadsheets hold data in "rows" and "columns", databases hold data in "tables" composed of "fields" and "records". While this might sound like a distinction without a difference, the resemblance of database tables to spreadsheets is quite deceiving. While spreadsheets are highly organized, database tables can be imagined as large sacks of unorganized data. It is only when users retrieve information that form is given to that information. If a user asks for only ten records and five fields, the database program gives him that and nothing more. Conversely, spreadsheet programs must hold all information in memory and find specific records only through sorting. Spreadsheets make calculations and sorting easy, but information retrieval becomes increasingly difficult with larger and larger numbers of contained rows and columns.
Relational databases, however, are dramatically more powerful than either spreadsheet programs or flat databases. Relational databases store data in multiple tables that relate to each other through one or more common fields. While spreadsheet programs can access data across multiple spreadsheets, database programs are highly optimized for super-fast information retrieval. For instance, it is a trivial matter to discover where an image might be found for a specific serial number of a specific certificate variety, regardless of whether images of that variety have appeared in one reference or a thousand.
Microsoft Access. After some experimentation and research, I chose Microsoft Access as my database of choice. It is both highly flexible, powerful and well-suited for my needs of storing information about collectible certificates. Prior to my decision to learn Access, I had stored data in the Lotus 1-2-3 program, the #1 pioneer spreadsheet program released in 1983. Although I had used Lotus from its inception and was highly proficient, I realized its limitations. I ultimately moved my information to Nutshell, a flat file database sometime around 1989. A couple years later, I migrated to a relational version of Nutshell called Nutshell-Plus. Desiring even more flexibility and programming power, I moved within a few months to Microsoft Access, probably around 1992.
A taxonomic approach. I figure I have spent between 30,000 and 35,000 hours compiling information about certificates plus even more on a couple other specialties. Early in that period, it became obvious that information needed to be organized with a taxonomic approach similar to methods used for identifying and classifying animals and plants.
The basic idea of those systems is to start with one, plainly obvious feature and ask the user to choose between a few (ideally, only two or three) less obvious features. For instance, tree classification systems usually depend upon examining leaves. Those systems first ask users to decide whether leaves are needle-like, scale-like, broad or clustered.
From that point, classification systems offer choices between several progressively finer sub-divisions, each time offering users narrow ranges of choices. In well-defined systems, users should be able to identify exact species after making only eight or ten decisions. Biological classification systems are rarely perfect because of the natural variability of individuals. With living things, there is always the possibility of cross-breeding and the discovery of new species.
That is the exact circumstance we experience with collectible certificates.
Classifying certificates is a very organic activity. We can usually make the first distinction very quickly by using company names. After that, we can decide whether certificates can be defined as stocks, bonds or other security documents. From that point, many certificates offer obvious distinctions based on vignettes. After that, we might need to make a few more finer distinctions. And of course, it is apparent that certificates, like living things, often display individual variations such as serial numbers, types of issuances, cancellations and autographs.
Balancing usability and data organization. Like other taxonomic classification systems, I wanted users to be able to identify varieties as quickly as possible. Unfortunately, companies are living, breathing, evolving and dying entities, just like plants and animals. That means that as much as we might desire, companies rarely chose the same approach when they created security documents. Consequently, features of certificates never fit cleanly into inflexible categories. My organization, therefore, had to embrace flexibility.
The basic structure. As mentioned above, I use a relational database system. Information with similar purpose is grouped into several separate tables and the tables are linked, one to another, by single fields. In the Coxrail database, I use these eight tables to describe and organize certificates:
These eight tables constitute the backbone of all information offered to readers of the catalog and users of the online database. However, behind the scenes are numerous additional tables that record tremendous amounts of ancillary information. Those tables are used to record a host of other subjects surrounding railroads and certificate collecting in general.
Website users will encounter some of that additional information in the form of special note pages that accompany the majority of company listings. Other tables are used for private purposes and research in conjunction with the main purpose of listing and evaluating certificates. I advise anyone using my project as a model for their own cataloging efforts to consider using at least some of these concepts in their databases. Note that all these tables are linked to the crucial certificate tables in some manner. Additional tables of supporting information include:
I suggest anyone considering undertaking a catalog project spend quite a bit of time in the planning process. Understand that as soon as you start your project, you will encounter people willing to contribute information. Or more accurately, those people will find you. That's a good thing. Just make sure that you set up methods of recording all the information they will send. The more information you record up front, the less you will need to go back and re-visit your original sources.
Unless absolutely necessary, avoid abbreviating information. If you do abbreviate, make sure you are consistent and record your acceptable abbreviations somewhere.
Make sure you plan on expansion. I almost guarantee that many of you will consider a spreadsheet an acceptable method of recording information in the beginning. The problem is that regardless of your initial planning, your project will grow in ways you never imagined. The more information you record, and the more directions you are pulled, the sooner you will realize that spreadsheets are woefully inadequate for the task.
You will notice from my introduction that I started with a Lotus spreadsheet. In my consulting business, I have encountered the same thinking (that spreadsheets are adequate for data storage) time and again. Please, use my experience. Start with a database in the beginning and you will be happier overall. Yes, databases are somewhat hard to understand and there is a steep learning curve. No doubt about it. But, nonetheless, the effort is worth it in the long term. Remember, you are going to be doing this for years to come and, ideally, your project will outlast you.
What if you choose a specialty that is too small? In a very small cataloging project, whether involving certificates or something else, a spreadsheet system might very well prove to be the appropriate method of data storage. However, the smaller the size of the project, the more important it is to consider whether the effort is worth the trouble. Why not use your drive and precious time for a larger specialty? For something that will help more people?
I faced that exact problem with coal company certificates. There may have been thousands upon thousands of coal companies, but relatively few certificates exist. And the number of dedicated collectors in that specialty is very limited. However, because I was once a coal geologist, I was drawn to the collecting specialty. Being a pragmatist, I accept that the specialty is too small for a separate effort.
However, with the large railroad certificate database already in progress for twenty-plus years, I realized a simple solution. All I needed to do was add one new field to the CompanyNames table mentioned above. It merely needed to hold eight characters. In that field, I put a code for coal companies ("COA") or rail companies ("RRY".) In only a few cases (50 so far), I must use the code "RRY, COA". Those few overlap companies use the same company numbers for both specialties (for example, DAY-190, Dayton Coal Iron & Railway Co.) Because I am using the "specialty" code field to discriminate between the two company types, I don't need to worry about potential duplication of company numbers. In fact, the "specialty " code freely allows duplicate certificate variety numbers with no problems whatsoever.
Will I ever add another specialty? I am afraid I have too few years left. At the moment, I see no reason to increase my effort further. However, it begs the question, "Could a database like mine handle multiple specialties in the detail I currently use for railroad and coal certificates?"
With the warning that more thought would need to go into initial database design, I don't see any valid reason that a database could not handle tens or even hundreds of specialties. Of course, at some point, even the Access database would probably need to be replaced with a more robust database program such as SQLServer or Oracle. I have already built two Access databases holding over a couple million records each and have experienced no performance problems at those sizes. Nonetheless, I suspect the Oracle/SQLServer options would be a better choice if considering enlarging my project multi-fold.
What about the online database? The online version of the Coxrail database is a subset of my main database. While I upload a new version of my Access database to my web host every two weeks, web users actually use SQLServer to retrieve information. SQLServer is a program that works well in a multi-user environment, whereas Access stumbles under the load of only a few concurrent users. Because I am coding my web pages with Classic ASP, I don't need to worry about what is really happening between web users and my Coxrail database.
Will you need to learn programming if you want to offer your database online? Short answer: "Yes."
Still, I wouldn't worry about that right now. By the time you build your own database, create custom data entry forms and structure your own database queries, you will know enough to build a database-driven website. When you reach that point, it won't matter whether you choose to construct your website with Classic ASP, ASP.NET, PHP or something else; you will be able to tackle online databases with minimal new effort.
Send an email message with corrections or comments about this page.
(Last updated Jan 16, 2016)
I strongly recommend buying the Cox Catalog from your favorite
If they do not yet carry, or are out of stock, you may buy directly from the author.
Help support this free site! Please visit my eBay store called Papermental by Terry Cox. My inventory includes railroad passes, railroad ephemera, newspapers, magazines, engravings, and all sorts of paper collectibles.
I suggest using WeTransfer or similar file transfer sites when sending large files or large numbers of files.
PLEASE contact the many fine dealers listed on my dealers page to buy certificates.