Content Addressable Memory
Introduction
When a requirement exists for searching large memory stores, conventional random access memory (RAM) becomes a major speed limitation. To avoid this limitation, machines that use multiple processors have been developed. While this development is considered to be a product of the late 1980s, the need for multiple or parallel architectures for this purpose was clearly foreseen early in the development of computers.
Content-addressable memory (CAM) is a storage device that stores data in a number of cells and can locate any piece of data based on cell content. CAMs are also referred to by other nomenclatures:
o Associative store (6)
o Data-addressed memory (7)
o Catalog memory (8)
o Multiple instantaneous response file (9)
o Parallel search file (10)
o Parallel search memory (11)
Designers and users of computer systems have long been aware of the fact that inclusion of content-addressable or “associative” functions in the storage and retrieval mechanism would allow a more effective and straightforward organization of data than with the usual addressed memories. The requirement is so obvious and the potential benefits so great that numerous memory circuits have been developed to implement parts of the CAM concept. Furthermore, the superiority of CAM for a large number of applications is so overwhelming that, despite the lack of such machines, a large amount of work has already been done to detail how such a machine would be used.
The method discussed here makes possible content based location of patterns or strings of arbitrary length in a block of memory of arbitrary size. This is accomplished by adding two parallel processing capabilities to standard RAM circuitry. The first sets a flag in memory at each instance of the search pattern. The second rapidly locates
the matches. Both of these stages are implemented without affecting the use of memory as conventional RAM.
The hardware required to implement the CAM function is a linear enlargement of standard RAM circuitry. This is significant because it will be cost effective for large memory size. All of the circuit concepts known to me, which have been implemented earlier, while functionally workable, required additional hardware proportional to an exponent of the memory size, thereby becoming cost prohibitive for large memory sizes. Current commercial implementations do not exceed 2048 search elements. The bottom line is that all proposed methods resulted in either unacceptably long settling times, caused by signal delays at many gates, or exponential growth of address generating circuitry. Quoting Kohonen (1978) (5):
“For this reason, hardware CAM’s with more than a few thousand memory locations are not practicable, unless, some radically new principles are invented.”
In addition to the linear relationship of additional circuitry in my method, CAM implementation is stackable. This means that memory may be increased indefinitely by the simple addition of memory modules.
CAM Applications
The potential of content addressing is explained by envisioning its use in a few applications.
Databases: Consider a portable combat computer being used to process realtime information. The data is obtained from a satellite communications link and may be any one of hundreds of data types each with a different record format. Using a CAM, the data is simply loaded into memory serially as received without sorting or structuring. To search for a particular piece of information (code word “charlie” for example), the search key (“charlie” in this case), is entered on a “search buss”. All memory locations (there could easily be 8 billion) simultaneously check their contents for the search key. With a CAM, this operation is completed in only a few clock cycles. Then, assume a second search is needed. A new type of non-alphabetic code word, “ch##%ly” never seen before has been discovered. It is expected to be embedded in a video stream, also something never seen before. Using a CAM, these changes do not create any new difficulties. No special sorting program is needed. The search runs just as fast as the previous search.
Pattern searching: A radar set loads a CAM with signal data obtained during the previous scan cycle. No data sorting is required. Signal patterns of interest are placed on the search buss. Each pattern match responds with an associated data address, thereby establishing its spatial origin. Testing for a range of patterns also runs very quickly because a special feature of CAM’s is that they resolve the “no match” condition very fast. This allows testing many patterns very quickly.
Learning: An autonomous vehicle is sent on a reconnaissance mission. As its sensors receive new input, the CAM is searched to see if the vehicle has ever encountered the patterns before. If so, the memory dumps the associated attributes. If not, the new pattern, with its associated attributes are simply dumped into the next empty memory location. The searches are fast since the entire memory is searched simultaneously. The storage of new data is fast, since no sorting is required and no memory reorganization is needed to maintain structure. The architecture of the CAM function essentially gives the vehicle a “real time mental recall” property.
A key factor in the three examples above is the ability of a CAM to search unstructured data. That is, no data presort is required. This allows a CAM to search for arbitrary structures without a requirement to know what those structures are before the data is placed into memory.
BACKGROUND
In 1945, Vannevar Bush suggested that the “filing of documents and data by ASSOCIATIONS, rather than by indexing, should be used.(1) A concept for a memory system that was accessed by content rather than by memory address was described by Slade in 1956.(2) The topic generated great interest at the time because it was already observed to be a better model for human intelligence than an address structured data store. By 1961, experiments had been done to combine associated memory principles with self-organizing networks in an attempt to achieve artificial intelligence. The machine that was built was called the Perceptron.(3) While the Perceptron architecture was too weak to implement artificial intelligence of practical value, it was seminal for other branches of information science, namely, pattern recognition and neural networks.(4)
The current view of associative processing may still be summarized by a paragraph from the 1989 DARPA study of neural networks(31):
“Widespread interest in such models seems to have waned by 1975. Although many application areas were explored, these early memory models had little technological impact. The available technology favored the use of digital random access memories….”
After 50 years, hardware content-addressable memories (CAMs) have only found their way into special roles such as small buffer memories and network control units because of the appreciable amount of electronic circuitry needed for every bit stored.(4)
STATE-OF-THE-ART CONCEPTS
There are currently two basic implementations of CAM: data dependent memory mapping (software methods) and hard wired networked search elements (hardware).
“It is remarkable that both of these principles were invented almost simultaneously, around 1955; this shows that with the advent of the first commercial computer systems there already existed a strong need for content addressing. While over twenty years have now passed (1987 report) since the introduction of these methods, no essentially new issues have been presented.”(4)
Software
The predominant software method used to access data by content is called HASH CODING. In principle, the data, in its binary numeric form, is interpreted as a memory address. Since each distinct data item also has a distinct binary code, each data item would also have an individual memory location. To locate the data, the search key would simply be converted to binary and that value used as a data address in a conventional memory.
The major problem with hash coding is that, in its basic form, for most types of data, it requires an unacceptable amount of memory storage. To hash a simple dictionary of alphabetical words with only 8 letters, AAAAAAAA through ZZZZZZZZ requires 26 to the 8th power (200 Gigabytes) of address space. Furthermore, most of the memory will be empty since there are no words like aaaabbbb or jjjjkk, etc. In fact, even the largest English dictionaries which include words up to 33 letters only claim around 650,000 words(12). The technology for implementing Hash Coding is actually focused on solving this problem with numerous techniques to compress the memory requirement. These include: forcing structure on the allowed data; mathematical transforms on the data values to reduce their magnitude; arbitration schemes for handling multiple data items that hash to the same address; and pointer tables to allow the hash algorithms to be handled iteratively.
In summary, as a data set becomes large and less structured and memory efficiency becomes more critical, hashing software becomes complex and takes a lot of time to execute. This is true, not only for searching but even more so for the sort operation needed to structure the data to begin with.
Hardware
Again quoting Kohonen(4), to construct a hardware CAM in principle, three elements are needed:
“1) There must exist a rapid method for broadcasting a search argument to all memory locations. 2) An individual comparison must be performed. 3) All matching data and other associated information must be collected and read out in succession.”(4)
The first element is already implemented in current RAM technology. A set of wires (the data bus) is connected in parallel to all memory locations. Using this bus, the memory devices are capable of writing data to any location in memory in a single write cycle. If all memory locations were simultaneously enabled for writing, they could all be written to, theoretically, as fast as any single location could be. The only consideration which must be addressed is the practicality of the power dissipation required to simultaneously alter the state of so many logic elements. This is not a trivial practicality. It appears to be manageable, however, especially with CMOS and low voltage technology now in use.
The second element, data comparison, can also be reasonably implemented in numerous forms depending on the complexity of the data comparison functions desired. Evaluation for equality, for example, can be implemented with about the same number of gates as is conventionally required to store the data.
What has presented the major stumbling block for all previous hardware implementations of cam has been the ability to locate, collect and readout the data associated with a search key in configurations that allow MULTIPLE MATCHES. This is the problem that I have solved.
Particular advantages of the proposed CAM concept
o The memory can be interacted with either as CAM or conventional memory. The CAM function is essentially switched on and off.
o The hardware required to implement the CAM is linearly related to memory size.
o The condition where no match exists is always determined with a single search cycle.
o The field size, for grouping and handling data is irrelevant. That is, field lengths can be different throughout memory for every piece of associated data. The length of each data item can be changed as well, even dynamically.
o The memory may be partitioned (masked) dynamically into blocks of any size (on whole word boundaries). This allows simple adaptation to multi-tasking software. It also allows selected segments to be declared CAM while the remainder are conventional, and for the declaration to be changed dynamically.
o The concept is “stackable,” so that memory can be expanded by simple addition of memory modules. This allows total memory size to be user configured.
o Software used to drive CAM searches is consistent with and easily adapted to conventional languages. A CAM search, even though it truly runs as a massively parallel system, looks to software like a Von Neumann process.
o The greatest advantages of CAM will occur in instances where data sets are large and complex and there are few matches in memory.
Discussion of some expected performance trade-offs
Hardware
o Extra gates in the memory will reduce RAM mode memory access speed. Preliminary circuit layouts suggest this will be about 20%. This may not result in a system speed reduction since most systems are CPU rate limited.
o Additional circuitry will reduce conventional memory density. This is estimated to be by a factor of 2.4. In systems that are hardware space limited (spacecraft for example), this is a significant issue and requires a trade-off between operating code, data space and computational efficiency. For non-space critical systems, i.e., PCs and fixed location earth-based installations, the trade-off is primarily costs associated with fabricating larger circuits (e.g., yield, reliability reduction, etc.)
o Power consumption is expected to increase substantially during CAM searches. Present memory technology (MOS) requires current to switch logic states. During conventional RAM operation, only one memory address changes at a time. During CAM operation, however, the entire memory is simultaneously active and the power required will be proportional to the number of matches found, i.e., a 1-Meg memory, during the CAM compare cycle, could use as much as one million times the power of a single address access if all addresses matched. Since a CAM search is expected to be performed at low duty cycles (less than 1%), and the quantity of matches relatively small compared to memory size, the overall power increase is not expected to be significant and preliminary simulations suggest power surges are also manageable .
Software
o New software will be needed to operate the CAM. However, because the CAM can be hardware switched, existing software using RAM conventionally will not require modification.
o Because CAM works well with unstructured data, new software written to implement it is expected to be much simpler than existing software to accomplish similar functions.
BIBLIOGRAPHY
1. Bush, V., Alt. Mon. 176, pp. 101, 1945.
2. Slade, A.E. and H.O. McMahon, A Cryptron Catalogue Memory System, Proceeding of the Eastern Joint Computer Conf., Dec. 1956.
3.Rosenblatt, F., Principles of Neurodynamics: Perceptron and the Theory of Brain Mechanisms, Spartan Books, Washington, DC, 1961.
4.Kohonen, T., Content-Addressable Memories, Springer-Verlag 1987, Second Edition.
5.Kohonen, T., Associative Memory – A System Theoretical Approach, Communication and Cybernetics, Vol. 17, Springer 1978.
6.Lewin, M.H., RCA Rev. 23, pp. 215-229, 1962.
7.Newhouse, V.L., R.E. Fruin, Electronics 35, pp. 31-36, 1962.
8.Slade, A.E., H.O. McMahon, Proc. EJCC, pp. 115-120, 1956.
9.Noe, J.D., Curr. Res. Devel. Sci. Doc. 9, 1956.
10.Frei, E.H., J. Goldberg, IEEE Trans. EC-10, pp. 718-722, 1961.
11.Falkoff, A.D., J. ACM 9, pp. 488-511, 1962.
12.Webster’s New World Dictionary.
13.Foster, C.C., IEEE Trans. C-17, pp. 788, 1968.
14.Anderson, G.A., IEEE Trans. C-23, pp. 1317, 1974.
15.Aspinall, D., D.J. Kinniment, D.B.G. Edwards, Information Processing, North-Holland, Amsterdam 1969, pp. 800.
16.Motorola Semiconductor Products, Inc., In Developments in Large-Scale Integration, Engineering Edition, Phoenix, AZ, 1968.
17.Prangishvili, I.V., V.A. Dementujev, M.S. Sonin, Mikroelektron. 4, pp. 497, 1975.
18.Chang, H., Magnetic-Bubble Memory Technology, Dekker, New York, Basel, 1978.
19.Asratyan, R.E., Lysikov, V.T., Autom. Remote Control (USSR) 39, pp. 755, 1978.
20.Pronina, V.A., Chudin, A.A., Avtom. Telemekh. 8, pp. 106, 1975.
21.Yau, S.S. C.C. Yang, IEEE Trans. EC-15, pp. 944, 1966.
22.Singhania, A.K., Proc. IMACS (AICA)-GI-Symp. Parallel Comput. March 14-16, 1977, 5, pp. 1071, North Holland, Amsterdam, 1977.
23.Estrin, G., C.R. Viswanathan, J. ACM, Jan 1962, pp. 41.
24.Gunderson, D.C., WESCON Tech. Papers, Session 9, 1966.
25.Hayes, J.P., Univ. of Illinois, Comput. Lab. Rept. 227, June 1967.
26.Previte, J., E. Tippie, EMI-TM-67-1, Feb. 1967.
27.Orlando, V.A., P.B. Berra, Proc. AFIPS 1972 FJCC, pp. 859, 1972.
28.Popova, G.M., I.V. Prangishvili, Avtom. Telemekh. 1, pp. 171, 1972.
29.Wald, L.D., T.R. Armstrong, C.C. Huang, T.L. Saxton, RADC-TR-73-19 Final Tech. Rept., Feb. 1973.
30.Cheng, W.T., T.Y. Feng, Proc. 1974 Sagamore Comput. Conf. Parallel Processing, Aug. 20-23, 1974, IEEE, New York, pp. 53.
31.DARPA Neural Network Study, Final Report, TR-840, 1989.
The concept for the CAM was originally developed in 1983. Based on legal advice, no patent was filed. The CAM concept has been kept as a “trade secret”. An intense effort to find support for the idea was conducted from ’83 to ’87. I approached numerous U.S. government agencies. I contacted the top people at “all” the top players in the computer industry: IBM, Intel, Apple, HP, Dell, Oracle, Microsoft, Google etc. After that, every few years, I tried to find support world wide. I contacted the chambers of commerce in Japan and Taiwan. I contacted firms in Israel, UK, France and Germany. In each case I received a reply, but it was always, “no interest”.
I am currently open to discussions with anyone interested in this breakthrough concept. Please contact me: Nanook {{at}} A3society {{dot}} org