Content Addressable Memory
Introduction to CAM
[ Nanook is talking to George.]
“OK. Talk to me some more about the comparison between your brain ideas and computers. So you don’t think we should try to draw too many parallels between human memory and computer memory?”
“Right. We can learn some things by doing so, but it is more likely to distract us from the real fundamentals. Computers are binary. Brain chemistry can easily process with an equivalent of base 20 or higher. Computers use a single processor architecture with a numerically ordered and numbered memory. Brain chemistry is more likely to run as a massively parallel architecture with only short runs of sequential ordering. Computer scientists have understood this for a long time. In fact Vannever Bush discussed this during the development of the Whirlwind computer around 1945. They called the parallel memory CONTENT ADDRESSABLE MEMORY or associative memory. That is, if you wanted to find something in memory, you didn’t look up an address. All you would do is somehow signal to the entire memory what you were looking for, and those memory locations that had that information in them would magically wave and say, ‘here I am’. They tried to use this model but were never able to make it work. So they implemented the addressed method that we still have today, 20 years later. It’s become a major LEGACY limitation on the whole field of computers.”
“And, what would be the chief value of content addressable memory?”
“SPEED. Let’s say you could build a huge memory like the brain, with billions of words or patterns stored in it. With content addressing, you could search the whole memory in one step. The way we do it now, unless you go to a lot of effort to structure the data ahead of time, if the data is just random, then to find items, you have to look at every single memory location one at a time. So with the example I just gave you, content addressing is a billion times faster.”
“Wow! And you aren’t saying a BILLION just as an exaggeration.”
“Exactly.”
“You know how incredible this is, don’t you? Computer scientists kill themselves every day to get improvements in our technology of say a few percent. I’ve got to talk to Bill about this. Maybe he and I can figure out how to do it.”
“Bill’s a clever guy, that’s for sure. And he will know about this. You should do it.”
Another day
[Nanook is talking to Bill.]
“Right. Which brings up one of the subjects he and I spent a lot of time talking about. Did he ever tell you about his 3-brain theory?”
“Sure. Isn’t that amazing.”
“Yeah. I think so too. But while we were talking about that, I asked him if he thought we could use his human brain model for computers. Essentially, he said no. Well, actually, he didn’t say no. He said we currently don’t know HOW to do it. If we knew more about how the brain worked, then maybe we could duplicate it. But in addition to not knowing how brain neurons interact, we also don’t have computer structures that have come even close to achieving the same result. Am I making sense?”
“Yeah, but big deal. Everyone knows that.”
“Sure. I know that. But I’ve been thinking!”
“I’ve got to hear this!”
“George said that the brain appears to process information like a massively parallel computer. So far, so good?”
“Sure. Parallel processing can be done in a lot of ways. Neural nets, for example, attempt to mimic the brain. I’m sure you’ve heard of those. But I wouldn’t call them ‘massively’ parallel. Parallel I’ll buy. They haven’t reached the MASSIVE stage by a long shot. Also, neural nets mostly process data like a funnel. That is, they take a lot of data inputs and reduce them to a single result. This is good as a sensor where the single result tells some process to do something. But it’s a far cry from explaining the human brain.”
“OK. But then he mentioned something called CONTENT ADDRESSABLE MEMORY. Have you ever heard of that?”
“Sure. It’s one of the Holy Grails of computing. But no one has figured out how to make it work.”
“But, the way he presented it to me, I don’t think it would be so hard to figure out.”
“A lot of people, including Vannever Bush, Johnny Von Neumann and other computer geniuses tried to solve it. It’s a very subtle problem once you look into the details.”
“So, you have heard about it, then?”
“Sure. As I said, it is one of the greatest goals in computer architecture. But it’s been one of the greatest challenges as well. Especially because it seems so easy on the surface and then catches you in the details. A lot of people have been sucked into trying to figure it out.
Content-addressable memory or CAM for short, is a computer memory that stores data in its memory cells and can then find any piece of data using only the CONTENT of the cells. It doesn’t use memory addresses.
Computer system designers have long known that using CAM would be a lot more effective for some tasks than addressed memories. The applications are so obvious and the benefits so great that many memory circuits have been tried to implement parts of the CAM concept. Furthermore, the superiority of CAM for a large number of applications is so overwhelming that, despite the lack of any workable machines, many studies have been done to detail how such a machine would be used.”
“You said many circuits have already been built.”
“That’s correct. But they have all been for very small memory sizes. The reason is, all of the circuit concepts which have been implemented, while functionally workable, require a lot of additional hardware. The amount is proportional to an exponent of the memory size.”
“You mean, the additional circuitry might be the same size as the memory itself for a memory with 1,000 memory cells but twice as big for a memory with 2,000 cells, and 4 times as big for 4,000 cells etc.”
“No! Come on Nanook. What you described is just a linear relationship. Let’s say it takes 10 transistors to make a 2 cell memory. It would then take 20 transistors for a 3 cell memory; 40 for 4 cells; 80 for 5 cells etc. You’re already up to 1280 transistors for just a 9 cell memory.”
“Wow! So, the special circuitry needed for these concepts would be millions of times larger than the whole memory itself.”
“Correct. So, no known technique is practical for very large memories because the cost and space requirements are too large. I think the largest commercial device that has been built to date is only 4,096 memory elements long.* So, to quote professor Kohonen, who studied this field in depth:
- “For this reason, hardware CAM’s with more than a few thousand memory locations are not practicable, unless, SOME RADICALLY NEW PRINCIPLES ARE INVENTED.”
“You seem to know a lot about this.”
“Correct! I was one of the engineers that got suckered into it.”
“Well, if no one could figure out how to implement CAM, what did they do?”
“They developed the whole science of using software for sorting and searching. Most of the theory related to sorting and searching was already worked out by 1955. This included things like RADIX sorting and HASH CODING.”
Uses of CAM
Data search
“So, what kind of things would CAM memories be useful for?”
“One thing is general data searching. Let’s say data is obtained from a communications link and put into memory. A CAM can then search memory for anything in there.”
“OK. So far so good. But any data base program can do that.”
“Right. But now let me explain what makes CAM special.
Let’s say you have a huge collection of data, and it hasn’t been sorted. The only way to search it is one item at a time. So, for example, let’s say you have a collection of one million names, each with a phone number. But the entries are not in order – like you entered many phone books a piece at a time. To find every occurrence of ‘Mr. Jones” for example, the computer will have to look at every entry in memory, one at a time. That means, one million search operations. If you have a CAM memory, the approach is very different. The computer would place the word it was looking for, like ‘Jones’ in a register called the ‘search key’. It would then trigger the CAM memory. In just ONE search operation, every occurrence of ‘Jones’ in memory would be identified.”
“Hold on. Did you say ‘in just ONE search operation’?”
“Correct! So, with the previous example, the search is completed in just one search operation, rather than one million!”
“Wow! Now I get it. But don’t data base programs work much faster than one-by-one searching? I mean, I’ve read some about them. They do things like binary searches and stuff right?”
“Sure. But searches like a binary search rely on having STRUCTURED DATA to work with. To do a binary search, you first have to SORT the data into a uniform structure. For the example I just gave you, phone numbers are pretty similar. So a data base program would first sort the data alphabetically, by last name, and group all the other information related to the last name by categories or data FIELDS. So, when sorted into memory, the data might look like ‘Jones, Ralph, 904.222.3334’. The next entry might be ‘Jones, Steven, 904.277.4321’. In memory, the data is organized pretty much the same as in a phone book. The data might also be organized so that each record had 100 characters in it. That way, the last names would all start at addresses that were multiples of 100. So you would be sure to find a person’s last name at memory address 540,200, 540,300 etc.
To binary search for Ralph Jones, the program would jump to the middle of the data file. If the first try for the computer was Ralph Jones, the search is over. If not, it would compare what it found to Ralph Jones and determine if Ralph was farther ahead or behind. Let’s say it was farther ahead. It would then go forward half the distance to the end of the data and check again. It would repeat this process always cutting the distance in half until the name was found or the jump was only a single record, which means the name isn’t there. That’s why it’s called a binary search.”
“And repeatedly cutting the data in half quickly finds the result, right?”
“Correct. A million records will always take less than 20 searches.”
“Wow! That’s a lot faster than a million.”
“Correct. And that’s why data base programs are so valuable. BUT! To effectively use a data base, a number of factors are needed: the data must already have substantial structure; the program must know the structure ahead of time; and time must be available before searching to sort the data. Phone book entries are a perfect example of very structured data. They all have just people’s names, phone numbers and addresses. People searching this data will only be looking for those variables. And we are taking advantage of the fact that the structures we call numeric and alphabetical order have already been figured out and are pretty simple. But what if the data is something like a bunch of news papers which include graphics? How would you ever sort that?”
“I see. You wouldn’t know, ahead of time, even what categories to assign. I mean, if you grouped the entries by paragraph, they could be editorials about anything, jokes, advertisements, classified items, picture captions etcetera. “
CAM Benefits
“Correct. And that’s where a CAM memory shines. It doesn’t care. You just load it with data in the order that the data comes in. You can then search for anything – John Wayne, tomato soup, strange character strings like ##%%@@, or anything. It still only takes ONE search operation to find every occurrence of that search key in memory.
And there is another irony related to CAM compared to a one-by-one search process, which is called a LINEAR search, by the way. If the item you are looking for DOES NOT exist in the data, a CAM finds this out in one search operation, which is the MINIMUM possible search time. A linear search takes the MAXIMUM search time it the data is not present. So for a one million record memory, a NO FIND result takes CAM one operation, while a linear search takes one million operations.”
“Wow. I can see the significance of this. If you are looking for rare information, the chances are it isn’t going to be there. So, with a linear search, you’d take a lot of search time to repeatedly come up dry.”
“Correct. And that will discourage you from searching for misspellings and things like that which are rare. In addition, there are other significant benefits for CAM. Sorting data can take a lot of time. The less structure the data has, that is, the less repeat patterns it has, the harder it is to sort and the longer it takes. With CAM, no sorting is required. Just dump it into memory.
Another factor is memory use efficiency. Let’s say you are storing data grouped a paragraph at a time. Some paragraphs are long; other’s are short. So data blocks in memory are typically sized for the largest paragraphs. That is, a large paragraph might have 200 words in it. So, you divide up memory into blocks that are 200 words long.”
“Right. But there will always be many short paragraphs. That means a lot of the memory will be filled with blanks.”
“Correct. And that’s the problem that kills methods like Hash Coding, which requires an unacceptable amount of memory storage.
In summary, as the amount of data set becomes large and less structured and memory efficiency becomes more critical, software searching and sorting methods become complex and take a lot of time to execute. But CAM doesn’t work that way. Data can be squeezed together and the storage doesn’t have to be uniform. So you can get the maximum amount of data into memory with little problem and find it in the fastest possible time.”
“And you’re telling me, no one yet knows how to implement CAM in a practical way?”
“Correct. That’s what I’m telling you.”
OK . . . . . So, data searching is an ideal application for CAM. What else can it do?”
“Well, I have really only been talking about text searching. There are a lot of other forms of data.”
“Sure. Radar signals are data. Photographs are data. Speech is data. Music is data.”
“Correct. But CAM doesn’t care what kind of data it is. You just load the data into memory and search for patterns.”
“Wow. I already see the benefits of this. When you’re trying to match patterns in this kind of data, it’s very hard to get exact matches. There are always small variations. But with CAM, you can test a whole bunch of small variations very quickly because if one of the variations doesn’t match anywhere in the data, you learn that in one search operation. For the one million item case, you would take one million search operations before the linear search process reported it’s first ‘no-find’.
“Correct. As I said, the value of CAM is well known by computer scientists. In fact, a big program was recently finished at Cornell Aeronautical Lab where they built a machine called the PERCEPTRON that they hoped would lead to artificial intelligence. It was a huge parallel processing system based on neural networks. It used a simple, modified CAM approach for its memory. But it wasn’t configured for general use. It was designed to mimic the human ability to perceive sensory data. They did achieve a very important result. They were able to get the Perceptron to LEARN. Unfortunately, the capability of the machine is still too crude to do more than the most primitive things.”
CAM requirements
“I can see how easy it is to get pulled into this. Can you explain just a few more details for me. You said people have tried to figure this out. So where do we stand on all of this?”
“It’s actually quite simple in concept. To construct a hardware CAM, in principle, only three elements are needed:
1) There must exist a rapid method to broadcast a search pattern to all memory locations.
2) An individual comparison must be performed between the search pattern and what is at each location in memory.
3) All matching data and information associated with it must be collected and read out in succession.
“Huh???? That’s it?”
“Poor Nanook. I think you’re going to get sucked in to the CAM quicksand.”
“No. Come on. I can already see how to do this.”
“Poor Nanook. Order the casket. We’ll be burying you in about a week after you stop eating because you can’t stop thinking about this.”
“Not me Bill. I know how to eat over my text books. A-student and all that, right?”
“OK. Here goes. But don’t say I didn’t warn you.
So, how would you do element 1: broadcasting the search pattern rapidly to all memory locations?”
“That’s pretty straight forward, right? That capability already exists. All current memories use a set of wires called the Data Bus that is connected in parallel to all memory locations. Using the Data Bus, the central processor is able to send data to any location in memory in a single write cycle. If all memory locations were simultaneously enabled to accept this data, the whole memory could access the data, theoretically, as fast as any single location could.”
“Correct! That is a simple way to do this. And very little extra circuitry is needed. Do you see any problems with it?”
“ The only problem I can think of is power dissipation. Even if each memory address draws just a small amount of power to read the data, with so many addresses active at the same time, the power load could be pretty high.”
“Do you think that’s a killer?”
“No. I don’t think so. Let’s say it was too high for a single write cycle, the search could be split up into two or three or four cycles. The speed up is still in the millions range.”
“Good. Now how about element 2: performing a comparison at every memory location?”
“That also seems pretty straight forward. All you need is an XAND gate for each binary bit. Of course you need some logic to determine that all of the characters in the whole word match. But that still seems easy to do. I’ve seen a lot of circuits that can do comparisons.”
“Correct on that point as well. This part of the problem has also been worked out quite a bit. Designs have shown that it can be done with the addition of about the same amount of circuitry as the memory already has. So, if this were the end of the story, we’d know how to build a CAM and it would only have twice as much circuitry as a conventional memory no matter how large the memory was.”
“And that means it would only cost, maybe, 20% more, since most of the cost is in the external packaging.”
“Correct. So, we’re left with element 3: collecting the matching data.”
“That also seems pretty simple. Let’s say that each comparison circuit that has a match, sets a flag to indicate that a match has been found.”
“OK?”
“Then you just read out the addresses of the places where the flags are set.”
“OK? And how do you do that?”
“Well . . . . . . Hmmmm . . . . . .”
“The quicksand is now up to your knees. Here’s a rope.”
“Come on. Give me a second. . . . .”
. . . . . . . .
. . . . . . . .
“OK. Your second’s UP! Last chance. Here’s the rope.”
“I’ve got to think about this a little more. It does seem simple at first. But every idea that flashes in my head quickly runs into a snag. Like, if the processor went address by address and checked to see if a flag was set, that would be just like a linear search.”
“Correct. And this is the problem which has plagued CAM development since day one. Go ahead, think about it awhile. But don’t be too tough on yourself if you can’t come up with a solution. The best minds in the business have worked on this for years and have just about given up. I’ll give you some history, though, to keep you from thinking you’ve stumbled on to something when you’ve just stumbled on to some simple cases.
There are some reasonable circuits that could produce an address of a match IF there was only ONE match in the whole memory. At least reasonable in their design. They would also be too big in practice. But single match searches are pretty limited. So don’t make the mistake of solving the problem for only ONE match. You need to figure out how to locate any number of simultaneous matches. There could be two, or twenty, or twenty thousand. That’s the killer.
If someone reports you wandering aimlessly across the tundra, I’ll send George out to get you with a white coat.”
“I just can’t believe there isn’t a simple way to do this.”
“Nanook’s in the quicksand; Nanook’s in the quicksand.”
“OK! OK! Give me a break. I know how my brain works! I won’t be able to stop it from thinking about this until it finds a solution.”
“I know. That’s the way a lot of us think. That’s why I said you’re in quicksand.”
Nanook meets with Bill another day
” I know you aren’t going to believe it, but I think I found a way to make my Content Addressable Memory idea work.”
“Why am I NOT surprised? OK. I’m not going to say you haven’t. You’ve already surprised me too many times. So, let’s celebrate before we get to the truth. I’ve got some fresh éclairs that George gave me. Get us both some coffee and you can tell me how you’d do it.”
I got some cups and poured out some coffee which was already boiling. I told him my idea.
“NANOOK! This is unbelievable! YOU’RE A FRIGGIN GENIUS! Now I’m the one who has to think about it a little. This is an amazing discovery. For that, you are going to get 2 éclairs! In fact, I’m going to ask George to make you some of his famous ‘Northern Lights Cream Puffs!’”
“I’ve heard about those. I’m gunna die!”
“So, this is actually a pretty simple modification of existing circuitry. I still can’t believe it. And it would only take about as much new circuitry as the comparison circuits we talked about. Unbelievable. And I can envision a whole bunch of other important features it has.
The hardware required to implement the CAM is linearly related to memory size. That is, a CAM memory will always have about 2.5 times as much memory as a regular memory, no matter how big that memory is. Your concept is ‘stackable.’ That means the memory can be expanded by simply adding more memory modules.
The new CAM memory can be used either as CAM or conventional memory. That is, the CAM function can essentially be switched on and off.
The field size for grouping and handling data is irrelevant. That is, field lengths can be different throughout memory for every group of associated data. The length of each data item can be changed as well. This will make data base memory use very efficient. That cost savings alone will probably make up for all the extra circuitry needed.”
“These are all important issues, I guess?”
“VERY, especially from a product standpoint. And there is another very important item. The software needed to drive CAM searches is similar to and easily adapted to conventional languages. A CAM search, even though it truly runs as a massively parallel system, looks to software like a conventional single processor computer.”
“I guess I don’t understand the significance of this either.”
“I mentioned that a number of attempts have been made to build parallel computers. In fact, people have even considered just taking a bunch of small computers and hooking them together. There was one study, for example, called the ‘Thinking Machine’, where they investigated hooking 64,000 computers together. One of the biggest problems they ran into was figuring out how to write software to run all the computers at the same time. It seemed that every different application required completely new software to efficiently use all the computers and to get them to talk to each other and not just create new bottlenecks. So simply hooking 64,000 computers together became a software nightmare. Your approach solves this. It could easily be run using existing software. Of course, it would actually simplify existing software a lot because it wouldn’t require all the sorting software and searching algorithms.”
“You really believe this?”
“No doubt about it. You’ve really stumbled on to something.”
“So, should we patent the idea?”
“Absolutely NOT! This idea is too powerful. It is TOO DISRUPTIVE. The industry won’t know what to do with it. They will stall and stall. Do you remember what I told you about Chester Carlson?”
“You mean the guy who invented the Xerox machine?”
“Right! So, the best way to handle this is to keep your discovery a SECRET, until you find someone willing to help you develop and sell it.”
“Hmmm . . . I guess this approach follows the discouraging things both you and Ben told me before. It does make me think of a very ironic thing.”
“Yeah? What’s that?”
“You know the old saying, ‘you can’t take it with you’?”
“Sure.”
“Well, what happens if I die before I tell anyone how to do it?”
“I see your point. Tough break for the human race!”
“Yeah. So much for a government ‘by the people, for the people’.”
The CAM is discussed further from a technical standpoint in the invention section.