Breaking point

The quest for faster and smaller silicon chips has reached a stage where critical breakdown is not far away, says Christine Evans-Pughe
Click to follow
The Independent Online

Software bugs may crash our PCs and mobiles, viruses may corrupt our data, but at least we can trust the basic hardware behind all this technology: the silicon chip. Well, that's been the assumption so far. But it may be time for a re-think.

Software bugs may crash our PCs and mobiles, viruses may corrupt our data, but at least we can trust the basic hardware behind all this technology: the silicon chip. Well, that's been the assumption so far. But it may be time for a re-think.

About two years ago Lloyd Condra, a scientist at Boeing, and Joe Chapman, from the United States Department of Defence, started warning people about the risk of chips wearing out in essential electronic systems such as air traffic control, national security, telecommunications, banking, avionics, military and medical equipment. "This challenge is as grave as any since the beginning of the solid-state revolution 50 years ago," Dr Condra said at one military and aerospace conference.

His concern was about chips being built using nanometre-scale semiconductor processes. One nanometre (nm) is a billionth of a metre; present cutting-edge chips have 90nm processes. The heat produced by the electrical current passing through such circuitry could, he suggested, erode their essential metal connections or oxide insulation layers, and cause transistors to break down.

The plausible result: chips that don't calculate correctly and, at the larger level, computers that don't behave, or malfunction silently - and perhaps catastrophically. And because every major form of transport, from cars to airliners, relies on chips, such questions are increasingly important.

Today, nano-scale semiconductor processes are becoming mainstream. "The challenges are greater in avionics because we tend to use devices in more harsh environments. And we have longer lifetime requirements because our systems operate for longer times," explains Dr Condra. "It's hard to say how big the problem is or will become. We may start to notice some problems in some of these newer designs. Until now, there hasn't been an obvious issue," he adds.

But frequent flyers will be glad to hear that Boeing says it is being very careful about using these new chips for its production systems.

Silicon chip circuit elements are now one-fiftieth the size they were in the Seventies, when lifetime estimates were hundreds of years, and computers occupied entire rooms rather than your lap. Since then the semiconductor industry has followed Moore's Law - doubling the amount of circuitry it fits into an area every 18 months by scaling down the transistors and the wires that link them. Scaling cuts costs and means chips can run faster, but also makes them more fragile.

Until now, chip lifetimes have been in the decades but the latest scaling step down to 130nm and more recently 90nm has been tricky. The silicon dioxide layers that make up vital transistor structures are now just a few atoms across. And at the same time, all the linking wires have been changed from aluminium to copper, a material with very different failure characteristics.

Semiconductor firms really don't want to discuss chip wear-out. Only by oblique questioning do you get close to the truth: 130nm chips designed for mobile phones might have minimum expected lifetimes of three years, and those manufactured for PC chips five years.

Calculating a chip's lifetime has in the past been done by putting them under extremes of temperature and voltage to accelerate the ageing process - in which heat generated by the currents can move atoms around, thinning or thickening the vital transistor "gates". But there are doubts as to whether this method still reflects what is likely to occur in real operating conditions.

"You are no longer looking at bulk materials, you are looking at a collection of 'individuals' [atoms or molecules]," suggests Tim Saxe, the vice-president of engineering at QuickLogic, an American firm that makes general-purpose chips called programmable logic devices. "If one of those individuals is a little strange, the properties [of the device] are going to be determined by him and not the average. If his statistics are different from the average, then the rules for accelerating are no longer correct.

"If you increase the temperature by a factor of two, is that the equivalent of running it four times as long or 10 times as long or what?" he asks.

So far, no one has the answer. QuickLogic has held off from using 130nm technology, and is unusual in its readiness to talk openly about wear-out. Saxe has been working in the industry for more than 20 years and points out that the history of semiconductors has always been about overcoming problems and there are always new problems. "Typically, the commercial side of the business has its head in the sand on these things, and so it really takes a big disaster before someone will say, 'we need to change the way we do things'," he says. "We do care about reliability, but Condra is making an excellent point in that we only care about relatively short-term reliability. How much electronic equipment have you kept for more than five years?"

High-reliability systems such as those in aeroplanes represent only 1 per cent of chip sales, so the only incentive for semiconductor manufacturers to extend chip lifetimes is fear of lawyers should things break down and people get hurt. Semiconductor firms are now working with Boeing and others to devise strategies that guarantee the reliability of certain commercial chips.

"I guess they believe it is in their interest to keep us informed as to what's going on," comments Condra. "If there is a problem with some of our systems, they can be highly visible problems."

The companies who make chip-design software are also working on the problem. Guillaume d'Eyssautier, the european general manager of the firm Cadence, says: "If you're designing a water reservoir, you can design very thick walls everywhere to be sure it is safe, but it's really expensive to do this. You know where you need thicker walls and where you can put thinner walls. Similarly, we're looking at areas of the chip where you have to reinforce the transistors or re-route some of the power lines, according to what the chip's function is."

The good news is that chip wear-out is probably a gradual process, according to Guido Groeseneken, an expert in chip reliability at the IMEC research institute in Belgium. "People have done work on this and proved that some circuits can survive the odd breakdown. If a transistor stops working, it's not always catastrophic to the circuit operation. It depends on where it is located and how sensitive the circuit is to that transistor."

The odd bit of metal erosion isn't instantly catastrophic. The signal will usually be able to take another path. The bad news is that things may get worse before they get better. Production of chips built on the new 90nm nanometre process is underway, and requires a range of new materials to improve on and replace structures such as the now perilously thin silicon dioxide layers. "The long-term performance and the wear-out [of these new materials] is unexplored," says Groeseneken. With silicon and silicon dioxide we had a learning curve of 40 years. But for copper and all these new materials, it's in its infancy."

He suggests that in the meantime, for chips in cars, telecoms infrastructure and definitely aeroplanes we should stick to less advanced chips. "As industry builds up confidence in the new materials from failures in the field, they will learn from those what are the weak points. Then the materials will go into high reliability applications."