Sunday, May 20, 2007

Nuke Plant Data Storm

Instapundit links to a story about a Nuke Plant going offline due to a data storm.

The device responsible for flooding the network with data appears to be a programmable logic controller (PLC) connected to the plant's Ethernet network, according to an NRC information notice on the incident. The PLC controlled Unit 3's condensate demineralizer -- essentially a water softener for nuclear plants. The flood of data spewed out by the malfunctioning controller caused the variable frequency drive (VFD) controllers for the recirculation pumps to hang.
It is what you get when you use a non-deterministic (crash) protocol like Ethernet instead of a time division protocol like MilStd 1553 or an arbitrated protocol like CAN Bus. The fundamental problem is Einstenian. What is simultaneous when signals only travel at the speed of light? Unless you provide each unit on the bus with its own time slot (1553) or arbitrate addresses as they go down the bus (CAN) you will have problems when two transmitters try to start at the same time (which assumes absolute time at a certain level - a problem that need ony concern engineers)

Crash buses are not allowed in critical systems in aircraft design.

In a nuke plant all systems are critical. Three Mile Island started with a valve malfunction and a burnt out lightbulb in a relatively non-critical part of the plant.

So why don't people stick with the more deterministic buses? There is a lot of design and documentation overhead with such an approach. Every time a new element is added to the bus the bus control software must be at minimum inspected and at most totally reconfigured. In addition the peak data handling capacity of such busses is not as good as Ethernet especially over longer distances. The alternative of course is to continue on with the plug and pray approach. I might note that all wireless busses are essentially crash busses. They will not help much.

BTW I have nuke plant operational experience (US Navy) and aircraft electrical systems design experience (Sundstrand Aerospace).

2 comments:

linearthinker said...

What's the fuss? The reactor went off line when the Ethernet went tits up, didn't it? Without Congress ordering it? Shit happens.

Newsflash: politicians, especially those test-driving new buzzwords that they don't know the definitions of, are buffoons.

Wish I'd said it first. Stolen from a comment by Uncle Pinky at JOM.

Anonymous said...

I've got more than a little experience here, having programmed low-level CAN (a deterministic bus) and high-level ethernet (non-deterministic, natch) drivers for motion-control and traction drives. I've never really liked ethernet for industrial applications, for precisely the reason highlighted here. In order to guarantee message delivery, the bus capacity has to be an order of magnitude higher than the intended usage. This works great, is cheap, and is effective, until one of the devices fails in the "motor-mouth" mode and raises the usage levels to near bus capacity.

With good design, this happens seldom enough that for many situations, industrial ethernet works just fine. Meaning the relatively infrequent and inexpensive failures do not outweigh the low first cost of adding ethernet-enabled industrial components.

However, for some applications, failures are VERY expensive (say, taking a major power plant off-line) and you really don't want to see it happen even once. I have been suspicious that the true costs have not been considered in switching to ethernet. I'm not sure if this incident proves my point, but it certainly illustrates my point.