海外优秀资讯抢先看9:世界著名软件缺陷灾难性案例详解之爱国者导弹自摆乌龙事件

Patriot Missile Failure

爱国者导弹的挫败

On February 25, 1991, during the Gulf War, an American Patriot Missile battery in Dharan, Saudi Arabia, failed to intercept an incoming Iraqi Scud missile. The Scud struck an American Army barracks and killed 28 soldiers. A report of the General Accounting office, GAO/IMTEC-92-26, entitled Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia reported on the cause of the failure. It turns out that the cause was an inaccurate calculation of the time since boot due to computer arithmetic errors. Specifically, the time in tenths of second as measured by the system’s internal clock was multiplied by 1/10 to produce the time in seconds. This calculation was performed using a 24 bit fixed point register. In particular, the value 1/10, which has a non-terminating binary expansion, was chopped at 24 bits after the radix point. The small chopping error, when multiplied by the large number giving the time in tenths of a second, lead to a significant error. Indeed, the Patriot battery had been up around 100 hours, and an easy calculation shows that the resulting time error due to the magnified chopping error was about 0.34 seconds. (The number 1/10 equals 1/24+1/25+1/28+1/29+1/212+1/213+…. In other words, the binary expansion of 1/10 is 0.0001100110011001100110011001100…. Now the 24 bit register in the Patriot stored instead 0.00011001100110011001100 introducing an error of 0.0000000000000000000000011001100… binary, or about 0.000000095 decimal. Multiplying by the number of tenths of a second in 100 hours gives 0.000000095×100×60×60×10=0.34.) A Scud travels at about 1,676 meters per second, and so travels more than half a kilometer in this time. This was far enough that the incoming Scud was outside the “range gate” that the Patriot tracked. Ironically, the fact that the bad time calculation had been improved in some parts of the code, but not all, contributed to the problem, since it meant that the inaccuracies did not cancel.

在1991年2月25号海湾战争期间,一枚美国的爱国者导弹因为基于内部时钟的时间计算缺陷,不能够在沙特阿拉伯的达兰成功拦截伊拉克发射过来的一枚飞毛腿导弹。该飞毛腿导弹击中了该地的一个美军军营并导致28个士兵阵亡。美国审计总署提供的GAO/IMTEC-92-26号报告文件描述了该拦截失败的原因,其标题为:“爱国者导弹防御:软件缺陷导致防御系统在沙特阿拉伯达兰的拦截失败”。原来原因是因为从系统启动时开始计算的不够精确的时间运算导致的错误。明确的说,就是爱国者导弹防御系统时间的衡量计算是基于系统时钟时间乘以1/10所得到的秒数来进行表示的(天地会珠海分舵注:本人认为可以理解为,一旦你调用一个获取系统时间的API如getTimeInSeconds(),系统自动就会把系统时钟的时间乘以1/10来进行返回,所以爱国者导弹系统的编码人员在获得该系统提供的时间后还需要乘以10才能获得真正的秒数)。且这个自动乘以1/10的运算是使用一个24位的定点寄存器来进行的。因为大家都知道计算机上面的数字都是以二进制来表示的,所以十进制的1/10用二进制来表示的话,学过进制转换的朋友应该会清楚二进制表示的1/10会产生无尽循环(天地会珠海分舵注:因为十进制小数需要用二进制表示的话是通过一到多个2的n次方分之一相加组合而成的,而0.1是没有办法用有限个2的n次方分之1相加而获得的),而24位的寄存器只能存放24位的有效位,其余小数精度部分会被砍掉。但是这个看似微不足道的缺少掉被砍掉的24位之后的精度的数值在乘以一个很大的数值(在这里就是系统启动之后的总秒数乘以1/10的所返回的系统时间)所获得的数字所产生的结果将会有一个巨大的偏差。事实上,当时该爱国者导弹系统的电池已经启动了100个小时,这样下面一个很简单的计算公式就能计算出该砍掉的精度所导致的时间偏差会达到0.34秒之多。(十进制的1/10转换成二进制将会是1/24+1/25+1/28+1/29+1/212+1/213+….的一个无尽循环,换句话说,表示成二进制小数的无尽循环将会是:0.0001100110011001100110011001100….,而现在24位寄存器存放的数字将会是0.00011001100110011001100,这相比真实的大小将会引进来24位之后的用二进制小数表示 0.0000000000000000000000011001100… 的误差,也就相当于10进制的0.000000095的误差。那么把这个数值乘以系统返回来的系统时间再乘以10就是计算得到的当前系统以秒数表示的时间所产生的误差值:0.000000095×100×60×60×10=0.34)一个飞毛腿导弹飞行的速度大概是1,676米每秒,所以在0.34秒的误差时间内针对飞毛腿导弹就会产生超过半公里的误差。这个距离已经足够让正在飞来的飞毛腿导弹跨出爱国者导弹系统进行导弹跟踪的有效”距离门”(天地会珠海分舵注:本人认为如果把它解析成“目标位置”或许会更好的帮助你进行理解)的范围之外了。具有讽刺意味的是,该时间误差导致的问题在代码的某些部分是有进行修复的,也就是代表有人已经意识到这个错误的,但问题是当时并没有把相关的所有问题的代码进行修复,也就是说该时间精度的问题是依然存在该系统之中对该灾难性时间做着“贡献”的。


The following paragraph is excerpted from the GAO report.

以下是对该GOF报告的部分引用:

The range gate’s prediction of where the Scud will next appear is a function of the Scud’s known velocity and the time of the last radar detection. Velocity is a real number that can be expressed as a whole number and a decimal (e.g., 3750.2563…miles per hour). Time is kept continuously by the system’s internal clock in tenths of seconds but is expressed as an integer or whole number (e.g., 32, 33, 34…). The longer the system has been running, the larger the number representing time. To predict where the Scud will next appear, both time and velocity must be expressed as real numbers. Because of the way the Patriot computer performs its calculations and the fact that its registers are only 24 bits long, the conversion of time from an integer to a real number cannot be any more precise than 24 bits. This conversion results in a loss of precision causing a less accurate time calculation. The effect of this inaccuracy on the range gate’s calculation is directly proportional to the target’s velocity and the length of the the system has been running. Consequently, performing the conversion after the Patriot has been running continuously for extended periods causes the range gate to shift away from the center of the target, making it less likely that the target, in this case a Scud, will be successfully intercepted.

“距离门”(天地会珠海分舵注:本人认为如果把它解析成“目标位置”或许会更好的帮助你进行理解)预测一个飞毛腿导弹下一次将会在哪里出现是通过一个函数来实现的,该函数接受的是两个参数:飞毛腿导弹的速度和雷达在上一次侦测到该导弹的时间。其中速度可以表示为一个整数和一个小数(比如,3750.2563…英里每小时)。而其中的时间是在爱国者反导弹系统中由内部时钟以总秒数的1/10的方式不停的刷新保存起来的,且该保存形式将会是以整形或者整数的方式呈现出来的(比如,33, 34…)。系统运行时间越长,表示时间的数字就会越大。为了预测一个飞毛腿导弹下一次将会在哪里出现,两个参数速度和时间都必须以实数的方式进行表示。因为爱国者导弹计算机系统本身的计算方法以及时间运算过程中用来存储时间的寄存器的大小只有24位,所以一个时间数字转换成对应的实数后获得的精度将不会再高于24位能表示的范围了。这个转换的结果将会导致精度的丢失。丢失的精度引发的后果在“距离门”对预测飞毛腿导弹下一次出现位置的预测的计算中就可以直接通过被检测的目标(飞毛腿导弹)的飞行速度和系统当前已经运行的时间长度体现出来了。后果就是,在该爱国者导弹系统已经不停的运行很长一段时间之后对时间进行转换将会导致“距离门”相对飞毛腿导弹这个目标的真实的中心位置产生偏移,这就让系统对该目标(在这种情况下指的就是飞毛腿导弹)进行拦截成为不大可能实现的事情了。