I have been following this incident since October 2018. Reading and commenting on articles in New York Times and Wall Street Journal using Twitter thread to sort of catalog my findings.
FYI @BoeingAirplanes @Boeing @FAASafetyBrief @FAANews @NTSB @NTSB_Newsroom the anti stall autonomous algorithm needs to be turned while investigation is in progress and protocol changes are being put in place. stall avoidance system created bigger problem – unintended steep dive.
— Shaker Cherukuri (@ProcessISInc) November 9, 2018
The Flight Data Recorder was found back in November which enabled the investigators to piece together what may have happened to this aircraft.
#LionAir Must have added this after #AF447 unfortunately an example of fix being worse than the problem it was trying to fix – stalling
Boeing held back information on 737 that crashed in Indonesia, according to safety experts and others. https://t.co/mbFTRvHFol via @WSJ
— Shaker Cherukuri (@ProcessISInc) November 13, 2018
It appears that a safety feature designed to prevent a stall condition seen in Air France flight 447 (an Airbus A330) crash created an unintended consequence of deep dive due to the perceived failure of the similar sensor as in the AF447 incident (pitot tubes) albeit a different failure mode! A functional safety failure.
Boeing may have finally realized that the algorithm design might be flawed and appears to be in the process of making design changes.
Finally, a software fix to prevent false AOA to trigger MACS. Will be tricky to calibrate.
— Shaker Cherukuri (@ProcessISInc) November 28, 2018
The cockpit voice recorder has been found now as per news reports (January 14, 2019). The WSJ article on this is reporting that there might be an issue with calibration update in the field.
— Shaker Cherukuri (@ProcessISInc) January 14, 2019
My thoughts on this calibration issue (further elaboration of my Comment posted on WSJ article):
Field calibrations (or Service Trims as they are called sometime) can be changed in the field using service tools used by trained technicians. However, it does require a protocol, training, certification for techs etc. Usually all these are developed by engineers.
There was something wrong with one of the sensor inputs here. Replacing the sensor did not fix the issue. This happened several times. So most likely it wasn’t the sensor at fault. It appears to be a 20 degree offset.
Could be a mounting, wiring, signal conditioning instead of a sensor to sensor variation. Calibration fix while possible, a field technician would be not be capable of doing this. Unless of’course this was known issue, a protocol has been developed for it, it was in place, technicians were trained for it etc.
All this does not preclude the fact that this was just a poorly designed safety feature for the following reason:
1) Why would the MCAS algorithm decide to act on flawed sensor data? Especially when there is discrepancy between the two sensor inputs.
2) Why would the MCAS algorithm automatically put the aircraft into steep dive and then ignore pilot attempts to pull the nose up?
3) Why does it require the pilot to turn off the feature by manually disabling the control system?
The algorithm doesn’t need the AOA sensors don’t agree light though. It knows the two values. It should have known that sensor inputs are flawed and needed more data before acting on it. Will add this tweet to my blog post on this. Good scoop! https://t.co/dkostNKpY3
— Shaker Cherukuri (@ProcessISInc) January 15, 2019
Also, questions have been raised about pilot training and training manuals.
How do you write a training manual for a flawed design which may not be an intended feature?
You can’t. Which explains why there wasn’t a training protocol for this.
I hear Boeing is making design changes to the MCAS algorithm which once released would require a software update to the ECU in question.
Needless to say before making any such changes a comprehensive functional safety review of the entire system is warranted using methodologies like Failure Modes Effects and Diagnostic Analysis.
Concerns about software changes are well warranted. The MCAS algorithm needs to be turned off while the broader ramifications of changes are being discoursed. Details on my blog about this. https://t.co/dkostNKpY3
— Shaker Cherukuri (@ProcessISInc) February 12, 2019