Failures have increasingly serious consequences as today’s electronics systems become smaller, more complex and more deeply embedded in our daily lives. System failures must be found and fixed before they can cause costly downtime, product recalls and reputational damage. This requires a comprehensive, multi-disciplinary approach to electronics system failure analysis that includes specialized tools and expertise.

System complexity is not only increasing at the board level, but at the IC, package and die level, as well. Process technology advances have taken us to devices containing billions of transistors, and many previously discrete components and independent subsystems are now being integrated. We also continue to see the rapid miniaturization of electronic components using FinFET, metal gate, low-k dielectric and other advanced process nodes. Additionally, we are using more complex packages including SIP, MCM, SiSub, stacked die, TSV and Cu wire, along with more complex package and board materials, as well as coatings and molding compounds.  Further complicating the challenge, the intermittent nature of many failures makes them extremely difficult to diagnose, regardless of their root cause. 

Today’s networking equipment offers a good example of the growing level of system complexity.  A networking system might contain thousands of components on each of multiple boards, including many complex ICs and SoCs, and a large mix of RF, power supply, high-speed digital and storage media,  all residing on a single system, and each requiring specialized domain knowledge. 

Automotive systems are similarly complex.  Certain vehicles contain as many as 100 electronic control units (ECUs), or more.  Each electronic system or device can include 50 to 100 microprocessors and more than 100 sensors.  Backup cameras and lane-change warning systems are already in widespread use, and automotive manufacturers are also looking at electronic assisted-driving and sensor-guided autopilot systems for tasks like navigating bumper-to-bumper traffic, driving through toll booths, recognizing speed limits and road signs, finding a space in a crowded garage, or squeezing into a tight parking spot.  These systems can encompass a dozen ultrasonic detectors, and multiple cameras and radar sensors.

With complexity comes a higher risk of failure, which increasingly has more expensive consequences.  It has been estimated that hardware failures are responsible for 72 percent of network downtime (source: “Understanding Network Failures in Data Centers:  Measurement, Analysis and Implications,” Microsoft and University of Toronto, 2011).  And the cost of an unplanned data center outage can reach $11,000 per minute for organizations that depend on service delivery, including telecom providers and e-commerce companies (source:  “Understanding the Cost of Data Center Downtime:  An Analysis of the Financial Impact of Infrastructure Vulnerability,” Ponemon Institute and Emerson Network Power, 2011).

Solving the problem
Options for comprehensive root cause failure analysis and resolution have typically included in-house testing teams that lack adequate experience and toolsets, or third-party services that focus on only part of the problem, with no defined methodology for dealing with system-level failure analysis and debug. 

In contrast, today’s specialized electronic system failure analysis service providers take a comprehensive, multidisciplinary approach that includes both electrical and physical analysis to enhance identification of the root cause, the associated failure mechanism, and how to prevent future failures.  The focus must be on the entire system, from electronics to materials, all the way down to failure mechanisms occurring at the IC transistor level.  Fig. 1 shows what is required in order to find, analyze and resolve electronic system failure mechanisms and their root causes.


Additionally, specialized expertise and equipment are required.  Expertise must extend from the component to the system level, with a highly trained staff that has a proven track record conducting the full range of failure analysis investigations from design through production and field returns (see Fig. 2).

Equipment is another key piece of the puzzle.  Choosing a provider that has a large and comprehensive set of equipment is critical in order to ensure the right solution for the problem, and to perform parallel processing of large projects with the ability to scale as scope and demand fluctuate.  There also is the requirement for system redundancy, and for highly specialized equipment such as advanced high-resolution microscopy imaging systems (SEM, TEM, and dual-beam FIB) that facilitate analysis down to the component level.  Additionally, it is critical to be able to characterize failures using tools such as laser timing probing that supports real-time, no-loading, non-contact signal waveform acquisition.  The ability to localize failures down to a single device also requires nano-probing capabilities for advanced process nodes below 28nm, along with specialized software tools that enable the measurement of any feature of interest on TEM images.

Once the right expertise and tools are in place, optimal analysis requires a comprehensive methodology and plan that spans the full range of electrical and physical failure analysis steps. The process starts with a definition of the electrical failure signature and ends with identification of the failure mechanism and resolution of the problem.

Customization is also important.  Every situation, customer, product, and failure mechanism has its own specific characteristics and issues.  There is no “one size fits all” approach.  Failure identification, analysis and resolution require a methodical approach that starts with asking the right questions up front and then customizing/designing the workflow.  Once the workflow is identified, the solution can be quickly executed. 

Electronic system failure is becoming increasingly expensive.  At the same time, the process of finding and fixing these failures has grown in difficulty with the trend to smaller, more complex systems that are built using exotic materials and advanced technology processes.   Failures have also become more intermittent in nature, and yet the stakes have never been higher to quickly find and fix them before they can cause costly downtime, recalls and reputational damage.  This requires a comprehensive, multidisciplinary electronic system failure analysis methodology and workflow that considers all possible root causes from the component to system level, while leveraging extensive, specialized expertise and a variety of advanced equipment and toolsets.