FANDOM


3D/TSV is the next evolution beyond SiP. There has been significant work over the last two years from both academics (research) and industry (standards and working models/test chips) to identify and resolve challenges to testing 3D/TSV devices. In the medium to long term, as TSV-based die stacking becomes more prevalent and more complex/exotic die stacks appear, test challenges will also become more difficult.  It is certain that new and additional Design-For-Test features will be needed to mitigate increased tester resource and time requirements, as well as increased test complexity, due to large numbers of different die in the same package.  This section will address six key test challenges based on the evolution of 3D from SiP through complex die stacks:  test flows, cost and resources; test access; heterogeneous die in a single stack/package; debug and diagnosis of failing stacks/die; DfX (Design for Test, Yield and Cost) and power.  It is important to note that 3D/TSV is not yet a mainstream technology, and because of that, it is difficult at this time to make any predictions regarding 3D/TSV test flows.  Currently there are two “adjacent” technologies:  2.5D and memory die stacks (Wide I/O, High Bandwidth Memory and Hybrid Memory Cube). Both of these technologies will yield insights to requirements and challenges associated with 3D/TSV.  The best that can be gleaned from these technologies at this time is that reliance on BIST and boundary-scan based technologies, and use of fault tolerance with simple configurations tends to produce relatively high yields at the stack level.  As these adjacent technologies become more mature and as more 3D/TSV applications emerge, more and better data will enable better predictions and decision making with respect to 3D/TSV test processes.

== 1.1     Test Flows, Cost, Resources ==

At a high level, the 3D test flow is comprised of four test process steps:  (1) pre-bond test – testing the individual die prior to integration into the stack (2) mid-bond test – testing the partially constructed die stack (3) post-bond test – testing the complete die stack assembly (4) final test – testing the packaged assembly.  Mid-bond test and Post-bond test are new to the “traditional” test flow.  Addition of these test steps will need to consider cost, process complexity and potential for damage with respect to quality of the 3D stack.   New defect/fault models will be required to account for new process steps related to 3D, including wafer thinning and die stack assembly.  Defect/fault models for TSVs will also be critical to help define TSV test requirements and processes.  Mid-bond testing may not be considered based on cost and complexity.

Modeling of test flows for stacked die is already being developed by both academia and industry.  Much of this modeling is related to optimizing overall test time; however, additional modeling to optimize resource utilization, cost and yield will also need to be considered.  A “cost-weighted yield” model (i.e. which process/test steps have the highest impact on product cost) can incorporate both assembly and test flows to determine the optimal test flow with respect to the stacking process, as well as die level test and yield requirements.  More work is needed in “Design for Stack Yield”.  Redundancy (for die, logic, memory and TSVs) is certainly more attainable in a 3D configuration.  Such redundancy may be able to increase pre, mid and post-bond yields by “tolerating” a certain level of defects in either the die or the stack. Several proposals for TSV and die level redundancy and repair have already been published since the last update of the roadmap.

Test costs can be influenced by several factors:  test resource requirements, overall test time, and cost-weighted yield.  Mid and post-bond testing can add costs beyond the obvious test time increase.  Most notably:  failure of a single die will most likely compromise the entire stack;  inter-die testing may add significant time and complexity; stacked die test resource requirements will be driven by the aggregation of all of the individual die requirements, increasing tester resource requirements and cost; huge test data volume will result from testing multiple die in parallel; data transfer/storage/security/integrity issues will be a consequence of information sharing between stack integrators and die providers.   

For mid and post bond testing, the proposed IEEE P1687 (IJTAG) standard may simplify die to die functional test generation on ATE.  In addition, boundary scan testing of die-to-die interconnects should be relatively straightforward.  However, tools will need to be developed in order to simplify test program generation for partial and/or full die stacks. 

Responsibility for some test process steps may also change in a die stack relationship.  Reliability testing is one example where responsibility (die provider or stack integrator) is not yet clear.  Pre-bond, Die-level, active burn-in (versus passive burn-in, i.e. simply baking) will be difficult given very limited access to the die.  However, if burn-in is done post bond there is a risk of significant yield loss.  At this point there is no clear direction on where and how burn-in is to be done (if done at all).  A significant technology advancement will be required to make burn-in more practical at either the die or stack level, or to eliminate the requirement for burn-in altogether. 

1.2     DfX Edit

Design-for-Test/Debug/Yield (DfX) has been a prominent part of each of the previous sections.  These embedded resources may enhance controllability, observability and defect/fault tolerance.  DfX at the die level may include: a standardized access protocol to the die; built-in test features to enable die level (ATE) test capability for die in the stack (comprehensive logic BIST, memory BIST or “compressed and stored” ATPG vectors); interconnect test capability for all I/Os (boundary-scan and or at speed loopback testing); built-in debug and monitoring features to isolate defects in the stack (including potential capability to measure/monitor TSV continuity and performance); some level of fault tolerance/repair to enable higher yields in the stack (this may include the ability to “partition” given die in the stack).  Built-in test and debug features mentioned above will become more prevalent for die in a stack.  Partitioning logic or even the die itself can facilitate parallel testing at the die level and may be used to “decommission” logic on the die or a die on the stack (given some level of fault tolerance designed into the die). 

DfX at the stack level will evolve over time as knowledge is acquired about defect opportunities during the die stacking process (i.e. back-grinding, wafer-thinning, and laser-drilling) and fault models are developed to accommodate those defects.  Stack level integrators may be able to utilize interposers for stack level DfX features.  Interposer based DfX features may assist with die to die testing and parallel testing of multiple die.  Built-in interposer based test features may also help to reduce the requirement for tester resources driven by heterogeneous die on the stack.  Note that early on, fault tolerance may be more important to die yields than DfX.

During the test/debug/yield-analysis process, it is imperative that tests may be conducted on one die alone; with multiple die simultaneously; and even with multiple die interacting with each other. Therefore, the access mechanism must provide two connectivity schemes on a per-die basis:  1) probe pads for bare-die test; and 2) interconnected and scalable TSV connections after stacking.  These schemes must accommodate the access requirements mentioned in the Test Access section below.

== 1.3     Test Access ==

This section addresses two significant test challenges for 3D/TSV configurations:  pre-bond access (probe access to the die) and mid/post-bond access - access to a single die in the stack.  “Access” can be defined as:  access to the test logic on the die, and/or access to the die I/O (especially the TSVs).  The quality of the die stack is based on the premise of “known good die”.  That is, all die in the stack have been “fully” tested.  Without full access to the logic on the die, test coverage is reduced and the possibility of defects escaping to the stack increases, risking quality and yield.  In this case we would refer to the die as “not known bad” (which also infers that the die is “not known good”).  The following paragraphs address test access challenges with respect to pre-bond, mid-bond and post-bond testing, including making conscious decisions to move away from “KGD”.

The bottom die in a stack (where the “external I/Os” reside) will typically have probe-able pads for wire-bonding or flip-chip bumps. All other dies (middle and top dies) will typically be connected (pwr/gnd, clocks, control, data) through TSVs. A “typical” TSV configuration may be 5mm diameter at 10mm minimum pitch.  However, in many cases, TSVs are not directly bonded, but equipped with micro-bumps. Micro-bumps may be 25mm diameter at 40mm pitch. 

Current probing technology has not been demonstrated to reliably contact either TSVs or micro-bumps in a “production” environment.  There are three issues that impact the ability to reliably contact the die:  diameter and pitch of the vias/micro-bumps are too small for current probing technologies; current probing techniques may damage the via/micro-bump (any damage may be too much); potential for ESD damage to the logic on the die.  The challenge will be to develop probe technology that can reliably probe on the micro-bumps (not only individually but in “arrays” of probe/contact points).  This would require advances in: probe contacting, configuration of “contact arrays”, metallurgies, tip cleaning recipes, minimizing damage from the probes.  Contactless probing might play a role in this domain. This technique is being examined primarily by academics.  Its main benefit is that it does not inflict probe damage. However, it still cannot probe the required sizes/pitches, and power/ground still needs to go through traditional needles.  There are promising techniques that have been demonstrated, however, it may take 1 to 2 years for these techniques to be capable of supporting production level volumes.

Probing on 40mm pitch micro-bumps may be feasible in the short term, for limited array sizes. As long as the probe-technology is not up-to-par with the micro-bump/TSV sizes/pitches, we will need additional dedicated probe pads to enable sufficient probing and/or better Design-for-Test features that can access logic without access to the TSV. “Sacrificial pads” may help to mitigate some access issues but will not solve all issues.  These pads also come at a price, so ROI needs to be considered.  Unfortunately, as micro-bump technology scales down quickly, it will be very challenging for probe technology to keep up with access requirements. 

A standardized, test access protocol (both physical and logical) is critical for die-in-stack testing.  Test signals will need to be routed vertically (from die to die) and horizontally (within a single die).  Without a standardized protocol, routing of test signals, vertically through the stack may be difficult due to test and functional signal density. Programming test features may also be confusing due to potentially different test protocols for individual die in the stack.  Currently, test data routing through the stack may be defined by one of two scenarios: 1) the development organization drives the design of all die and all TSVs are physically lined up (test, verification, debug and other requirements are designed and implemented at the stack level as one design effort); and 2) off the shelf die will require either standardized “test access areas”, or interposers to support re-routing of test signals to neighboring die in the stack. 

Four basic access functions are required for stack-level test access:  1) the ability to provide access to on-die DFx features; 2) the ability to provide a bypass function for skipping-over a die; 3) the ability to provide a turn-around function for terminating the access function at the die; and 4) the ability to provide access to the next die above the current die.  JEDEC has already defined some test/access capability in the Wide I/O specification for stackable mobile memories and the High Bandwidth Memory initiative.  The Hybrid Memory Cube consortium is also working on a standard that includes some test access capabilities. The IEEE P1838 Working Group is also defining a standardized electrical access (and potentially physical) and test protocol for 3D die stacks.  All of the test access mechanisms described above are based on the IEEE 1149.1 standard to some extent.  The IEEE P1838 standard is primarily focused on test access while the other initiatives are only considering test access as a small part of the overall standard.  For each of these standards, it is contingent on the Working Group to try to release the standard before the stack complexity increases to a point that test access to the die will be a significant challenge. 

There will obviously be tester implications based both on the number and location of the test signals, and the protocol to “address” individual die in the stack and to access specific test features on the die.  Ideally, the access mechanism should “assemble” itself as the die are stacked – even if the die come from different fabs and are made in different processes. An access mechanism should include a port on the base die, physical TSV definitions, and a communication protocol or control structure to talk to all of the per die DfX in a stack. The access mechanism must allow test of the die before they are stacked (pre-bond test) and must allow test after the die are in a complete or partial stack (mid/post-bond test).

== 1.4     Heterogeneous Die ==

Heterogeneous Die can have several meanings ranging from different functions (including memory, logic, analog and high speed optics/photonics) to different die providers.  In any case, the evolution to complex, heterogeneous die stacks will have significant ramifications on test, test access, test application and test accountability.  Some of these ramifications have already been discussed in the previous sections.  However, the implications of testing die-to-die interactions go well beyond what was described earlier.  A stacked die can be analogous to a Printed Circuit Assembly (PCA) from a test perspective.  Testing of the 3D stack must account for potential die level, pre-bond test escapes based on:  untested, functional die to die interactions; power and signal integrity in the stack (compared to die level testing); yet to be discovered defects/faults based on assembly and interconnect processes (wafer thinning may be a good example) which could expose and/or exacerbate die level defects.  One area that will be improved over the PCA environment will be die-to-die latencies.  Chip-to-chip latencies were a significant contributor to test escapes.  Still, test escapes to the stack will be prevalent.   Irrespective of the test time/cost perspective, generating a comprehensive, full stack, functional test can be anywhere between impractical to unperceivable.  Depending on when testing occurs (mid-bond “stack and test” versus post-bond “assemble and test”) it is possible that there may need to be several versions of functional tests to account for the different variations of die on the stack. 

Similar to SiP, comprehensive testing can be accomplished by a combination of Built-in Self-Test (BIST), judicious use of existing and upcoming test standards, such as:  IEEE 1149.1 - 2013, IEEE 1500, IEEE P1687, IEEE P1838 and limited functional testing.  However, this will require a significant amount of coordination between die providers, stack integrators and the architect/designer.  It is imperative that Built-in Self-Test be used extensively, both to test logic on the die and to test die-to-die interactions.

Traceability at the die level will be significant as the number of heterogeneous die on the stack increases.  Data sharing between the die providers and the stack integrator will be critical to maintain quality levels of the stack as well as the individual die.  Access to die IDs needs to be standardized in some way – either through a standard access protocol or through a standardized description language.  In addition, data sharing and analysis tools will need to evolve in order to accommodate data driven process control that extends beyond the die provider to the stack integrator.  It should be noted that test data storage requirements will increase dramatically in such a situation. 

Ultimately, the testing of heterogeneous die will be a mixture of “component test”, where a die is tested (in the stack) against fault models, parametric requirements and yield criteria to ensure the die is “known good”; and “board test” where a die is tested as a part of an integrated stack, covering interconnect, and interaction properties.

1.5     Debug-Diagnosis Edit

Similar to a PCA test environment, the major challenges with debug and diagnosis will be the ability to correlate failures at the stack level to defects/faults on the die.  Functional tests at the stack level (mid-bond, post-bond and final test) will be very difficult to debug, especially when the failing die is in the middle of the stack (allowing for little to no “debug” access).  The problem is compounded by the implication that it will be virtually impossible to remove a die from the stack without significant damage.  Failure analysis of “systemic” defects will be costly, time consuming and ineffective unless adequate test/debug/FA resources are available at stack level test.  Diagnosis may also be impaired by “environment factors” (thermal and power integrity) and an inability to identify potential TSV defects pre and post assembly.  Note, it is imperative that stack level testing be able to discriminate die level defects/failures from failures due to the assembly process. 

Significant integration of built-in test and debug features will be required at both the die level and the stack level.  These features could include: built in logic analyzers/state capture, oscilloscopes, temperature and power monitors, droop detectors.  Data from these debug features should be logged by the stack integrator and provided to the die provider along with the failing, stack level test.  Moreover, significant advancements are also needed in Built-in Test and Debug technologies themselves.  Areas such as Analog BIST and Functional BIST, as well as current logic and memory BIST require significant breakthroughs in order to facilitate test and debug capabilities in a limited access environment. Use of Built-in Test and Debug features may allow the die provider to recreate both the test environment and the failure at the stack level.  Socketing/fixturing technologies will need to be significantly enhanced to allow for partial/full stack configuration on ATE.  “Test interposers” may also be used to help identify/debug systemic, assembly-induced defects.  Data-driven debug/diagnosis techniques may be another alternative over time. 

1.6     Power Edit

Given that test power requirements can be greater than operational power requirements, power could be a significant test challenge.  Power issues may occur at the power domain level, the die level or the stack level.  Die and stack level power distribution requirements must be aware of test power consumption.  Judicious power monitoring and droop detection throughout the stack will be imperative to guarantee test integrity. Power-aware testing at the die level and the stack level is something that will need to be considered and refined as test power requirements for the stack grow in the future. 

High power levels may also contribute to thermal issues.  Thermal issues could impact fixture design and performance.  In addition, thermal variations may impact not only performance but also the integrity of the die stack itself. There will be a need to develop a “thermally induced, inter-die fault model” to describe the impact of thermal variations across the die and the stack.  Thermal modeling and on-chip thermal monitoring may help to identify potential thermal issues in the stack. Validation of individual die and die in the stack, as well as pre-production testing should also consider guard-banding for potential thermal variation. 

1.7     Conclusion Edit

Significant evolution is required in test equipment, tools, EDA and methodology to address the challenges posed by 3D die integration.  As is typical with test, the pace and scope of technology development in 3D die integration will drive test requirements and technology development.  Challenges imposed by the manufacturing, assembly and packaging processes will also drive test technology and processes.