Monday, May 21, 2012

ST19XL18P - K5F0A Teardown

4 Metal, 350 nanometer fabrication process, EAL4+ smart card.  A device fabricated in 2002 and yet, today the latest ST19W/N series only main differences are the ROM data bus output width into the decrypt block and the fabrication process (180nm and 150nm shrink).

Figure 1:  Logo of the ST19XL18 die coded K5F0A.  Notice active shielding presence.

The device was dipped into a HydroFluoric (HF) bath until the active shielding fell off.  The result of this saved about 10 minutes of polishing to remove the surface oxide and Metal 4 (M4).  This also helps begin the polishing process on the lower layers fairly evenly.

Figure 2:  Active shield is removed.  Device needs polishing now.

The oxide thickness of a layer once the passivation oxide is removed requires less than 2 minutes per layer to remove.  We purposely stop just before the Metal 3 (M3) surface is exposed leaving the vias visibly clear (there are several gates tied to the ground of the mesh on Metal 4 (M4) as well as the active shield's begin and end vias.

Figure 3:  Metal 3 (M3) polish until only a thin layer of oxide remains.

The device was very modularly placed n' routed.  The picture below is not 100% to scale but more less highlights the various blocks present.  The MAP consists of asymmetric and symmetric crypto functions (DES, RSA, etc).

Figure 4:  M3 with comments drawn into place.

The EEPROM control logic is actually in the lower left corner of the EEPROM block.  When drawing on the picture, highlighting that area was forgotten ;).

Figure 5:  M2 layer

As Metal 3 (M3) was removed exposing the M2 layer, the device is beginning to not look so complicated.

Figure 6:  M1 layer

Metal 1 (M1) shows us all the transistors.  We did not polish down to the poly.  Most of the gates are understandable without it for the purposes of finding the clear data bus.

Figure 7:  Small memory area located behind EEPROM block.

Figure 8:  Second small memory area located behind the EEPROM block.

Most likely, these NVM areas in Figure 7 & 8 are trimming or security violation related.  No further investigation is planned on these areas (it isn't necessary).

 Figure 9:  Clear ROM drivers feeding the 'clear' data bus highlighted on each of the 3 layers.

Strangely enough, it is now understandable why ST cannot achieve high performance on the ST19 platform.  Each logic area with access to the clear data bus runs via a high-output driver that is tri-stated (hi-z) when not driven.  This means that all drivers are OR-tied and only one set of 8 drivers are ever active at a time.  This is a very large and cumbersome way of creating a MUX.

As time permits, the ST19W and ST19N series will be looked at.  It is expected to again find this kind of pattern.  Overall, finding the clear data bus took 1.5 hours once the images were created.  Most of the 1.5 hours was the alignment of the layers.


  1. How could the MUX be created different then Tri-stating and orring the outputs? How does that limit the performance?

  2. You can make an 8:1 mux by cascading three levels of 2:1 muxes (seven muxes in total) and then (if necessary) putting a high-output driver on the last stage. This means you only need one high-drive buffer instead of eight.

    A 2:1 mux is pretty easy to create out of basic logic cells. You can describe it naively as (in0 & !select) | (in1 & select). This is one inverter (two transistors), two AND gates (six transistors each) and one OR gate (six transistors), or 20 in total.

    Further optimization reduces this to (in0 NOR !select) NAND (in1 NOR select). This is one inverter (two transistors), two NORs (four each), and one NAND (four each), or 14 in total. We can reduce this to 12 if we assume both select and !select are available.

    The first stage thus needs an inverter for select[0] (2T) and four 2:1 muxes (48T) or 50T. The second needs an inverter for select[1] and two muxes, or 26T. The third stage thus needs 14T. This comes out to 90 transistors and one high-drive output stage (per bit of output).

    By comparison, ST's method requires eight high-drive tristate buffers. A typical tristate buffer uses a NOR (4T), a NAND (4T), an inverter (2T) and a 2T output stage (more than 2 if high drive is required). This comes out to 10T in core logic and 2T of output or 12T total. Multiplying by the eight units this gives 96T and *eight* high-drive output stages. When you consider that a high-drive output can be 8T or more (I've seen 16 on data bus outputs) the serious area disadvantage of ST's method becomes obvious.

  3. The bus is running to many more places that 8 I later found. This is such a slow and poor implementation. It is literally a "glue logic" design.

  4. Thanks for your explanation. Now it all makes sense.