Structurally field-configurable semiconductor array for in-memory processing of stateful, transaction-oriented systems
A semiconductor memory device is provided. The semiconductor memory device includes a plurality of memory cells arranged in multiple column groups, each column group having, a plurality of columns and a plurality of external bit-lines for independent multi-way configurable access. The column group having a first, second, and third level of hierarchy in the external bit-lines. The first level of the hierarchy provides connectivity to the plurality of memory cells. The second level of the hierarchy provides a first splicer for multiplexing data to and from each of the columns in the column group to an intermediate bit-line. The third level of the hierarchy includes a second splicer for multiplexing data to and from multiple external access paths to the intermediate bit-line. A structurally reconfigurable circuit device and methods for designing a circuit are also provided.
This application is a divisional application claiming priority from U.S. patent application Ser. No. 11/426,880, filed on Jun. 27, 2006, which claims priority under 35 U.S.C. §119(e) from U.S. Provisional Patent Application No. 60/694,538, filed Jun. 27, 2005, U.S. Provisional Patent Application No. 60/694,546, filed Jun. 27, 2005, and U.S. Provisional Patent Application No. 60/694,537, filed Jun. 27, 2005, all of which are incorporated by reference in their entirety for all purposes. The present application is related to U.S. application Ser. No. 11/426,887, filed Jun. 27, 2006 entitled APPARATUS FOR PERFORMING COMPUTATIONAL TRANSFORMATIONS AS APPLIED TO IN-MEMORY PROCESSING OF STATEFUL, TRANSACTION ORIENTED SYSTEMS, and U.S. application Ser. No. 11/426,882, filed Jun. 27, 2006 entitled METHOD FOR SPECIFYING STATEFUL, TRANSACTION-ORIENTED SYSTEMS FOR FLEXIBLE MAPPING TO STRUCTURALLY CONFIGURABLE, IN-MEMORY PROCESSING SEMICONDUCTOR DEVICE, each of which are incorporated by reference in their entirety for all purposes.
BACKGROUNDSystem on a chip (SOC) implementation is predominantly based on design capture at the register-transfer level using design languages such as Verilog and VHDL, followed by logic synthesis of the captured design and placement and routing of the synthesized netlist in physical design. Current efforts to improve design productivity have aimed at design capture at a higher level of abstraction, via more algorithmic/system approaches such as C++, C, SystemC and System Verilog.
As process technology advances, physical design issues such as timing closure and power consumption management have dominated the design cycle time as much as design capture and verification. Methodology advances currently in development and under consideration for adoption using higher levels of abstraction in design capture do not address these physical design issues, and manufacturability issues. It is recognized in the semiconductor industry that with process technologies at 90 nm and below, physical design issues will have even more significant cost impacts in design cycle time and product quality.
CAD tools for placement and route of synthesized logic netlists have delivered limited success in addressing the physical design requirements of deep submicron process technologies. To take full advantage of deep submicron process technology, the semiconductor industry needs a design methodology and a supporting tool suite that can improve productivity through the entire design cycle, from design capture and verification through physical design, while guaranteeing product manufacturability at the same time. It is also well-known in the semiconductor industry that SOC implementations of stateful, transaction-oriented applications depend heavily on on-chip memory bandwidth and capacity for performance and power savings. Placement and routing of a large number of memory modules becomes another major bottleneck in SOC physical design.
Another important requirement for an advanced SOC design methodology for deep submicron process technology is to allow integration of on-chip memory with significant bandwidth and capacity without impacting product development schedule or product manufacturability. High level design capture, product manufacturability, and support for significant memory resources are also motivating factors in the development of processor-in-memory. Processor-in-memory architectures are driven by requirements to support advanced software programming concepts such as virtual memory, global memory, dynamic resource allocation, and dynamic load balancing. The hardware and software complexity and costs of these architectures are justified by the requirement to deliver good performance for a wide range of software applications. Due to these overheads, multiple processor-in-memory chips are required in any practical system to meet realistic performance and capacity requirements, as witnessed by the absence of any system product development incorporating a single processor-in-memory chip package.
There is thus an added requirement for cost effective SOC applications that resource management in processor-in-memory architectures be completely controllable by the designer through program structuring and annotations, and compile-time analysis. It is also important to eliminate all cost and performance overheads in software and hardware complexity attributed to the support of hierarchical memory systems. Based on these observations, there is a need in the semiconductor industry for a cost-effective methodology to implementing SOCs for stateful, transaction-oriented applications.
SUMMARYBroadly speaking, the present invention fills these needs by providing a method and apparatus for in-memory processing of stateful, transaction-oriented applications. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, or a device. Several inventive embodiments of the present invention are described below.
In one embodiment, a structurally reconfigurable semiconductor circuit device for in-memory processing of stateful, transaction-oriented applications is provided. The circuit device includes a multiple level array of memory storage cells and logic circuits. The storage cells have multiple configurable access paths and are capable of being simultaneously accessed for being read from and written into. The circuit device also includes a plurality of configurable, packetized interface ports capable of receiving data packets. The packetized interface ports have access to the multiple level array. A plurality of configurable commute elements distributed within the multiple level array are included. Each of the plurality of configurable commute elements is configured to move data within the multiple level array of storage cells through one of the multiple configurable access paths. The circuit device also includes a plurality of configurable Compute elements within the multiple level array. Each of the plurality of configurable Compute elements is configured to transform data within a portion of the multiple level array of storage cells via the multiple configurable access paths.
In another embodiment, a semiconductor memory device is provided. The semiconductor memory device includes a plurality of memory cells arranged in multiple column groups, each column group having, a plurality of columns and a plurality of external bit-lines for independent multi-way configurable access. The column group having a first, second, and third level of hierarchy in the external bit-lines. The first level of the hierarchy provides connectivity to the plurality of memory cells. The second level of the hierarchy provides a first splicer for multiplexing data to and from each of the columns in the column group to an intermediate bit-line. The third level of the hierarchy includes a second splicer for multiplexing data to and from multiple external access paths to the intermediate bit-line.
In yet another embodiment, a method for designing a circuit device and a layout in a manner to enhance yield of the circuit device during manufacturing is provided. The method initiates with partitioning a physical design of the circuit device into different hierarchical levels of integration. A pool of redundant features for the different hierarchical levels of integration is provided, wherein the pool of redundant features is apportioned to the different hierarchical levels of integration according to a defect density of each of the levels of integration.
In still another embodiment, a method to enhance soft error robustness of a semiconductor circuit device having a multiple level array of memory storage cells is provided. The method initiates with isolating a read access path coupled to a memory storage cell of the multiple level array of memory storage cells. A charge of the memory storage cell is increased, that is in addition to a gate capacitance provided by a gate of the memory storage cell. A diffusion area of a gate region of the memory storage cell is then reduced, thereby reducing the SER (Soft Error) cross section.
In another embodiment, a method for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells is provided. The method initiates with expressing a stateful transaction oriented application as a network of flow virtual machines (FVMs), each of the FVMs is associated with a portion of a configurable memory region. The method includes aggregating multiple FVMs into an aggregate flow virtual machine (AFVM) and mapping the AFVM into a portion of the multiple level array of memory storage cells. Multi-way access paths of the multiple level array are configured according to the multiple FVMs and the portion of the multiple level array is programmed to function according to the multiple FVMs.
Other aspects and advantages of the invention will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the invention.
The present invention will be readily understood by the following detailed description in conjunction with the accompanying drawings, and like reference numerals designate like structural elements.
An invention is described for a structurally reconfigurable intelligent memory device for efficient implementation of stateful, transaction-oriented systems in silicon. It will be obvious, however, to one skilled in the art, that the present invention may be practiced without some or all of these specific details. In other instances, well known process operations have not been described in detail in order not to unnecessarily obscure the present invention.
The embodiments of the present invention described below provide a method and apparatus enabling flexible design capture methodology which allows a designer to select the granularity at which a stateful, transaction-oriented application is captured. An efficient methodology to implement a stateful, transaction-oriented application on a platform economically superior with respect to design effort, implementation costs and manufacturability is further described below. The embodiments utilize an execution model that allows for efficient compiler optimization and resource allocation, efficient hardware implementation, and accurate performance analysis and prediction when a design is captured and analyzed. It should be appreciated that no significant uncertainty is introduced by design compilation, mapping into the physical platform, or resource conflicts during system operation. The resource requirements are specified explicitly when the design is captured, using annotations or compiler analysis. Allocation of hardware resources can be determined statically at compile time.
In another aspect of the invention a simple and effective chip architecture that uses a single level real memory organization to eliminate the costs of managing a caching hierarchy associated with virtual memory systems in applications development, compiler optimization, run-time system support, and hardware complexity is provided. As will be explained in more detail below, the embodiments described herein meet the tremendous demands of memory capacity and bandwidth in future generation SOCs with solutions that are economical in die area, product development cycle and power consumption. At the same time, the embodiments reap the cost, performance and power consumption benefits of advanced deep submicron fabrication processes with exceedingly high manufacturability and reliability.
Still referring to
One skilled in the art will appreciate from
The FlowLogic architecture allows flexible design space exploration of performance and quantitative behavior, followed by flexible mapping of the results into the said structurally field-configurable semiconductor device. The parameters related to Arcs 108, among others, are determined interactively during system simulations using FlowLogic. It may be noted that the performance behavior of such systems will only be as good as the traffic pattern assumptions made in the simulation. In one embodiment, FlowGates referred to as DynamicFlowGates can be dynamically loaded and linked at run-time. In one embodiment, DynamicFlowGates are limited to serving the purposes of run-time system diagnostics and debug. Thus, an overview of the FlowLogic system and language has been provided above and further details are provided with reference to the Figures referenced below.
It should be noted that the sizes of the logical memory partitions in an FVM are arbitrary and the partitions have physically independent access paths. The code related to FlowGates and FlowMethods is compiled into relocatable machine code which in-turn determines the logical size of the corresponding FVM CodeMemory. The FlowGateIndex contains a jump table indexed on unique FlowGate identifier along with the pointer to the FlowGate code, among other context data for proper FlowGate execution. The StackMemory is used for storing intermediate states as required during the FlowGate execution. There are no register files in the FVM. The working of the FVM is analogous to that of a stack machine. The Stack is always empty before a FlowGate starts since the FlowGate by itself does not have a persistent state, and the FlowGate is not allowed to suspend.
The size or the depth of the Stack is determined at compile-time by the FlowLogic compiler. As may be evident, FlowLogic programming style does not support nested calls and recursive function calls whose depths are not predictable at compile-time. Furthermore, there is no dynamic allocation or garbage collection in FlowLogic because memory resource allocations are fixed at compile-time. Other than temporary variables whose life times span the FlowGate call, State variables are all pre-allocated at compile-time. The size of the StateMemory 126 for a FVM is well known at the compile time. OutputBuffer 128 and ChannelMemory 130 are managed by the run-time system and are visible to the system designer only via annotation in one embodiment. OutputBuffer 128 is a small memory area for temporarily staging outgoing Signals. ChannelMemory 130, on the other hand, hosts the Channels and is as large as is required by the corresponding FVM. It is useful to point out at this time that although these memories have different access data paths, the memories all use the same resource types in the structurally configurable in-memory processing array. In fact, memories are the only resources directly allocated in the array, with other necessary logic, including processing elements, being fixed to such memory resources.
The FlowLogicMachine can itself be thought of as an array of structurally configurable memory units that implements a plurality of FlowTiles, where the computational logic is fixed and distributed. As a further analogy, the FlowLogic language described herein may be thought of as the JAVA language, while the FlowLogicMachine may be analogized to the JAVA Virtual machine, since the FlowLogic Language has some attributes of object oriented programming languages. For one skilled in the art, it should be appreciated that much of the resources in question are memory units in one form or another, i.e., code, state, stack, channels, and buffer. Motivated by the above observation, the FlowLogicMachine is designed to provide the ability to configure these memory units, also referred to as memory resources, as required by a particular application and the FlowLogic representation allows the flexibility of re-casting a system description in flexible ways to achieve the targeted capacity, performance, and functionality.
In one embodiment of the present invention, Compute element 150 of
Program Read 152: The control code is read from the stored program control memory. The instruction size is fixed to be 32 bits wide in one embodiment. There is an instance of a program counter for each one of the virtual processors and some portion of the instruction is used to identify the operands and operand selection mode. The instruction is aligned and operand addresses are generated. Compute element 150 depends extensively on context pointers for generating relative addresses. The address offsets in the instruction itself have a dynamic range in accordance with the size of the bit field each occupies.
Decode Read 160: This is the micro-program read that decodes the program code into control states for controlling the computation operation. In a sense, the architecture of Compute element 150 defies the principles of Reduced Instruction Set Computer (RISC) design by resorting back to microprogram based control. The microprogram decode is programmable, in that certain instances of Compute element 150 may have application dependant optimizations of the microprogram control store.
Extension Read 162: This field is used to customize a control instruction in the second step or pipeline stage. In particular, extension read 162 generates operand masks as required for bit manipulations, in the context of transaction-oriented processing. Typically, the extension reads are templates that are generated on an application specific basis and are referenced by the primary instructions.
Operand A&B Read 166 and 168, respectively: The two operands are read from the addresses generated by the address generator 158.
Look Up[0-3] 172a-d: There are four optional lookup memory partitions that enable a special class of instructions called the “Memory Extensible Instructions.” These instructions are application dependent and hence the look up memories can be optionally configured for each Compute element 150. These instructions accelerate algorithms such as encryption, authentication, hashing, cyclic redundancy checks and multiplication among others, used in transaction-oriented applications. The operands are used to generate the addresses for the four lookup partitions and the resulting four lookup outputs, up to 128 bits each, are combined together in ALU 170 to generate the output.
Result Write 182: The resulting output from ALU 170 is then written into the corresponding memory partition via the access path of result write 182.
ALU 170: Often times the result of ALU 170 is used to update an internal register or the next control state of the program. It should be noted that there is no branch prediction or any other form of instruction-level-parallelism enhancement techniques. The architecture of Compute element 150 once again defies the premise of RISC. Compute element 150 does have several complex instructions operating on anywhere from 32 to 128 bit data paths, which are optimized for stateful, transaction-oriented applications. ALU 170 is a three stage pipelined unit in one embodiment. As shown in
The embodiments of the present invention are particularly designed to address noise-induced errors and soft-errors plaguing deep semiconductor memory technologies. Noise sources include crosstalk and coupling. In a reasonably designed system, soft-errors are rare, but inevitable. Compute element 150 detects single-bit errors on all the read access paths 152 and 178, but does not expend combinational logic in correcting the error. Compute element 150 is designed to go into an exception mode, where the error is corrected programmatically at the cost of compute cycles. In a sense, this is similar to a hardware interrupt. In fact, Compute element 150 does not have any other forms or use of interrupts in the embodiments described herein. In the embodiment described herein data is organized in bytes each with its own parity bit enabling error detection at the byte level. Furthermore, a block of 16 bytes including the parity bits is protected by a 9-bit syndrome enabling single-bit error correction at the block level.
Built-In Silicon Test & Repair: A Substantial part of the semiconductor device includes the configurable memory pools. Since other elements do not occupy a significant percentage of the device die, they are designed robustly with respect to potential manufacturing defects. Memory pool 190 is tested in-silicon and configured to eliminate defective portions of the memory units in an application independent fashion. In one embodiment, much of this operation is performed programmatically on power up. The memory pools are then configured appropriately to suit a given application. The memory pool also provides independent read accesses to a resident Commute element and independent write accesses to neighboring Commute elements. As mentioned above, Commute elements 136 of
LookupMemory 212: This partition of memory is optional and can use 1, 2 or 4 of the access paths shown earlier. The contents of LookupMemory 212 are programmed typically at configuration time as required by the application. Portions of lookup memory 212 can also be metal-programmed during manufacturing.
StackMemory (Copy 0 and 1) 214a and 214b, respectively: The execution model of the Compute element can be analogized to a stack machine. The Compute element does not have the notion of register files or virtual memories. All the required memory is pre-allocated at the compile or personalization time in one embodiment. StackMemory 214a and 214b serves as temporary storage of run-time state of the FlowGate. FlowGates in FlowLogic are so specified that the maximum Stack size required for an application can be determined at compile time. The partition is made sufficiently large to house the deepest stack as determined by the compiler. FlowLogic does not support the notion of recursive function calls to ensure that the Stack does not overflow in one embodiment.
There is an optional second copy of the StackMemory which is a mirror image of the original copy. This arrangement is used in one embodiment to make the contents of the Stack available as either operand to the ALU. The two copies of the StackMemory however get written into simultaneously. The compiler in some cases may choose not to use the second copy. But often times, Stack Memories are substantially smaller, but the variables stored in the Stack tend to get used frequently. It should be appreciated that the StackMemory is the replacement for register files in traditional RISC processors.
CodeMemory 218: Much of the program code relates to FlowGates, which are relocatable and contextual. The CodeMemory partition can be configured to any arbitrary size like other partitions. Multiple virtual processors can share some of the code as required. Portions of CodeMemory 218, especially those relating to power on repair can be metal-programmed at the time of device manufacture. The rest of the application dependent code is typically programmed at the configuration time in one embodiment. In some special cases, such as exception handling, this partition can also be programmed at run-time in an exception specific way.
ExtensionMemory 220: This is a much smaller optional partition that is used to customize instances of instruction, typically providing masks and alignments and other control/data parameters to the ALU.
StateMemory 222: This is a memory partition where the FlowModule states are stored and operated upon. All the allocations into state memory 222 are made at the compile time. As mentioned previously, there is no dynamic heap storage allocation or garbage collection in FlowLogic.
Output Buffer 224: This is a relatively small partition, where the Compute element writes to, but does not read from. The Commute element typically reads out from this partition.
Channel Memory 226: This is typically a flexible partition which may be very small in some cases and large in others. Signals are deposited into this partition by the Commute element. The Compute element only has read access to this partition.
The concept of memory extensible instruction is disclosed by the embodiments described herein.
In one embodiment of the invention, the in-memory processing die is constructed in a scalable fashion by tiling FlowTiles along two dimensions as shown in
The in-memory processing device is realized using a system-in-package (SIP) device wherein one or more instances of an in-memory processing array die are interfaced with one or more smaller companion dies 255 on substrate 251 as illustrated in
Still referring to
The memory bit-cells, pages and FlowTiles are designed specifically to enhance semiconductor design yield against defects. The area dominant circuitry is the bit-cell which is designed using aggressive design pitches. To improve yield at this level, each page comprises redundant rows, which can be used to replace rows with defective bit-cells. At the next level of integration relaxed geometries are used to minimize the likelihood of point defects. There are also redundant pages within a FlowTile to compensate for pages that may be defective in spite of bit-cell repairability. The external bit-line and per-page logic is over-designed to be robust against point defects. It should be noted that the embodiments described herein partition a physical design of a circuit device into different hierarchical levels of integration. The different levels include lower level where a defect density is relatively high as compared to the higher levels of integration. For example, at the bit cell level, the defect density is relatively high and thus, the embodiments described herein would provide for higher redundancy since it is preferable to keep feature sizes at a minimum in this level of integration. However, at the level of page integration, the redundancy may be relaxed and defect resilient techniques may be incorporated. In one embodiment, the defect resilient techniques may include using coarse features and spacing features farther apart to reduce the redundancy requirements. One skilled in the art will appreciate that tools currently available do not possess knowledge of circuit levels and solely focus on a minimum spacing criteria between features. The embodiments herein define a set of design criteria that is hierarchical and at each level it is determined which rules apply. That is, at some levels redundancy may be desirable, while at other levels resiliency may be preferable, where resiliency refers to using coarser feature sizes and further spacing features apart to reduce the need for redundant features at that level. Thus, the embodiments described herein base the decision of redundancy versus resiliency on the level of integration rather than solely on the decision to minimum spacing at all levels.
The memory refresh is performed separately for different portions of the in-memory processing array by temporarily stalling the usage of the corresponding access ports. Some memories may not be refreshed if they are either not used or if they belong to a partition which is known to have short mean-time between re-write. In such instance, the application does however monitor the time between re-write to ensure that the bits do not decay.
Still referring to
The embodiments described above provide a memory centric approach for a processing system design and architecture, as well as the FlowLogic language for designing, synthesizing, and placing and routing techniques for this unique processing system design. Terms of the FlowLogic language have been analogized to some object oriented terms for ease of understanding. For example, a FlowGate may be thought of as a Function, Procedure or Task, while a FlowModule may be analogized to an object in object oriented programming. A Signal may be referred to as a message or a packet. It should be appreciated that while these analogies are used for explanatory purposes, there are significant differences between the embodiments described herein and the corresponding analogies.
Traditional processors incorporate the notion of virtual memories to push physical memory away from the processing core. To do so, they introduce accumulators, registers and caching hierarchies. The embodiments described above embrace the incorporation of processing core(s) directly within the physical memory. Furthermore, the data paths in the above-described embodiments are significantly different than the data paths within the traditional processor architecture.
The invention has been described herein in terms of several exemplary embodiments. Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention. The embodiments and preferred features described above should be considered exemplary, with the invention being defined by the appended claims.
With the above embodiments in mind, it should be understood that the invention may employ various computer-implemented operations involving data stored in computer systems. These operations are those requiring physical manipulation of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. Further, the manipulations performed are often referred to in terms, such as producing, identifying, determining, or comparing.
Any of the operations described herein that form part of the invention are useful machine operations. The invention also relates to a device or an apparatus for performing these operations. The apparatus may be specially constructed for the required purposes, or it may be a general purpose computer selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with computer programs written in accordance with the teachings herein, or it may be more convenient to construct a more specialized apparatus to perform the required operations.
The invention can also be embodied as computer readable code on a computer readable medium. The computer readable medium is any data storage device that can store data which can be thereafter be read by a computer system. Examples of the computer readable medium include hard drives, network attached storage (NAS), read-only memory, random-access memory, CD-ROMs, CD-Rs, CD-RWs, magnetic tapes, electromagnetic wave carriers, and other optical and non-optical data storage devices. The computer readable medium can also be distributed over a network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.
Although the foregoing invention has been described in some detail for purposes of clarity of understanding, it will be apparent that certain changes and modifications may be practiced within the scope of the appended claims. Accordingly, the present embodiments are to be considered as illustrative and not restrictive, and the invention is not to be limited to the details given herein, but may be modified within the scope and equivalents of the appended claims. In the claims, elements and/or steps do not imply any particular order of operation, unless explicitly stated in the claims.
Claims
1. A semiconductor memory device comprising:
- a plurality of memory cells arranged in multiple column groups, each column group having, a plurality of columns, a plurality of external bit-lines for independent multi-way configurable access, a first, second and third level of hierarchy in the external bit-lines, wherein the first level of the hierarchy provides connectivity to the plurality of memory cells, the second level of the hierarchy provides a first splicer for multiplexing data to and from each of the columns in the column group to an intermediate bit-line, and the third level of the hierarchy including a second splicer for multiplexing data to and from multiple external access paths to the intermediate bit-line.
2. The device of claim 1 wherein each of the memory cells include,
- a metal-insulator-metal capacitor connected to a gate of an isolation transistor,
- a transistor for writing connected to an input bit line; and
- a transistor for reading connected to an output bit line.
3. The memory cell in claim 1, wherein the plurality of memory cells are configured for metal programming to one of a hard logic zero or a one at time of manufacture.
4. The memory cell of claim 1, wherein the memory cell is included in a multi-chip module package.
5. A method for designing a circuit device and a layout in a manner to enhance yield of the circuit device during manufacturing, comprising:
- partitioning a physical design of the circuit device into different hierarchical levels of integration; and
- providing a pool of redundant features for the different hierarchical levels of integration, wherein the pool of redundant features is apportioned to the different hierarchical levels of integration according to a defect density of each of the levels of integration.
6. The method of claim 5, wherein providing a pool of redundant features includes,
- associating a greater amount of redundant features to lower hierarchical levels of integration; and
- applying defect resilient techniques to higher hierarchical levels of integration in order to reduce an amount of redundant features associated with the higher hierarchical levels of integration.
7. The method of claim 6 wherein the defect resilient techniques include spacing features further apart.
8. The method of claim 5, wherein the circuit device includes a multiple level array of memory storage cells.
9. The method of claim 6 wherein the lower hierarchical levels of integration include a transistor level of integration and the higher levels of integration include a page level of integration.
10. A method to enhance soft error robustness of a semiconductor circuit device having a multiple level array of memory storage cells, comprising;
- isolating a read access path coupled to a memory storage cell of the multiple level array of memory storage cells;
- increasing a charge of the memory storage cell that is in addition to a gate capacitance provided by a gate of the memory storage cell, and
- reducing a diffusion area of a gate region of the memory storage cell, thereby reducing a soft error rate (SER) cross section.
11. The method of claim 10, further comprising:
- performing single bit soft error detection and correction.
12. A method for configuring and programming a semiconductor circuit device having a multiple level array of memory storage cells, comprising:
- expressing a stateful transaction oriented application as a network of FlowVirtualMachines (FVMs), each of the FVMs associated with a portion of a configurable memory region;
- aggregating multiple FVMs into an AggregateFlowVirtualMachine (AFVM);
- mapping the AFVM into a portion of the multiple level array of memory storage cells;
- configuring multi-way access paths of the multiple level array according to the multiple FVMs; and
- programming the portion of the multiple level array to function according to the multiple FVMs.
13. The method of claim 12, wherein multiple AFVMs are mapped into a FlowTile.
14. The method of claim 13, wherein a FlowTile is a physical entity that has a number of memory resource units and wherein a sum of resources required by the multiple AFVMs is less than the number of memory resources.
Type: Application
Filed: Sep 17, 2009
Publication Date: Jan 14, 2010
Inventors: Shridhar Mukund (San Jose, CA), Anjan Mitra (Santa Clara, CA)
Application Number: 12/561,460
International Classification: G11C 7/10 (20060101); G06F 17/50 (20060101); H03K 19/003 (20060101); H03K 19/173 (20060101);