Integrated Circuit Device with Separate Die for Programmable Fabric and Programmable Fabric Support Circuitry
An integrated circuit device having separate dies for programmable logic fabric and circuitry to operate the programmable logic fabric are provided. A first integrated circuit die may include field programmable gate array fabric. A second integrated circuit die may be coupled to the first integrated circuit die. The second integrated circuit die may include fabric support circuitry that operates the field programmable gate array fabric of the first integrated circuit die.
This application is a Divisional of U.S. Non-Provisional application Ser. No. 15/855,419, filed Dec. 27, 2017, entitled “Integrated Circuit Device with Separate Die for Programmable Fabric and Programmable Fabric Support Circuitry”, which is hereby incorporated by reference in its entirety for all purposes.
BACKGROUNDThis disclosure relates to an integrated circuit that includes a first die containing programmable logic fabric and a second die containing support circuitry for operating the programmable logic fabric.
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it may be understood that these statements are to be read in this light, and not as admissions of prior art.
Programmable logic devices are a class of integrated circuits that can be programmed to perform a wide variety of operations. A programmable logic device may include programmable logic elements programmed by a form of memory known as configuration random access memory (CRAM). Thus, to program a circuit design into a programmable logic device, the circuit design may be compiled into a bitstream and programmed into CRAM cells. The values programmed into the CRAM cells define the operation of programmable logic elements of the programmable logic device.
The highly flexible nature of programmable logic devices makes them an excellent fit for accelerating many computing tasks. Thus, programmable logic devices are increasingly used as accelerators for machine learning, video processing, voice recognition, image recognition, and many other highly specialized tasks, particularly those that would be too slow or inefficient in software running on a processor. Moreover, bitstreams that define a particular accelerator function may be programmed into a programmable logic device as requested, in a process known as partial reconfiguration. Even this, however, takes some amount of time to perform. Although partial reconfiguration may take place very quickly, on the order of milliseconds, some tasks may call for even quicker calculations, on the order of microseconds or faster.
Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It may be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it may be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, the phrase A “based on” B is intended to mean that A is at least partially based on B. Moreover, unless expressly stated otherwise, the term “or” is intended to be inclusive (e.g., logical OR) and not exclusive (e.g., logical XOR). In other words, the phrase A “or” B is intended to mean A, B, or both A and B.
The highly flexible nature of programmable logic devices makes them an excellent fit for accelerating many computing tasks. Thus, programmable logic devices are increasingly used as accelerators for machine learning, video processing, voice recognition, image recognition, and many other highly specialized tasks, particularly those that would be too slow or inefficient in software running on a processor. Moreover, bitstreams that define a particular accelerator function may be programmed into a programmable logic device as requested, in a process known as partial reconfiguration.
To increase the speed at which configuration, including partial reconfiguration, can occur on a programmable logic device, as well as to better control power consumption, reduce manufacturing costs, among other things, this disclosure describes systems and methods that employ a programmable logic device composed of at least two separate die. The programmable logic device may include a first die that contains primarily programmable logic fabric, and a second die that contains fabric support circuitry to support the operation of the programmable logic fabric. Indeed, the second die may contain at least some fabric support circuitry that may operate the programmable logic fabric (e.g., the fabric support circuitry of the second die may be essential to the operation of the programmable logic fabric of the first die). Thus, the fabric support circuitry may include, among other things, a device controller (sometimes referred to as a secure device manager (SDM)), a sector controller (sometimes referred to as a local sector manager (LSM)), a network-on-chip (NOC), a configuration network on chip (CNOC), data routing circuitry, local (e.g., sectorized or sector-aligned) memory used to store and/or cache configuration programs (bitstreams) or data, memory controllers used to program the programmable logic fabric, input/output (I/O) interfaces or modules for the programmable logic fabric, external memory interfaces (e.g., for a high bandwidth memory (HBM) device), an embedded processor (e.g., an embedded Intel® Xeon® processor by Intel Corporation of Santa Clara, Calif.) or an interface to connect to a processor (e.g., an interface to an Intel® Xeon® processor by Intel Corporation of Santa Clara, Calif.), voltage control circuitry, thermal monitoring circuitry, decoupling capacitors, power clamps, or electrostatic discharge circuitry, to name just a few circuit elements that may be present on the second die. Indeed, in some embodiments, the first die may entirely or almost entirely contain programmable logic fabric, and the second die may contain all or almost all of the fabric support circuitry that controls the programmable logic fabric.
By separating at least some of the programmable logic fabric and at least some of the fabric support circuitry, the programmable logic fabric may be programmed or operated more quickly or efficiently. Indeed, the first die that contains the programmable logic fabric may not contain as much fabric support circuitry as a single die that would contain both the programmable logic fabric and the fabric support circuitry. This may allow the first die to be more dense with programmable logic fabric. Moreover, in some cases, the first die and the second die may be vertically stacked and connected to one another via an efficient connection, such as via microbumps, which may allow a parallel connection between the programmable logic fabric and the fabric support circuitry, further increasing a speed of configuring and/or operating the programmable logic fabric. In addition, in some cases, a configuration program (e.g., bitstream) may be cached into sector-aligned memory in the fabric support circuitry of the second die. This may allow for rapid partial reconfiguration by configuring the programmable logic fabric using a cached configuration. Data may also be cached or stored in greater amounts for use by the programmable logic fabric.
With this in mind,
The programmable logic device 12 may represent any integrated circuit device that includes a programmable logic device with two separate integrated circuit die where at least some of the programmable logic fabric is separated from at least some of the fabric support circuitry that operates the programmable logic fabric. One example of the programmable logic device 12 is shown in
In combination, the fabric die 22 and base die 24 may operate as a programmable logic device such as a field programmable gate array (FPGA). For example, the fabric die 22 and the base die 24 may operate in combination as an FPGA 40, shown in
In the example of
There may be any suitable number of programmable logic sectors 48 on the FPGA 40. Indeed, while 29 programmable logic sectors 48 are shown here, it should be appreciated that more or fewer may appear in an actual implementation (e.g., in some cases, on the order of 50, 100, or 1000 sectors or more). Each programmable logic sector 48 may include a sector controller (SC) 58 that controls the operation of the programmable logic sector 48. Each sector controller 58 may be in communication with a device controller (DC) 60. Each sector controller 58 may accept commands and data from the device controller 60, and may read data from and write data into its configuration memory 52 based on control signals from the device controller 60. In addition to these operations, the sector controller 58 and/or device controller 60 may be augmented with numerous additional capabilities. Such capabilities may include coordinating memory transactions between local fabric memory (e.g., local fabric memory or CRAM being used for data storage) and sector-aligned memory associated with that particular programmable logic sector 48, decrypting configuration programs (bitstreams) 18, and locally sequencing reads and writes to implement error detection and correction on the configuration memory 52 and sequencing test control signals to effect various test modes.
The sector controllers 58 and the device controller 60 may be implemented as state machines and/or processors. For example, each operation of the sector controllers 58 or the device controller 60 may be implemented as a separate routine in a memory containing a control program. This control program memory may be fixed in a read-only memory (ROM) or stored in a writable memory, such as random-access memory (RAM). The ROM may have a size larger than would be used to store only one copy of each routine. This may allow each routine to have multiple variants depending on “modes” the local controller may be placed into. When the control program memory is implemented as random access memory (RAM), the RAM may be written with new routines to implement new operations and functionality into the programmable logic sectors 48. This may provide usable extensibility in an efficient and easily understood way. This may be useful because new commands could bring about large amounts of local activity within the sector at the expense of only a small amount of communication between the device controller 60 and the sector controllers 58.
Each sector controller 58 thus may communicate with the device controller 60, which may coordinate the operations of the sector controllers 58 and convey commands initiated from outside the FPGA device 40. To support this communication, the interconnection resources 46 may act as a network between the device controller 60 and each sector controller 58. The interconnection resources may support a wide variety of signals between the device controller 60 and each sector controller 58. In one example, these signals may be transmitted as communication packets.
The FPGA 40 may be electrically programmed. With electrical programming arrangements, the programmable elements 50 may include one or more logic elements (wires, gates, registers, etc.). For example, during programming, configuration data is loaded into the configuration memory 52 using pins 44 and input/output circuitry 42. In one example, the configuration memory 52 may be implemented as configuration random-access-memory (CRAM) cells. The use of configuration memory 52 based on RAM technology is described herein is intended to be only one example. Moreover, configuration memory 52 may be distributed (e.g., as RAM cells) throughout the various programmable logic sectors 48 the FPGA 40. The configuration memory 52 may provide a corresponding static control output signal that controls the state of an associated programmable logic element 50 or programmable component of the interconnection resources 46. The output signals of the configuration memory 52 may configure the may be applied to the gates of metal-oxide-semiconductor (MOS) transistors that control the states of the programmable logic elements 50 or programmable components of the interconnection resources 46.
As stated above, the logical arrangement of the FPGA 40 shown in
The fabric die 22 and the base die 24 may collectively hold any suitable circuitry that may encompass the programmable logic device 12. Thus, in one example, the fabric die 22 may include primarily programmable logic fabric resources, such as the programmable logic elements 50 and configuration memory 52, and the base die 24 may include circuitry other than the programmable logic elements 50 and configuration memory 52. These circuit elements may include, among other things, a device controller (DC) 60, a sector controller (SC) 58, a network-on-chip (NOC), a configuration network on chip (CNOC), data routing circuitry, sector-aligned memory used to store and/or cache configuration programs (bitstreams) or data, memory controllers used to program the programmable logic fabric, input/output (I/O) interfaces or modules for the programmable logic fabric, external memory interfaces (e.g., for a high bandwidth memory (HBM) device), an embedded processor (e.g., an embedded Intel® Xeon® processor by Intel Corporation of Santa Clara, Calif.) or an interface to connect to a processor (e.g., an interface to an Intel® Xeon® processor by Intel Corporation of Santa Clara, Calif.), voltage control circuitry, thermal monitoring circuitry, decoupling capacitors, power clamps, and/or electrostatic discharge (ESD) circuitry, to name just a few elements that may be present on the base die 24. It should be understood that some of these elements that may be part of the fabric support circuitry of the base die 24 may additionally or alternatively be a part of the fabric die 22. For example, the device controller (DC) 60 and/or the sector controllers (SC) 58 may be part of the fabric die 22.
One physical arrangement of the fabric die 22 is shown in
While the physical arrangement shown in
In another example, shown in
In another example, shown in
In another example, shown in
In another example, shown in
In another example, shown in
In another example, shown in
In another example, shown in
As shown in
In another example, shown in
To facilitate efficient communication, the fabric die 22 and the base die 24 may be vertically sector-aligned. In one example, shown in
In some cases, the fabric sectors 80 and the sectors 90 may not occupy the same amount of area. Indeed, as shown in
By vertically aligning the fabric die 22 and the base die 24, memory located in the base die 24 may be accessible in parallel to fabric sectors 80 of the fabric die 22.
As shown in
The programmable logic device 12 may be packaged in a variety of configurations. In addition to the configuration shown in
The programmable logic device 12 may also take a form in which the fabric die 22 and the base die 24 are not vertically stacked, but rather take a 2.5D packaging configuration. An example is shown in
There may also be more than one base die 24 for a respective fabric die 22. In an example shown in
The packaging may also include a liquid cooling system, such as a microchannel integrated heat spreader (MC-IHS). Shown by way of example in
The programmable logic device 12 may be, or may be a component of, a data processing system. For example, the programmable logic device 12 may be a component of a data processing system 260, shown in
In one example, the data processing system 260 may be part of a data center that processes a variety of different requests. For instance, the data processing system 260 may receive a data processing request via the network interface 266 to perform machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or some other specialized task. The host processor 262 may cause the programmable logic fabric of the programmable logic device 12 to be programmed with a particular accelerator related to requested task. For instance, the host processor 262 may instruct that a program (bitstream) stored on the memory/storage 264 or cached in sector-aligned memory of the programmable logic device 12 to be programmed into the programmable logic fabric of the programmable logic device 12. The program (bitstream) may represent a circuit design for a particular accelerator function relevant to the requested task. Due to the high density of the programmable logic fabric, the proximity of the substantial amount of sector-aligned memory to the programmable logic fabric, or other features of the programmable logic device 12 that are described here, the programmable logic device 12 may rapidly assist the data processing system 260 in performing the requested task. Indeed, in one example, programming an accelerator to assist with a voice recognition task may take place faster than a few milliseconds (e.g., on the order of microseconds).
The methods and devices of this disclosure may be incorporated into any suitable circuit. For example, the methods and devices may be incorporated into numerous types of devices such as microprocessors or other integrated circuits. Exemplary integrated circuits include programmable array logic (PAL), programmable logic arrays (PLAs), field programmable logic arrays (FPLAs), electrically programmable logic devices (EPLDs), electrically erasable programmable logic devices (EEPLDs), logic cell arrays (LCAs), field programmable gate arrays (FPGAs), application specific standard products (ASSPs), application specific integrated circuits (ASICs), and microprocessors, just to name a few.
Moreover, while the method operations have been described in a specific order, it should be understood that other operations may be performed in between described operations, described operations may be adjusted so that they occur at slightly different times or described operations may be distributed in a system which allows the occurrence of the processing operations at various intervals associated with the processing, as long as the processing of overlying operations is performed as desired.
The embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it may be understood that the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims. In addition, the techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). For any claims containing elements designated in any other manner, however, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
Claims
1. A data processing system comprising:
- a processor configured to manage a data processing request; and
- a programmable logic device configured to be programmed with a configuration program relating to the data processing request in response to an instruction by the processor, wherein programmable logic fabric of a first integrated circuit die of the programmable logic device is programmed at least in part by fabric support circuitry of a second integrated circuit die of the programmable logic device.
2. The data processing system of claim 1, wherein the fabric support circuitry of the second integrated circuit die of the programmable logic device comprises a device controller configured to control circuitry of the first integrated circuit die and the second integrated circuit die, a sector controller configured to control a sector of circuitry of the first integrated circuit die and the second integrated circuit die, a network on chip, a configuration network on chip, data routing circuitry, sector-aligned memory, a memory controller configured to program the programmable logic fabric, an input/output (I/O) interface for the programmable logic fabric, an external memory interface, a first processor embedded in the second integrated circuit die, an interface to connect the programmable logic fabric to a second processor external to the first integrated circuit die and the second integrated circuit die, voltage control circuitry configured to control a voltage supplied to the programmable logic fabric, thermal monitoring circuitry configured to monitor heat of the first integrated circuit die, a decoupling capacitor, a power clamp, electrostatic discharge circuitry, or any combination thereof.
3. The data processing system of claim 1, wherein the processor and the programmable logic device are disposed within the same package.
4. The data processing system of claim 1, wherein the data processing request comprises machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, or a combination thereof.
5. The data processing system of claim 1, wherein the programmable logic device comprises a field programmable gate array (FPGA).
6. A method for manufacturing an integrated circuit device, the method comprising:
- obtaining a first integrated circuit die comprising field programmable gate array fabric;
- obtaining a second integrated circuit die comprising fabric support circuitry configured to operate the field programmable gate array fabric of the first integrated circuit die;
- vertically aligning the first integrated circuit die and the second integrated circuit die; and
- connecting a first surface of the first integrated circuit die to a second surface of the second integrated circuit die.
7. The method of claim 6, wherein obtaining the first integrated circuit die comprises manufacturing the first integrated circuit die according to a higher-resolution process, and wherein obtaining the second integrated circuit die comprises manufacturing the second integrated circuit die according to a lower-resolution process.
8. The method of claim 6, comprising disposing the first integrated circuit die and the second integrated circuit die in a microchannel integrated heat spreader.
9. The method of claim 6, wherein vertically aligning the first integrated circuit die and the second integrated circuit die comprises vertically aligning first sectors of the field programmable gate array fabric with second sectors of the fabric support circuitry.
10. The method of claim 6, wherein connecting the first surface and the second surface comprises forming an electrical connection between a connector of the first integrated circuit die and a corresponding connector of the second integrated circuit die.
11. The method of claim 10, wherein the electrical connection comprises a microbump.
12. The method of claim 6, wherein the integrated circuit device comprises a field programmable gate array (FPGA) comprising the first integrated circuit die and the second integrated circuit die.
13. The method of claim 6, comprising electrically coupling a processor to the second integrated circuit die, the processor configured to manage a data processing request for the first integrated circuit die and the second integrated circuit die.
14. The method of claim 6, wherein the second integrated circuit die comprises sector-aligned memory corresponding to one or more fabric sectors of the field programmable gate array fabric, wherein the sector-aligned memory is configured to store configuration data for programming the one or more fabric sectors.
15. A method of manufacturing a field programmable gate array (FPGA) comprising:
- obtaining a first integrated circuit die comprising field programmable gate array fabric;
- obtaining a second integrated circuit die comprising fabric support circuitry configured to operate the field programmable gate array fabric of the first integrated circuit die; and
- electrically coupling the first integrated circuit die and the second integrated circuit die such that the fabric support circuitry is communicatively coupled to the field programmable gate array fabric.
16. The method of claim 15, comprising aligning one or more sectors of the field programmable gate array fabric with one or more corresponding sectors of the fabric support circuitry.
17. The method of claim 16, wherein the fabric support circuitry is configured to:
- receive configuration data indicative of programming for the field programmable gate array fabric;
- store the configuration data in the one or more corresponding sectors of the fabric support circuitry; and
- program the one or more sectors of the field programmable gate array fabric according to the programming.
18. The method of claim 17, comprising physically coupling the first integrated circuit die and the second integrated circuit die within a single package.
19. The method of claim 18, wherein physically coupling the first integrated circuit die and the second integrated circuit die within the single package comprises physically coupling the first integrated circuit die and the second integrated circuit die to a microchannel heat spreader within the single package.
20. The method of claim 18, wherein electrically coupling the first integrated circuit die and the second integrated circuit die comprises communicatively coupling the fabric support circuitry and the field programmable gate array fabric via one or more silicon bridges, one or more interposers, through-silicon via (TSVs) or any combination thereof.
Type: Application
Filed: Apr 13, 2023
Publication Date: Aug 10, 2023
Inventors: Ravi Prakash Gutala (San Jose, CA), Aravind Raghavendra Dasu (Milpitas, CA), Sean R. Atsatt (Santa Cruz, CA), Scott J. Weber (Piedmont, CA)
Application Number: 18/300,330