Patents by Inventor Tim Tuan
Tim Tuan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12656970Abstract: A device includes a data processing engine (DPE) array having a plurality of data processing engines (DPEs) and a subsystem coupled to the DPE array. Each DPE of the plurality of DPEs is configurable to share data with one or more other DPEs of the plurality of DPEs using one or more of a plurality of data sharing techniques. The data sharing techniques include a core of a selected DPE accessing a memory module of an adjacent DPE via a memory interface of the selected DPE connected to a memory module of the adjacent DPE and the selected DPE accessing the memory module of a non-adjacent DPE using a DMA circuit and a stream switch of the selected DPE. The subsystem may be in a different die than the DPE array.Type: GrantFiled: April 15, 2024Date of Patent: June 16, 2026Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Baris Ozgul, Jan Langer, Tim Tuan, Ralph D. Wittig, David Clarke, Goran H. K. Bilski, Kornelis A. Vissers, Richard L. Walke, Christopher H. Dick, Zachary Dickman, Philip B. James-Roxby, Peter McColgan
-
Patent number: 12554310Abstract: Embodiments herein describe a hardware accelerator that includes multiple power or clock domains. For example, the hardware accelerator can include an array of data processing engines (DPEs) where different subsets of the DPEs (e.g., different columns, rows, or blocks) are disposed in different power or clock domains within the hardware accelerator. When one or more subsets of the DPEs are idle (e.g., the hardware accelerator has not assigned any tasks to those DPEs), the accelerator can deactivate the corresponding power or clock domain (or domains), which deactivates the DPEs in those domains while the DPEs in the other power or clock domains remain operational. As such, idle DPEs can be deactivated to conserve energy while DPEs with work can remain operational.Type: GrantFiled: December 22, 2023Date of Patent: February 17, 2026Assignees: XILINX, INC., Advanced Micro Devices, Inc.Inventors: Juan J. Noguera Serra, Akila Subramaniam, David Kramer, Madhusudan Chilakam, Tim Tuan
-
Publication number: 20250370949Abstract: Embodiments herein describe a hardware accelerator that includes multiple clock domains. For example, the hardware accelerator can include data processing engines (DPEs) which include circuitry for performing acceleration tasks (e.g., artificial intelligence (AI) tasks, data encryption tasks, data compression tasks, and the like). The DPEs are interconnected to permit them to share data when performing the acceleration tasks. In addition to the DPEs, the hardware accelerator can include interface circuitry such as an interconnect, a controller, address translation circuitry, etc. The DPEs may be in a first clock domain while the other circuitry is in a second clock domain. The two clock domains can use different frequency clock circuits, for example, to generate more bandwidth for moving data into and out of the hardware accelerator while reducing power consumption.Type: ApplicationFiled: May 30, 2024Publication date: December 4, 2025Inventors: Juan J. NOGUERA SERRA, Sneha Bhalchandra DATE, Tim TUAN
-
Publication number: 20250343549Abstract: An apparatus includes a data processing array having a plurality of array tiles. Each array tile can include a random-access memory (RAM) having a local memory interface accessible by circuitry within the array tile and an adjacent memory interface accessible by circuitry disposed within an adjacent array tile. Each adjacent memory interface of each array tile can include isolation logic that is programmable to allow the circuitry disposed within the adjacent array tile to access the RAM or prevent the circuitry disposed within the adjacent array tile from accessing the RAM. The data processing array can be subdivided into a plurality of partitions wherein the isolation logic of the adjacent memory interfaces is programmed to prevent array tiles from accessing RAMs across a boundary between the plurality of partitions.Type: ApplicationFiled: July 18, 2025Publication date: November 6, 2025Applicant: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Javier Cabezas Rodriguez, David Clarke, Peter McColgan, Zachary Blaise Dickman, Saurabh Mathur, Amarnath Kasibhatla, Francisco Barat Quesada
-
Patent number: 12401364Abstract: An apparatus includes a data processing array having a plurality of array tiles. The plurality of array tiles include a plurality of compute tiles. The compute tiles include a core coupled to a random-access memory (RAM) in a same compute tile and to a RAM of at least one other compute tile. The data processing array is subdivided into a plurality of partitions. Each partition includes a plurality of array tiles including at least one of the plurality of compute tiles. The apparatus includes a plurality of clock gate circuits being programmable to selectively gate a clock signal provided to a respective one of the plurality of partitions.Type: GrantFiled: November 14, 2023Date of Patent: August 26, 2025Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Javier Cabezas Rodriguez, David Clarke, Peter McColgan, Zachary Blaise Dickman, Saurabh Mathur, Amarnath Kasibhatla, Francisco Barat Quesada
-
Publication number: 20250208682Abstract: Embodiments herein describe a hardware accelerator that includes multiple power or clock domains. For example, the hardware accelerator can include data processing engines (DPEs) which include circuitry for performing acceleration tasks (e.g., artificial intelligence (AI) tasks, data encryption tasks, data compression tasks, and the like). The DPEs are interconnected to permit them to share data when performing the acceleration tasks. In addition to the DPEs, the hardware accelerator can include other circuitry such as an interconnect, a controller, address translation circuitry, etc. The DPEs may be in a first power or clock domain while the other circuitry is in a second power or clock domain. That way, when the DPEs are idle (e.g., the hardware accelerator currently has no tasks assigned to it), the first power or clock domain can be powered down while the second power or clock domain can remain powered.Type: ApplicationFiled: December 22, 2023Publication date: June 26, 2025Inventors: Juan J. NOGUERA SERRA, Akila SUBRAMANIAM, David KRAMER, Madhusudan CHILAKAM, Tim TUAN
-
Publication number: 20250208907Abstract: Embodiments herein describe integrating an accelerator into a same SoC (or same chip or IC) as a CPU. The SoC also includes a controller (e.g., a microcontroller) that orchestrates data processing engines (DPEs) in the accelerator. The controller (or orchestrator) receives a task from the CPU and then configures the DPEs to perform the task. For example, the controller may divide the task into a sequence of operations that are performed by one or more of the DPEs. The controller can then report back to the CPU when the task is complete.Type: ApplicationFiled: December 22, 2023Publication date: June 26, 2025Inventors: Juan J. NOGUERA SERRA, Akila SUBRAMANIAM, David KRAMER, Madhusudan CHILAKAM, Patrick KORAN, Tim TUAN
-
Publication number: 20250208687Abstract: Embodiments herein describe a hardware accelerator that includes multiple power or clock domains. For example, the hardware accelerator can include an array of data processing engines (DPEs) where different subsets of the DPEs (e.g., different columns, rows, or blocks) are disposed in different power or clock domains within the hardware accelerator. When one or more subsets of the DPEs are idle (e.g., the hardware accelerator has not assigned any tasks to those DPEs), the accelerator can deactivate the corresponding power or clock domain (or domains), which deactivates the DPEs in those domains while the DPEs in the other power or clock domains remain operational. As such, idle DPEs can be deactivated to conserve energy while DPEs with work can remain operational.Type: ApplicationFiled: December 22, 2023Publication date: June 26, 2025Inventors: Juan J. NOGUERA SERRA, Akila SUBRAMANIAM, David KRAMER, Madhusudan CHILAKAM, Tim TUAN
-
Publication number: 20250209036Abstract: Embodiments herein describe integrating an AI accelerator into a same SoC (or same chip or IC) as a CPU. Thus, instead of relying on off-chip communication techniques, on-chip communication techniques such as an interconnect (e.g., a NoC) can be used to facilitate communication. This can result in faster communication between the AI accelerator and the CPU. Moreover, a tighter integration between the CPU and AI accelerator can make it easier for the CPU to offload AI tasks to the Al accelerator. In one embodiment, the AI accelerator includes address translation circuitry for translating virtual addresses used in the AI accelerator to physical addresses used to store the data.Type: ApplicationFiled: December 22, 2023Publication date: June 26, 2025Inventors: Juan J. NOGUERA SERRA, Akila SUBRAMANIAM, David KRAMER, Madhusudan CHILAKAM, Patrick KORAN, Tim TUAN
-
Patent number: 12164451Abstract: An integrated circuit (IC) can include a data processing array including a plurality of compute tiles arranged in a grid. The IC can include an array interface coupled to the data processing array. The array interface includes a plurality of interface tiles. Each interface tile includes a plurality of direct memory access circuits. The IC can include a network-on-chip (NoC) coupled to the array interface. Each direct memory access circuit is communicatively linked to the NoC via an independent communication channel.Type: GrantFiled: May 17, 2022Date of Patent: December 10, 2024Assignee: Xilinx, Inc.Inventors: David Patrick Clarke, Peter McColgan, Juan J. Noguera Serra, Tim Tuan, Saurabh Mathur, Amarnath Kasibhatla, Javier Cabezas Rodriguez, Pedro Miguel Parola Duarte, Zachary Blaise Dickman
-
Patent number: 12001367Abstract: An integrated circuit includes an interposer and a die coupled to the interposer. The die includes a first data processing engine (DPE) array and a second DPE array. The first DPE array includes a first plurality of DPEs and a first DPE interface coupled to the first plurality of DPEs. The second DPE array includes a second plurality of DPEs and a second DPE interface coupled to the second plurality of DPEs. The integrated circuit includes one or more other dies having a first die interface coupled to, and configured to communicate with, the first DPE interface via the interposer and a second die interface coupled to, and configured to communicate with, the second DPE interface via the interposer.Type: GrantFiled: May 18, 2023Date of Patent: June 4, 2024Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Sridhar Subramanian
-
Patent number: 11972132Abstract: A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.Type: GrantFiled: December 22, 2022Date of Patent: April 30, 2024Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Goran H K Bilski, Jan Langer, Baris Ozgul, Richard L. Walke, Ralph D. Wittig, Kornelis A. Vissers, Tim Tuan, David Clarke
-
Publication number: 20240088900Abstract: An apparatus includes a data processing array having a plurality of array tiles. The plurality of array tiles include a plurality of compute tiles. The compute tiles include a core coupled to a random-access memory (RAM) in a same compute tile and to a RAM of at least one other compute tile. The data processing array is subdivided into a plurality of partitions. Each partition includes a plurality of array tiles including at least one of the plurality of compute tiles. The apparatus includes a plurality of clock gate circuits being programmable to selectively gate a clock signal provided to a respective one of the plurality of partitions.Type: ApplicationFiled: November 14, 2023Publication date: March 14, 2024Applicant: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Javier Cabezas Rodriguez, David Clarke, Peter McColgan, Zachary Blaise Dickman, Saurabh Mathur, Amarnath Kasibhatla, Francisco Barat Quesada
-
Patent number: 11848670Abstract: An apparatus includes a data processing array having a plurality of array tiles. Each array tile can include a random-access memory (RAM) having a local memory interface accessible by circuitry within the array tile and an adjacent memory interface accessible by circuitry disposed within an adjacent array tile. Each adjacent memory interface of each array tile can include isolation logic that is programmable to allow the circuitry disposed within the adjacent array tile to access the RAM or prevent the circuitry disposed within the adjacent array tile from accessing the RAM. The data processing array can be subdivided into a plurality of partitions wherein the isolation logic of the adjacent memory interfaces is programmed to prevent array tiles from accessing RAMs across a boundary between the plurality of partitions.Type: GrantFiled: April 15, 2022Date of Patent: December 19, 2023Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Javier Cabezas Rodriguez, David Clarke, Peter McColgan, Zachary Blaise Dickman, Saurabh Mathur, Amarnath Kasibhatla, Francisco Barat Quesada
-
Publication number: 20230376437Abstract: An integrated circuit (IC) can include a data processing array including a plurality of compute tiles arranged in a grid. The IC can include an array interface coupled to the data processing array. The array interface includes a plurality of interface tiles. Each interface tile includes a plurality of direct memory access circuits. The IC can include a network-on-chip (NoC) coupled to the array interface. Each direct memory access circuit is communicatively linked to the NoC via an independent communication channel.Type: ApplicationFiled: May 17, 2022Publication date: November 23, 2023Applicant: Xilinx, Inc.Inventors: David Patrick Clarke, Peter McColgan, Juan J. Noguera Serra, Tim Tuan, Saurabh Mathur, Amarnath Kasibhatla, Javier Cabezas Rodriguez, Pedro Miguel Parola Duarte, Zachary Blaise Dickman
-
Publication number: 20230336179Abstract: An apparatus includes a data processing array having a plurality of array tiles. Each array tile can include a random-access memory (RAM) having a local memory interface accessible by circuitry within the array tile and an adjacent memory interface accessible by circuitry disposed within an adjacent array tile. Each adjacent memory interface of each array tile can include isolation logic that is programmable to allow the circuitry disposed within the adjacent array tile to access the RAM or prevent the circuitry disposed within the adjacent array tile from accessing the RAM. The data processing array can be subdivided into a plurality of partitions wherein the isolation logic of the adjacent memory interfaces is programmed to prevent array tiles from accessing RAMs across a boundary between the plurality of partitions.Type: ApplicationFiled: April 15, 2022Publication date: October 19, 2023Applicant: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Javier Cabezas Rodriguez, David Clarke, Peter McColgan, Zachary Blaise Dickman, Saurabh Mathur, Amarnath Kasibhatla, Francisco Barat Quesada
-
Publication number: 20230289311Abstract: An integrated circuit includes an interposer and a die coupled to the interposer. The die includes a first data processing engine (DPE) array and a second DPE array. The first DPE array includes a first plurality of DPEs and a first DPE interface coupled to the first plurality of DPEs. The second DPE array includes a second plurality of DPEs and a second DPE interface coupled to the second plurality of DPEs. The integrated circuit includes one or more other dies having a first die interface coupled to, and configured to communicate with, the first DPE interface via the interposer and a second die interface coupled to, and configured to communicate with, the second DPE interface via the interposer.Type: ApplicationFiled: May 18, 2023Publication date: September 14, 2023Applicant: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Sridhar Subramanian
-
Patent number: 11693808Abstract: An integrated circuit includes an interposer, a first die coupled to the interposer, a second die coupled to the interposer, and a third die coupled to the interposer and having a plurality of die interfaces. The first die includes a first data processing engine (DPE) array having a first plurality of DPEs and a first DPE interface coupled to the first plurality of DPEs therein. The second die includes a second DPE array having a second plurality of DPEs and a second DPE interface coupled to the second plurality of DPEs therein. The first DPE interface of the first die is configured to communicate with a first die interface of the plurality of die interfaces via the interposer. The second DPE interface of the second die is configured to communicate with a second die interface of the plurality of die interfaces via the interposer.Type: GrantFiled: March 11, 2022Date of Patent: July 4, 2023Assignee: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Tim Tuan, Sridhar Subramanian
-
Patent number: 11669464Abstract: Examples herein describe performing non-sequential DMA read and writes. Rather than storing data sequentially, a DMA engine can write data into memory using non-sequential memory addresses. A data processing engine (DPE) controller can submit a first job using first parameters that instruct the DMA engine to store data using a first non-sequential write pattern. The DPE controller can also submit a second job using second parameters that instruct the DMA engine to store data using a second, different non-sequential write pattern. In this manner, the DMA engine can switch to performing DMA writes using different non-sequential patterns. Similarly, the DMA engine can use non-sequential reads to retrieve data from memory. When performing a first DMA read, the DMA engine can retrieve data from memory using a first sequential pattern and then perform a second DMA read where data is retrieved from memory using a second non-sequential read pattern.Type: GrantFiled: April 24, 2020Date of Patent: June 6, 2023Assignee: XILINX, INC.Inventors: Goran Hk Bilski, Baris Ozgul, David Clarke, Juan J. Noguera Serra, Jan Langer, Zachary Dickman, Sneha Bhalchandra Date, Tim Tuan
-
Publication number: 20230131698Abstract: A device includes a data processing engine array having a plurality of data processing engines organized in a grid having a plurality of rows and a plurality of columns. Each data processing engine includes a core, a memory module including a memory and a direct memory access engine. Each data processing engine includes a stream switch connected to the core, the direct memory access engine, and the stream switch of one or more adjacent data processing engines. Each memory module includes a first memory interface directly coupled to the core in the same data processing engine and one or more second memory interfaces directly coupled to the core of each of the one or more adjacent data processing engines.Type: ApplicationFiled: December 22, 2022Publication date: April 27, 2023Applicant: Xilinx, Inc.Inventors: Juan J. Noguera Serra, Goran HK Bilski, Jan Langer, Baris Ozgul, Richard L. Walke, Ralph D. Wittig, Kornelis A. Vissers, Tim Tuan, David Clarke