Patents by Inventor Ehsan Ghasemi

Ehsan Ghasemi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Instruction set architecture for data processing array control

Patent number: 12248786

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

Type: Grant

Filed: August 8, 2022

Date of Patent: March 11, 2025

Assignee: Xilinx, Inc.

Inventors: Xiao Teng, Tejus Siddagangaiah, Bryan Lozano, Ehsan Ghasemi, Rajeev Patwari, Elliott Delaye, Jorn Tuyls, Aaron Ng, Sanket Pandit, Pramod Peethambaran, Satyaprakash Pareek
SYSTEMS, METHODS, AND COMPUTER-READABLE MEDIA FOR STRUCTURED BLOCKCHAIN STREAMS

Publication number: 20250037114

Abstract: Provided are systems, methods, and computer-readable media for structured blockchain streams. The method includes receiving, at a server from a first user device corresponding to a first user, a listing request for an item on the server, the listing request comprising: an item media identifier, a stream structure option, the stream structure option comprising a plurality of stream slots; a first wallet address identifier of the first user; and a smart contract identifier; generating, at the server, a voucher comprising the item attributes, the first wallet address identifier of the first user, account information of the first user, the stream structure option, item history and ownership history; storing, at the server, the generated voucher; and outputting, at the server, an item listing corresponding to the item voucher.

Type: Application

Filed: July 24, 2023

Publication date: January 30, 2025

Inventors: Peter William Humphries, Christian William Humphries, Mackenzie Parnham, Joel Young, Ehsan Ghasemi, Catalina Orrego, Vinay Malhotra, Nancy Pavici
Reconfigurable neural engine with extensible instruction set architecture

Patent number: 12079158

Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.

Type: Grant

Filed: July 25, 2022

Date of Patent: September 3, 2024

Assignee: Xilinx, Inc.

Inventors: Sanket Pandit, Jorn Tuyls, Xiao Teng, Rajeev Patwari, Ehsan Ghasemi, Elliott Delaye, Aaron Ng
INSTRUCTION GENERATION AND PROGRAMMING MODEL FOR A DATA PROCESSING ARRAY AND MICROCONTROLLER

Publication number: 20240069511

Abstract: Instruction generation for a data processing array and microcontroller includes generating a tensor-level intermediate representation from a machine learning model using kernel expressions. Statements of the tensor-level intermediate representation are partitioned into a first set of statements and a second set of statements. From the first set of statements, kernel instructions are generated based on a reconfigurable neural engine model. The kernel instructions are executable by a compute tile of a data processing array to implement compute functions of the machine learning model. From the set of second statements, microcontroller instructions are generated based on a super-graph model. The microcontroller instructions are executable by a microcontroller of the data processing array to move data into and out from the data processing array.

Type: Application

Filed: August 31, 2022

Publication date: February 29, 2024

Applicant: Xilinx, Inc.

Inventors: Jorn Tuyls, Xiao Teng, Sanket Pandit, Rajeev Patwari, Qian Zhou, Ehsan Ghasemi, Ephrem C. Wu, Elliott Delaye, Aaron Ng
INSTRUCTION SET ARCHITECTURE FOR DATA PROCESSING ARRAY CONTROL

Publication number: 20240045692

Abstract: Controlling a data processing (DP) array includes creating a replica of a register address space of the DP array based on the design and the DP array. A sequence of instructions, including write instructions and read instructions, is received. The write instructions correspond to buffer descriptors specifying runtime data movements for a design for a DP array. The write instructions are converted into transaction instructions and the read instructions are converted into wait instructions based on the replica of the register address space. The transaction instructions and the wait instructions are included in an instruction buffer. The instruction buffer is provided to a microcontroller configured to execute the transaction instructions and the wait instructions to implement the runtime data movements for the design as implemented in the DP array. In another aspect, the instruction buffer is stored in a file for subsequent execution by the microcontroller.

Type: Application

Filed: August 8, 2022

Publication date: February 8, 2024

Applicant: Xilinx, Inc.

Inventors: Xiao Teng, Tejus Siddagangaiah, Bryan Lozano, Ehsan Ghasemi, Rajeev Patwari, Elliott Delaye, Jorn Tuyls, Aaron Ng, Sanket Pandit, Pramod Peethambaran, Satyaprakash Pareek
RECONFIGURABLE NEURAL ENGINE WITH EXTENSIBLE INSTRUCTION SET ARCHITECTURE

Publication number: 20240028556

Abstract: An integrated circuit includes a plurality of kernels and a virtual machine coupled to the plurality of kernels. The virtual machine is configured to interpret instructions directed to different ones of the plurality of kernels. The virtual machine is configured to control operation of the different ones of the plurality of kernels responsive to the instructions.

Type: Application

Filed: July 25, 2022

Publication date: January 25, 2024

Applicant: Xilinx, Inc.

Inventors: Sanket Pandit, Jorn Tuyls, Xiao Teng, Rajeev Patwari, Ehsan Ghasemi, Elliott Delaye, Aaron Ng
HARDWARE ACCELERATION OF MACHINE LEARNING DESIGNS

Publication number: 20230401480

Abstract: Hardware acceleration of machine learning (ML) designs includes translating an ML primitive into an intermediate representation. The intermediate representation is subdivided to specify a functional compute block. The functional compute block is sized according to a compute node primitive adapted for implementing the ML primitive on target hardware. An overlay is generated for the ML primitive, at least in part, by mapping the functional compute block to the compute node primitive. The overlay is synthesizable to implement the ML primitive on the target hardware. The overlay can be scheduled for operation within the target hardware as part of an ML design including the ML primitive.

Type: Application

Filed: June 14, 2022

Publication date: December 14, 2023

Applicant: Xilinx, Inc.

Inventors: Ehsan Ghasemi, Rajeev Patwari, Elliott Delaye, Jorn Tuyls, Ephrem C. Wu, Xiao Teng, Sanket Pandit
Machine learning runtime library for neural network acceleration

Patent number: 11694066

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

Type: Grant

Filed: October 17, 2017

Date of Patent: July 4, 2023

Assignee: XILINX, INC.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
Multi-layer neural network processing by a neural network accelerator using host communicated merged weights and a package of per-layer instructions

Patent number: 11620490

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

Type: Grant

Filed: October 17, 2017

Date of Patent: April 4, 2023

Assignee: XILINX, INC.

Inventors: Aaron Ng, Elliott Delaye, Ehsan Ghasemi, Xiao Teng, Jindrich Zejda, Yongjun Wu, Sean Settle, Ashish Sirasao
Circuit arrangements and methods for traversing input feature maps

Patent number: 11106968

Abstract: A circuit arrangement includes a buffer, a height traversal circuit configured to generate a sequence of IFM height values in response to first control signals, a width traversal circuit configured to generate a sequence of IFM width values in response to second control signals, a control circuit, and an address generation circuit. The control circuit is configured to input an OFM height, an OFM width, a kernel height, and a kernel width; generate the first control signals at times based on the OFM height and the kernel height; and generate the second control signals at times based on the OFM width and the kernel width. The address generation circuit is configured to generate a sequence of addresses based on the sequences of IFM height values and IFM width values, provide the sequence of addresses to the buffer, and enable reading from the buffer.

Type: Grant

Filed: May 24, 2018

Date of Patent: August 31, 2021

Assignee: XILINX, INC.

Inventors: Ehsan Ghasemi, Elliott Delaye, Ashish Sirasao
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit

Patent number: 10984500

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.

Type: Grant

Filed: September 19, 2019

Date of Patent: April 20, 2021

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
Software-driven design optimization for fixed-point multiply-accumulate circuitry

Patent number: 10943039

Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.

Type: Grant

Filed: October 17, 2017

Date of Patent: March 9, 2021

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Sean Settle, Zhao Ma, Ehsan Ghasemi, Xiao Teng, Aaron Ng, Jindrich Zejda
Dynamically structured single instruction, multiple data (SIMD) instructions

Patent number: 10824434

Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.

Type: Grant

Filed: November 29, 2018

Date of Patent: November 3, 2020

Assignee: XILINX, INC.

Inventors: Sean Settle, Ehsan Ghasemi, Ashish Sirasao, Ralph D. Wittig
Software-driven design optimization for mapping between floating-point and fixed-point multiply accumulators

Patent number: 10678509

Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.

Type: Grant

Filed: August 21, 2018

Date of Patent: June 9, 2020

Assignee: XILINX, INC.

Inventors: Sean Settle, Elliott Delaye, Aaron Ng, Ehsan Ghasemi, Ashish Sirasao, Xiao Teng, Jindrich Zejda
Circuit arrangements and methods for performing multiply-and-accumulate operations

Patent number: 10572225

Abstract: A and a request generator circuit is configured to read data elements of a three-dimensional (3-D) input feature map (IFM) from a memory and store a subset of the data elements in one of a plurality of N line buffers. Each line buffer is configured for storage of M data elements. A pixel iterator circuit is coupled to the line buffers and is configured to generate a sequence of addresses for reading the stored data elements from the line buffers based on a sequence of IFM height values and a sequence of IFM width values.

Type: Grant

Filed: September 26, 2018

Date of Patent: February 25, 2020

Assignee: XILINX, INC.

Inventors: Ehsan Ghasemi, Elliott Delaye, Ashish Sirasao, Sean Settle
Inline image preprocessing for convolution operations using a matrix multiplier on an integrated circuit

Patent number: 10460416

Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.

Type: Grant

Filed: October 17, 2017

Date of Patent: October 29, 2019

Assignee: XILINX, INC.

Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
Circuit arrangements and methods for dividing a three-dimensional input feature map

Patent number: 10411709

Abstract: Disclosed circuits and methods include N line buffers. Each line buffer is configured for storage of M data elements of a three-dimensional (3-D) input feature map (IFM). A request generator circuit is coupled to the N line buffers and to a memory configured for storage of the 3-D IFM. The request generator circuit is divides the 3-D IFM into a plurality of IFM sub-volumes based on values of N, M, and dimensions of the 3-D IFM. The request generator circuit reads from the memory, data elements at addresses of an unprocessed one of the IFM sub-volumes and stores the data elements of the unprocessed one of the IFM sub-volumes in the N line buffers. In response to a completion signal, the request generator circuit repeats the reading of an unprocessed one of the IFM sub-volumes and storing the data elements in the N line buffers.

Type: Grant

Filed: July 25, 2018

Date of Patent: September 10, 2019

Assignee: XILINX, INC.

Inventors: Ehsan Ghasemi, Elliott Delaye, Ashish Sirasao
MACHINE LEARNING RUNTIME LIBRARY FOR NEURAL NETWORK ACCELERATION

Publication number: 20190114533

Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
MULTI-LAYER NEURAL NETWORK PROCESSING BY A NEURAL NETWORK ACCELERATOR USING HOST COMMUNICATED MERGED WEIGHTS AND A PACKAGE OF PER-LAYER INSTRUCTIONS

Publication number: 20190114529

Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.

Type: Application

Filed: October 17, 2017

Publication date: April 18, 2019

Applicant: Xilinx, Inc.

Inventors: Aaron Ng, Elliott Delaye, Ehsan Ghasemi, Xiao Teng, Jindrich Zejda, Yongjun Wu, Sean Settle, Ashish Sirasao