Patents by Inventor Ehsan Ghasemi

Ehsan Ghasemi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10984500
    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; bank address and control circuitry coupled to control inputs of the plurality of memory banks, the multiplexer circuitry, and the first plurality of registers; output control circuitry coupled to control inputs of the second plurality of registers; and a control state machine coupled to the bank address and control circuitry and the output control circuitry.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: April 20, 2021
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
  • Patent number: 10943039
    Abstract: An example multiply accumulate (MACC) circuit includes: a multiply-accumulator having an accumulator output register; a quantizer, coupled to the multiply accumulator; and a control circuit coupled to the multiply-accumulator and the quantizer, the control circuit configured to provide control data to the quantizer, the control data indicative of a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: March 9, 2021
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Sean Settle, Zhao Ma, Ehsan Ghasemi, Xiao Teng, Aaron Ng, Jindrich Zejda
  • Patent number: 10824434
    Abstract: Examples described herein relate to dynamically structured single instruction, multiple data (SIMD) instructions, and systems and circuits implementing such dynamically structured SIMD instructions. An example is a method for processing data. A first SIMD structure is determined by a processor. A characteristic of the first SIMD structure is altered by the processor to obtain a second SIMD structure. An indication of the second SIMD structure is communicated from the processor to a numerical engine. Data is packed by the numerical engine into an SIMD instruction according to the second SIMD structure. The SIMD instruction is transmitted from the numerical engine.
    Type: Grant
    Filed: November 29, 2018
    Date of Patent: November 3, 2020
    Assignee: XILINX, INC.
    Inventors: Sean Settle, Ehsan Ghasemi, Ashish Sirasao, Ralph D. Wittig
  • Patent number: 10678509
    Abstract: An example multiply accumulate (MACC) circuit includes a multiply-accumulator having an accumulator output register, a scaler, coupled to the multiply accumulator, and a control circuit coupled to the multiply-accumulator and the scaler. The control circuit is configured to provide control data to the scaler, the control data indicative of: a most-significant bit (MSB) to least significant bit (LSB) range for selecting bit indices from the accumulator output register for implementing a first right shift; a multiplier; and a second right shift.
    Type: Grant
    Filed: August 21, 2018
    Date of Patent: June 9, 2020
    Assignee: XILINX, INC.
    Inventors: Sean Settle, Elliott Delaye, Aaron Ng, Ehsan Ghasemi, Ashish Sirasao, Xiao Teng, Jindrich Zejda
  • Patent number: 10572225
    Abstract: A and a request generator circuit is configured to read data elements of a three-dimensional (3-D) input feature map (IFM) from a memory and store a subset of the data elements in one of a plurality of N line buffers. Each line buffer is configured for storage of M data elements. A pixel iterator circuit is coupled to the line buffers and is configured to generate a sequence of addresses for reading the stored data elements from the line buffers based on a sequence of IFM height values and a sequence of IFM width values.
    Type: Grant
    Filed: September 26, 2018
    Date of Patent: February 25, 2020
    Assignee: XILINX, INC.
    Inventors: Ehsan Ghasemi, Elliott Delaye, Ashish Sirasao, Sean Settle
  • Patent number: 10460416
    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a plurality of memory banks configured to store the image data; multiplexer circuitry coupled to the memory banks; a first plurality of registers coupled to the multiplexer circuitry; a second plurality of registers coupled to the first plurality of registers, outputs of the second plurality of registers configured to provide the plurality of streams of image samples; and control circuitry configured to generate addresses for the plurality of memory banks, control the multiplexer circuitry to select among outputs of the plurality of memory banks, control the first plurality of registers to store outputs of the second plurality of multiplexers, and control the second plurality of registers to store outputs of the first plurality of registers.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: October 29, 2019
    Assignee: XILINX, INC.
    Inventors: Ashish Sirasao, Elliott Delaye, Aaron Ng, Ehsan Ghasemi
  • Patent number: 10411709
    Abstract: Disclosed circuits and methods include N line buffers. Each line buffer is configured for storage of M data elements of a three-dimensional (3-D) input feature map (IFM). A request generator circuit is coupled to the N line buffers and to a memory configured for storage of the 3-D IFM. The request generator circuit is divides the 3-D IFM into a plurality of IFM sub-volumes based on values of N, M, and dimensions of the 3-D IFM. The request generator circuit reads from the memory, data elements at addresses of an unprocessed one of the IFM sub-volumes and stores the data elements of the unprocessed one of the IFM sub-volumes in the N line buffers. In response to a completion signal, the request generator circuit repeats the reading of an unprocessed one of the IFM sub-volumes and storing the data elements in the N line buffers.
    Type: Grant
    Filed: July 25, 2018
    Date of Patent: September 10, 2019
    Assignee: XILINX, INC.
    Inventors: Ehsan Ghasemi, Elliott Delaye, Ashish Sirasao
  • Publication number: 20190114533
    Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
  • Publication number: 20190114529
    Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Elliott Delaye, Ehsan Ghasemi, Xiao Teng, Jindrich Zejda, Yongjun Wu, Sean Settle, Ashish Sirasao