Patents by Inventor Elliott Delaye

Elliott Delaye has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10354733
    Abstract: Methods and apparatus are described for partitioning and reordering block-based matrix multiplications for high-speed data streaming in general matrix multiplication (GEMM), which may be implemented by a programmable integrated circuit (IC). By preloading and hierarchically caching the blocks, examples of the present disclosure reduce the double data rate (DDR) memory intake bandwidth for software-defined GEMM accelerators.
    Type: Grant
    Filed: October 17, 2017
    Date of Patent: July 16, 2019
    Assignee: XILINX, INC.
    Inventors: Jindrich Zejda, Elliott Delaye, Ashish Sirasao, Yongjun Wu, Aaron Ng
  • Patent number: 10303833
    Abstract: Parallelizing operations for implementing a circuit design can include dividing, using a processor, the circuit design into a plurality of partitions, wherein each partition is stored as a separate file, for each partition, generating, using the processor, a timing arc file specifying boundary delays for the partition, and generating, using the processor, a partition design file specifying interfaces of the partitions. Using the processor, a plurality of processes executing in parallel can be initiated. Each process is adapted to operate on a selected partition using the partition design file and the timing arc files for the other partitions to generate an updated file for the selected partition.
    Type: Grant
    Filed: February 9, 2017
    Date of Patent: May 28, 2019
    Assignee: XILINX, INC.
    Inventors: Aman Gayasen, Surya Pratik Saha, Elliott Delaye, Shangzhi Sun, Ashish Sirasao
  • Publication number: 20190114538
    Abstract: In disclosed approaches of neural network processing, a host computer system copies an input data matrix from host memory to a shared memory for performing neural network operations of a first layer of a neural network by a neural network accelerator. The host instructs the neural network accelerator to perform neural network operations of each layer of the neural network beginning with the input data matrix. The neural network accelerator performs neural network operations of each layer in response to the instruction from the host. The host waits until the neural network accelerator signals completion of performing neural network operations of layer i before instructing the neural network accelerator to commence performing neural network operations of layer i+1, for i?1. The host instructs the neural network accelerator to use a results data matrix in the shared memory from layer i as an input data matrix for layer i+1 for i?1.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Elliott Delaye, Jindrich Zejda, Ashish Sirasao
  • Publication number: 20190114548
    Abstract: Embodiments herein describe techniques for static scheduling a neural network implemented in a massively parallel hardware system. The neural network may be scheduled using three different scheduling levels referred to herein as an upper level, an intermediate level, and a lower level. In one embodiment, the upper level includes a hardware or software model of the layers in the neural network that establishes a sequential order of functions that operate concurrently in the hardware system. In the intermediate level, identical processes in the functions defined in the upper level are connected to form a systolic array or mesh and balanced data flow channels are used to minimize latency. In the lower level, a compiler can assign the operations performed by the processing elements in the systolic array to different portions of the hardware system to provide a static schedule for the neural network.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Yongjun Wu, Jindrich Zejda, Elliott Delaye, Ashish Sirasao
  • Publication number: 20190114535
    Abstract: A disclosed neural network processing system includes a host computer system, a RAMs coupled to the host computer system, and neural network accelerators coupled to the RAMs, respectively. The host computer system is configured with software that when executed causes the host computer system to write input data and work requests to the RAMS. Each work request specifies a subset of neural network operations to perform and memory locations in a RAM of the input data and parameters. A graph of dependencies among neural network operations is built and additional dependencies added. The operations are partitioned into coarse grain tasks and fine grain subtasks for optimal scheduling for parallel execution. The subtasks are scheduled to accelerator kernels of matching capabilities. Each neural network accelerator is configured to read a work request from the respective RAM and perform the subset of neural network operations on the input data using the parameters.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Ashish Sirasao
  • Publication number: 20190114529
    Abstract: In the disclosed methods and systems for processing in a neural network system, a host computer system writes a plurality of weight matrices associated with a plurality of layers of a neural network to a memory shared with a neural network accelerator. The host computer system further assembles a plurality of per-layer instructions into an instruction package. Each per-layer instruction specifies processing of a respective layer of the plurality of layers of the neural network, and respective offsets of weight matrices in a shared memory. The host computer system writes input data and the instruction package to the shared memory. The neural network accelerator reads the instruction package from the shared memory and processes the plurality of per-layer instructions of the instruction package.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Elliott Delaye, Ehsan Ghasemi, Xiao Teng, Jindrich Zejda, Yongjun Wu, Sean Settle, Ashish Sirasao
  • Publication number: 20190114533
    Abstract: Embodiments herein describe techniques for interfacing a neural network application with a neural network accelerator using a library. The neural network application may execute on a host computing system while the neural network accelerator executes on a massively parallel hardware system, e.g., a FPGA. The library operates a pipeline for submitting the tasks received from the neural network application to the neural network accelerator. In one embodiment, the pipeline includes a pre-processing stage, an FPGA execution stage, and a post-processing stage which each correspond to different threads. When receiving a task from the neural network application, the library generates a packet that includes the information required for the different stages in the pipeline to perform the tasks. Because the stages correspond to different threads, the library can process multiple packets in parallel which can increase the utilization of the neural network accelerator on the hardware system.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Aaron Ng, Jindrich Zejda, Elliott Delaye, Xiao Teng, Sonal Santan, Soren T. Soe, Ashish Sirasao, Ehsan Ghasemi, Sean Settle
  • Publication number: 20190114499
    Abstract: An example preprocessor circuit for formatting image data into a plurality of streams of image samples includes: a first buffer configured to store a plurality of rows of the image data and output a row of the plurality of rows; a second buffer, coupled to the first buffer, including a plurality of storage locations to store a respective plurality of image samples of the row output by the first buffer; a plurality of shift registers; an interconnect network including a plurality of connections, each connection coupling a respective one of the plurality of shift registers to more than one of the plurality of storage locations, one or more of the plurality of storage locations being coupled to more than one of the plurality of connections; and a control circuit configured to load the plurality of shift registers with the plurality of image samples based on the plurality of connections and shift the plurality of shift registers to output the plurality of streams of image samples.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Elliott Delaye, Ashish Sirasao, Aaron Ng, Yongjun Wu, Jindrich Zejda
  • Publication number: 20190114534
    Abstract: At least one neural network accelerator performs operations of a first subset of layers of a neural network on an input data set, generates an intermediate data set, and stores the intermediate data set in a shared memory queue in a shared memory. A first processor element of a host computer system provides input data to the neural network accelerator and signals the neural network accelerator to perform the operations of the first subset of layers of the neural network on the input data set. A second processor element of the host computer system reads the intermediate data set from the shared memory queue, performs operations of a second subset of layers of the neural network on the intermediate data set, and generates an output data set while the neural network accelerator is performing the operations of the first subset of layers of the neural network on another input data set.
    Type: Application
    Filed: October 17, 2017
    Publication date: April 18, 2019
    Applicant: Xilinx, Inc.
    Inventors: Xiao Teng, Aaron Ng, Ashish Sirasao, Elliott Delaye
  • Patent number: 9460253
    Abstract: In an example, a method of processing a circuit design includes: determining a first partition in a description of the circuit design having a hierarchy of design objects, the first partition including at least one design object in the hierarchy of design objects; generating a signature for the first partition; querying a database with the signature of the first partition to identify a plurality of predefined implementations of the first partition; and generating an implementation of the circuit design for a target integrated circuit (IC) based on a selected predefined implementation of the plurality of predefined implementations for the first partition.
    Type: Grant
    Filed: September 10, 2014
    Date of Patent: October 4, 2016
    Assignee: XILINX, INC.
    Inventors: Elliott Delaye, Ashish Sirasao, Krishna Garlapati, Bing Tian
  • Patent number: 9268891
    Abstract: Compiling a circuit design includes receiving the circuit design specified in a hardware description language, detecting, using a processor, a slice of a vector within the circuit design, and determining that the slice is defined by a left slice boundary variable and a right slice boundary variable. A hardware description is generated from the circuit design using the processor by including a first shifter circuit receiving the left slice boundary variable as an input signal, a second shifter circuit receiving the right slice boundary variable as an input signal, a control signal generator coupled to the first and second shifter circuits, and an output stage. The output stage, responsive to a control signal dependent upon an output from the first shifter circuit and an output from second shifter circuit, generates an output signal including newly received values from a data signal only for bit locations of the output signal corresponding to the slice.
    Type: Grant
    Filed: November 6, 2014
    Date of Patent: February 23, 2016
    Assignee: XILINX, INC.
    Inventors: Krishna Garlapati, Elliott Delaye, Ashish Sirasao, Bing Tian
  • Patent number: 9235498
    Abstract: A circuit for enabling a modification of an input data stream is described. The circuit comprises a first plurality of registers coupled in series; an input register of the first plurality of registers coupled to receive the input data stream; an output register of the first plurality of registers positioned at an end of the first plurality of registers; and a control circuit enabling a data value which is independent of the input data stream to be generated as an output of the circuit at a predetermined time.
    Type: Grant
    Filed: June 3, 2013
    Date of Patent: January 12, 2016
    Assignee: XILINX, INC.
    Inventors: Jay Southard, Krishna Garlapati, Elliott Delaye, Ashish Sirasao, Bing Tian
  • Patent number: 8938700
    Abstract: Data-driven processing of a circuit design includes converting each pattern of one or more input patterns from a first format into a second format. Each pattern identifies one or more inputs and one or more outputs and specifies each function that generates each of the one or more outputs from the one or more inputs. Each pattern of the second format is stored in a database. An input circuit design is searched for circuit design elements that match patterns in the database. Data indicative of each pattern in the database that matches a circuit design element is output.
    Type: Grant
    Filed: February 7, 2013
    Date of Patent: January 20, 2015
    Assignee: Xilinx, Inc.
    Inventors: Elliott Delaye, Alireza S. Kaviani, Ashish Sirasao, Yinyi Wang
  • Patent number: 7836113
    Abstract: A dedicated logic cell in a programmable logic structure is described that comprises the following primary components: a configurable logic function or look-up table (LL), a dedicated logic function (DL), a sequential logic function (LS), and a control logic function (LC). In this illustration, the dedicated logic cell comprises two configurable logic functions, two sequential logic functions, a dedicate logic function, and a control logic function. In a first embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-input function, an AND function, an OR function, or an XOR function. In a second embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-to-1 multiplexer function.
    Type: Grant
    Filed: October 9, 2006
    Date of Patent: November 16, 2010
    Assignee: Agate Logic, Inc.
    Inventors: Ravi Sunkavalli, Hare K. Verma, Manoj Gunwani, Elliott Delaye
  • Patent number: 7439768
    Abstract: A dedicated logic cell in a programmable logic structure is described that comprises the following primary components: a configurable logic function or look-up table (LL), a dedicated logic function (DL), a sequential logic function (LS), and a control logic function (LC). In this illustration, the dedicated logic cell comprises two configurable logic functions, two sequential logic functions, a dedicate logic function, and a control logic function. In some embodiments, the configurable logic function comprises a plurality of look-up tables coupled to a multiplexer with configurable bits that is capable to perform a four 4-input look-up table, one 6-input look-up tables or a 4-to-1 multiplexer. In the first function that operates as the four 4-input look-up table, the dedicated logic cell has four look-up tables for receiving four inputs respectively.
    Type: Grant
    Filed: October 9, 2006
    Date of Patent: October 21, 2008
    Assignee: CSwitch Corporation
    Inventors: Ravi Sunkavalli, Hare K. Verma, Manoj Gunwani, Elliott Delaye
  • Patent number: 7428722
    Abstract: Logic design apparatus and method provides serial multiplexer chains in a programmable logic fabric, each element in the chain either selects output of block, or passes output from earlier element of the chain. Select line is a decoder structure or output from configurable function generator that is configured at power-on to create correct selection. Using such structure, larger multiplexer, including priority multiplexers, tristate buses or larger look-up tables (LUTs) can be created. These novel structures can implement priority, non-priority or tristate multiplexers.
    Type: Grant
    Filed: January 15, 2008
    Date of Patent: September 23, 2008
    Assignee: CSwitch Corporation
    Inventors: Ravi Sunkavalli, Hare Krishna Verma, Sudip Nag, Elliott Delaye
  • Patent number: 7414431
    Abstract: A dedicated logic cell in a programmable logic structure is described that comprises the following primary components: a configurable logic function or look-up table (LL), a dedicated logic function (DL), a sequential logic function (LS), and a control logic function (LC). In this illustration, the dedicated logic cell comprises two configurable logic functions, two sequential logic functions, a dedicate logic function, and a control logic function. In a first embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-input function, an AND function, an OR function, or an XOR function. In a second embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-to-1 multiplexer function.
    Type: Grant
    Filed: October 9, 2006
    Date of Patent: August 19, 2008
    Assignee: Cswitch Corporation
    Inventors: Hare K. Verma, Ravi Sunkavalli, Manoj Gunwani, Elliott Delaye
  • Publication number: 20080129334
    Abstract: Logic design apparatus and method provides serial multiplexer chains in a programmable logic fabric, each element in the chain either selects output of block, or passes output from earlier element of the chain. Select line is a decoder structure or output from configurable function generator that is configured at power-on to create correct selection. Using such structure, larger multiplexer, including priority multiplexers, tristate buses or larger look-up tables (LUTs) can be created. These novel structures can implement priority, non-priority or tristate multiplexers.
    Type: Application
    Filed: January 15, 2008
    Publication date: June 5, 2008
    Applicant: Cswitch Corporation
    Inventors: Ravi Sunkavalli, Hare Krishna Verma, Sudip Nag, Elliott Delaye
  • Patent number: 7358761
    Abstract: Logic design apparatus and method provides serial multiplexer chains in a programmable logic fabric, each element in the chain either selects output of block, or passes output from earlier element of the chain. Select line is a decoder structure or output from configurable function generator that is configured at power-on to create correct selection. Using such structure, larger multiplexer, including priority multiplexers, tristate buses or larger look-up tables (LUTs) can be created. These novel structures can implement priority, non-priority or tristate multiplexers.
    Type: Grant
    Filed: January 21, 2005
    Date of Patent: April 15, 2008
    Assignee: Csitch Corporation
    Inventors: Ravi Sunkavalli, Hare Krishna Verma, Sudip Nag, Elliott Delaye
  • Patent number: 7358765
    Abstract: A dedicated logic cell in a programmable logic structure is described that comprises the following primary components: a configurable logic function or look-up table (LL), a dedicated logic function (DL), a sequential logic function (LS), and a control logic function (LC). In this illustration, the dedicated logic cell comprises two configurable logic functions, two sequential logic functions, a dedicate logic function, and a control logic function. In a first embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-input function, an AND function, an OR function, or an XOR function. In a second embodiment, the dedicated logic cell is constructed with a combination of configurable logic functions that are coupled to a dedicated logic function in order to perform a four 2-to-1 multiplexer function.
    Type: Grant
    Filed: February 23, 2005
    Date of Patent: April 15, 2008
    Assignee: Cswitch Corporation
    Inventors: Hare K. Verma, Ravi Sunkavalli, Manoj Gunwani, Elliott Delaye