Patents by Inventor Hadi Parandeh-Afshar

Hadi Parandeh-Afshar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11048509
    Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.
    Type: Grant
    Filed: June 5, 2018
    Date of Patent: June 29, 2021
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Patent number: 10846260
    Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.
    Type: Grant
    Filed: July 5, 2018
    Date of Patent: November 24, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Patent number: 10628162
    Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.
    Type: Grant
    Filed: June 19, 2018
    Date of Patent: April 21, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
  • Publication number: 20200065098
    Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution.
    Type: Application
    Filed: August 21, 2018
    Publication date: February 27, 2020
    Inventors: Hadi Parandeh Afshar, Eric Rotenberg, Gregory Michael Wright
  • Publication number: 20200012618
    Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.
    Type: Application
    Filed: July 5, 2018
    Publication date: January 9, 2020
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Publication number: 20190384606
    Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.
    Type: Application
    Filed: June 19, 2018
    Publication date: December 19, 2019
    Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
  • Publication number: 20190369994
    Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.
    Type: Application
    Filed: June 5, 2018
    Publication date: December 5, 2019
    Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
  • Patent number: 9231594
    Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is a tree structure comprised of a number of levels of cells with each cell consisting of a logic gate or the functional equivalent of a logic gate, one or more selectable inverters, and wherein the inputs of the logic block consist of the inputs to the logic gate or functional equivalent of the logic gate and inputs to the selectable inverters. The new logic blocks can map circuits more efficiently than LUTs, because they include multi-output blocks and can cover more logic depth due to the higher input and output bandwidth.
    Type: Grant
    Filed: August 13, 2014
    Date of Patent: January 5, 2016
    Assignee: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)
    Inventors: Hadi Parandeh Afshar, David Novo Bruna, Paolo Ienne Lopez, Grace Zgheib
  • Publication number: 20140347096
    Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is a tree structure comprised of a number of levels of cells with each cell consisting of a logic gate or the functional equivalent of a logic gate, one or more selectable inverters, and wherein the inputs of the logic block consist of the inputs to the logic gate or functional equivalent of the logic gate and inputs to the selectable inverters. The new logic blocks can map circuits more efficiently than LUTs, because they include multi-output blocks and can cover more logic depth due to the higher input and output bandwidth.
    Type: Application
    Filed: August 13, 2014
    Publication date: November 27, 2014
    Inventors: Hadi Parandeh Afshar, David Novo Bruna, Paolo Ienne Lopez, Grace Zgheib
  • Patent number: 8836368
    Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.
    Type: Grant
    Filed: December 21, 2011
    Date of Patent: September 16, 2014
    Assignee: Ecole Polytechnique Federale de Lausanne (EPFL)
    Inventors: Hadi Parandeh Afshar, David Novo Bruña, Paolo Ienne Lopez
  • Patent number: 8667046
    Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
    Type: Grant
    Filed: February 20, 2009
    Date of Patent: March 4, 2014
    Assignee: Ecole Polytechnique Federale de Lausanne/Service des Relations Industrielles
    Inventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar
  • Publication number: 20130162292
    Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.
    Type: Application
    Filed: December 21, 2011
    Publication date: June 27, 2013
    Applicant: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)
    Inventors: Hadi Parandeh Afshar, David Novo Bruña, Paolo Ienne Lopez
  • Publication number: 20090216826
    Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).
    Type: Application
    Filed: February 20, 2009
    Publication date: August 27, 2009
    Applicant: Ecole Polytechnique Federale de Lausanne/ Service des Relations Industrielles(SRI)
    Inventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar