Patents by Inventor Hadi Parandeh-Afshar

Hadi Parandeh-Afshar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices

Patent number: 11048509

Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.

Type: Grant

Filed: June 5, 2018

Date of Patent: June 29, 2021

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices

Patent number: 10846260

Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

Type: Grant

Filed: July 5, 2018

Date of Patent: November 24, 2020

Assignee: Qualcomm Incorporated

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices

Patent number: 10628162

Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.

Type: Grant

Filed: June 19, 2018

Date of Patent: April 21, 2020

Assignee: Qualcomm Incorporated

Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
PROVIDING EFFICIENT HANDLING OF BRANCH DIVERGENCE IN VECTORIZABLE LOOPS BY VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20200065098

Abstract: Providing efficient handling of branch divergence in vectorizable loops by vector-processor-based devices is disclosed. In some aspects, a vector-processor-based device provides a plurality of processing elements (PEs) coupled to a scheduler circuit comprising a clock cycle threshold and a mask register comprising a plurality of bits corresponding to a plurality of loop iterations of a vectorizable loop to be executed. The scheduler circuit initiates a first execution interval, during which loop iterations of the vectorizable loop are assigned to PEs for parallel execution. If a loop iteration's execution time exceeds the clock cycle threshold, the scheduler circuit sets a mask register bit corresponding to the loop iteration indicating that the loop iteration is incomplete, and defers its execution.

Type: Application

Filed: August 21, 2018

Publication date: February 27, 2020

Inventors: Hadi Parandeh Afshar, Eric Rotenberg, Gregory Michael Wright
PROVIDING RECONFIGURABLE FUSION OF PROCESSING ELEMENTS (PEs) IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20200012618

Abstract: Providing reconfigurable fusion of processing elements (PEs) in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device provides a vector processor including a plurality of PEs and a decode/control circuit. The decode/control circuit receives an instruction block containing a vectorizable loop comprising a loop body. The decode/control circuit determines how many PEs of the plurality of PEs are required to execute the loop body, and reconfigures the plurality of PEs into one or more fused PEs, each including the determined number of PEs required to execute the loop body. The plurality of PEs, reconfigured into one or more fused PEs, then executes one or more loop iterations of the loop body. Some aspects further include a PE communications link interconnecting the plurality of PEs, to enable communications between PEs of a fused PE and communications of inter-iteration data dependencies between PEs without requiring vector register file access operations.

Type: Application

Filed: July 5, 2018

Publication date: January 9, 2020

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
ENABLING PARALLEL MEMORY ACCESSES BY PROVIDING EXPLICIT AFFINE INSTRUCTIONS IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20190384606

Abstract: Enabling parallel memory accesses by providing explicit affine instructions in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device implementing a block-based dataflow instruction set architecture (ISA) includes a decoder circuit configured to provide an affine instruction that specifies a base parameter indicating a base value B, a stride parameter indicating a stride interval value S, and a count parameter indicating a count value C. The decoder circuit of the vector-processor-based device decodes the affine instruction, and generates an output stream comprising one or more output values, wherein a count of the output values of the output stream equals the count value C. Using an index X where 0?X<C, each Xth output value in the output stream is generated as a sum of the base value B and a product of the stride interval value S and the index X.

Type: Application

Filed: June 19, 2018

Publication date: December 19, 2019

Inventors: Amrit Panda, Eric Rotenberg, Hadi Parandeh Afshar, Gregory Michael Wright
PROVIDING MULTI-ELEMENT MULTI-VECTOR (MEMV) REGISTER FILE ACCESS IN VECTOR-PROCESSOR-BASED DEVICES

Publication number: 20190369994

Abstract: Providing multi-element multi-vector (MEMV) register file access in vector-processor-based devices is disclosed. In this regard, a vector-processor-based device includes a vector processor comprising multiple processing elements (PEs) communicatively coupled via a corresponding plurality of channels to a vector register file comprising a plurality of memory banks. The vector processor provides a direct memory access (DMA) controller that is configured to receive a plurality of vectors that each comprise a plurality of vector elements representing operands for processing a loop iteration. The DMA controller arranges the vectors in the vector register file such that, for each group of vectors to be accessed in parallel, vector elements for each vector are stored consecutively, but corresponding vector elements of consecutive vectors are stored in different memory banks of the vector register file.

Type: Application

Filed: June 5, 2018

Publication date: December 5, 2019

Inventors: Hadi Parandeh Afshar, Amrit Panda, Eric Rotenberg, Gregory Michael Wright
Non-LUT field-programmable gate arrays

Patent number: 9231594

Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is a tree structure comprised of a number of levels of cells with each cell consisting of a logic gate or the functional equivalent of a logic gate, one or more selectable inverters, and wherein the inputs of the logic block consist of the inputs to the logic gate or functional equivalent of the logic gate and inputs to the selectable inverters. The new logic blocks can map circuits more efficiently than LUTs, because they include multi-output blocks and can cover more logic depth due to the higher input and output bandwidth.

Type: Grant

Filed: August 13, 2014

Date of Patent: January 5, 2016

Assignee: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)

Inventors: Hadi Parandeh Afshar, David Novo Bruna, Paolo Ienne Lopez, Grace Zgheib
NON-LUT FIELD-PROGRAMMABLE GATE ARRAYS

Publication number: 20140347096

Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is a tree structure comprised of a number of levels of cells with each cell consisting of a logic gate or the functional equivalent of a logic gate, one or more selectable inverters, and wherein the inputs of the logic block consist of the inputs to the logic gate or functional equivalent of the logic gate and inputs to the selectable inverters. The new logic blocks can map circuits more efficiently than LUTs, because they include multi-output blocks and can cover more logic depth due to the higher input and output bandwidth.

Type: Application

Filed: August 13, 2014

Publication date: November 27, 2014

Inventors: Hadi Parandeh Afshar, David Novo Bruna, Paolo Ienne Lopez, Grace Zgheib
Non-LUT field-programmable gate arrays

Patent number: 8836368

Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.

Type: Grant

Filed: December 21, 2011

Date of Patent: September 16, 2014

Assignee: Ecole Polytechnique Federale de Lausanne (EPFL)

Inventors: Hadi Parandeh Afshar, David Novo Bruña, Paolo Ienne Lopez
Generalized programmable counter arrays

Patent number: 8667046

Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).

Type: Grant

Filed: February 20, 2009

Date of Patent: March 4, 2014

Assignee: Ecole Polytechnique Federale de Lausanne/Service des Relations Industrielles

Inventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar
NON-LUT FIELD-PROGRAMMABLE GATE ARRAYS

Publication number: 20130162292

Abstract: New logic blocks capable of replacing the use of Look-Up Tables (LUTs) in integrated circuits, such as Field-Programmable Gate Arrays (FPGAs), are disclosed herein. In one embodiment, the new logic block is an AND-Inverter Cone (AIC), which is a binary tree including one or more AND gates with a programmable conditional inversion and a number of intermediary outputs. Compared to LUTs, AICs are richer in terms of input and output bandwidth, because the area of the AICs grows only linearly with the number of inputs. Also, the delay grows only logarithmically with the input count. The new logic blocks can map circuits more efficiently than LUTs, because the AICs are multi-output blocks and can cover more logic depth due to the higher input bandwidth.

Type: Application

Filed: December 21, 2011

Publication date: June 27, 2013

Applicant: ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE (EPFL)

Inventors: Hadi Parandeh Afshar, David Novo Bruña, Paolo Ienne Lopez
GENERALIZED PROGRAMMABLE COUNTER ARRAYS

Publication number: 20090216826

Abstract: A Generalized Programmable Counter Array (GPCA) is a reconfigurable multi-operand adder, which can be reprogrammed to sum a plurality of operands of arbitrary size. The GPCA is configured to compress the input words down to two operands using parallel counters. Resulting operands are then summed using a standard Ripple Carry Adder to produce the final result. The GPCA consists of a linear arrangement of identical compressor slices (CSlice).

Type: Application

Filed: February 20, 2009

Publication date: August 27, 2009

Applicant: Ecole Polytechnique Federale de Lausanne/ Service des Relations Industrielles(SRI)

Inventors: Philip Brisk, Alessandro Cevrero, Frank K. Gurkaynak, Paolo Ienne Lopez, Hadi Parandeh-Afshar