Patents by Inventor Asaf Hargil

Asaf Hargil has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Two-dimensional multi-layer convolution for deep learning

Patent number: 11341210

Abstract: This application relates to a multi-layer convolution operation. The multi-layer convolution operation is optimized for a vector processing unit having a number of data paths configured to operate on vector operands containing a number of elements processed in parallel by the data paths. The convolution operation specifies a convolution kernel utilized to filter a multi-channel input and generate a multi-channel output of the convolution operation. A number of threads are generated to process blocks of the multi-channel output, each block comprising a set of windows of a number of channels of the multi-channel output. Each window is a portion of the array of elements in a single layer of the multi-channel output. Each thread processes a block in accordance with an arbitrary width of the block, processing a set of instructions for each sub-block of the block having a well-defined width, the instructions optimized for the vector processing unit.

Type: Grant

Filed: May 29, 2019

Date of Patent: May 24, 2022

Inventors: Asaf Hargil, Ali Sazegari
FAST DEEP LEARNING FULLY-CONNECTED INFERENCE

Publication number: 20200356837

Abstract: This application relates to performing fully-connected inferences using a convolutional neural network. A method includes receiving a two-dimensional input matrix that includes a plurality of elements. The method further includes identifying a two-dimensional weight matrix corresponding to the two-dimensional input matrix, where the two-dimensional weight matrix includes a plurality of weight values. The method further includes transposing a first column of the two-dimensional weight matrix and storing the transposed first column of the two-dimensional weight matrix in a first register having a first length corresponding to the transposed first column. The method further includes generating a first output element by performing a first dot product operation using a first row of the two-dimensional input matrix and the transposed first column. Finally, the method includes storing the first output element in a first row of a two-dimensional output matrix.

Type: Application

Filed: September 11, 2019

Publication date: November 12, 2020

Inventors: Asaf HARGIL, Ali SAZEGARI
FAST DEEP LEARNING FULLY-CONNECTED COLUMN-MAJOR IMPLEMENTATION

Publication number: 20200356836

Abstract: This application relates to classifying information using a fully-connected layer of a convolutional neural network. A method for classifying information using a fully-connected layer of a convolutional neural network includes calculating a first partial output for a first block of elements by performing a dot product operation using a first row of elements of the first block of elements and a first weight block, where the first row of elements of the first block of elements corresponds to a first batch of elements. The method further includes generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements.

Type: Application

Filed: September 11, 2019

Publication date: November 12, 2020

Inventors: Asaf HARGIL, Ali SAZEGARI
TWO-DIMENSIONAL MULTI-LAYER CONVOLUTION FOR DEEP LEARNING

Publication number: 20200265106

Abstract: This application relates to a multi-layer convolution operation. The multi-layer convolution operation is optimized for a vector processing unit having a number of data paths configured to operate on vector operands containing a number of elements processed in parallel by the data paths. The convolution operation specifies a convolution kernel utilized to filter a multi-channel input and generate a multi-channel output of the convolution operation. A number of threads are generated to process blocks of the multi-channel output, each block comprising a set of windows of a number of channels of the multi-channel output. Each window is a portion of the array of elements in a single layer of the multi-channel output. Each thread processes a block in accordance with an arbitrary width of the block, processing a set of instructions for each sub-block of the block having a well-defined width, the instructions optimized for the vector processing unit.

Type: Application

Filed: May 29, 2019

Publication date: August 20, 2020

Inventors: Asaf HARGIL, Ali SAZEGARI
Unpacking packed data in multiple lanes

Patent number: 9086872

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Grant

Filed: June 30, 2009

Date of Patent: July 21, 2015

Assignee: Intel Corporation

Inventors: Asaf Hargil, Doron Orenstein
Unpacking packed data in multiple lanes

Patent number: 9081562

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Grant

Filed: March 15, 2013

Date of Patent: July 14, 2015

Assignee: Intel Corporation

Inventors: Asaf Hargil, Doron Orenstein
SYSTEM, APPARATUS AND METHOD FOR LOOP REMAINDER MASK INSTRUCTION

Publication number: 20140189296

Abstract: A loop remainder mask instruction indicates a current iteration count of a loop as a first operand, an iteration limit of a loop as a second operand, and a destination. The loop contains iterations and each iteration includes a data element of the array. A processor receives the loop remainder mask instruction, decodes the instruction for execution, and stores a result of the execution in the destination. The result indicates a number of data elements of the array past an end of a preceding portion of the array that are to be handled separately from the preceding portion, the end of the preceding portion being where the current iteration count is recorded.

Type: Application

Filed: December 14, 2011

Publication date: July 3, 2014

Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Andrey Naraikin, Suleyman Sair, Asaf Hargil, Miland B. Girkar, Bret T. Toll, Mark J. Charney
Unpacking Packed Data In Multiple Lanes

Publication number: 20130232321

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: March 15, 2013

Publication date: September 5, 2013

Inventors: Asaf Hargil, Doron Orenstein
Instruction and logic for performing range detection

Patent number: 8386547

Abstract: A technique to accelerate range detection in a spline calcuation. In one embodiment, an instruction and corresponding logic are provided to perform range detection within a computer or processor.

Type: Grant

Filed: October 31, 2008

Date of Patent: February 26, 2013

Assignee: Intel Corporation

Inventors: Asaf Hargil, Evgeny Fiksman, Artiom Myaskouvskey, Doron Orenstien
UNPACKING PACKED DATA IN MULTIPLE LANES

Publication number: 20100332794

Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.

Type: Application

Filed: June 30, 2009

Publication date: December 30, 2010

Inventors: Asaf Hargil, Doron Orenstein
PROCESSING OF VIDEO DATA IN RESOURCE CONTRAINED DEVICES

Publication number: 20100135417

Abstract: A video processing device may comprise a video processing logic to control the enhancement operations performed on the video processing device. The video processing logic may determine a short term frame rate average value in response to receiving a plurality of video frames. Further, the video processing logic may generate a derivative of the short term frame rate using the short term frame rate value. The video processing logic may then activate monitoring of a processor usage if the derivative of the short term frame rate is below a first threshold value. The video processing logic may then reduce the performance of rendering of the plurality of video frames if a processor usage average value is above a second threshold. While restoring the performance, the video processing logic may restore the enhancement operations in steps after determining that processor resources are available.

Type: Application

Filed: December 2, 2008

Publication date: June 3, 2010

Inventor: Asaf Hargil
Instruction and logic for performing range detection

Publication number: 20100115014

Abstract: A technique to accelerate range detection in a spline calcuation. In one embodiment, an instruction and corresponding logic are provided to perform range detection within a computer or processor.

Type: Application

Filed: October 31, 2008

Publication date: May 6, 2010

Inventors: Asaf Hargil, Evgeny Fiksman, Artiom Myaskouvskey, Doron Orenstien