Patents by Inventor Asaf Hargil
Asaf Hargil has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11341210Abstract: This application relates to a multi-layer convolution operation. The multi-layer convolution operation is optimized for a vector processing unit having a number of data paths configured to operate on vector operands containing a number of elements processed in parallel by the data paths. The convolution operation specifies a convolution kernel utilized to filter a multi-channel input and generate a multi-channel output of the convolution operation. A number of threads are generated to process blocks of the multi-channel output, each block comprising a set of windows of a number of channels of the multi-channel output. Each window is a portion of the array of elements in a single layer of the multi-channel output. Each thread processes a block in accordance with an arbitrary width of the block, processing a set of instructions for each sub-block of the block having a well-defined width, the instructions optimized for the vector processing unit.Type: GrantFiled: May 29, 2019Date of Patent: May 24, 2022Inventors: Asaf Hargil, Ali Sazegari
-
Publication number: 20200356836Abstract: This application relates to classifying information using a fully-connected layer of a convolutional neural network. A method for classifying information using a fully-connected layer of a convolutional neural network includes calculating a first partial output for a first block of elements by performing a dot product operation using a first row of elements of the first block of elements and a first weight block, where the first row of elements of the first block of elements corresponds to a first batch of elements. The method further includes generating a first output element using the first partial output for the first block of elements and at least one other partial output corresponding to the first batch of elements.Type: ApplicationFiled: September 11, 2019Publication date: November 12, 2020Inventors: Asaf HARGIL, Ali SAZEGARI
-
Publication number: 20200356837Abstract: This application relates to performing fully-connected inferences using a convolutional neural network. A method includes receiving a two-dimensional input matrix that includes a plurality of elements. The method further includes identifying a two-dimensional weight matrix corresponding to the two-dimensional input matrix, where the two-dimensional weight matrix includes a plurality of weight values. The method further includes transposing a first column of the two-dimensional weight matrix and storing the transposed first column of the two-dimensional weight matrix in a first register having a first length corresponding to the transposed first column. The method further includes generating a first output element by performing a first dot product operation using a first row of the two-dimensional input matrix and the transposed first column. Finally, the method includes storing the first output element in a first row of a two-dimensional output matrix.Type: ApplicationFiled: September 11, 2019Publication date: November 12, 2020Inventors: Asaf HARGIL, Ali SAZEGARI
-
Publication number: 20200265106Abstract: This application relates to a multi-layer convolution operation. The multi-layer convolution operation is optimized for a vector processing unit having a number of data paths configured to operate on vector operands containing a number of elements processed in parallel by the data paths. The convolution operation specifies a convolution kernel utilized to filter a multi-channel input and generate a multi-channel output of the convolution operation. A number of threads are generated to process blocks of the multi-channel output, each block comprising a set of windows of a number of channels of the multi-channel output. Each window is a portion of the array of elements in a single layer of the multi-channel output. Each thread processes a block in accordance with an arbitrary width of the block, processing a set of instructions for each sub-block of the block having a well-defined width, the instructions optimized for the vector processing unit.Type: ApplicationFiled: May 29, 2019Publication date: August 20, 2020Inventors: Asaf HARGIL, Ali SAZEGARI
-
Patent number: 9086872Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: GrantFiled: June 30, 2009Date of Patent: July 21, 2015Assignee: Intel CorporationInventors: Asaf Hargil, Doron Orenstein
-
Patent number: 9081562Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: GrantFiled: March 15, 2013Date of Patent: July 14, 2015Assignee: Intel CorporationInventors: Asaf Hargil, Doron Orenstein
-
Publication number: 20140189296Abstract: A loop remainder mask instruction indicates a current iteration count of a loop as a first operand, an iteration limit of a loop as a second operand, and a destination. The loop contains iterations and each iteration includes a data element of the array. A processor receives the loop remainder mask instruction, decodes the instruction for execution, and stores a result of the execution in the destination. The result indicates a number of data elements of the array past an end of a preceding portion of the array that are to be handled separately from the preceding portion, the end of the preceding portion being where the current iteration count is recorded.Type: ApplicationFiled: December 14, 2011Publication date: July 3, 2014Inventors: Elmoustapha Ould-Ahmed-Vall, Robert Valentine, Jesus Corbal, Andrey Naraikin, Suleyman Sair, Asaf Hargil, Miland B. Girkar, Bret T. Toll, Mark J. Charney
-
Publication number: 20130232321Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: ApplicationFiled: March 15, 2013Publication date: September 5, 2013Inventors: Asaf Hargil, Doron Orenstein
-
Patent number: 8386547Abstract: A technique to accelerate range detection in a spline calcuation. In one embodiment, an instruction and corresponding logic are provided to perform range detection within a computer or processor.Type: GrantFiled: October 31, 2008Date of Patent: February 26, 2013Assignee: Intel CorporationInventors: Asaf Hargil, Evgeny Fiksman, Artiom Myaskouvskey, Doron Orenstien
-
Publication number: 20100332794Abstract: Receiving an instruction indicating first and second operands. Each of the operands having packed data elements that correspond in respective positions. A first subset of the data elements of the first operand and a first subset of the data elements of the second operand each corresponding to a first lane. A second subset of the data elements of the first operand and a second subset of the data elements of the second operand each corresponding to a second lane. Storing result, in response to instruction, including: (1) in first lane, only lowest order data elements from first subset of first operand interleaved with corresponding lowest order data elements from first subset of second operand; and (2) in second lane, only highest order data elements from second subset of first operand interleaved with corresponding highest order data elements from second subset of second operand.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Inventors: Asaf Hargil, Doron Orenstein
-
Publication number: 20100135417Abstract: A video processing device may comprise a video processing logic to control the enhancement operations performed on the video processing device. The video processing logic may determine a short term frame rate average value in response to receiving a plurality of video frames. Further, the video processing logic may generate a derivative of the short term frame rate using the short term frame rate value. The video processing logic may then activate monitoring of a processor usage if the derivative of the short term frame rate is below a first threshold value. The video processing logic may then reduce the performance of rendering of the plurality of video frames if a processor usage average value is above a second threshold. While restoring the performance, the video processing logic may restore the enhancement operations in steps after determining that processor resources are available.Type: ApplicationFiled: December 2, 2008Publication date: June 3, 2010Inventor: Asaf Hargil
-
Publication number: 20100115014Abstract: A technique to accelerate range detection in a spline calcuation. In one embodiment, an instruction and corresponding logic are provided to perform range detection within a computer or processor.Type: ApplicationFiled: October 31, 2008Publication date: May 6, 2010Inventors: Asaf Hargil, Evgeny Fiksman, Artiom Myaskouvskey, Doron Orenstien