Patents by Inventor Tony Werner

Tony Werner has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatus and method for coherent, accelerated conversion between data representations

Patent number: 10761757

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Grant

Filed: June 30, 2018

Date of Patent: September 1, 2020

Assignee: Intel Corporation

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzin, Nitin Garegrat, Tom Schebye, Tony Werner
Matrix multiplication acceleration of sparse matrices using column folding and squeezing

Patent number: 10620951

Abstract: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.

Type: Grant

Filed: June 22, 2018

Date of Patent: April 14, 2020

Assignee: Intel Corporation

Inventors: Omid Azizi, Guy Boudoukh, Tony Werner, Andrew Yang, Michael Rotzin, Chen Koren, Eriko Nurvitadhi
DEEP LEARNING HARDWARE

Publication number: 20190392297

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 28, 2017

Publication date: December 26, 2019

Applicant: Intel Corporation

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
MATRIX MULTIPLICATION ACCELERATION OF SPARSE MATRICES USING COLUMN FOLDING AND SQUEEZING

Publication number: 20190042237

Abstract: Disclosed embodiments relate to sparse matrix multiplication (SMM) acceleration using column folding and squeezing. In one example, a processor, in response to a SMM instruction having fields to specify locations of first, second, and output matrices, the second matrix being a sparse matrix, uses execution circuitry to pack the second matrix by replacing one or more zero-valued elements with non-zero elements yet to be processed, each of the replaced elements further including a field to identify its logical position within the second matrix, and, the execution circuitry further to, for each non-zero element at row M and column K of the specified first matrix, generate a product of the element and each corresponding non-zero element at row K, column N of the packed second matrix, and accumulate each generated product with a previous value of a corresponding element at row M and column N of the specified output matrix.

Type: Application

Filed: June 22, 2018

Publication date: February 7, 2019

Inventors: Omid AZIZI, Guy BOUDOUKH, Tony WERNER, Andrew YANG, Michael ROTZIN, Chen KOREN, Eriko NURVITADHI
APPARATUS AND METHOD FOR COHERENT, ACCELERATED CONVERSION BETWEEN DATA REPRESENTATIONS

Publication number: 20190042094

Abstract: An apparatus and method for a converting tensor data. For example, one embodiment of a method comprises: fetching source tensor blocks of a source tensor data structure, each source tensor block comprising a plurality of source tensor data elements having a first numeric representation, wherein the source tensor data structure comprises a predefined structural arrangement of source tensor blocks; converting the one or more source tensor blocks into one or more destination tensor blocks comprising a plurality of destination tensor data elements having a second numeric representation different from the first numeric representation, wherein the sets of one or more source tensor blocks are converted to one or more corresponding destination tensor blocks in a specified order based on the first and second numeric representations; and storing each individual destination tensor block in a designated memory region to maintain coherency with the predefined structural arrangement of the source tensor blocks.

Type: Application

Filed: June 30, 2018

Publication date: February 7, 2019

Inventors: Krishnakumar Nair, Andrew Yang, Michael Rotzn, Nitin Garegrat, Tom Schebye, Tony Werner
Pipelined convolutional operations for processing clusters

Patent number: 9886377

Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

Type: Grant

Filed: October 5, 2015

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Tony Werner, Aravind Kalaiah, Andrew Yang, Carey Kloss, Horace Lau, Naveen Gandham Rao, Amir Khosrowshahi
Matrix operands for linear algebra operations

Patent number: 9886418

Abstract: Described herein are methods, systems, and apparatuses to utilize a matrix operation by accessing each of the operation's matrix operands via a respective single memory handle. This use of a single memory handle for each matrix operand eliminates significant overhead in memory allocation, data tracking, and subroutine complexity present in prior art solutions. The result of the matrix operation can also be accessible via a single memory handle identifying the matrix elements of the result.

Type: Grant

Filed: April 28, 2015

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Andrew Yang, Carey Kloss, Prashant Arora, Tony Werner, Naveen Gandham Rao, Amir Khosrowshahi
PIPELINED CONVOLUTIONAL OPERATIONS FOR PROCESSING CLUSTERS

Publication number: 20170097884

Abstract: Described herein are one or more integrated circuits (ICs) comprising controller circuitry to receive a command to execute an operation for data inputs stored in an external memory or a local memory, and convert the operation into a set of matrix operations to operate on sub-portions of the data inputs. The IC(s) further comprise at least one processing circuitry to execute the set of matrix operations, the processing circuitry to include ALUs, a local memory external to the ALUs and accessible by the ALUs, and processing control circuitry to create at least one matrix operand in the local memory (from the data inputs of the operation) comprising at least one of a scalar, a vector, or a 2D matrix, and provide memory handles corresponding to each of the matrix operands to one of the ALUs to access the respective matrix operands when executing a matrix operation.

Type: Application

Filed: October 5, 2015

Publication date: April 6, 2017

Applicant: Intel Corporation

Inventors: Tony Werner, Aravind Kalaiah, Andrew Yang, Carey Kloss, Horace Lau, Naveen Gandham Rao, Amir Khosrowshahi
MATRIX OPERANDS FOR LINEAR ALGEBRA OPERATIONS

Publication number: 20170060811

Abstract: Described herein are methods, systems, and apparatuses to utilize a matrix operation by accessing each of the operation's matrix operands via a respective single memory handle. This use of a single memory handle for each matrix operand eliminates significant overhead in memory allocation, data tracking, and subroutine complexity present in prior art solutions. The result of the matrix operation can also be accessible via a single memory handle identifying the matrix elements of the result.

Type: Application

Filed: April 28, 2015

Publication date: March 2, 2017

Applicant: Intel Corporation

Inventors: Andrew Yang, Carey Kloss, Prashant Arora, Tony Werner, Naveen Gandham Rao, Amir Khosrowshahi