Patents by Inventor Tony L. Werner

Tony L. Werner has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP LEARNING HARDWARE

Publication number: 20240112006

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 8, 2023

Publication date: April 4, 2024

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
MULTI-VARIATE STRIDED READ OPERATIONS FOR ACCESSING MATRIX OPERANDS

Publication number: 20230333855

Abstract: In one embodiment, a matrix processor comprises a memory to store a matrix operand and a strided read sequence, wherein: the matrix operand is stored out of order in the memory; and the strided read sequence comprises a sequence of read operations to read the matrix operand in a correct order from the memory. The matrix processor further comprises circuitry to: receive a first instruction to be executed by the matrix processor, wherein the first instruction is to instruct the matrix processor to perform a first operation on the matrix operand; read the matrix operand from the memory based on the strided read sequence; and execute the first instruction by performing the first operation on the matrix operand.

Type: Application

Filed: May 19, 2023

Publication date: October 19, 2023

Applicant: Intel Corporation

Inventors: Nitin N. Garegrat, Tony L. Werner, Jeff DelChiaro, Michael Rotzin, Robert T. Rhoades, Ujwal Basavaraj Sajjanar, Anne Q. Ye
Distributed convolution for neural networks

Patent number: 11748625

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 30, 2016

Date of Patent: September 5, 2023

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
DEEP LEARNING HARDWARE

Publication number: 20230222331

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: March 15, 2023

Publication date: July 13, 2023

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
Multi-variate strided read operations for accessing matrix operands

Patent number: 11687341

Abstract: In one embodiment, a matrix processor comprises a memory to store a matrix operand and a strided read sequence, wherein: the matrix operand is stored out of order in the memory; and the strided read sequence comprises a sequence of read operations to read the matrix operand in a correct order from the memory. The matrix processor further comprises circuitry to: receive a first instruction to be executed by the matrix processor, wherein the first instruction is to instruct the matrix processor to perform a first operation on the matrix operand; read the matrix operand from the memory based on the strided read sequence; and execute the first instruction by performing the first operation on the matrix operand.

Type: Grant

Filed: August 29, 2019

Date of Patent: June 27, 2023

Assignee: Intel Corporation

Inventors: Nitin N. Garegrat, Tony L. Werner, Jeff DelChiaro, Michael Rotzin, Robert T. Rhoades, Ujwal Basavaraj Sajjanar, Anne Q. Ye
DEEP LEARNING HARDWARE

Publication number: 20220245438

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: April 25, 2022

Publication date: August 4, 2022

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20220121954

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 28, 2021

Publication date: April 21, 2022

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
Dimension shuffling using matrix processors

Patent number: 10949496

Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

Type: Grant

Filed: December 30, 2016

Date of Patent: March 16, 2021

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Amir Khosrowshahi
Programmable matrix processing engine

Patent number: 10896039

Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

Type: Grant

Filed: January 31, 2019

Date of Patent: January 19, 2021

Assignee: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
MULTI-VARIATE STRIDED READ OPERATIONS FOR ACCESSING MATRIX OPERANDS

Publication number: 20190391811

Abstract: In one embodiment, a matrix processor comprises a memory to store a matrix operand and a strided read sequence, wherein: the matrix operand is stored out of order in the memory; and the strided read sequence comprises a sequence of read operations to read the matrix operand in a correct order from the memory. The matrix processor further comprises circuitry to: receive a first instruction to be executed by the matrix processor, wherein the first instruction is to instruct the matrix processor to perform a first operation on the matrix operand; read the matrix operand from the memory based on the strided read sequence; and execute the first instruction by performing the first operation on the matrix operand.

Type: Application

Filed: August 29, 2019

Publication date: December 26, 2019

Applicant: Intel Corporation

Inventors: Nitin N. Garegrat, Tony L. Werner, Jeff DelChiaro, Michael Rotzin, Robert T. Rhoades, Ujwal Basavaraj Sajjanar, Anne Q. Ye
Winograd algorithm on a matrix processing architecture

Patent number: 10482155

Abstract: In one embodiment, a matrix operation may be performed, wherein the matrix operation comprises a matrix multiplication operation on a plurality of matrix operands. Matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the plurality of matrix operands. The plurality of matrix operands may be extracted from the matrix data, wherein the plurality of matrix operands comprises a first matrix operand and a second matrix operand. A first transform may be performed on the first matrix operand to obtain a transformed matrix operand, wherein performing matrix multiplication using the transformed matrix operand is faster than performing matrix multiplication using the first matrix operand. Matrix multiplication may be performed on the transformed matrix operand to obtain a partial result. A second transform may be performed on the partial result to obtain a result of the matrix multiplication operation.

Type: Grant

Filed: December 30, 2016

Date of Patent: November 19, 2019

Assignee: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah
MAX POOLING IN A MATRIX PROCESSING ARCHITECTURE

Publication number: 20190171690

Abstract: In one embodiment, an apparatus comprises a multi-dimensional memory and a plurality of processing elements to perform a matrix operation, wherein the matrix operation comprises a max pooling operation on one or more matrix operands. The plurality of processing elements comprises one or more matrix processors, and the plurality of processing elements is configured to: receive matrix data from the multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands; extract the one or more matrix operands from the matrix data; perform the max pooling operation using the one or more matrix operands; and obtain a result of the max pooling operation.

Type: Application

Filed: February 4, 2019

Publication date: June 6, 2019

Applicant: Intel Corporation

Inventors: Horace Lau, Tony L. Werner
PROGRAMMABLE MATRIX PROCESSING ENGINE

Publication number: 20190171450

Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

Type: Application

Filed: January 31, 2019

Publication date: June 6, 2019

Applicant: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
Programmable matrix processing engine

Patent number: 10228937

Abstract: An apparatus may comprise a multi-dimensional memory, a plurality of matrix processors, and a matrix routine memory. The matrix routine memory may store a plurality of programmable matrix routines, wherein each programmable matrix routine comprises a plurality of instructions associated with a particular matrix operation, wherein the plurality of instructions is to be executed by the plurality of matrix processors. Further, the plurality of matrix processors may be configured to: receive a command to perform a matrix operation; receive matrix data from the multi-dimensional memory; extract one or more matrix operands from the matrix data; identify a programmable matrix routine associated with the matrix operation; receive the programmable matrix routine from the matrix routine memory; execute the programmable matrix routine using the one or more matrix operands; and obtain a result of the matrix operation based on execution of the programmable matrix routine.

Type: Grant

Filed: December 30, 2016

Date of Patent: March 12, 2019

Assignee: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
Max pooling in a matrix processing architecture

Patent number: 10198401

Abstract: In one embodiment, an apparatus comprises a multi-dimensional memory and a plurality of processing elements to perform a matrix operation, wherein the matrix operation comprises a max pooling operation on one or more matrix operands. The plurality of processing elements comprises one or more matrix processors, and the plurality of processing elements is configured to: receive matrix data from the multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands; extract the one or more matrix operands from the matrix data; perform the max pooling operation using the one or more matrix operands; and obtain a result of the max pooling operation.

Type: Grant

Filed: December 30, 2016

Date of Patent: February 5, 2019

Assignee: Intel Corporation

Inventors: Horace Lau, Tony L. Werner
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20180189652

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
PROGRAMMABLE MATRIX PROCESSING ENGINE

Publication number: 20180189057

Abstract: In one embodiment, a matrix operation may be performed on one or more matrix operands. For example, matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the one or more matrix operands. The one or more matrix operands may be extracted from the matrix data. A matrix routine associated with the matrix operation may be identified. The matrix routine may be executed on a matrix processor using the one or more matrix operands. A result of the matrix operation may be obtained based on the matrix routine executed by the matrix processor.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Inventors: Tony L. Werner, Aravind Kalaiah, Vijay Korthikanti, Horace Lau
WINOGRAD ALGORITHM ON A MATRIX PROCESSING ARCHITECTURE

Publication number: 20180189237

Abstract: In one embodiment, a matrix operation may be performed, wherein the matrix operation comprises a matrix multiplication operation on a plurality of matrix operands. Matrix data may be received from a multi-dimensional memory, wherein the matrix data is associated with the plurality of matrix operands. The plurality of matrix operands may be extracted from the matrix data, wherein the plurality of matrix operands comprises a first matrix operand and a second matrix operand. A first transform may be performed on the first matrix operand to obtain a transformed matrix operand, wherein performing matrix multiplication using the transformed matrix operand is faster than performing matrix multiplication using the first matrix operand. Matrix multiplication may be performed on the transformed matrix operand to obtain a partial result. A second transform may be performed on the partial result to obtain a result of the matrix multiplication operation.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Tony L. Werner, Aravind Kalaiah
MATRIX STORAGE USING DATA SHIFTING MEMORY

Publication number: 20180188972

Abstract: In one embodiment, an apparatus comprises a memory and a memory controller. The memory comprises a plurality of memory modules, wherein each memory module comprises a plurality of storage locations. The memory controller may be configured to write data of a matrix to the memory. For example, the memory controller may be configured to write a particular row or a particular column of the matrix to the memory by: shifting a plurality of matrix elements of the particular row or the particular column; and writing the plurality of matrix elements to the plurality of memory modules.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Andrew Yang, Carey K. Kloss, Tony L. Werner, Horace Lau
DIMENSION SHUFFLING USING MATRIX PROCESSORS

Publication number: 20180189227

Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Amir Khosrowshahi

1 2 next