Patents by Inventor Carey K. Kloss

Carey K. Kloss has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP LEARNING HARDWARE

Publication number: 20240112006

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 8, 2023

Publication date: April 4, 2024

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
Distributed convolution for neural networks

Patent number: 11748625

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 30, 2016

Date of Patent: September 5, 2023

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
DEEP LEARNING HARDWARE

Publication number: 20230222331

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: March 15, 2023

Publication date: July 13, 2023

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DEEP LEARNING HARDWARE

Publication number: 20220245438

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: April 25, 2022

Publication date: August 4, 2022

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20220121954

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 28, 2021

Publication date: April 21, 2022

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
Distributed matrix multiplication for neural networks

Patent number: 10922380

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 31, 2018

Date of Patent: February 16, 2021

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
DEEP LEARNING HARDWARE

Publication number: 20190392297

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 28, 2017

Publication date: December 26, 2019

Applicant: Intel Corporation

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

Publication number: 20190138569

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 31, 2018

Publication date: May 9, 2019

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
Distributed matrix multiplication for neural networks

Patent number: 10169296

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 30, 2016

Date of Patent: January 1, 2019

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

Publication number: 20180189236

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
MATRIX STORAGE USING DATA SHIFTING MEMORY

Publication number: 20180188972

Abstract: In one embodiment, an apparatus comprises a memory and a memory controller. The memory comprises a plurality of memory modules, wherein each memory module comprises a plurality of storage locations. The memory controller may be configured to write data of a matrix to the memory. For example, the memory controller may be configured to write a particular row or a particular column of the matrix to the memory by: shifting a plurality of matrix elements of the particular row or the particular column; and writing the plurality of matrix elements to the plurality of memory modules.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Andrew Yang, Carey K. Kloss, Tony L. Werner, Horace Lau
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20180189652

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi