Patents by Inventor Amir KHOSROWSHAHI

Amir KHOSROWSHAHI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

DEEP LEARNING HARDWARE

Publication number: 20240112006

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 8, 2023

Publication date: April 4, 2024

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
Distributed convolution for neural networks

Patent number: 11748625

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 30, 2016

Date of Patent: September 5, 2023

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
DEEP LEARNING HARDWARE

Publication number: 20230222331

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: March 15, 2023

Publication date: July 13, 2023

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DEEP LEARNING HARDWARE

Publication number: 20220245438

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: April 25, 2022

Publication date: August 4, 2022

Inventors: Horce H. Lau, Prashant Arora, Olivia K. Wu, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20220121954

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 28, 2021

Publication date: April 21, 2022

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
Optical analog matrix multiplier for optical neural networks

Patent number: 11251876

Abstract: Embodiments of the present disclosure are directed toward techniques and apparatus comprising at least one layer of an ONN that includes an optical matrix multiplier provided in a semiconductor substrate to receive a plurality of optical signal inputs and to linearly transform the plurality of optical signal inputs into a plurality of optical signal outputs. The optical matrix multiplier comprises one or more 2×2 unitary optical matrices optically interconnected to implement a singular value decomposition (SVD) of a matrix, and a nonlinear optical device coupled with the optical matrix multiplier in the semiconductor substrate, to receive the optical signal outputs and to provide an optical output that is generated in a nonlinear manner in response to the optical signal outputs of the optical matrix multiplier reaching saturation or attenuation. Additional embodiments may be described and claimed.

Type: Grant

Filed: November 17, 2020

Date of Patent: February 15, 2022

Assignee: Intel Corporation

Inventors: Wenhua Lin, Amir Khosrowshahi, Casimir Wierzynski
Optical analog matrix multiplier for optical neural networks

Patent number: 11218223

Abstract: Embodiments of the present disclosure are directed toward techniques and apparatus comprising at least one layer of an ONN that includes an optical matrix multiplier provided in a semiconductor substrate to receive a plurality of optical signal inputs and to linearly transform the plurality of optical signal inputs into a plurality of optical signal outputs. The optical matrix multiplier comprises one or more 2×2 unitary optical matrices optically interconnected to implement a singular value decomposition (SVD) of a matrix, and a nonlinear optical device coupled with the optical matrix multiplier in the semiconductor substrate, to receive the optical signal outputs and to provide an optical output that is generated in a nonlinear manner in response to the optical signal outputs of the optical matrix multiplier reaching saturation or attenuation. Additional embodiments may be described and claimed.

Type: Grant

Filed: November 17, 2020

Date of Patent: January 4, 2022

Assignee: Intel Corporation

Inventors: Wenhua Lin, Amir Khosrowshahi, Casimir Wierzynski
HETEROGENEOUSLY INTEGRATED SILICON PHOTONICS NEURAL NETWORK CHIP

Publication number: 20210132650

Abstract: Embodiments of the present disclosure are directed toward techniques and configurations for a photonics integrated circuit (IC) for an optical neural network (ONN). In embodiments, the photonics IC includes monolithically optoelectronic components in a single semiconductor substrate including a combination of one or more of integrated array of light sources, a plurality of optical modulators, an optical unitary matrix multiplier, non-linear optical amplifiers or attenuators, and a plurality of photodetectors. In embodiments, the optical unitary matrix multiplier comprises a plurality of 2×2 unitary optical matrices optically interconnected, wherein each 2×2 unitary optical matrix comprises a plurality of phase shifters. In embodiments, each 2×2 unitary optical matrix is to phase shift, split, and/or combine one or more of the optical signal inputs. Other embodiments may be described and/or claimed.

Type: Application

Filed: November 17, 2020

Publication date: May 6, 2021

Inventors: Wenhua Lin, Casimir Wierzynski, Amir Khosrowshahi, Bharadwaj Parthasarathy, Jin Hong, Robert Blum
HIGH EFFICIENCY OPTICAL NEURAL NETWORK

Publication number: 20210133547

Abstract: Techniques and configurations for an optical neural network (ONN) with layers of optical matrix multipliers and an optical nonlinearity function are described herein. The techniques provide for programmable matrix multipliers, allowing for a partitioned use of a part of a matrix as needed, for computation efficiency. The techniques provide for multiple pass-through the same optical matrix die on the same photonic integrated circuit (PIC) chip and for connecting multiple layers of the ONN and running through them in sequence. The techniques further provide for scaling the ONN to different sizes. Additional embodiments may be described and claimed.

Type: Application

Filed: November 17, 2020

Publication date: May 6, 2021

Inventors: Wenhua Lin, Amir Khosrowshahi, Casimir Wierzynski
OPTICAL ANALOG MATRIX MULTIPLIER FOR OPTICAL NEURAL NETWORKS

Publication number: 20210135764

Abstract: Embodiments of the present disclosure are directed toward techniques and apparatus comprising at least one layer of an ONN that includes an optical matrix multiplier provided in a semiconductor substrate to receive a plurality of optical signal inputs and to linearly transform the plurality of optical signal inputs into a plurality of optical signal outputs. The optical matrix multiplier comprises one or more 2×2 unitary optical matrices optically interconnected to implement a singular value decomposition (SVD) of a matrix, and a nonlinear optical device coupled with the optical matrix multiplier in the semiconductor substrate, to receive the optical signal outputs and to provide an optical output that is generated in a nonlinear manner in response to the optical signal outputs of the optical matrix multiplier reaching saturation or attenuation. Additional embodiments may be described and claimed.

Type: Application

Filed: November 17, 2020

Publication date: May 6, 2021

Inventors: Wenhua Lin, Amir Khosrowshahi, Casimir Wierzynski
Dimension shuffling using matrix processors

Patent number: 10949496

Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

Type: Grant

Filed: December 30, 2016

Date of Patent: March 16, 2021

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Amir Khosrowshahi
Distributed matrix multiplication for neural networks

Patent number: 10922380

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 31, 2018

Date of Patent: February 16, 2021

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
Dynamic management of numerical representation in a distributed matrix processor architecture

Patent number: 10552119

Abstract: A system receives and executes a sequence of tensor instructions, for example, instructions for performing a neural network computation. The system may be implemented as a multiprocessor architecture, for example, hardware for performing a neural network computation. A tensor instruction specifies a tensor computation receiving one or more input tensors for determining an output tensor. The system stores a decimal position associated with a plurality of values of a tensor. The system performs the tensor computation of a tensor instruction to determine a plurality of values of the output tensor. The system collects statistics describing the plurality of values of the output tensor and determines a decimal position for the plurality of values based on the collected statistics.

Type: Grant

Filed: April 29, 2016

Date of Patent: February 4, 2020

Assignee: Intel Corporation

Inventors: Urs Koster, William Howard Constable, Luke James Hornof, Carey Kevin Kloss, Amir Khosrowshahi, Scott Gray
DEEP LEARNING HARDWARE

Publication number: 20190392297

Abstract: A network of matrix processing units (MPUs) is provided on a device, where each MPU is connected to at least one other MPU in the network, and each MPU is to perform matrix multiplication operations. Computer memory stores tensor data and a master control central processing unit (MCC) is provided on the device to receive an instruction from a host device, where the instruction includes one or more tensor operands based on the tensor data. The MCC invokes a set of operations on one or more of the MPUs based on the instruction, where the set of operations includes operations on the tensor operands. A result is generated from the set of operations, the result embodied as a tensor value.

Type: Application

Filed: December 28, 2017

Publication date: December 26, 2019

Applicant: Intel Corporation

Inventors: Horace H. Lau, Prashant Arora, Olivia K. Wu, Tony Werner, Carey K. Kloss, Amir Khosrowshahi, Andrew Yang, Aravind Kalaiah, Vijay Anand R. Korthikanti
DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

Publication number: 20190138569

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 31, 2018

Publication date: May 9, 2019

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
Distributed matrix multiplication for neural networks

Patent number: 10169296

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Grant

Filed: December 30, 2016

Date of Patent: January 1, 2019

Assignee: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
DIMENSION SHUFFLING USING MATRIX PROCESSORS

Publication number: 20180189227

Abstract: In one embodiment, a matrix operation may be performed to reorder a plurality of dimensions of an input matrix stored in two-dimensional memory. Data associated with the input matrix may be accessed using one or more strided memory operations, wherein the one or more strided memory operations are configured to access the two-dimensional memory at a plurality of locations that are separated by a particular interval. The data accessed using the one or more strided memory operations may be stored in a result matrix, wherein the data accessed using each strided memory operation is stored in the result matrix in non-transpose form or transpose form.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Amir Khosrowshahi
DISTRIBUTED MATRIX MULTIPLICATION FOR NEURAL NETWORKS

Publication number: 20180189236

Abstract: In one embodiment, a matrix operation associated with a plurality of input matrices may be performed. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Carey K. Kloss, Aravind Kalaiah, Amir Khosrowshahi
DISTRIBUTED CONVOLUTION FOR NEURAL NETWORKS

Publication number: 20180189652

Abstract: In one embodiment, a matrix operation may be performed using a plurality of input matrices, wherein the matrix operation is associated with one or more convolution operations. The plurality of input matrices may be partitioned into a plurality of input partitions, wherein the plurality of input matrices is partitioned based on a number of available processing elements. The plurality of input partitions may be distributed among a plurality of processing elements, wherein each input partition is distributed to a particular processing element of the plurality of processing elements. A plurality of partial matrix operations may be performed using the plurality of processing elements, and partial matrix data may be transmitted between the plurality of processing elements while performing the plurality of partial matrix operations. A result of the matrix operation may be determined based on the plurality of partial matrix operations.

Type: Application

Filed: December 30, 2016

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Vijay Anand R. Korthikanti, Aravind Kalaiah, Tony L. Werner, Carey K. Kloss, Amir Khosrowshahi
Matrix operands for linear algebra operations

Patent number: 9886418

Abstract: Described herein are methods, systems, and apparatuses to utilize a matrix operation by accessing each of the operation's matrix operands via a respective single memory handle. This use of a single memory handle for each matrix operand eliminates significant overhead in memory allocation, data tracking, and subroutine complexity present in prior art solutions. The result of the matrix operation can also be accessible via a single memory handle identifying the matrix elements of the result.

Type: Grant

Filed: April 28, 2015

Date of Patent: February 6, 2018

Assignee: Intel Corporation

Inventors: Andrew Yang, Carey Kloss, Prashant Arora, Tony Werner, Naveen Gandham Rao, Amir Khosrowshahi

1 2 next