Patents by Inventor Om Ji Omer

Om Ji Omer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Methods, apparatus, and articles of manufacture to improve in-memory multiply and accumulate operations

Patent number: 11949414

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 2, 2024

Assignee: INTEL CORPORATION

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
Applying self-confidence in multi-label classification to model training

Patent number: 11875555

Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.

Type: Grant

Filed: November 24, 2021

Date of Patent: January 16, 2024

Assignee: Intel Corporation

Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

Publication number: 20240005135

Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

Type: Application

Filed: April 18, 2023

Publication date: January 4, 2024

Applicant: Intel Corporation

Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
MAXIMIZING ON-CHIP DATA REUSE IN COMPUTE IN MEMORY AND COMPUTE NEAR MEMORY ARCHITECTURES

Publication number: 20230333999

Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.

Type: Application

Filed: June 22, 2023

Publication date: October 19, 2023

Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11783170

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: January 25, 2023

Date of Patent: October 10, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits

Patent number: 11714998

Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

Type: Grant

Filed: June 23, 2020

Date of Patent: August 1, 2023

Assignee: INTEL CORPORATION

Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
METHOD AND APPARATUS FOR LEVERAGING SIMULTANEOUS MULTITHREADING FOR BULK COMPUTE OPERATIONS

Publication number: 20230205692

Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.

Type: Application

Filed: December 23, 2021

Publication date: June 29, 2023

Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
SPARSE INDEX GENERATOR

Publication number: 20230169315

Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.

Type: Application

Filed: September 1, 2022

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20230169319

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11620818

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 4, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
HIGH END IMAGING RADAR

Publication number: 20220196798

Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.

Type: Application

Filed: July 14, 2021

Publication date: June 23, 2022

Inventors: Chulong CHEN, Wenling Margaret HUANG, Saiveena KESARAJU, Ivan SIMÕES GASPAR, Pradyumna S. SINGH, Biji GEORGE, Dipan Kumar MANDAL, Om Ji OMER, Sreenivas SUBRAMONEY, Yuval AMIZUR, Leor BANIN, Hao CHEN, Nir DVORECKI, Shengbo XU
Methods, apparatus, articles of manufacture to perform accelerated matrix multiplication

Patent number: 11347828

Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.

Type: Grant

Filed: March 27, 2020

Date of Patent: May 31, 2022

Assignee: Intel Corporation

Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
SEGMENT FUSION BASED ROBUST SEMANTIC SEGMENTATION OF SCENES

Publication number: 20220148311

Abstract: Systems, apparatuses and methods may provide for technology that identifies a plurality of segments based on semantic features and instance features associated with a scene, fuses the plurality of segments into a plurality of instances, and selects classification labels for the plurality of instances. In one example, the plurality of segments is fused into the plurality of instances via a learnable self-attention based network.

Type: Application

Filed: January 24, 2022

Publication date: May 12, 2022

Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
HARDWARE-SOFTWARE CO-DESIGNED MULTI-CAST FOR IN-MEMORY COMPUTING ARCHITECTURES

Publication number: 20220113974

Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
MATRIX PROCESSING ENGINE WITH COUPLED DENSE AND SCALAR COMPUTE

Publication number: 20220114234

Abstract: A matrix processing engine is provided for efficient matrix computation performed by a dense matrix compute circuit (performing SIMD operations) and a scalar computing core (performing SISD operations). These two processing components operate together to produce output data tiles by feeding results of the dense SIMD operations to the scalar computing core using thread packing and an in-line buffer for accumulating and packing the dense result data. This permits the scalar computing to spawn threads to operate on the dense results as available and without requiring partial or intermediate data read/writes between the dense and scalar computations.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Biji George, Sreenivas Subramoney, Om Ji Omer, Anoop Viswam
APPLYING SELF-CONFIDENCE IN MULTI-LABEL CLASSIFICATION TO MODEL TRAINING

Publication number: 20220084310

Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.

Type: Application

Filed: November 24, 2021

Publication date: March 17, 2022

Applicant: Intel Corporation

Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO IMPROVE IN-MEMORY MULTIPLY AND ACCUMULATE OPERATIONS

Publication number: 20210111722

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20210110187

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
TILE-BASED SPARSITY AWARE DATAFLOW OPTIMIZATION FOR SPARSE DATA

Publication number: 20210090328

Abstract: Systems, apparatuses and methods provide technology for optimizing processing of sparse data, such as 3D pointcloud data sets. The technology may include generating a locality-aware rulebook based on an input unstructured sparse data set, such as a 3D pointcloud data set, the locality-aware rulebook storing spatial neighborhood information for active voxels in the input unstructured sparse data set, computing an average receptive field (ARF) value based on the locality aware rulebook, and determining, from a plurality of tile size and loop order combinations, a tile size and loop order combination for processing the unstructured sparse data based on the computed ARF value. The technology may also include providing the locality-aware rulebook and the tile size and loop order combination to a compute engine such as a neural network, the compute engine to process the unstructured sparse data using the locality aware rulebook and the tile size and loop order combination.

Type: Application

Filed: December 7, 2020

Publication date: March 25, 2021

Inventors: Prashant Laddha, Anirud Thyagharajan, Om Ji Omer, Sreenivas Subramoney
ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

Publication number: 20200320375

Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

Type: Application

Filed: June 23, 2020

Publication date: October 8, 2020

Applicant: Intel Corporation

Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking

1 2 next