Patents by Inventor Om Ji Omer

Om Ji Omer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPARATUSES, METHODS, AND SYSTEMS FOR INSTRUCTIONS FOR MATRIX TRANSPOSE

Publication number: 20250217149

Abstract: Examples detailed herein at least include transpose circuitry that is external to a matrix operations accelerator. In some examples, the transpose circuitry at least includes a plurality of transpose engines to transpose a source matrix operand of a single instruction to generate a transposed source matrix, and control circuitry to direct the plurality of transpose engines to alternately operate in a parallel loading mode and a serial loading mode to generate the transposed source matrix, wherein the plurality of transposes engines and the control circuitry are at least a portion of transpose circuitry.

Type: Application

Filed: December 30, 2023

Publication date: July 3, 2025

Inventors: Kamlesh PILLAI, Christopher J. HUGHES, Vinay JOSHI, Om Ji OMER
SYSTEMS, METHOD, AND APPARATUS FOR QUALITY AND CAPACITY-AWARE GROUPED QUERY ATTENTION

Publication number: 20250021819

Abstract: Systems, apparatus, articles of manufacture, and methods for quality and capacity-aware grouped query attention are disclosed. To accomplish such groupings, example instructions cause a machine to create a plurality of groups of query heads present in a key value cache using an evolutionary algorithm based on at least two objectives, quantify an amount of error introduced by a first group of query heads in the plurality of groups of query heads, and retain the query heads of the first group of query heads in a non-grouped arrangement when the error meets an error threshold.

Type: Application

Filed: September 27, 2024

Publication date: January 16, 2025

Applicant: Intel Corporation

Inventors: Vinay Joshi, Om Ji Omer, Prashant Laddha, Shambhavi Sinha
High end imaging radar

Patent number: 12140696

Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.

Type: Grant

Filed: July 14, 2021

Date of Patent: November 12, 2024

Assignee: Intel Corporation

Inventors: Chulong Chen, Wenling Margaret Huang, Saiveena Kesaraju, Ivan Simões Gaspar, Pradyumna S. Singh, Biji George, Dipan Kumar Mandal, Om Ji Omer, Sreenivas Subramoney, Yuval Amizur, Leor Banin, Hao Chen, Nir Dvorecki, Shengbo Xu
PROGRAMMABLE LOOK UP TABLE FREE HARDWARE ACCELERATOR AND INSTRUCTION SET ARCHITECTURE FOR ACTIVATION FUNCTIONS

Publication number: 20240289168

Abstract: Systems, apparatuses and methods may provide for technology that identifies a type of a first activation function, identifies a derivative level of the first activation function, and generates a first instruction based on the type of the first activation function and the derivative level of the first activation function. The technology also includes an accelerator having logic coupled to one or more substrates, the logic including a compute engine including a plurality of arithmetic operators, a multiplexer network coupled to the compute engine, and a controller coupled to the multiplexer network, the controller to detect the first instruction, decode the first instruction to identify the first activation function, and drive the multiplexer network to form first connections between two or more of the plurality of arithmetic operators in accordance with the first activation function, wherein the first connections are to cause the compute engine to conduct the first activation function.

Type: Application

Filed: November 10, 2023

Publication date: August 29, 2024

Inventors: Krishnan Ananthanarayanan, Martin Langhammer, Om Ji Omer, Bogdan Pasca, Kamlesh Pillai, Pramod Udupa
SPARSITY-AWARE PERFORMANCE BOOST IN COMPUTE-IN-MEMORY CORES FOR DEEP NEURAL NETWORK ACCELERATION

Publication number: 20240201949

Abstract: Systems, apparatuses and methods may provide for technology that includes a compute-in-memory (CiM) enabled memory array to conduct digital bit-serial multiply and accumulate (MAC) operations on multi-bit input data and weight data stored in the CiM enabled memory array, an adder tree coupled to the CiM enabled memory array, an accumulator coupled to the adder tree, and an input bit selection stage coupled to the CiM enabled memory array, wherein the input bit selection stage restricts serial bit selection on the multi-bit input data to non-zero values during the digital MAC operations.

Type: Application

Filed: February 28, 2024

Publication date: June 20, 2024

Inventors: Sagar Varma Sayyaparaju, Om Ji Omer, Sreenivas Subramoney
Methods, apparatus, and articles of manufacture to improve in-memory multiply and accumulate operations

Patent number: 11949414

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 2, 2024

Assignee: INTEL CORPORATION

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
Applying self-confidence in multi-label classification to model training

Patent number: 11875555

Abstract: A computer model is trained to classify regions of a space (e.g., a pixel of an image or a voxel of a point cloud) according to a multi-label classification. To improve the model's accuracy, the model's self-confidence is determined with respect to its own predictions of regions in a training space. The self-confidence is determined based on the class predictions, such as a difference between the highest-predicted class and a second-highest-predicted class. When these are similar, it may reflect areas for potential improvement by focusing training on these low-confidence areas. Additional training may be performed by including modified training data in subsequent training iterations that focuses on low-confidence areas. As another example, additional training may be performed using the self-confidence to modify a classification loss used to refine parameters of the model.

Type: Grant

Filed: November 24, 2021

Date of Patent: January 16, 2024

Assignee: Intel Corporation

Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
ACCELERATING NEURAL NETWORKS WITH LOW PRECISION-BASED MULTIPLICATION AND EXPLOITING SPARSITY IN HIGHER ORDER BITS

Publication number: 20240005135

Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

Type: Application

Filed: April 18, 2023

Publication date: January 4, 2024

Applicant: Intel Corporation

Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
MAXIMIZING ON-CHIP DATA REUSE IN COMPUTE IN MEMORY AND COMPUTE NEAR MEMORY ARCHITECTURES

Publication number: 20230333999

Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.

Type: Application

Filed: June 22, 2023

Publication date: October 19, 2023

Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11783170

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: January 25, 2023

Date of Patent: October 10, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits

Patent number: 11714998

Abstract: An apparatus to facilitate accelerating neural networks with low precision-based multiplication and exploiting sparsity in higher order bits is disclosed. The apparatus includes a processor comprising a re-encoder to re-encode a first input number of signed input numbers represented in a first precision format as part of a machine learning model, the first input number re-encoded into two signed input numbers of a second precision format, wherein the first precision format is a higher precision format than the second precision format. The processor further includes a multiply-add circuit to perform operations in the first precision format using the two signed input numbers of the second precision format; and a sparsity hardware circuit to reduce computing on zero values at the multiply-add circuit, wherein the processor to execute the machine learning model using the re-encoder, the multiply-add circuit, and the sparsity hardware circuit.

Type: Grant

Filed: June 23, 2020

Date of Patent: August 1, 2023

Assignee: INTEL CORPORATION

Inventors: Avishaii Abuhatzera, Om Ji Omer, Ritwika Chowdhury, Lance Hacking
METHOD AND APPARATUS FOR LEVERAGING SIMULTANEOUS MULTITHREADING FOR BULK COMPUTE OPERATIONS

Publication number: 20230205692

Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.

Type: Application

Filed: December 23, 2021

Publication date: June 29, 2023

Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
SPARSE INDEX GENERATOR

Publication number: 20230169315

Abstract: Systems, apparatuses and methods may provide for technology that generates a vector output based on a bitmask, wherein the vector output includes non-zero bit indices in a first portion of the vector output, and wherein the non-zero bit indices correspond to non-zero values in the bitmask. The technology may also generate an offset based on the bitmask, wherein the offset indicates a start position in the vector output for the non-zero bit indices.

Type: Application

Filed: September 1, 2022

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Om Ji Omer
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20230169319

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11620818

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 4, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
HIGH END IMAGING RADAR

Publication number: 20220196798

Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.

Type: Application

Filed: July 14, 2021

Publication date: June 23, 2022

Inventors: Chulong CHEN, Wenling Margaret HUANG, Saiveena KESARAJU, Ivan SIMÕES GASPAR, Pradyumna S. SINGH, Biji GEORGE, Dipan Kumar MANDAL, Om Ji OMER, Sreenivas SUBRAMONEY, Yuval AMIZUR, Leor BANIN, Hao CHEN, Nir DVORECKI, Shengbo XU
Methods, apparatus, articles of manufacture to perform accelerated matrix multiplication

Patent number: 11347828

Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.

Type: Grant

Filed: March 27, 2020

Date of Patent: May 31, 2022

Assignee: Intel Corporation

Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
SEGMENT FUSION BASED ROBUST SEMANTIC SEGMENTATION OF SCENES

Publication number: 20220148311

Abstract: Systems, apparatuses and methods may provide for technology that identifies a plurality of segments based on semantic features and instance features associated with a scene, fuses the plurality of segments into a plurality of instances, and selects classification labels for the plurality of instances. In one example, the plurality of segments is fused into the plurality of instances via a learnable self-attention based network.

Type: Application

Filed: January 24, 2022

Publication date: May 12, 2022

Inventors: Anirud Thyagharajan, Prashant Laddha, Benjamin Ummenhofer, Om Ji Omer
MATRIX PROCESSING ENGINE WITH COUPLED DENSE AND SCALAR COMPUTE

Publication number: 20220114234

Abstract: A matrix processing engine is provided for efficient matrix computation performed by a dense matrix compute circuit (performing SIMD operations) and a scalar computing core (performing SISD operations). These two processing components operate together to produce output data tiles by feeding results of the dense SIMD operations to the scalar computing core using thread packing and an in-line buffer for accumulating and packing the dense result data. This permits the scalar computing to spawn threads to operate on the dense results as available and without requiring partial or intermediate data read/writes between the dense and scalar computations.

Type: Application

Filed: December 22, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Biji George, Sreenivas Subramoney, Om Ji Omer, Anoop Viswam
HARDWARE-SOFTWARE CO-DESIGNED MULTI-CAST FOR IN-MEMORY COMPUTING ARCHITECTURES

Publication number: 20220113974

Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.

Type: Application

Filed: December 23, 2021

Publication date: April 14, 2022

Applicant: INTEL CORPORATION

Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera

1 2 next