Patents by Inventor Bharadwaj Pudipeddi

Bharadwaj Pudipeddi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTEMS AND METHODS FOR ERROR RECOVERY

Publication number: 20210232451

Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.

Type: Application

Filed: March 27, 2020

Publication date: July 29, 2021

Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
LOSSLESS EXPONENT AND LOSSY MANTISSA WEIGHT COMPRESSION FOR TRAINING DEEP NEURAL NETWORKS

Publication number: 20210064986

Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.

Type: Application

Filed: September 3, 2019

Publication date: March 4, 2021

Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
DYNAMIC MULTI-LAYER EXECUTION FOR ARTIFICIAL INTELLIGENCE MODELING

Publication number: 20210019634

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.

Type: Application

Filed: September 30, 2019

Publication date: January 21, 2021

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
DATA PARALLELISM IN DISTRIBUTED TRAINING OF ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20210019152

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.

Type: Application

Filed: September 30, 2019

Publication date: January 21, 2021

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
EXECUTING LARGE ARTIFICIAL INTELLIGENCE MODELS ON MEMORY-CONSTRAINED DEVICES

Publication number: 20210019151

Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.

Type: Application

Filed: September 20, 2019

Publication date: January 21, 2021

Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Gautham Popuri, Layali Rashid, Tiyasa Mitra, III, Mohit Mittal, Maral Mesmakhosroshahi
DIRECT COMPUTATION WITH COMPRESSED WEIGHT IN TRAINING DEEP NEURAL NETWORK

Publication number: 20200342288

Abstract: A distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value. In a further example aspect, a training worker may compute an activation result directly from the compressed representation of a weight matrix and a training data matrix by performing gather-reduce-add operations that accumulate all the elements of the training data matrix that correspond to the same centroid value to generate partial sums, multiplying each partial sum by its corresponding centroid value, and summing the resulting products.

Type: Application

Filed: September 26, 2019

Publication date: October 29, 2020

Inventors: Jinwen Xi, Bharadwaj Pudipeddi
FPGA functionality mode switch-over

Patent number: 10627798

Abstract: In an embodiment of the invention, an apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; a field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA; wherein the apparatus triggers a switch of an FPGA image in the FPGA to another FPGA image. In another embodiment of the invention, a method comprises: triggering, by an apparatus, a switch of an FPGA image in a field programmable gate array (FPGA) to another FPGA image; herein the apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; the field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA.

Type: Grant

Filed: June 29, 2018

Date of Patent: April 21, 2020

Assignee: BiTMICRO Networks, Inc.

Inventors: Federico Sambilay, Jr., Bharadwaj Pudipeddi, Richard A. Cantong, Joevanni Parairo
Data Software System Assist

Publication number: 20190155735

Abstract: In an embodiment of the invention, an apparatus comprises: a central processing unit (CPU); a volatile memory controller; a non-volatile memory controller; a volatile memory coupled to the volatile memory controller; and a non-volatile memory coupled to the non-volatile memory controller; wherein a ratio of the non-volatile memory to the volatile memory is much less than a typical ratio. In another embodiment of the invention, a method comprises: receiving, by a Central Processing Unit (CPU) receives a command; evaluating, by the CPU, the command; executing, by the CPU, a data software assist to perform the command or activating, by the CPU, a hardware accelerator module to perform the command; and responding, by the CPU, to the command.

Type: Application

Filed: June 29, 2018

Publication date: May 23, 2019

Inventors: Bharadwaj Pudipeddi, Richard A. Cantong, Marlon B. Verdan, Joevanni Parairo, Marvin Fenol
Efficient Operand Multicast For Acceleration

Publication number: 20190158384

Abstract: In an embodiment of the invention, an apparatus comprises: a requestor configured to transmit a first operand and a second operand, wherein the first operand is partitioned; a shared network configured to transmit the operands; a processing load balancer for receiving the operands; a plurality of processing elements that are configured to process the operands; and a private network configured to multicast the operands to the processing elements. In another embodiment of the invention, a method comprises: transmitting a first operand and a second operand from a requestor, wherein the first operand is partitioned; transmitting the operands along a shared network; receiving the operands by a processing load balancer; multicasting the operands by a private network; and processing the operands by a plurality of processing elements.

Type: Application

Filed: June 25, 2018

Publication date: May 23, 2019

Inventors: Bharadwaj Pudipeddi, Federico Sambilay, Richard A. Cantong
MULTI-CONNECTOR MODULE DESIGN FOR PERFORMANCE SCALABILITY

Publication number: 20190129882

Abstract: Disclosed techniques include platform optimization for multi-platform module design for performance scalability. A compute platform pluggable module form factor and functionality is obtained, where the form factor enables single socket plugging within a plurality of sockets on a compute platform. The form factor employs electrical connections in each socket. A scaling form factor commensurate with adjacent sockets on the compute platform is established. The adjacent sockets each provide similar functionality for modules, and the adjacent sockets can be used interchangeably without loss of functionality of the compute platform. A single, integrated, rigid module is provided according to the scaling form factor that plugs into the adjacent sockets of the compute platform. The module provides expanded functionality over a single-plug form factor module. The expanded functionality is enabled through use of the electrical connections of the adjacent sockets.

Type: Application

Filed: October 30, 2018

Publication date: May 2, 2019

Inventors: Bharadwaj Pudipeddi, Anthony Gallippi, Vijay Devadiga
Fast consistent write in a distributed system

Patent number: 10216596

Abstract: Embodiments of the invention provide a system and method to vastly improve the remote write latency (write to remote server) and to reduce the load that is placed on the remote server by issuing auto-log (automatic log) writes through an integrated networking port in the SSD (solid state drive). Embodiments of the invention also provide a system and method for a PCI-e attached SSD to recover after a failure detection by appropriating a remote namespace.

Type: Grant

Filed: December 31, 2016

Date of Patent: February 26, 2019

Assignee: BiTMICRO Networks, Inc.

Inventor: Bharadwaj Pudipeddi
FPGA Functionality Mode Switch-Over

Publication number: 20190018386

Abstract: In an embodiment of the invention, an apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; a field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA; wherein the apparatus triggers a switch of an FPGA image in the FPGA to another FPGA image. In another embodiment of the invention, a method comprises: triggering, by an apparatus, a switch of an FPGA image in a field programmable gate array (FPGA) to another FPGA image; herein the apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; the field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA.

Type: Application

Filed: June 29, 2018

Publication date: January 17, 2019

Inventors: Federico Sambilay Jr., Bharadwaj Pudipeddi, Richard A. Cantong, Joevanni Parairo
Multi-mode device for flexible acceleration and storage provisioning

Patent number: 10007561

Abstract: The invention is an apparatus for dynamic provisioning available as a multi-mode device that can be dynamically configured for balancing between storage performance and hardware acceleration resources on reconfigurable hardware such as an FPGA. An embodiment of the invention provides a cluster of these multi-mode devices that form a group of resilient Storage and Acceleration elements without requiring a dedicated standby storage spare. Yet another embodiment of the invention provides an interconnection network attached cluster configured to dynamically provision full acceleration and storage resources to meet an application's needs and end-of-life requirements of an SSD.

Type: Grant

Filed: April 10, 2017

Date of Patent: June 26, 2018

Assignee: BiTMICRO Networks, Inc.

Inventors: Bharadwaj Pudipeddi, Jeffrey Bunting, Lihan Chang
Transaction detection in link based computing system

Patent number: 7813288

Abstract: A transaction detection device is described having inputs to couple to communication lines. Each communication line to transport notification of a packet observed by a link probe within a computing system containing point-to-point links between nodes. Each of said nodes having at least one processing core. The transaction detection device also comprises logic circuitry to determine from said notifications whether a looked for transaction has occurred within said computing system.

Type: Grant

Filed: November 21, 2005

Date of Patent: October 12, 2010

Assignee: Intel Corporation

Inventors: Robert Roth, Bharadwaj Pudipeddi, Richard Glass, Madhu Athreya
Avoiding snoop response dependency

Patent number: 7779210

Abstract: In one embodiment, the present invention includes a method for receiving a request for data in a home agent of a system from a first agent, prefetching the data from a memory and accessing a directory entry to determine whether a copy of the data is cached in any system agent, and forwarding the data to the first agent without waiting for snoop responses from other system agents if the directory entry indicates that the data is not cached. Other embodiments are described and claimed.

Type: Grant

Filed: October 31, 2007

Date of Patent: August 17, 2010

Assignee: Intel Corporation

Inventors: Bharadwaj Pudipeddi, Ghassan Khadder
Reducing core wake-up latency in a computer system

Patent number: 7765352

Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.

Type: Grant

Filed: September 1, 2009

Date of Patent: July 27, 2010

Assignee: Intel Corporation

Inventors: Bharadwaj Pudipeddi, James S. Burns
REDUCING CORE WAKE-UP LATENCY IN A COMPUTER SYSTEM

Publication number: 20090319712

Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.

Type: Application

Filed: September 1, 2009

Publication date: December 24, 2009

Inventors: Bharadwaj Pudipeddi, James S. Burns
Reducing core wake-up latency in a computer system

Patent number: 7603504

Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.

Type: Grant

Filed: December 18, 2007

Date of Patent: October 13, 2009

Assignee: Intel Corporation

Inventors: Bharadwaj Pudipeddi, James S. Burns
REDUCING CORE WAKE-UP LATENCY IN A COMPUTER SYSTEM

Publication number: 20090158068

Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.

Type: Application

Filed: December 18, 2007

Publication date: June 18, 2009

Inventors: Bharadwaj Pudipeddi, James S. Burns
Avoiding snoop response dependency

Publication number: 20090113139

Abstract: In one embodiment, the present invention includes a method for receiving a request for data in a home agent of a system from a first agent, prefetching the data from a memory and accessing a directory entry to determine whether a copy of the data is cached in any system agent, and forwarding the data to the first agent without waiting for snoop responses from other system agents if the directory entry indicates that the data is not cached. Other embodiments are described and claimed.

Type: Application

Filed: October 31, 2007

Publication date: April 30, 2009

Inventors: Bharadwaj Pudipeddi, Ghassan Khadder

prev 1 2 3 next