Patents by Inventor Bharadwaj Pudipeddi

Bharadwaj Pudipeddi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210232451
    Abstract: Embodiments of the present disclosure include an error recovery method comprising detecting a computing error, restarting a first artificial intelligence processor of a plurality of artificial intelligence processors processing a data set, and loading a model in the artificial intelligence processor, wherein the model corresponds to a same model processed by the plurality of artificial intelligence processors during a previous processing iteration by the plurality of artificial intelligence processors on data from the data set.
    Type: Application
    Filed: March 27, 2020
    Publication date: July 29, 2021
    Inventors: Bharadwaj PUDIPEDDI, Maral MESMAKHOSROSHAHI, Jinwen XI, Saurabh M. KULKARNI, Marc TREMBLAY, Matthias BAENNINGER, Nuno CLAUDINO PEREIRA LOPES
  • Publication number: 20210064986
    Abstract: Systems, methods, and apparatuses are provided for compressing values. A plurality of parameters may be obtained from a memory, each parameter comprising a floating-point number that is used in a relationship between artificial neurons or nodes in a model. A mantissa value and an exponent value may be extracted from each floating-point number to generate a set of mantissa values and a set of exponent values. The set of mantissa values may be compressed to generate a mantissa lookup table (LUT) and a plurality of mantissa LUT index values. The set of exponent values may be encoded to generate an exponent LUT and a plurality of exponent LUT index values. The mantissa LUT, mantissa LUT index values, exponent LUT, and exponent LUT index values may be provided to one or more processing entities to train the model.
    Type: Application
    Filed: September 3, 2019
    Publication date: March 4, 2021
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi, Marc Tremblay
  • Publication number: 20210019634
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. This paradigm of executing one portion of the AI model at a time allows for dynamic execution of the large AI model.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 21, 2021
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20210019152
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be adjusted to reduce the communication overhead. Multi-level parallel parameters reduction may be performed at the parameter server and the target device.
    Type: Application
    Filed: September 30, 2019
    Publication date: January 21, 2021
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Sujeeth Subramanya Bharadwaj, Devangkumar Patel, Jinwen Xi, Maral Mesmakhosroshahi
  • Publication number: 20210019151
    Abstract: Methods, systems, apparatuses, and computer program products are described herein that enable execution of a large AI model on a memory-constrained target device that is communicatively connected to a parameter server, which stores a master copy of the AI model. The AI model may be dissected into smaller portions (e.g., layers or sub-layers), and each portion may be executed as efficiently as possible on the target device. After execution of one portion of the AI model is finished, another portion of the AI model may be downloaded and executed at the target device. To improve efficiency, the input samples may be divided into microbatches, and a plurality of microbatches executing in sequential order may form a minibatch. The size of the group of microbatches or minibatch can be manually or automatically adjusted to reduce the communication overhead.
    Type: Application
    Filed: September 20, 2019
    Publication date: January 21, 2021
    Inventors: Bharadwaj Pudipeddi, Marc Tremblay, Gautham Popuri, Layali Rashid, Tiyasa Mitra, III, Mohit Mittal, Maral Mesmakhosroshahi
  • Publication number: 20200342288
    Abstract: A distributed training system including a parameter server is configured to compress the weight metrices according to a clustering algorithm, with the compressed representation of the weight matrix may thereafter distributed to training workers. The compressed representation may comprise a centroid index matrix and a centroid table, wherein each element of the centroid index matrix corresponds to an element of the corresponding weight matrix and comprises an index into the centroid table, and wherein each element of the centroid table comprises a centroid value. In a further example aspect, a training worker may compute an activation result directly from the compressed representation of a weight matrix and a training data matrix by performing gather-reduce-add operations that accumulate all the elements of the training data matrix that correspond to the same centroid value to generate partial sums, multiplying each partial sum by its corresponding centroid value, and summing the resulting products.
    Type: Application
    Filed: September 26, 2019
    Publication date: October 29, 2020
    Inventors: Jinwen Xi, Bharadwaj Pudipeddi
  • Patent number: 10627798
    Abstract: In an embodiment of the invention, an apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; a field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA; wherein the apparatus triggers a switch of an FPGA image in the FPGA to another FPGA image. In another embodiment of the invention, a method comprises: triggering, by an apparatus, a switch of an FPGA image in a field programmable gate array (FPGA) to another FPGA image; herein the apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; the field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: April 21, 2020
    Assignee: BiTMICRO Networks, Inc.
    Inventors: Federico Sambilay, Jr., Bharadwaj Pudipeddi, Richard A. Cantong, Joevanni Parairo
  • Publication number: 20190155735
    Abstract: In an embodiment of the invention, an apparatus comprises: a central processing unit (CPU); a volatile memory controller; a non-volatile memory controller; a volatile memory coupled to the volatile memory controller; and a non-volatile memory coupled to the non-volatile memory controller; wherein a ratio of the non-volatile memory to the volatile memory is much less than a typical ratio. In another embodiment of the invention, a method comprises: receiving, by a Central Processing Unit (CPU) receives a command; evaluating, by the CPU, the command; executing, by the CPU, a data software assist to perform the command or activating, by the CPU, a hardware accelerator module to perform the command; and responding, by the CPU, to the command.
    Type: Application
    Filed: June 29, 2018
    Publication date: May 23, 2019
    Inventors: Bharadwaj Pudipeddi, Richard A. Cantong, Marlon B. Verdan, Joevanni Parairo, Marvin Fenol
  • Publication number: 20190158384
    Abstract: In an embodiment of the invention, an apparatus comprises: a requestor configured to transmit a first operand and a second operand, wherein the first operand is partitioned; a shared network configured to transmit the operands; a processing load balancer for receiving the operands; a plurality of processing elements that are configured to process the operands; and a private network configured to multicast the operands to the processing elements. In another embodiment of the invention, a method comprises: transmitting a first operand and a second operand from a requestor, wherein the first operand is partitioned; transmitting the operands along a shared network; receiving the operands by a processing load balancer; multicasting the operands by a private network; and processing the operands by a plurality of processing elements.
    Type: Application
    Filed: June 25, 2018
    Publication date: May 23, 2019
    Inventors: Bharadwaj Pudipeddi, Federico Sambilay, Richard A. Cantong
  • Publication number: 20190129882
    Abstract: Disclosed techniques include platform optimization for multi-platform module design for performance scalability. A compute platform pluggable module form factor and functionality is obtained, where the form factor enables single socket plugging within a plurality of sockets on a compute platform. The form factor employs electrical connections in each socket. A scaling form factor commensurate with adjacent sockets on the compute platform is established. The adjacent sockets each provide similar functionality for modules, and the adjacent sockets can be used interchangeably without loss of functionality of the compute platform. A single, integrated, rigid module is provided according to the scaling form factor that plugs into the adjacent sockets of the compute platform. The module provides expanded functionality over a single-plug form factor module. The expanded functionality is enabled through use of the electrical connections of the adjacent sockets.
    Type: Application
    Filed: October 30, 2018
    Publication date: May 2, 2019
    Inventors: Bharadwaj Pudipeddi, Anthony Gallippi, Vijay Devadiga
  • Patent number: 10216596
    Abstract: Embodiments of the invention provide a system and method to vastly improve the remote write latency (write to remote server) and to reduce the load that is placed on the remote server by issuing auto-log (automatic log) writes through an integrated networking port in the SSD (solid state drive). Embodiments of the invention also provide a system and method for a PCI-e attached SSD to recover after a failure detection by appropriating a remote namespace.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: February 26, 2019
    Assignee: BiTMICRO Networks, Inc.
    Inventor: Bharadwaj Pudipeddi
  • Publication number: 20190018386
    Abstract: In an embodiment of the invention, an apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; a field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA; wherein the apparatus triggers a switch of an FPGA image in the FPGA to another FPGA image. In another embodiment of the invention, a method comprises: triggering, by an apparatus, a switch of an FPGA image in a field programmable gate array (FPGA) to another FPGA image; herein the apparatus comprises: a non-volatile memory device; a complex programmable logic device (CPLD) coupled to the non-volatile memory device; the field programmable gate array (FPGA) coupled to the CPLD; and a host coupled to the FPGA.
    Type: Application
    Filed: June 29, 2018
    Publication date: January 17, 2019
    Inventors: Federico Sambilay Jr., Bharadwaj Pudipeddi, Richard A. Cantong, Joevanni Parairo
  • Patent number: 10007561
    Abstract: The invention is an apparatus for dynamic provisioning available as a multi-mode device that can be dynamically configured for balancing between storage performance and hardware acceleration resources on reconfigurable hardware such as an FPGA. An embodiment of the invention provides a cluster of these multi-mode devices that form a group of resilient Storage and Acceleration elements without requiring a dedicated standby storage spare. Yet another embodiment of the invention provides an interconnection network attached cluster configured to dynamically provision full acceleration and storage resources to meet an application's needs and end-of-life requirements of an SSD.
    Type: Grant
    Filed: April 10, 2017
    Date of Patent: June 26, 2018
    Assignee: BiTMICRO Networks, Inc.
    Inventors: Bharadwaj Pudipeddi, Jeffrey Bunting, Lihan Chang
  • Patent number: 7813288
    Abstract: A transaction detection device is described having inputs to couple to communication lines. Each communication line to transport notification of a packet observed by a link probe within a computing system containing point-to-point links between nodes. Each of said nodes having at least one processing core. The transaction detection device also comprises logic circuitry to determine from said notifications whether a looked for transaction has occurred within said computing system.
    Type: Grant
    Filed: November 21, 2005
    Date of Patent: October 12, 2010
    Assignee: Intel Corporation
    Inventors: Robert Roth, Bharadwaj Pudipeddi, Richard Glass, Madhu Athreya
  • Patent number: 7779210
    Abstract: In one embodiment, the present invention includes a method for receiving a request for data in a home agent of a system from a first agent, prefetching the data from a memory and accessing a directory entry to determine whether a copy of the data is cached in any system agent, and forwarding the data to the first agent without waiting for snoop responses from other system agents if the directory entry indicates that the data is not cached. Other embodiments are described and claimed.
    Type: Grant
    Filed: October 31, 2007
    Date of Patent: August 17, 2010
    Assignee: Intel Corporation
    Inventors: Bharadwaj Pudipeddi, Ghassan Khadder
  • Patent number: 7765352
    Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.
    Type: Grant
    Filed: September 1, 2009
    Date of Patent: July 27, 2010
    Assignee: Intel Corporation
    Inventors: Bharadwaj Pudipeddi, James S. Burns
  • Publication number: 20090319712
    Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.
    Type: Application
    Filed: September 1, 2009
    Publication date: December 24, 2009
    Inventors: Bharadwaj Pudipeddi, James S. Burns
  • Patent number: 7603504
    Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.
    Type: Grant
    Filed: December 18, 2007
    Date of Patent: October 13, 2009
    Assignee: Intel Corporation
    Inventors: Bharadwaj Pudipeddi, James S. Burns
  • Publication number: 20090158068
    Abstract: A power control unit (PCU) may reduce the core wake-up latency in a computer system by concurrently waking-up the remaining cores after the first core is woken-up. The power control unit may detect arrival of a first, second, and a third interrupt directed at a first, second, and a third core. The power control unit may check whether the second interrupt occurs within a first period, wherein the first period is counted after waking-up of the first core is complete. The power control unit may then wake-up the second and the third core concurrently if the second interrupt occurs within the first period after the wake-up activity of the first core is complete. The first period may at least equal twice the time required for a first credit to be returned and next credit to be accepted.
    Type: Application
    Filed: December 18, 2007
    Publication date: June 18, 2009
    Inventors: Bharadwaj Pudipeddi, James S. Burns
  • Publication number: 20090113139
    Abstract: In one embodiment, the present invention includes a method for receiving a request for data in a home agent of a system from a first agent, prefetching the data from a memory and accessing a directory entry to determine whether a copy of the data is cached in any system agent, and forwarding the data to the first agent without waiting for snoop responses from other system agents if the directory entry indicates that the data is not cached. Other embodiments are described and claimed.
    Type: Application
    Filed: October 31, 2007
    Publication date: April 30, 2009
    Inventors: Bharadwaj Pudipeddi, Ghassan Khadder