Patents by Inventor Vikram Saletore
Vikram Saletore has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11966843Abstract: Methods, apparatus, systems and articles of manufacture for distributed training of a neural network are disclosed. An example apparatus includes a neural network trainer to select a plurality of training data items from a training data set based on a toggle rate of each item in the training data set. A neural network parameter memory is to store neural network training parameters. A neural network processor is to generate training data results from distributed training over multiple nodes of the neural network using the selected training data items and the neural network training parameters. The neural network trainer is to synchronize the training data results and to update the neural network training parameters.Type: GrantFiled: June 13, 2022Date of Patent: April 23, 2024Assignee: Intel CorporationInventors: Meenakshi Arunachalam, Arun Tejusve Raghunath Rajan, Deepthi Karkada, Adam Procter, Vikram Saletore
-
Publication number: 20230274157Abstract: Systems and methods include technology that identifies a first shard of shards of training data, where the training data is stored on a storage array, where the training data is divided into shards. The technology stores the first shard onto a first data storage of a first compute node, and copies the first shard from the first data storage of the first compute node onto a second data storage of the first compute node. The technology trains a machine learning model on the first shard stored in the second data storage during a first epoch of a training phase.Type: ApplicationFiled: May 4, 2023Publication date: August 31, 2023Applicant: Intel CorporationInventors: Vikram Saletore, Yaser Afshar
-
Publication number: 20220309349Abstract: Methods, apparatus, systems and articles of manufacture for distributed training of a neural network are disclosed. An example apparatus includes a neural network trainer to select a plurality of training data items from a training data set based on a toggle rate of each item in the training data set. A neural network parameter memory is to store neural network training parameters. A neural network processor is to generate training data results from distributed training over multiple nodes of the neural network using the selected training data items and the neural network training parameters. The neural network trainer is to synchronize the training data results and to update the neural network training parameters.Type: ApplicationFiled: June 13, 2022Publication date: September 29, 2022Inventors: Meenakshi Arunachalam, Arun Tejusve Raghunath Rajan, Deepthi Karkada, Adam Procter, Vikram Saletore
-
Patent number: 11029971Abstract: Systems, apparatuses and methods may provide for technology that identifies a first set of compute nodes and a second set of compute nodes, wherein the first set of compute nodes execute more slowly than the second set of compute nodes. The technology may also automatically determine a compute node configuration that results in a relatively low difference in completion time between the first set of compute nodes and the second set of compute nodes with respect to a neural network workload. In an example, the technology applies the compute node configuration to an execution of the neural network workload on one or more nodes in the first set of compute nodes and one or more nodes in the second set of compute nodes.Type: GrantFiled: January 28, 2019Date of Patent: June 8, 2021Assignee: Intel CorporationInventors: Meenakshi Arunachalam, Kushal Datta, Vikram Saletore, Vishal Verma, Deepthi Karkada, Vamsi Sripathi, Rahul Khanna, Mohan Kumar
-
Patent number: 10922610Abstract: Systems, apparatuses and methods may provide for technology that conducts a first timing measurement of a blockage timing of a first window of the training of the neural network. The blockage timing measures a time that processing is impeded at layers of the neural network during the first window of the training due to synchronization of one or more synchronizing parameters of the layers. Based upon the first timing measurement, the technology is to determine whether to modify a synchronization barrier policy to include a synchronization barrier to impede synchronization of one or more synchronizing parameters of one of the layers during a second window of the training. The technology is further to impede the synchronization of the one or more synchronizing parameters of the one of the layers during the second window if the synchronization barrier policy is modified to include the synchronization barrier.Type: GrantFiled: September 14, 2017Date of Patent: February 16, 2021Assignee: Intel CorporationInventors: Adam Procter, Vikram Saletore, Deepthi Karkada, Meenakshi Arunachalam
-
Publication number: 20190155620Abstract: Systems, apparatuses and methods may provide for technology that identifies a first set of compute nodes and a second set of compute nodes, wherein the first set of compute nodes execute more slowly than the second set of compute nodes. The technology may also automatically determine a compute node configuration that results in a relatively low difference in completion time between the first set of compute nodes and the second set of compute nodes with respect to a neural network workload. In an example, the technology applies the compute node configuration to an execution of the neural network workload on one or more nodes in the first set of compute nodes and one or more nodes in the second set of compute nodes.Type: ApplicationFiled: January 28, 2019Publication date: May 23, 2019Inventors: Meenakshi Arunachalam, Kushal Datta, Vikram Saletore, Vishal Verma, Deepthi Karkada, Vamsi Sripathi, Rahul Khanna, Mohan Kumar
-
Publication number: 20190080233Abstract: Systems, apparatuses and methods may provide for technology that conducts a first timing measurement of a blockage timing of a first window of the training of the neural network. The blockage timing measures a time that processing is impeded at layers of the neural network during the first window of the training due to synchronization of one or more synchronizing parameters of the layers. Based upon the first timing measurement, the technology is to determine whether to modify a synchronization barrier policy to include a synchronization barrier to impede synchronization of one or more synchronizing parameters of one of the layers during a second window of the training. The technology is further to impede the synchronization of the one or more synchronizing parameters of the one of the layers during the second window if the synchronization barrier policy is modified to include the synchronization barrier.Type: ApplicationFiled: September 14, 2017Publication date: March 14, 2019Inventors: Adam Procter, Vikram Saletore, Deepthi Karkada, Meenakshi Arunachalam
-
Publication number: 20190042934Abstract: Methods, apparatus, systems and articles of manufacture for distributed training of a neural network are disclosed. An example apparatus includes a neural network trainer to select a plurality of training data items from a training data set based on a toggle rate of each item in the training data set. A neural network parameter memory is to store neural network training parameters. A neural network processor is to generate training data results from distributed training over multiple nodes of the neural network using the selected training data items and the neural network training parameters. The neural network trainer is to synchronize the training data results and to update the neural network training parameters.Type: ApplicationFiled: December 1, 2017Publication date: February 7, 2019Inventors: Meenakshi Arunachalam, Arun Tejusve Raghunath Rajan, Deepthi Karkada, Adam Procter, Vikram Saletore
-
Patent number: 9524219Abstract: Durable atomic transactions for non-volatile media are described. A processor includes an interface to a non-volatile storage medium and a functional unit to perform instructions associated with an atomic transaction. The instructions are to update data at a set of addresses in the non-volatile storage medium atomically. The functional unit is operable to perform a first instruction to create the atomic transaction that declares a size of the data to be updated atomically. The functional unit is also operable to perform a second instruction to start execution of the atomic transaction. The functional unit is further operable to perform a third instruction to commit the atomic transaction to the set of addresses in the non-volatile storage medium, wherein the updated data is not visible to other functional units of the processing device until the atomic transaction is complete.Type: GrantFiled: September 27, 2013Date of Patent: December 20, 2016Assignee: Intel CorporationInventors: Robert Bahnsen, Sridharan Sakthivelu, Vikram A. Saletore, Krishnaswamy Viswanathan, Matthew E. Tolentino, Kanivenahalli Govindaraju, Vincent J. Zimmer
-
Publication number: 20150095600Abstract: Durable atomic transactions for non-volatile media are described. A processor includes an interface to a non-volatile storage medium and a functional unit to perform instructions associated with an atomic transaction. The instructions are to update data at a set of addresses in the non-volatile storage medium atomically. The functional unit is operable to perform a first instruction to create the atomic transaction that declares a size of the data to be updated atomically. The functional unit is also operable to perform a second instruction to start execution of the atomic transaction. The functional unit is further operable to perform a third instruction to commit the atomic transaction to the set of addresses in the non-volatile storage medium, wherein the updated data is not visible to other functional units of the processing device until the atomic transaction is complete.Type: ApplicationFiled: September 27, 2013Publication date: April 2, 2015Inventors: Robert Bahnsen, Sridharan Sakthivelu, Vikram A. Saletore, Krishnaswamy Viswanathan, Matthew E. Tolentino, Kanivenahalli Govindaraju, Vincent J. Zimmer
-
Patent number: 7305493Abstract: An apparatus and a system may include an adaptation module, a plurality of Direct Transport Interfaces (DTIs), a DTI accelerator, and a Transport Control Protocol/Internet Protocol (TCP/IP) accelerator. The adaptation module may provide a translated sockets call from an application program to one of the DTIs, where an included set of memory structures may couple the translated sockets call to the DTI accelerator, which may in turn couple the set of memory structures to the TCP/IP accelerator. An article may include data causing a machine to perform a method including: receiving an application program sockets call at the adaptation module, deriving a translated sockets call from the application program sockets call, receiving the translated sockets call at a DTI, coupling the translated sockets call to a DTI accelerator using a set of memory structures in the DTI, and coupling the set of memory structures to a TCP/IP accelerator.Type: GrantFiled: November 27, 2002Date of Patent: December 4, 2007Assignee: Intel CorporationInventors: Gary L. McAlpine, David B. Minturn, Hemal V. Shah, Annie Foong, Greg J. Regnier, Vikram A. Saletore
-
Patent number: 7290114Abstract: Provided are a method, system, and program for sharing data in a user virtual address range with a kernel virtual address range. A user address in a user address space and length defining a user address range referencing physical locations in a memory are received. A determination is made of determining at least one page in the memory including the physical locations referenced by the user address range. For each determined page, one kernel address in a kernel address space is generated to reference the determined page, wherein at least one user address and at least one kernel address reference one page in the memory.Type: GrantFiled: November 17, 2004Date of Patent: October 30, 2007Assignee: Intel CorporationInventors: Paul M. Stillwell, Jr., Vikram A. Saletore
-
Publication number: 20060107020Abstract: Provided are a method, system, and program for sharing data in a user virtual address range with a kernel virtual address range. A user address in a user address space and length defining a user address range referencing physical locations in a memory are received. A determination is made of determining at least one page in the memory including the physical locations referenced by the user address range. For each determined page, one kernel address in a kernel address space is generated to reference the determined page, wherein at least one user address and at least one kernel address reference one page in the memory.Type: ApplicationFiled: November 17, 2004Publication date: May 18, 2006Inventors: Paul Stillwell, Vikram Saletore
-
Publication number: 20060072563Abstract: In general, the disclosure describes a variety of techniques that can enhance packet processing operations.Type: ApplicationFiled: October 5, 2004Publication date: April 6, 2006Inventors: Greg Regnier, Vikram Saletore, Gary McAlpine, Ram Huggahalli, Ravishankar Iyer, Ramesh Illikkal, David Minturn, Donald Newell, Srihari Makineni
-
Publication number: 20050188055Abstract: The present inventive subject matter relates to the field of network computing, and more specifically to methods, systems, and software for accelerated performance of server clusters, server farms, and server grids. Some such embodiments include methods, systems, and software, that when executing cause content into the memories of servers in a cluster, sharing the contents of the memories amongst all servers in the cluster over a high-speed interconnect to form a high-speed cluster-wide memory. Some such embodiments include servicing content requests from a server that may or may not have the requested content in its local memory, but is able to directly access the requested content in the memory of another server in the cluster over the high-speed cluster wide memory. One such embodiment includes caching the content obtained from the memory of the other server for use in servicing subsequent requests for that content.Type: ApplicationFiled: December 31, 2003Publication date: August 25, 2005Inventor: Vikram Saletore
-
Publication number: 20040103225Abstract: An apparatus and a system may include an adaptation module, a plurality of Direct Transport Interfaces (DTIs), a DTI accelerator, and a Transport Control Protocol/Internet Protocol (TCP/IP) accelerator. The adaptation module may provide a translated sockets call from an application program to one of the DTIs, where an included set of memory structures may couple the translated sockets call to the DTI accelerator, which may in turn couple the set of memory structures to the TCP/IP accelerator. An article may include data causing a machine to perform a method including: receiving an application program sockets call at the adaptation module, deriving a translated sockets call from the application program sockets call, receiving the translated sockets call at a DTI, coupling the translated sockets call to a DTI accelerator using a set of memory structures in the DTI, and coupling the set of memory structures to a TCP/IP accelerator.Type: ApplicationFiled: November 27, 2002Publication date: May 27, 2004Applicant: Intel CorporationInventors: Gary L. McAlpine, David B. Minturn, Hemal V. Shah, Annie Foong, Greg J. Regnier, Vikram A. Saletore
-
Patent number: 4438354Abstract: A switched capacitor gain stage (110, 120) having a programmable gain factor. This gain factor is determined by the connection of desired gain determining components (14-17; 25-28) contained within a component array (100, 101). A sample and hold circuit (46) is provided for the storage of the error voltage of the entire gain-integrator stage. This stored error voltage (V.sub.error) is inverted and integrated one time for each integration of the input voltage (V.sub.in), thus eliminating the effects of the inherent offset voltages of the circuit from the output voltage (V.sub.out).Type: GrantFiled: August 14, 1981Date of Patent: March 20, 1984Assignee: American Microsystems, IncorporatedInventors: Yusuf A. Haque, Vikram Saletore, Jeffrey A. Schuler
-
Patent number: 4431986Abstract: A digital to analog converter (100) utilizes a current mirror connected to a reference voltage (V.sub.REF) to generate a constant reference current (I.sub.REF). A voltage divider (R.sub.1 and R.sub.2) is used in conjunction with a plurality of MOS transistors (X.sub.1 -X.sub.N) serving as current mirrors having specific current carrying capabilities which are controlled by selected binary digits (bits) of a digital signal. By the appropriate connection of desired ones of said plurality of MOS transistors, a specific fraction of said reference current is caused to flow through said plurality of MOS transistors. The amount of current flowing through said plurality of MOS transistors generates an output voltage (V.sub.OUT) from the digital to analog converter of this invention. This output voltage may be positive or negative with respect to the reference voltage, thus the output voltage is bipolar.Type: GrantFiled: October 9, 1981Date of Patent: February 14, 1984Assignee: American Microsystems, IncorporatedInventors: Yusuf A. Haque, Vikram Saletore, Jeffrey A. Schuler