Patents by Inventor Swagath Venkataramani
Swagath Venkataramani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240143982Abstract: Fused channel and/or fused filter convolutions for fast deep neural network execution are provided. In one aspect, a system includes: a processor, connected to a memory, configured to: implement an approximated datapath in a deep neural network having a sequence of adders and multipliers for adding up operands to provide accumulated sums for two or more groups of neurons in the deep neural network, and multiplying the accumulated sums to obtain a product; and make an inference using the deep neural network based on the product from the approximated datapath. A method for approximation in a deep neural network is also provided.Type: ApplicationFiled: October 26, 2022Publication date: May 2, 2024Inventors: Swagath Venkataramani, Sarada Krithivasan, Vijayalakshmi Srinivasan
-
Publication number: 20240118892Abstract: Methods and apparatuses relating to processing neural networks are described. In one embodiment, an apparatus to process a neural network includes a plurality of fully connected layer chips coupled by an interconnect; a plurality of convolutional layer chips each coupled by an interconnect to a respective fully connected layer chip of the plurality of fully connected layer chips and each of the plurality of fully connected layer chips and the plurality of convolutional layer chips including an interconnect to couple each of a forward propagation compute intensive tile, a back propagation compute intensive tile, and a weight gradient compute intensive tile of a column of compute intensive tiles between a first memory intensive tile and a second memory intensive tile.Type: ApplicationFiled: December 18, 2023Publication date: April 11, 2024Inventors: Swagath VENKATARAMANI, Dipankar DAS, Ashish RANJAN, Subarno BANERJEE, Sasikanth AVANCHA, Ashok JAGANNATHAN, Ajaya V. DURG, Dheemanth NAGARAJ, Bharat KAUL, Anand RAGHUNATHAN
-
Patent number: 11941111Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.Type: GrantFiled: July 31, 2021Date of Patent: March 26, 2024Assignee: International Business Machines CorporationInventors: Sanchari Sen, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan, Sunil K. Shukla
-
Publication number: 20240029786Abstract: A memory system, a method of assembling the memory system, and a computer system. The memory system includes a global memory device coupled to a plurality of processing elements. The global memory device is positioned external to a chip on which the plurality of processing devices reside. The memory system also includes at least one main scratchpad coupled to the at least one processing element of the plurality of processing devices and the global memory device. The memory system further includes a plurality of auxiliary scratchpads coupled to the plurality of processing elements and the global memory device. The one or more auxiliary scratchpads are configured to store static tensors. At least a portion of the plurality of auxiliary scratchpads are configured as a unitary multichannel device.Type: ApplicationFiled: July 22, 2022Publication date: January 25, 2024Inventors: Ravi Nair, Swagath Venkataramani, Vijayalakshmi Srinivasan, Arvind Kumar
-
Publication number: 20240028899Abstract: Embodiments are provided for efficient realization of memory-bound operations in a computing system by a processor. Data may be read from and written to a memory at a granular level using a stickification operation. One or more regions of activation and weight tensor data on the memory may be annotated by coupling the stickification operation with padding.Type: ApplicationFiled: July 25, 2022Publication date: January 25, 2024Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Shubham JAIN, Sarada KRITHIVASAN, Sanchari SEN
-
Patent number: 11831467Abstract: Embodiments for providing enhanced multicast data transfer for ring topology based artificial intelligence systems are disclosed. Multicast data is sent to a plurality of disjointed cores in a multicast group according to a first multicast mode, a second multicast mode, or a third multicast mode, where the first multicast mode sends a first half the multicast data on first multicast ring and a second half on a second multicast ring, the second multicast mode sends the multicast data on either the first multicast ring or the second multicast ring, and the third multicast mode replicates the multicast data and sends the multicast data to both the first multicast ring and the second multicast ring.Type: GrantFiled: May 13, 2022Date of Patent: November 28, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shubham Jain, Swagath Venkataramani, Vijayalakshmi Srinivasan, Sunil K Shukla, Martin A Lutz
-
Publication number: 20230370304Abstract: Embodiments for providing enhanced multicast data transfer for ring topology based artificial intelligence systems are disclosed. Multicast data is sent to a plurality of disjointed cores in a multicast group according to a first multicast mode, a second multicast mode, or a third multicast mode, where the first multicast mode sends a first half the multicast data on first multicast ring and a second half on a second multicast ring, the second multicast mode sends the multicast data on either the first multicast ring or the second multicast ring, and the third multicast mode replicates the multicast data and sends the multicast data to both the first multicast ring and the second multicast ring.Type: ApplicationFiled: May 13, 2022Publication date: November 16, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shubham JAIN, Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Sunil K. SHUKLA, Martin A. LUTZ
-
Patent number: 11810340Abstract: A system includes a determination component that determines output for successively larger neural networks of a set; and a consensus component that determines consensus between a first neural network and a second neural network of the set. A linear chain of increasingly complex neural networks trained on progressively larger inputs is utilized (e.g., increasingly complex neural networks is generally representative of increased accuracy). Outputs of progressively networks are computed until a consensus point is reached—where two or more successive large networks yield a same inference output. At such point of consensus the larger neural network of the set reaching consensus can be deemed appropriately sized (or of sufficient complexity) for a classification task at hand.Type: GrantFiled: November 29, 2017Date of Patent: November 7, 2023Assignee: International Business Machines CorporationInventors: Pradip Bose, Alper Buyuktosunoglu, Schuyler Eldridge, Karthik V. Swaminathan, Swagath Venkataramani
-
Publication number: 20230344667Abstract: Embodiments for providing single-producer-multiple consumers synchronization and multicast data transfer by a processor are disclosed. Multicast data transfer is synchronized based on an identification tag and a request from each one of a plurality of recipients for the multicast data. The multicast data is transferred to each of the plurality of recipients based on the identification tag, the request from each one of the plurality of recipients, and a list of the plurality of recipients.Type: ApplicationFiled: April 22, 2022Publication date: October 26, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Vijayalakshmi SRINIVASAN, Scot RIDER, Swagath VENKATARAMANI, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA, Brian William CURRAN, Martin A. LUTZ
-
Publication number: 20230267003Abstract: Processing input data for transmittal to a data consumer such as an artificial intelligence engine is performed by arranging the input data into a uniform structure made up of sticks of data combined to form pages of sticks. A stick is any well-sized set of input data elements whereby the size of the stick is fixed. A masking pattern is established for sticks of data having certain ranges of invalid data for consumption of partial sticks while maintaining validity of the input data being transferred. The mask pattern is derived based on set-active-mask-and-value (SAMV) instructions. The derived mask pattern is carried forward for subsequent load instructions to the data consumer.Type: ApplicationFiled: February 23, 2022Publication date: August 24, 2023Inventors: Cedric Lichtenau, Vijayalakshmi Srinivasan, Sunil K Shukla, Swagath Venkataramani, Kailash Gopalakrishnan, Holger Horbach, Razvan Peter Figuli, Wei Wang, YULONG LI, Martin A Lutz
-
Patent number: 11681529Abstract: Systems, methods, and apparatuses relating to access synchronization in a shared memory are described. In one embodiment, a processor includes a decoder to decode an instruction into a decoded instruction, and an execution unit to execute the decoded instruction to: receive a first input operand of a memory address to be tracked and a second input operand of an allowed sequence of memory accesses to the memory address, and cause a block of a memory access that violates the allowed sequence of memory accesses to the memory address. In one embodiment, a circuit separate from the execution unit compares a memory address for a memory access request to one or more memory addresses in a tracking table, and blocks a memory access for the memory access request when a type of access violates a corresponding allowed sequence of memory accesses to the memory address for the memory access request.Type: GrantFiled: August 24, 2021Date of Patent: June 20, 2023Assignee: Intel CorporationInventors: Swagath Venkataramani, Dipankar Das, Sasikanth Avancha, Ashish Ranjan, Subarno Banerjee, Bharat Kaul, Anand Raghunathan
-
Patent number: 11669489Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.Type: GrantFiled: September 30, 2021Date of Patent: June 6, 2023Assignee: International Business Machines CorporationInventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
-
Publication number: 20230109301Abstract: A systolic array can be configured to skip distributed operands that have zero-values, resulting in improved resource efficiency. A skip module is introduced to receive operands from memory, identify whether they have a zero value or not, and, if they are nonzero, generate an operand vector including an index before sending the operand vector to a processing element.Type: ApplicationFiled: September 30, 2021Publication date: April 6, 2023Inventors: Swagath Venkataramani, Sanchari Sen, Vijayalakshmi Srinivasan, Ankur Agrawal, Sunil K Shukla, Bruce Fleischer, Kailash Gopalakrishnan
-
Publication number: 20230099608Abstract: A system comprises an analog resistive processing unit (RPU) system, and one or more processors. The analog RPU system comprises an array of RPU cells. The one or more processors are configured to: configure the analog RPU system to implement a convolutional neural network comprising a convolutional layer comprising at least one kernel matrix; program the at least one array of RPU cells to store a transformed kernel matrix which is generated by applying a first transformation process to the kernel matrix using a first predefined transformation matrix; and utilize the analog RPU system to perform an analog convolution operation by performing analog matrix-vector multiplication operations using the transformed kernel matrix and input vectors of a transformed data matrix, to thereby generate a transformed convolution output matrix, wherein the transformed data matrix is generated by applying a second transformation process to a data matrix using a second predefined transformation matrix.Type: ApplicationFiled: September 24, 2021Publication date: March 30, 2023Inventors: Swagath Venkataramani, Shubham Jain, Leland Chang
-
Patent number: 11599795Abstract: An N modular redundancy method, system, and computer program product include a computer-implemented N modular redundancy method for neural networks, the method including selectively replicating the neural network by employing one of checker neural networks and selective N modular redundancy (N-MR) applied only to critical computations.Type: GrantFiled: November 8, 2017Date of Patent: March 7, 2023Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Pradip Bose, Alper Buyuktosunoglu, Schuyler Eldridge, Karthik V Swaminathan, Augusto Vega, Swagath Venkataramani
-
Publication number: 20230030287Abstract: Indices of non-zero weights may be stored in an index register file included within each of a plurality of processor elements in a systolic array. Non-zero weights may be stored in a register file associated with the index register file. Input values (e.g., dense input values) corresponding to a single block in a data structure may be sent to the plurality of processor elements. Those of the input values corresponding to the indices of non-zero weights in the index register file may be selected for performing multiply-accumulate (“MAC”) operation based on sending the plurality of input values to one or more of the plurality of processor elements. The indices of the plurality of non-zero weight are stored in an index data stick. The values of the plurality of non-zero weights are stored in a value data stick.Type: ApplicationFiled: July 31, 2021Publication date: February 2, 2023Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Sanchari SEN, Swagath VENKATARAMANI, Vijayalakshmi SRINIVASAN, Kailash GOPALAKRISHNAN, Sunil K. SHUKLA
-
Patent number: 11556450Abstract: The embodiments herein describe hybrid parallelism techniques where a mix of data and model parallelism techniques are used to split the workload of a layer across an array of processors. When configuring the array, the bandwidth of the processors in one direction may be greater than the bandwidth in the other direction. Each layer is characterized according to whether they are more feature heavy or weight heavy. Depending on this characterization, the workload of an NN layer can be assigned to the array using a hybrid parallelism technique rather than using solely the data parallelism technique or solely the model parallelism technique. For example, if an NN layer is more weight heavy than feature heavy, data parallelism is used in the direction with the greater bandwidth (to minimize the negative impact of weight reduction) while model parallelism is used in the direction with the smaller bandwidth.Type: GrantFiled: October 11, 2019Date of Patent: January 17, 2023Assignee: International Business Machines CorporationInventors: Swagath Venkataramani, Vijayalakshmi Srinivasan, Philip Heidelberger
-
Patent number: 11551054Abstract: A convolutional neural network includes a front layer, a back layer, and a plurality of other layers that are connected between the front layer and the back layer. One of the other layers is a transition layer. A first precision is assigned to activations of neurons from the front layer back to the transition layer and a second precision is assigned to activations of the neurons from the transition layer back to the back layer. A third precision is assigned to weights of inputs to neurons from the front layer back to the transition layer and a fourth precision is assigned to weights of inputs to the neurons from the transition layer back to the back layer. In some embodiments the layers forward of the transition layer have a different convolutional kernel than the layers rearward of the transition layer.Type: GrantFiled: August 27, 2019Date of Patent: January 10, 2023Assignee: International Business Machines CorporationInventors: Jungwook Choi, Swagath Venkataramani, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
-
Publication number: 20220405555Abstract: A combined function specified by an instruction is performed. The combined function includes a plurality of operations performed as part of one invocation of the combined function. The performing the combined function includes performing a convolution using a first tensor and a second tensor to obtain one or more intermediate results, in which the second tensor includes an adjusted weight tensor created using a plurality of multipliers. Values of a bias tensor are added to the one or more intermediate results to obtain one or more combined function results for the combined function.Type: ApplicationFiled: June 17, 2021Publication date: December 22, 2022Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Sunil K. Shukla, Swagath Venkataramani
-
Publication number: 20220405348Abstract: A tensor of a first select dimension is reformatted to provide one or more sub-tensors of a second select dimension. The reformatting includes determining a number of sub-tensors to be used to represent the tensor. The reformatting further includes creating the number of sub-tensors, in which a sub-tensor is to start on a boundary of a memory unit. Data of the tensor is rearranged to fit within the number of sub-tensors.Type: ApplicationFiled: June 17, 2021Publication date: December 22, 2022Inventors: Cedric Lichtenau, Kailash Gopalakrishnan, Vijayalakshmi Srinivasan, Anthony Saporito, Sunil K. Shukla, Swagath Venkataramani