Patents by Inventor Rekha Singhal
Rekha Singhal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250139790Abstract: Multi-object tracking (MOT) in video sequences plays a critical role in various computer vision applications. The primary objective of MOT is to accurately localize and track objects across consecutive frames. However, existing MOT approaches often suffer from computational limitations and low frame rates in commodity machines, which hinders real-time performance. Present disclosure provides method and system for performing content aware multi-object tracking. The system first classifies video into slow and fast moving object content videos depending on features of objects to be tracked in frames. Then, system applies a computationally intensive deep sort algorithm to perform tracking of objects by selectively skipping frames.Type: ApplicationFiled: September 9, 2024Publication date: May 1, 2025Applicant: Tata Consultancy Services LimitedInventors: RATUL KISHORE SAHA, REKHA SINGHAL, MANOJ KARUNAKARAN NAMBIAR
-
Publication number: 20250086111Abstract: High-performance deployment of DNN recommendation models heavily rely on embedding tables, and their performance bottleneck lies in the latency of embedding access. To optimize the deployment of RMs, the method and system is disclosed, which leverages heterogeneous memory types on FPGAs to improve the overall performance by maximizing the availability of frequently accessed data in faster memory. The system, using a optimizer dynamically allocates table partitions of the embedding tables based on history of input access history. A pre-optimizer block disclosed determines whether smaller tables should be partitioned or placed entirely in smaller memories, improving overall efficiency. The performance of RM is improved with improvement in average embedding fetch latency and effectively inference latency via modified Round Trip computation.Type: ApplicationFiled: August 14, 2024Publication date: March 13, 2025Applicant: Tata Consultancy Services LimitedInventors: ASHWIN KRISHNAN, MANOJ KARUNAKARAN NAMBIAR, REKHA SINGHAL
-
Patent number: 12182029Abstract: Works in the literature fail to leverage embedding access patterns and memory units' access/storage capabilities, which when combined can yield high-speed heterogeneous systems by dynamically re-organizing embedding tables partitions across hardware during inference. A method and system for optimal deployment of embeddings tables across heterogeneous memory architecture for high-speed recommendations inference is disclosed, which dynamically partitions and organizes embedding tables across fast memory architectures to reduce access time. Partitions are chosen to take advantage of the past access patterns of those tables to ensure that frequently accessed data is available in the fast memory most of the time. Partition and replication is used to co-optimize memory access time and resources.Type: GrantFiled: August 25, 2023Date of Patent: December 31, 2024Assignee: TATA CONSULTANCY SERVICES LIMITEDInventors: Ashwin Krishnan, Manoj Karunakaran Nambiar, Chinmay Narendra Mahajan, Rekha Singhal
-
Publication number: 20240420464Abstract: The disclosure addresses problems associated with a systematic integration of multi-modal data for effective training, and handling of large volume of data because of high resolution of the multiple modalities. Embodiments herein provide a method and a system for a distributed training of a multi-modal data fusion transformer. Herein, a distributed training approach called a Distributed Architecture for Fusion-Transformer Training Acceleration (DAFTA) is proposed for processing large multimodal remote sensing data. DAFTA is enabled to handle any combination of remote sensing modalities. Additionally, similarity of feature space is leveraged to optimize the training process and to achieve the training with reduced data set which is equivalent to a complete data set. The proposed approach provides a systematic and efficient method for managing large sensing data and enables accurate and timely insights for various applications.Type: ApplicationFiled: June 13, 2024Publication date: December 19, 2024Applicant: Tata Consultancy Services LimitedInventors: Shruti Kunal KUNDE, Ravi Kumar SINGH, Chaman BANOLIA, Rekha SINGHAL, Balamuralidhar PURUSHOTHAMAN, Shailesh Shankar DESHPANDE
-
Publication number: 20240265243Abstract: This disclosure relates generally to neural network inferencing, and more particularly, to a method and system for neural network inferencing in logarithmic domain. The conventional techniques include training a neural network in logarithmic domain and performing inferencing. This leads to less accuracy, challenge in converting large models and unable to perform optimization. The present disclosure converts a pre-trained neural network into logarithmic domain using a bit manipulation based logarithm number system technique wherein the neural network is pre-trained in real time or in logarithmic domain. The method converts the weights, neural network layers and activation function into logarithmic domain. The method uses a 32-bit integer variable to store a logarithm number which leads to memory efficiency. The disclosed method is used for inferencing of convolutional neural network for natural language processing, image recognition and so on.Type: ApplicationFiled: January 25, 2024Publication date: August 8, 2024Applicant: Tata Consultancy Services LimitedInventors: Archisman BHOWMICK, Mayank MISHRA, Rekha SINGHAL, Aditya Singh RATHORE
-
Patent number: 12050563Abstract: The present disclosure provides a scalable acceleration of data processing in Machine Learning pipeline which is unavailable in conventional methods. Initially, the system receives a dataset and a data processing code. A plurality of sample datasets are obtained based on the received dataset using a sampling technique. A plurality of performance parameters corresponding to each of the plurality of sample datasets are obtained based on the data processing code using a profiling technique. A plurality of scalable performance parameters corresponding to each of a plurality of larger datasets are predicted based on the plurality of performance parameters and the data processing code using a curve fitting technique. Simultaneously, a plurality of anti-patterns are located in the data processing code using a pattern matching technique.Type: GrantFiled: October 25, 2022Date of Patent: July 30, 2024Assignee: TATA CONSULTANCY SERVICES LIMITEDInventors: Mayank Mishra, Archisman Bhowmick, Rekha Singhal
-
Publication number: 20240235961Abstract: Cloud and Fog computing are complementary technologies used for complex Internet of Things (IoT) based deployment of applications. With an increase in the number of internet-connected devices, the volume of data generated and processed at higher speeds has increased substantially. Serving a large amount of data and workloads for predictive decisions in real-time using fog computing without Service-Level Objective (SLO) violation is a challenge. Present disclosure provides systems and method for inference management wherein a suitable execution workflow is automatically generated to execute machine learning (ML)/deep learning (DL) inference requests using fog with various type of instances (e.g., Function-as-a-Service (FaaS) instance, Machine Learning-as-a-service (MLaaS) instance, and the like) provided by cloud vendors/platforms. Generated workflow minimizes the cost of deployment as well as SLO violations.Type: ApplicationFiled: December 21, 2023Publication date: July 11, 2024Applicant: Tata Consultancy Services LimitedInventors: CHETAN DNYANDEO PHALAK, DHEERAJ CHAHAL, REKHA SINGHAL
-
Publication number: 20240220245Abstract: Data processing code in machine learning pipelines is primarily done using data frame APIs provided by Pandas and similar libraries. Though, these libraries are easy to use, their temporal performance is worse than similar code written using NumPy or other high-performance libraries. Embodiments herein provide a system and method for acceleration of slower data processing code in machine learning pipelines by automatically generating an accelerated data processing code. Initially, a code is received and pre-processed based on a predefined format to get a standardized code. Further, system identifies code statements having operations that to be performed on a data frame, and an ordered list of data frame columns to generate a filtered dictionary code. Further, a data processing representation is generated using filtering dictionary code and ordered list of data frame columns. Finally, an accelerated data processing code is recommended based on the data processing representation.Type: ApplicationFiled: December 19, 2023Publication date: July 4, 2024Applicant: Tata Consultancy Services LimitedInventors: Mayank Mishra, Rekha Singhal
-
Publication number: 20240160949Abstract: Technical limitation of conventional Gradient-Based Meta Learners is their inability to adapt to scenarios where input tasks are sampled from multiple distributions. Training multiple models, with one model per distribution adds to the training time owing to increased compute. A method and system for generating meta-subnets for efficient model generalization in a multi-distribution scenario using Binary Mask Perceptron (BMP) technique or a Multi-modal Meta Supermasks (MMSUP) technique is provided. The BMP utilizes an adaptor which determines a binary mask, thus training only those layers which are relevant for given input distribution, leading to improved training accuracy in a cross-domain scenario. The MMSUP, further determines relevant subnets for each input distribution, thus, generalizing well as compared to standard MAML. The BMP and MMSUP, beat Multi-MAML in terms of training time as they train a single model on multiple distributions as opposed to Multi-MAML which trains multiple models.Type: ApplicationFiled: August 23, 2023Publication date: May 16, 2024Applicant: Tata Consultancy Services LimitedInventors: Shruti Kunal KUNDE, Rekha SINGHAL, Varad Anant PIMPALKHUTE
-
Publication number: 20240119008Abstract: Works in the literature fail to leverage embedding access patterns and memory units' access/storage capabilities, which when combined can yield high-speed heterogeneous systems by dynamically re-organizing embedding tables partitions across hardware during inference. A method and system for optimal deployment of embeddings tables across heterogeneous memory architecture for high-speed recommendations inference is disclosed, which dynamically partitions and organizes embedding tables across fast memory architectures to reduce access time. Partitions are chosen to take advantage of the past access patterns of those tables to ensure that frequently accessed data is available in the fast memory most of the time. Partition and replication is used to co-optimize memory access time and resources.Type: ApplicationFiled: August 25, 2023Publication date: April 11, 2024Applicant: Tata Consultancy Services LimitedInventors: Ashwin KRISHNAN, Manoj Karunakaran Nambiar, Chinmay Narendra Mahajan, Rekha Singhal
-
Publication number: 20240112095Abstract: The disclosure generally relates to an FPGA-based online 3D bin packing. Online 3D bin packing is the process of packing boxes into larger bins-Long Distance Containers (LDCs) such that the space inside each LDC is used to the maximum extent. The use of deep reinforcement learning (Deep RL) for this process is effective and popular. However, since the existing processor-based implementations are limited by Von-Neumann architecture and take a long time to evaluate each alignment for a box, only a few potential alignments are considered, resulting in sub-optimal packing efficiency. This disclosure describes an architecture for bin packing which leverages pipelining and parallel processing on FPGA for faster and exhaustive evaluation of all alignments for each box resulting in increased efficiency. In addition, a suitable generic purpose processor is employed to train the neural network within the algorithm to make the disclosed techniques computationally light, faster and efficient.Type: ApplicationFiled: August 25, 2023Publication date: April 4, 2024Applicant: Tata Consultancy Services LimitedInventors: ASHWIN KRISHNAN, HARSHAD KHADILKAR, REKHA SINGHAL, ANSUMA BASUMATARY, MANOJ KARUNAKARAN NAMBIAR, ARIJIT MUKHERJEE, KAVYA BORRA
-
Publication number: 20240070540Abstract: Existing approaches for switching between different hardware accelerators in a heterogeneous accelerator approach have the disadvantage that complete potential of the heterogeneous hardware accelerators do not get used as the switching relies on load on the accelerators or a random switching in which entire task gets reassigned to a different hardware accelerator. The disclosure herein generally relates to data model training, and, more particularly, to a method and system for data model training using heterogeneous hardware accelerators. In this approach, the system switches between hardware accelerators when a measured accuracy of the data model after any epoch is below a threshold of accuracy.Type: ApplicationFiled: July 31, 2023Publication date: February 29, 2024Applicant: Tata Consultancy Services LimitedInventors: MAYANK MISHRA, RAVI KUMAR SINGH, REKHA SINGHAL
-
Publication number: 20240062045Abstract: This disclosure relates generally to a method and system for latency optimized heterogeneous deployment of convolutional neural network (CNN). State-of-the-art methods for optimal deployment of convolutional neural network provide a reasonable accuracy. However, for unseen networks the same level of accuracy is not attained. The disclosed method provides an automated and unified framework for the convolutional neural network (CNN) that optimally partitions the CNN and maps these partitions to hardware accelerators yielding a latency optimized deployment configuration. The method provides an optimal partitioning of the CNN for deployment on heterogeneous hardware platforms by searching network partition and hardware pair optimized for latency while including communication cost between hardware. The method employs performance model-based optimization algorithm to optimally deploy components of a deep learning pipeline across right heterogeneous hardware for high performance.Type: ApplicationFiled: July 27, 2023Publication date: February 22, 2024Applicant: Tata Consultancy Services LimitedInventors: Nupur SUMEET, Manoj Karunakaran NAMBIAR, Rekha SINGHAL, Karan RAWAT
-
Publication number: 20230421504Abstract: Heterogeneous cloud storage services offered by different cloud service providers have unique deliverable performance. One key challenge is to find the maximum achievable data transfer rate from one cloud service to another. The disclosure herein generally relates to cloud computing, and, more particularly, to a method and system for parameter tuning in cloud network. The system obtains optimum value of parameters of a source cloud and a destination cloud in a cloud pair, by performing a parameter tuning. The optimum value of parameters and corresponding data transfer rate is used as a training data to generate a data model. The data model processes real-time information with respect to cloud pairs, and predicts corresponding data transfer rate.Type: ApplicationFiled: May 23, 2023Publication date: December 28, 2023Applicant: Tata Consultancy Services LimitedInventors: DHEERAJ CHAHAL, SURYA CHAITANYA VENKATA PALEPU, MAYANK MISHRA, REKHA SINGHAL, MANJU RAMESH
-
Publication number: 20230419180Abstract: Hardly any work in literature attempts employing Function-as-a-Service (FaaS) or serverless architecture to accelerate the training or re-training process of meta-learning architectures. Embodiments of the present disclosure provide a method and system for meta learning using distributed training on serverless architecture. The system, interchangeably referred to as MetaFaaS, is a meta-learning based scalable architecture using serverless distributed setup. Hierarchical nature of gradient based architectures is leveraged to facilitate distributed training on the serverless architecture. Further, a compute-efficient architecture, efficient Adaptive Learning of hyperparameters for Fast Adaptation (eALFA) for meta-learning is provided. The serverless architecture based training of models during meta learning enables unlimited scalability and reduction of training time by using optimal number of serverless instances.Type: ApplicationFiled: April 3, 2023Publication date: December 28, 2023Applicant: Tata Consultancy Services LimitedInventors: SHRUTI KUNAL KUNDE, VARAD ANANT PIMPALKHUTE, REKHA SINGHAL
-
Publication number: 20230409967Abstract: State of the art methods require size of DL model, or its gradients be less than maximum data item size of storage used as a communication channel for model training with serverless platform. Embodiments of the present disclosure provide method and system for training large DL models via serverless architecture using communication channel when the gradients are larger than maximum size of one data item allowed by the channel. Gradients that are generated by each worker during current training instance, are chunked into segments and stored in the communication channel. Corresponding segments of each worker are aggregated by aggregators and stored back. Each of the aggregated corresponding segments are read by each worker to generate an aggregated model to be used during successive training instance. Optimization techniques are used for reading-from and writing-to the channel resulting in significant improvement in performance and cost of training.Type: ApplicationFiled: April 27, 2023Publication date: December 21, 2023Applicant: Tata Consultancy Services LimitedInventors: Dheeraj CHAHAL, Surya Chaitanya Venkata PALEPU, Mayank MISHRA, Ravi Kumar SINGH, Rekha SINGHAL
-
Patent number: 11775264Abstract: This disclosure relates generally to configuring/building of applications. Typically, a deep learning (DL) application having multiple models composed and interspersed with corresponding transformation functions has no mechanism of efficient deployment on underlying system resources. The disclosed system accelerates the development of application to compose multiple models where each model could be a primitive model or a composite model itself. In an embodiment, the disclosed system optimally deploys a composable model application and transformation functions on underlying resources using performance prediction models, thereby accelerating the development and deployment of the application.Type: GrantFiled: September 2, 2021Date of Patent: October 3, 2023Assignee: TATA CONSULTANCY SERVICES LIMITEDInventors: Rekha Singhal, Mayank Mishra, Dheeraj Chahal, Shruti Kunde, Manju Ramesh
-
Publication number: 20230305544Abstract: Large training times incurred during the process of self-learning of ML models in digital twins are debilitating and can adversely affect the functioning of industrial plants. Embodiments of the present disclosure provide a method and system for accelerated self-learning using application agnostic meta learner trained using optimal set of meta features selected from classification meta features, regression meta features, and domain meta features based on a domain-meta-feature-taxonomy created for a plurality of industrial plants across a plurality of domains. Optimal feature selection is enabled using ML, DL that provides static feature selection, while Q-learning based approach is disclosed that enables dynamic feature selection. Q-learning based approach has two implementations, static and dynamic reward.Type: ApplicationFiled: February 13, 2023Publication date: September 28, 2023Applicant: Tata Consultancy Services LimitedInventors: SHRUTI KUNAL KUNDE, AMEY SANJAYKUMAR PANDIT, REKHA SINGHAL, SHAROD ROY CHOUDHURY
-
Publication number: 20230185778Abstract: The present disclosure provides a scalable acceleration of data processing in Machine Learning pipeline which is unavailable in conventional methods. Initially, the system receives a dataset and a data processing code. A plurality of sample datasets are obtained based on the received dataset using a sampling technique. A plurality of performance parameters corresponding to each of the plurality of sample datasets are obtained based on the data processing code using a profiling technique. A plurality of scalable performance parameters corresponding to each of a plurality of larger datasets are predicted based on the plurality of performance parameters and the data processing code using a curve fitting technique. Simultaneously, a plurality of anti-patterns are located in the data processing code using a pattern matching technique.Type: ApplicationFiled: October 25, 2022Publication date: June 15, 2023Applicant: Tata Consultancy Services LimitedInventors: MAYANK MISHRA, ARCHISMAN BHOWMICK, REKHA SINGHAL
-
Publication number: 20230185625Abstract: Recent techniques for workload characterization of an application to be executed in a serverless execution environment or cloud are based on benchmark-approximation. Multiple microbenchmarks are run against the multiple VM configurations and a score is calculated which is used for mapping futuristic workloads to the appropriate configuration. Embodiments herein disclose method and system for workload characterization-based capacity planning of an actual application running on-premise with different configurations of the same machine and providing a cost-effective and high-performance serverless execution environment. Resource demand of each API in the application workflow is evaluated. Based on the resource demand of each API, a mapping is performed to the serverless platform on cloud. Additionally, characterization of threads within each API is performed and each thread is mapped to a serverless instance based on its resource requirements.Type: ApplicationFiled: December 5, 2022Publication date: June 15, 2023Applicant: Tata Consultancy Services LimitedInventors: DHEERAJ CHAHAL, REKHA SINGHAL, Surya Chaitanya VENKATA PALEPU