Patents by Inventor Samarjit Das

Samarjit Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for enhanced data generation in fault diagnosis

Patent number: 12670926

Abstract: A method of generating audio to obtain manipulated audio data includes receiving textual descriptions of audio associated with operation of a device, receiving audio data associated with the operation of the device, generating, based on the textual descriptions, descriptive text inputs of audio features associated with the operation of the device, generating the manipulated audio data based on the descriptive text inputs and the audio data, the manipulated audio data including the one or more audio features indicative of faults associated with the descriptive text inputs, training a machine learning (ML) model to diagnose the faults using the manipulated audio data, the ML model being trained to generate an output indicative of the faults based on audio data obtained during the operation of the device, and, based on convergence during the training, outputting a trained ML model configured to generate the output indicative of the faults.

Type: Grant

Filed: August 7, 2024

Date of Patent: June 30, 2026

Assignee: Robert Bosch GmbH

Inventors: Pongtep Angkititrakul, Long Huang, Jonathan Francis, Samarjit Das
LLM-ASSISTED AUDIO SYNTHESIS FRAMEWORK

Publication number: 20260141914

Abstract: Training of an audio foundation model (AFM) is performed using a dataset constructed using low-level audio property control and high-level composition planning. A plurality of digital audio compositions are generated, using a large language model (LLM) as a planner agent. The planner agent is prompted to generate composition plans defining logical combinations of foreground and background digital sounds, event occurrences within the compositions, and digital sound properties. The foreground and background digital sounds have consistent audio quality. An audio composition tool generates the plurality of digital audio compositions according to the composition plans. Descriptive text is generated for each of the digital audio compositions using a summarizer agent. The summarizer agent is implemented as an LLM, prompted to describe the digital audio compositions. The compositions and the corresponding descriptive text are combined to form audio-text pairs.

Type: Application

Filed: November 20, 2024

Publication date: May 21, 2026

Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Luca Bondi, Shabnam Ghaffarzadegan, Abinaya Kumar, Samarjit Das
Systems and methods of processing audio data with a multi-rate learnable audio frontend

Patent number: 12633297

Abstract: Methods and systems of processing audio data with a multi-stage audio front end model is provided. A one-dimensional audio waveform is received as input and processed using a multi-stage audio frontend model to convert the one-dimensional waveform into a two-dimensional matrix representing features of the audio waveform. The multi-stage learnable audio frontend model is configured to apply a first filterbank to the audio waveform to generate a first time-frequency representation of the audio waveform; apply a first decimation filter to the audio waveform to generate a first decimated audio input; apply a second filterbank to the first decimated audio input to generate a second time-frequency representation of the audio waveform; and stack the first time-frequency representation and the second time-frequency representation together to generate the two-dimensional matrix.

Type: Grant

Filed: September 14, 2023

Date of Patent: May 19, 2026

Assignee: Robert Bosch GmbH

Inventors: Luca Bondi, Irtsam Ghazi, Charles Shelton, Samarjit Das
Video management system and method for audio event search and classification

Patent number: 12608422

Abstract: A method includes defining, by a text encoder, a set of text embeddings for a text prompt indicative of a search query for video content having audio data indicative of a sound feature that is defined as a search parameter of the search query, and ranking a plurality of audio embeddings indicative of a plurality of audio signals and provided in a vector database using the set of text embeddings of the search query. The method further includes detecting a relevant audio record associated with an identified audio embedding from among the ranked audio embeddings, and outputting a relevant video content associated with the relevant audio record to have a computing device play the video content, the relevant video content being obtained from among a plurality of video content.

Type: Grant

Filed: July 24, 2024

Date of Patent: April 21, 2026

Assignee: Robert Bosch GmbH

Inventors: Irtsam Ghazi, Ho-Hsiang Wu, Ajit Belsarkar, Luca Bondi, Wei-Cheng Lin, Samarjit Das
SYSTEM AND METHOD FOR CLAP4SED

Publication number: 20260080895

Abstract: A method for real-time sound event detection on an embedded device includes pretraining a contrastive language-audio pretraining model as an audio foundation model and preparing offline multimodal query prototypes for sound events of interest. The pretrained model and query prototypes are deployed on an embedded device. The device receives an input audio stream and extracts audio embeddings using the pretrained model. Similarity scores are calculated between the extracted audio embeddings and the prepared query prototypes. The presence of a sound event is determined based on the calculated similarity scores, and a real-time sound event detection result is output. The system includes a memory storing the pretrained model and query prototypes, an audio input interface, and a processor configured to perform the extraction, calculation, determination, and output operations. A non-transitory computer-readable medium stores instructions that, when executed, cause a processor to perform the method.

Type: Application

Filed: September 16, 2024

Publication date: March 19, 2026

Inventors: Wei-Cheng LIN, Ho-Hsiang WU, Irtsam GHAZI, Luca BONDI, Ajit BELSARKAR, Samarjit DAS
SYSTEMS AND METHODS FOR ENHANCED DATA GENERATION IN FAULT DIAGNOSIS

Publication number: 20260045272

Abstract: A method of generating audio to obtain manipulated audio data includes receiving textual descriptions of audio associated with operation of a device, receiving audio data associated with the operation of the device, generating, based on the textual descriptions, descriptive text inputs of audio features associated with the operation of the device, generating the manipulated audio data based on the descriptive text inputs and the audio data, the manipulated audio data including the one or more audio features indicative of faults associated with the descriptive text inputs, training a machine learning (ML) model to diagnose the faults using the manipulated audio data, the ML model being trained to generate an output indicative of the faults based on audio data obtained during the operation of the device, and, based on convergence during the training, outputting a trained ML model configured to generate the output indicative of the faults.

Type: Application

Filed: August 7, 2024

Publication date: February 12, 2026

Inventors: Pongtep ANGKITITRAKUL, Long HUANG, Jonathan FRANCIS, Samarjit DAS
VIDEO MANAGEMENT SYSTEM AND METHOD FOR AUDIO EVENT SEARCH AND CLASSIFICATION

Publication number: 20260030294

Abstract: A method includes defining, by a text encoder, a set of text embeddings for a text prompt indicative of a search query for video content having audio data indicative of a sound feature that is defined as a search parameter of the search query, and ranking a plurality of audio embeddings indicative of a plurality of audio signals and provided in a vector database using the set of text embeddings of the search query. The method further includes detecting a relevant audio record associated with an identified audio embedding from among the ranked audio embeddings, and outputting a relevant video content associated with the relevant audio record to have a computing device play the video content, the relevant video content being obtained from among a plurality of video content.

Type: Application

Filed: July 24, 2024

Publication date: January 29, 2026

Inventors: Irtsam Ghazi, Ho-Hsiang Wu, Ajit Belsarkar, Luca Bondi, Wei-Cheng Lin, Samarjit Das
ACTIVE MACHINE LEARNING SYSTEM FOR ANOMALOUS EVENT DETECTION AND CLASSIFICATION IN INDUSTRIAL APPLICATIONS

Publication number: 20260016819

Abstract: Active learning for anomalous event detection and classification in industrial applications. Initial samples from an industrial environment may be received. Unlabeled samples may then be classified using a target query strategy to determine the top-ranked relevant or most important samples. The top-ranked samples may then be manually annotated. A model may then be optimized based on the initial samples from the industrial environment and the annotated top-ranked samples. The model's performance may then be evaluated.

Type: Application

Filed: July 10, 2024

Publication date: January 15, 2026

Inventors: Shabnam GHAFFARZADEGAN, Luca BONDI, Abinaya KUMAR, Ho-Hsiang WU, Wei-Cheng LIN, Samarjit DAS
DEEP REINFORCEMENT ACTIVE MACHINE LEARNING SYSTEM FOR AUDIO EVENT DETECTION AND CLASSIFICATION

Publication number: 20260018185

Abstract: Active machine learning systems for anomalous event detection and classification. Initial samples from an industrial environment may be received and labeled. Initially, a training pool of audio samples may be labeled. These labeled samples may be used to train an audio event classifier to detect and categorize sounds. Environment states may be calculated using outputs from the classifier. A batch of audio samples may then selected from an unlabeled pool for annotation, guided by a reinforcement learning agent. These selected samples may be annotated and added to the labeled training pool. The classifier may be retrained with this updated pool. Rewards may be calculated for each of the annotated samples based on their annotations. The environment states may be updated using the retrained classifier, and the exploration-exploitation parameter of the reinforcement learning agent may be adjusted. The reinforcement learning agent may be retrained using the updated environment states and rewards.

Type: Application

Filed: July 10, 2024

Publication date: January 15, 2026

Inventors: Ana Elisa MENDEZ MENDEZ, Shabnam GHAFFARZADEGAN, Samarjit DAS
SYSTEM AND METHOD FOR MULTI-CONDITIONED AUDIO GENERATION

Publication number: 20250356121

Abstract: A method for audio generation includes defining an audio input condition for an obtained input using an encoder, where the obtained input is indicative of one or more audio characteristics. The method further includes defining an audio style condition of a selected audio style profile employing an audio feature extraction neural network, and outputting a generated audio data indicative of a desired generated audio using a multi-conditioned latent diffusion model that employs the audio input condition and the audio style condition as adapters to the multi-conditioned latent diffusion model.

Type: Application

Filed: May 17, 2024

Publication date: November 20, 2025

Inventors: Pongtep Angkititrakul, Long Huang, Samarjit Das
Dynamic spatiotemporal beamforming self-diagonostic system

Patent number: 12459120

Abstract: A method for self-diagnosing a data acquisition system for acquiring calibrated images of an area by a controller includes requesting a signal, from a sensor associated with a mobile platform in the area, removing from the signal, background noise associated with the mobile platform, thereby focusing the measurement to a foreground signal, wherein the background noise is removed from the foreground signal via a subspace approximation using singular value, requesting a previous-in-time signal indicative of a previous-in-time measurement of the parameter, wherein previous-in-time background noise is removed from a previous-in-time foreground signal via subspace approximation using singular value decomposition and in response to a change detection indicating a difference between a spectrogram of the background noise and a previous-in-time spectrogram of the previous-in-time background noise exceeding a predetermined threshold at a predetermined frequency, outputting a status signal indicative of a change in oper

Type: Grant

Filed: December 31, 2020

Date of Patent: November 4, 2025

Assignee: Robert Bosch GmbH

Inventors: Jonathan Jenner Macoskey, Samarjit Das
SYSTEM AND METHOD FOR KNOWLEDGE-BASED AUDIO-TEXT MODELING VIA AUTOMATIC MULTIMODAL GRAPH CONSTRUCTION

Publication number: 20250335705

Abstract: Knowledge-based audio-text modeling via automatic multimodal graph construction is performed. An audio dataset is received, the audio dataset including clips of audio data, wherein each of the clips of the audio data is paired with corresponding metadata descriptive of the audio contents of the respective clip of the audio data. Graph nodes of interest are identified from a sematic network, the graph nodes being descriptive of semantics of the knowledge domain of the contents of the audio dataset. A large language model (LLM) is utilized for categorizing the metadata into the graph nodes and for inferring supplemental data for the graph nodes for which there is no metadata, producing an extracted knowledge graph. The extracted knowledge graph is validated utilizing the LLM to perform relation verification of edges between the graph nodes of the extracted knowledge graph, thereby mitigating hallucination effects in the categorizing and inferring of the supplemental data.

Type: Application

Filed: April 25, 2024

Publication date: October 30, 2025

Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das
SYSTEMS AND METHODS FOR MULTI-MODAL CONTINUAL PRE-TRAINING OF AUDIO ENCODERS

Publication number: 20250322823

Abstract: A method for training an audio encoder includes receiving first training data comprising first audio data, performing a first training task on an audio encoder using the first training data, receiving second training data comprising first image data and second audio data, and performing a second training task on the audio encoder using the second training data. The method also includes receiving third training data comprising first text data and third audio data, performing a third training task on the audio encoder using the third training data, and performing at least one downstream task using the audio encoder.

Type: Application

Filed: April 12, 2024

Publication date: October 16, 2025

Inventors: HO-HSIANG WU, GYUHAK KIM, LUCA BONDI, SAMARJIT DAS
MOBILE ROBOT WITH OPTIMAL CONTROL STRATEGIES UNDER SENSOR UNCERTAINTIES

Publication number: 20250216848

Abstract: A computer-implemented system and method relate a mobile robot. State data is generated using sensor data from at least one sensor. A current confident zone is identified on a unified confident zone map using the state data. The unified confident zone map includes confident zones. Each confident zone is indicative of a given confidence level of given state data of a selected sensor modality for a given location. Assessment data is generated that indicates whether the current confident zone is deemed a failure zone. A mobile robot is controlled based on a control command. The control command relates to a recovery plan of moving the mobile robot out of the current confident zone when the assessment data indicates that the current confident zone is the failure zone. The control command relates to another plan when the assessment data indicates that the current confident zone is not the failure zone.

Type: Application

Filed: December 29, 2023

Publication date: July 3, 2025

Inventors: Sandeep Reddy BADDAM, Jonathan FRANCIS, I, Sirajum MUNIR, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
METHODS AND SYSTEMS FOR CLASSIFYING VEHICLES AS ELECTRIC OR NONELECTRIC BASED ON AUDIO

Publication number: 20250218187

Abstract: Methods and systems for training a neural network to identify an electric vehicle based on audio. Video data is generated from a camera with a field of view including a roadway. Audio data is generated from a microphone, the audio data associated with vehicles traveling across the roadway. The video data is segmented into segments, each having a start time and a finish time that corresponds to a respective vehicle traveling across the roadway in and out of the field of view. Each video segment is labeled with a label indicating the respective vehicle in that segment as either an electric vehicle or a non-electric vehicle. The audio data is segmented into segments, each having a start time and end time associated with a respective one of the video segments. A neural network is trained based on the audio segments and the labels of the associated video segments.

Type: Application

Filed: December 28, 2023

Publication date: July 3, 2025

Inventors: Ibrahim Eshera, Charles Shelton, Samarjit Das
SYSTEM AND METHOD FOR LEARNING SENSOR MEASUREMENT UNCERTAINTY

Publication number: 20250216851

Abstract: A computer-implemented system and method include generating a set of state data using sensor data of a particular sensor modality at a set of locations in a region. Each state data includes a corresponding position estimate of a vehicle. A set of contour ranges is generated. Each contour range is indicative of a respective error range of given state data with respect to corresponding ground truth data for a given location. The region is categorized into at least (i) a first confident level associated with a first error range and (ii) a second confident level associated with a second error range. A first confident zone corresponds to locations associated with the first confident level. A second confident zone corresponds to locations associated with the second confident level. A confident zone map includes at least the first confident zone and the second confident zone.

Type: Application

Filed: December 29, 2023

Publication date: July 3, 2025

Inventors: Sandeep Reddy BADDAM, Jonathan FRANCIS, I, Sirajum MUNIR, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
METHODS AND SYSTEMS FOR SPEECH EMOTION RETRIEVAL VIA NATURAL LANGUAGE PROMPTS

Publication number: 20250217638

Abstract: Methods and systems for generating training data for training a contrastive language-audio machine-learning model. A plurality of audio segments are retrieved from a speech emotion recognition (SER) database along with metadata associated with the audio segments. The metadata of each audio segment includes an emotion class. Words or terms associated with emotions are retrieved from a lexicon. A large language model (LLM) is executed on (i) the classes of emotion associated with the audio segments and (ii) the words or terms from the lexicon. This generates a plurality of text captions associated with emotion, which are stored in a caption pool. For each audio segment retrieved from the SER database, that audio segment is paired with one or more of the text captions from the caption pool that were generated based on the emotion class associated with that audio segment. This yields audio-text pairs for training a contrastive learning model.

Type: Application

Filed: December 29, 2023

Publication date: July 3, 2025

Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das
SYSTEM AND METHOD OF FUSING WIRELESS AND VISUAL FEATURES FOR ROBUST ROBOT STATE ESTIMATION

Publication number: 20250216852

Abstract: A computer-implemented system and method relate to operating a mobile robot with respect to a reference location. First state data is generated using sensor data obtained from a first set of sensors of a first sensor modality. Second state data is generated using second obtained from a second set of sensors. The second set of sensors provide wireless sensing. The second state data is generated from wireless features of the second sensor data. A first distribution of the first state data is generated. A second distribution of the second state data is generated. A posterior distribution is computed by fusing the first distribution and the second distribution. Optimal state data and associated uncertainty data is generated using the posterior distribution. The optimal state data including a position estimate of the mobile robot. The mobile robot is controlled using at least the optimal state data.

Type: Application

Filed: December 29, 2023

Publication date: July 3, 2025

Inventors: Sandeep Reddy BADDAM, Sirajum MUNIR, Jonathan FRANCIS, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
METHODS AND SYSTEMS FOR DETERMINING A QUANTITY OF FUEL DISPENSED AT A FUELING STATION BASED ON AUDIO

Publication number: 20250207966

Abstract: Methods and system for determining a quantity of fuel dispensed at a fueling station based on audio, as well as training such a system. Audio data is generated from one or more microphones, wherein the audio data is associated with stages of a refueling operation at a fueling station. A machine learning model is executed on the audio data to segment the audio data into segments, with each segment associated with a respective one of the stages of the refueling operation. The model also determines that one of the segments is associated with a fuel flow stage indicating fuel is flowing from a fuel storage. This allows the system to determine a quantity of fuel being dispensed, based on the time of the one segment.

Type: Application

Filed: December 20, 2023

Publication date: June 26, 2025

Inventors: IBRAHIM ESHERA, CHARLES SHELTON, SAMARJIT DAS
Mobile Robot with Audio Perception System

Publication number: 20250189970

Abstract: A mobile robot includes a microphone array with a set of microphones. The microphone array is at least partially disposed on the mobile robot. The mobile robot receives audio signals from the microphone array. Audio feature data of acoustic activity is extracted from the audio signals. Direction of arrival (DOA) data of the acoustic activity is generated based on the audio signals. A machine learning model is configured to generate audio event data using the audio feature data. The audio event data identifies at least one sound source of the audio feature data. A knowledge graph is queried using the audio event data to obtain entity data. The entity data has a predetermined relation with the audio event data. Semantic audio scene data is generated using the audio event data, the DOA data, and the entity data. The mobile robot performs an action based on the semantic audio scene data.

Type: Application

Filed: December 7, 2023

Publication date: June 12, 2025

Inventors: Pongtep Angkititrakul, Jonathan Francis, Luca Bondi, Samarjit Das

1 2 3 4 5 next