Patents by Inventor Samarjit Das

Samarjit Das has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260141914
    Abstract: Training of an audio foundation model (AFM) is performed using a dataset constructed using low-level audio property control and high-level composition planning. A plurality of digital audio compositions are generated, using a large language model (LLM) as a planner agent. The planner agent is prompted to generate composition plans defining logical combinations of foreground and background digital sounds, event occurrences within the compositions, and digital sound properties. The foreground and background digital sounds have consistent audio quality. An audio composition tool generates the plurality of digital audio compositions according to the composition plans. Descriptive text is generated for each of the digital audio compositions using a summarizer agent. The summarizer agent is implemented as an LLM, prompted to describe the digital audio compositions. The compositions and the corresponding descriptive text are combined to form audio-text pairs.
    Type: Application
    Filed: November 20, 2024
    Publication date: May 21, 2026
    Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Luca Bondi, Shabnam Ghaffarzadegan, Abinaya Kumar, Samarjit Das
  • Patent number: 12633297
    Abstract: Methods and systems of processing audio data with a multi-stage audio front end model is provided. A one-dimensional audio waveform is received as input and processed using a multi-stage audio frontend model to convert the one-dimensional waveform into a two-dimensional matrix representing features of the audio waveform. The multi-stage learnable audio frontend model is configured to apply a first filterbank to the audio waveform to generate a first time-frequency representation of the audio waveform; apply a first decimation filter to the audio waveform to generate a first decimated audio input; apply a second filterbank to the first decimated audio input to generate a second time-frequency representation of the audio waveform; and stack the first time-frequency representation and the second time-frequency representation together to generate the two-dimensional matrix.
    Type: Grant
    Filed: September 14, 2023
    Date of Patent: May 19, 2026
    Assignee: Robert Bosch GmbH
    Inventors: Luca Bondi, Irtsam Ghazi, Charles Shelton, Samarjit Das
  • Patent number: 12608422
    Abstract: A method includes defining, by a text encoder, a set of text embeddings for a text prompt indicative of a search query for video content having audio data indicative of a sound feature that is defined as a search parameter of the search query, and ranking a plurality of audio embeddings indicative of a plurality of audio signals and provided in a vector database using the set of text embeddings of the search query. The method further includes detecting a relevant audio record associated with an identified audio embedding from among the ranked audio embeddings, and outputting a relevant video content associated with the relevant audio record to have a computing device play the video content, the relevant video content being obtained from among a plurality of video content.
    Type: Grant
    Filed: July 24, 2024
    Date of Patent: April 21, 2026
    Assignee: Robert Bosch GmbH
    Inventors: Irtsam Ghazi, Ho-Hsiang Wu, Ajit Belsarkar, Luca Bondi, Wei-Cheng Lin, Samarjit Das
  • Publication number: 20260080895
    Abstract: A method for real-time sound event detection on an embedded device includes pretraining a contrastive language-audio pretraining model as an audio foundation model and preparing offline multimodal query prototypes for sound events of interest. The pretrained model and query prototypes are deployed on an embedded device. The device receives an input audio stream and extracts audio embeddings using the pretrained model. Similarity scores are calculated between the extracted audio embeddings and the prepared query prototypes. The presence of a sound event is determined based on the calculated similarity scores, and a real-time sound event detection result is output. The system includes a memory storing the pretrained model and query prototypes, an audio input interface, and a processor configured to perform the extraction, calculation, determination, and output operations. A non-transitory computer-readable medium stores instructions that, when executed, cause a processor to perform the method.
    Type: Application
    Filed: September 16, 2024
    Publication date: March 19, 2026
    Inventors: Wei-Cheng LIN, Ho-Hsiang WU, Irtsam GHAZI, Luca BONDI, Ajit BELSARKAR, Samarjit DAS
  • Publication number: 20260045272
    Abstract: A method of generating audio to obtain manipulated audio data includes receiving textual descriptions of audio associated with operation of a device, receiving audio data associated with the operation of the device, generating, based on the textual descriptions, descriptive text inputs of audio features associated with the operation of the device, generating the manipulated audio data based on the descriptive text inputs and the audio data, the manipulated audio data including the one or more audio features indicative of faults associated with the descriptive text inputs, training a machine learning (ML) model to diagnose the faults using the manipulated audio data, the ML model being trained to generate an output indicative of the faults based on audio data obtained during the operation of the device, and, based on convergence during the training, outputting a trained ML model configured to generate the output indicative of the faults.
    Type: Application
    Filed: August 7, 2024
    Publication date: February 12, 2026
    Inventors: Pongtep ANGKITITRAKUL, Long HUANG, Jonathan FRANCIS, Samarjit DAS
  • Publication number: 20260030294
    Abstract: A method includes defining, by a text encoder, a set of text embeddings for a text prompt indicative of a search query for video content having audio data indicative of a sound feature that is defined as a search parameter of the search query, and ranking a plurality of audio embeddings indicative of a plurality of audio signals and provided in a vector database using the set of text embeddings of the search query. The method further includes detecting a relevant audio record associated with an identified audio embedding from among the ranked audio embeddings, and outputting a relevant video content associated with the relevant audio record to have a computing device play the video content, the relevant video content being obtained from among a plurality of video content.
    Type: Application
    Filed: July 24, 2024
    Publication date: January 29, 2026
    Inventors: Irtsam Ghazi, Ho-Hsiang Wu, Ajit Belsarkar, Luca Bondi, Wei-Cheng Lin, Samarjit Das
  • Publication number: 20260016819
    Abstract: Active learning for anomalous event detection and classification in industrial applications. Initial samples from an industrial environment may be received. Unlabeled samples may then be classified using a target query strategy to determine the top-ranked relevant or most important samples. The top-ranked samples may then be manually annotated. A model may then be optimized based on the initial samples from the industrial environment and the annotated top-ranked samples. The model's performance may then be evaluated.
    Type: Application
    Filed: July 10, 2024
    Publication date: January 15, 2026
    Inventors: Shabnam GHAFFARZADEGAN, Luca BONDI, Abinaya KUMAR, Ho-Hsiang WU, Wei-Cheng LIN, Samarjit DAS
  • Publication number: 20260018185
    Abstract: Active machine learning systems for anomalous event detection and classification. Initial samples from an industrial environment may be received and labeled. Initially, a training pool of audio samples may be labeled. These labeled samples may be used to train an audio event classifier to detect and categorize sounds. Environment states may be calculated using outputs from the classifier. A batch of audio samples may then selected from an unlabeled pool for annotation, guided by a reinforcement learning agent. These selected samples may be annotated and added to the labeled training pool. The classifier may be retrained with this updated pool. Rewards may be calculated for each of the annotated samples based on their annotations. The environment states may be updated using the retrained classifier, and the exploration-exploitation parameter of the reinforcement learning agent may be adjusted. The reinforcement learning agent may be retrained using the updated environment states and rewards.
    Type: Application
    Filed: July 10, 2024
    Publication date: January 15, 2026
    Inventors: Ana Elisa MENDEZ MENDEZ, Shabnam GHAFFARZADEGAN, Samarjit DAS
  • Publication number: 20250356121
    Abstract: A method for audio generation includes defining an audio input condition for an obtained input using an encoder, where the obtained input is indicative of one or more audio characteristics. The method further includes defining an audio style condition of a selected audio style profile employing an audio feature extraction neural network, and outputting a generated audio data indicative of a desired generated audio using a multi-conditioned latent diffusion model that employs the audio input condition and the audio style condition as adapters to the multi-conditioned latent diffusion model.
    Type: Application
    Filed: May 17, 2024
    Publication date: November 20, 2025
    Inventors: Pongtep Angkititrakul, Long Huang, Samarjit Das
  • Patent number: 12459120
    Abstract: A method for self-diagnosing a data acquisition system for acquiring calibrated images of an area by a controller includes requesting a signal, from a sensor associated with a mobile platform in the area, removing from the signal, background noise associated with the mobile platform, thereby focusing the measurement to a foreground signal, wherein the background noise is removed from the foreground signal via a subspace approximation using singular value, requesting a previous-in-time signal indicative of a previous-in-time measurement of the parameter, wherein previous-in-time background noise is removed from a previous-in-time foreground signal via subspace approximation using singular value decomposition and in response to a change detection indicating a difference between a spectrogram of the background noise and a previous-in-time spectrogram of the previous-in-time background noise exceeding a predetermined threshold at a predetermined frequency, outputting a status signal indicative of a change in oper
    Type: Grant
    Filed: December 31, 2020
    Date of Patent: November 4, 2025
    Assignee: Robert Bosch GmbH
    Inventors: Jonathan Jenner Macoskey, Samarjit Das
  • Publication number: 20250335705
    Abstract: Knowledge-based audio-text modeling via automatic multimodal graph construction is performed. An audio dataset is received, the audio dataset including clips of audio data, wherein each of the clips of the audio data is paired with corresponding metadata descriptive of the audio contents of the respective clip of the audio data. Graph nodes of interest are identified from a sematic network, the graph nodes being descriptive of semantics of the knowledge domain of the contents of the audio dataset. A large language model (LLM) is utilized for categorizing the metadata into the graph nodes and for inferring supplemental data for the graph nodes for which there is no metadata, producing an extracted knowledge graph. The extracted knowledge graph is validated utilizing the LLM to perform relation verification of edges between the graph nodes of the extracted knowledge graph, thereby mitigating hallucination effects in the categorizing and inferring of the supplemental data.
    Type: Application
    Filed: April 25, 2024
    Publication date: October 30, 2025
    Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das
  • Publication number: 20250322823
    Abstract: A method for training an audio encoder includes receiving first training data comprising first audio data, performing a first training task on an audio encoder using the first training data, receiving second training data comprising first image data and second audio data, and performing a second training task on the audio encoder using the second training data. The method also includes receiving third training data comprising first text data and third audio data, performing a third training task on the audio encoder using the third training data, and performing at least one downstream task using the audio encoder.
    Type: Application
    Filed: April 12, 2024
    Publication date: October 16, 2025
    Inventors: HO-HSIANG WU, GYUHAK KIM, LUCA BONDI, SAMARJIT DAS
  • Publication number: 20250216848
    Abstract: A computer-implemented system and method relate a mobile robot. State data is generated using sensor data from at least one sensor. A current confident zone is identified on a unified confident zone map using the state data. The unified confident zone map includes confident zones. Each confident zone is indicative of a given confidence level of given state data of a selected sensor modality for a given location. Assessment data is generated that indicates whether the current confident zone is deemed a failure zone. A mobile robot is controlled based on a control command. The control command relates to a recovery plan of moving the mobile robot out of the current confident zone when the assessment data indicates that the current confident zone is the failure zone. The control command relates to another plan when the assessment data indicates that the current confident zone is not the failure zone.
    Type: Application
    Filed: December 29, 2023
    Publication date: July 3, 2025
    Inventors: Sandeep Reddy BADDAM, Jonathan FRANCIS, I, Sirajum MUNIR, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
  • Publication number: 20250216851
    Abstract: A computer-implemented system and method include generating a set of state data using sensor data of a particular sensor modality at a set of locations in a region. Each state data includes a corresponding position estimate of a vehicle. A set of contour ranges is generated. Each contour range is indicative of a respective error range of given state data with respect to corresponding ground truth data for a given location. The region is categorized into at least (i) a first confident level associated with a first error range and (ii) a second confident level associated with a second error range. A first confident zone corresponds to locations associated with the first confident level. A second confident zone corresponds to locations associated with the second confident level. A confident zone map includes at least the first confident zone and the second confident zone.
    Type: Application
    Filed: December 29, 2023
    Publication date: July 3, 2025
    Inventors: Sandeep Reddy BADDAM, Jonathan FRANCIS, I, Sirajum MUNIR, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
  • Publication number: 20250218187
    Abstract: Methods and systems for training a neural network to identify an electric vehicle based on audio. Video data is generated from a camera with a field of view including a roadway. Audio data is generated from a microphone, the audio data associated with vehicles traveling across the roadway. The video data is segmented into segments, each having a start time and a finish time that corresponds to a respective vehicle traveling across the roadway in and out of the field of view. Each video segment is labeled with a label indicating the respective vehicle in that segment as either an electric vehicle or a non-electric vehicle. The audio data is segmented into segments, each having a start time and end time associated with a respective one of the video segments. A neural network is trained based on the audio segments and the labels of the associated video segments.
    Type: Application
    Filed: December 28, 2023
    Publication date: July 3, 2025
    Inventors: Ibrahim Eshera, Charles Shelton, Samarjit Das
  • Publication number: 20250217638
    Abstract: Methods and systems for generating training data for training a contrastive language-audio machine-learning model. A plurality of audio segments are retrieved from a speech emotion recognition (SER) database along with metadata associated with the audio segments. The metadata of each audio segment includes an emotion class. Words or terms associated with emotions are retrieved from a lexicon. A large language model (LLM) is executed on (i) the classes of emotion associated with the audio segments and (ii) the words or terms from the lexicon. This generates a plurality of text captions associated with emotion, which are stored in a caption pool. For each audio segment retrieved from the SER database, that audio segment is paired with one or more of the text captions from the caption pool that were generated based on the emotion class associated with that audio segment. This yields audio-text pairs for training a contrastive learning model.
    Type: Application
    Filed: December 29, 2023
    Publication date: July 3, 2025
    Inventors: Wei-Cheng Lin, Ho-Hsiang Wu, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das
  • Publication number: 20250216852
    Abstract: A computer-implemented system and method relate to operating a mobile robot with respect to a reference location. First state data is generated using sensor data obtained from a first set of sensors of a first sensor modality. Second state data is generated using second obtained from a second set of sensors. The second set of sensors provide wireless sensing. The second state data is generated from wireless features of the second sensor data. A first distribution of the first state data is generated. A second distribution of the second state data is generated. A posterior distribution is computed by fusing the first distribution and the second distribution. Optimal state data and associated uncertainty data is generated using the posterior distribution. The optimal state data including a position estimate of the mobile robot. The mobile robot is controlled using at least the optimal state data.
    Type: Application
    Filed: December 29, 2023
    Publication date: July 3, 2025
    Inventors: Sandeep Reddy BADDAM, Sirajum MUNIR, Jonathan FRANCIS, Sushanta RAKSHIT, Martin COORS, Samarjit DAS, Vivek JAIN
  • Publication number: 20250207966
    Abstract: Methods and system for determining a quantity of fuel dispensed at a fueling station based on audio, as well as training such a system. Audio data is generated from one or more microphones, wherein the audio data is associated with stages of a refueling operation at a fueling station. A machine learning model is executed on the audio data to segment the audio data into segments, with each segment associated with a respective one of the stages of the refueling operation. The model also determines that one of the segments is associated with a fuel flow stage indicating fuel is flowing from a fuel storage. This allows the system to determine a quantity of fuel being dispensed, based on the time of the one segment.
    Type: Application
    Filed: December 20, 2023
    Publication date: June 26, 2025
    Inventors: IBRAHIM ESHERA, CHARLES SHELTON, SAMARJIT DAS
  • Publication number: 20250189970
    Abstract: A mobile robot includes a microphone array with a set of microphones. The microphone array is at least partially disposed on the mobile robot. The mobile robot receives audio signals from the microphone array. Audio feature data of acoustic activity is extracted from the audio signals. Direction of arrival (DOA) data of the acoustic activity is generated based on the audio signals. A machine learning model is configured to generate audio event data using the audio feature data. The audio event data identifies at least one sound source of the audio feature data. A knowledge graph is queried using the audio event data to obtain entity data. The entity data has a predetermined relation with the audio event data. Semantic audio scene data is generated using the audio event data, the DOA data, and the entity data. The mobile robot performs an action based on the semantic audio scene data.
    Type: Application
    Filed: December 7, 2023
    Publication date: June 12, 2025
    Inventors: Pongtep Angkititrakul, Jonathan Francis, Luca Bondi, Samarjit Das
  • Patent number: 12322411
    Abstract: Systems and methods for converting a primary one-dimensional signal into a secondary one-dimensional signal of another modality. The primary signal is spliced into a plurality of consecutive frames. A first linear transformation transforms the frames into corresponding vectors. Positional encodings are provided on the vectors to encode relative positional information associated with each sample within each frame. A multi-head self-attention machine-learning model compares relative importance of the samples within each vector to each other in that vector to yield high-level representation vectors. A second linear transformation transforms the high-level representation vectors into corresponding secondary signal frames. The secondary signal frames are concatenated into a reconstructed one-dimensional secondary signal having a different modality than the primary signal.
    Type: Grant
    Filed: September 29, 2023
    Date of Patent: June 3, 2025
    Assignee: Robert Bosch GmbH
    Inventors: Long Huang, Pongtep Angkititrakul, Samarjit Das