Patents by Inventor Yilin Shen

Yilin Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speech denoising networks using speech and noise modeling

Patent number: 12260874

Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.

Type: Grant

Filed: November 22, 2022

Date of Patent: March 25, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Chou-Chang Yang, Ching-Hua Lee, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Yilin Shen, Hongxia Jin
METHOD FOR END-TO-END CUFF-LESS BLOOD PRESSURE MONITORING USING ECG AND PPG SIGNALS

Publication number: 20250090033

Abstract: A method for performing cuffless blood pressure (BP) measurement, including: obtaining a first physiological signal and a second physiological signal associated with a user; providing the first physiological signal as an input to a first transformer model; providing the second physiological signal as an input to a second transformer model; providing an output of the first transformer model and an output of the second transformer model as inputs to a third transformer model; providing an output of the third transformer model to at least one BP estimation model; and generating an estimated BP value corresponding to the first physiological signal and the second physiological signal based on an output of the at least one BP estimation model

Type: Application

Filed: September 12, 2024

Publication date: March 20, 2025

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Suhas BETTAPALLI NAGARAJ, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Jaejin Cho, Ching-Hua Lee, Chouchang Yang, Yilin Shen, Hongxia Jin
ENABLING DEVICE CONTROL PLANNING CAPABILITIES OF SMALL LANGUAGE MODEL

Publication number: 20250094820

Abstract: A method for enabling an improved device control capability of a small language model (SLM) transferrable to a hub device configured to be operable by a user in an environment, is disclosed. The method includes performing a fine-tuning the SLM based on a data set including base plans and contrastive plans; generating computer codes corresponding to the fine-tuned SLM; and transferring the generated computer codes to the hub device to be connected with a group of the electronic devices in the environment.

Type: Application

Filed: September 4, 2024

Publication date: March 20, 2025

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Sudipta PAUL, Lingyu ZHANG, Yilin SHEN, Hongxia JIN
ACCELERATING LANGUAGE MODEL INFERENCE WITH DYNAMIC MULTI-TOKEN SAMPLING

Publication number: 20250094709

Abstract: A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.

Type: Application

Filed: September 6, 2024

Publication date: March 20, 2025

Applicant: Samsung Electronics Co., Ltd.

Inventors: Shikhar TULI, Chi-Heng Lin, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
ZERO-SHOT INTENT CLASSIFICATION USING A SEMANTIC SIMILARITY AWARE CONTRASTIVE LOSS AND LARGE LANGUAGE MODEL

Publication number: 20250095638

Abstract: A method includes: receiving one or more training text sentences; generating one or more training vectors based on inputting the one or more training sentences input into a text encoder, the one or more training vectors corresponding to one or more operations that an electronic device is configured to perform; generating one or more speech vectors based on one or more speech utterances input into a speech encoder; generating a similarity matrix that compares each of the one or more training vectors with each of the one or more speech vectors; and updating at least one of the text encoder and the speech encoder based on the similarity matrix.

Type: Application

Filed: September 20, 2024

Publication date: March 20, 2025

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jaejin CHO, Rakshith Sharma Srinivasa, Chou-chang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin
LEVERAGING SELF-SUPERVISED SPEECH REPRESENTATIONS FOR DOMAIN ADAPTATION IN SPEECH ENHANCEMENT

Publication number: 20250095666

Abstract: A method for generating a customized speech enhancement model includes obtaining noisy-clean speech data from a source domain, obtaining noisy speech data from a target domain; obtaining raw speech data, using the noisy-clean speech data, the noisy speech data, and the raw speech data, training the customized SE model based on at least one of self-supervised representation-based adaptation (SSRA), ensemble mapping, or self-supervised adaptation loss, generating the customized SE model by denoising the noisy speech data using the trained customized SE model, and providing the customized SE model to a user device to use the denoised noisy speech data.

Type: Application

Filed: September 13, 2024

Publication date: March 20, 2025

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ching-Hua LEE, Chouchang YANG, Rakshith Sharma SRINIVASA, Yashas Malur SAIDUTTA, Jaejin CHO, Yilin SHEN, Hongxia JIN
Apparatus and method for compositional spoken language understanding

Patent number: 12211486

Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.

Type: Grant

Filed: January 10, 2022

Date of Patent: January 28, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Avik Ray, Yilin Shen, Hongxia Jin
Multi-granularity alignment for visual question answering

Patent number: 12210835

Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.

Type: Grant

Filed: September 16, 2022

Date of Patent: January 28, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Peixi Xiong, Yilin Shen, Hongxia Jin
DYNAMIC LOW-RANK ESTIMATION FOR TRANSFORMER-BASED LANGUAGE MODELS

Publication number: 20250029005

Abstract: A method includes accessing a plurality of weight matrices of a machine learning model. The method also includes, for each weight matrix, decomposing the weight matrix into a U matrix, an S matrix, and a V matrix using singular value decomposition. The S matrix is a diagonal matrix, and a singular group corresponds to each element in the S matrix. The method further includes, for each weight matrix, determining an importance score of each singular group. The importance score of the singular group represents a change in loss if the singular group is removed from the machine learning model. The method also includes, for each weight matrix, ranking the singular groups across the plurality of weight matrices based on the importance scores. In addition, the method includes, for each weight matrix, identifying one or more of the singular groups to prune based on the ranking of the singular groups.

Type: Application

Filed: May 20, 2024

Publication date: January 23, 2025

Inventors: Ting Hua, Xiao Li, Shangqian Gao, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
Low-Rank Compression of Neural Networks

Publication number: 20250021826

Abstract: In one embodiment, a method includes accessing at least a portion of a training dataset for a trained neural network that includes multiple layers, where each layer includes a number of parameters, and where the training dataset includes multiple training samples that each include an input and a ground-truth output used to train the trained neural network. The method further includes training a hypernetwork to generate a layer-specific compression mask for each of one or more of the multiple layers of the trained neural network. The method further includes generating, by the trained hypernetwork, a final layer-specific compression mask for the trained neural network and compressing the trained neural network by reducing, for each of the one or more layers of the neural network, the number of parameters of that layer according to the final layer-specific compression mask.

Type: Application

Filed: April 8, 2024

Publication date: January 16, 2025

Inventors: Shangqian Gao, Ting Hua, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
System and method for supervised contrastive learning for multi-modal tasks

Patent number: 12183062

Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.

Type: Grant

Filed: January 31, 2022

Date of Patent: December 31, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
EFFICIENT ON-DEVICE TRANSFORMER ARCHITECTURE FOR IMAGE PROCESSING

Publication number: 20240414448

Abstract: Provided is a U-shaped network for image restoration. The U-shaped network is lightweight based on a transformer block and is suitable to be deployed on-device, such as in a smartphone. The U-shaped network uses the transformer block to implement encoder, decoder and bottleneck functions. Decoders are connected to respective encoders using skip connections based on element-wise addition, which avoids dimension expansion of concatenation. The transformer block uses a configuration of scaling and pool mixing to process input image data without the need for self-attention computations which permits reduction in memory, reduction in latency, reduction in computational demand, all while maintaining good output image quality.

Type: Application

Filed: November 21, 2023

Publication date: December 12, 2024

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Karim AHMED, Yi WEI, Vasili RAMANISHKA, Yilin SHEN, Hongxia JIN
CROSS-MODAL TRANSFER WITH CONTINUOUSLY WEIGHTED CONTRASTIVE LOSS

Publication number: 20240394592

Abstract: A method includes accessing a training dataset having multiple samples, where each sample includes a data point for each of multiple modalities. The method also includes generating, using a first encoder associated with a first modality of the multiple modalities, first modality embeddings for data points of the first modality in the training dataset. The method further includes, for each first modality embedding, determining a similarity metric to other first modality embeddings. The method also includes generating, using a second encoder associated with a second modality of the multiple modalities, second modality embeddings for data points of the second modality in the training dataset. In addition, the method includes training the second encoder based on a contrastive loss function to align the first modality embeddings and the second modality embeddings from different samples of the training dataset, where the contrastive loss function is weighed using the similarity metrics.

Type: Application

Filed: February 6, 2024

Publication date: November 28, 2024

Inventors: Rakshith Sharma Srinivasa, Jaejin Cho, Chouchang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin
ONE-STAGE PROGRESSIVE DICHOTOMOUS SEGMENTATION

Publication number: 20240386575

Abstract: Provided is a method and apparatus for obtaining a foreground image from an input image containing the foreground object in a scene. Embodiments use multi-scale convolutional attention values, one or more hamburger heads and one or more multilayer perceptrons to obtain a segmentation map of the input image. In some embodiments, progressive segmentation is applied to obtain the segmentation map.

Type: Application

Filed: December 1, 2023

Publication date: November 21, 2024

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jing Zhu, Karim Ahmed, Wenbo Li, Yilin Shen, Hongxia Jin
SYSTEM AND METHOD FOR ZERO-SHOT OBJECT NAVIGATION USING LARGE LANGUAGE MODELS

Publication number: 20240377829

Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.

Type: Application

Filed: November 3, 2023

Publication date: November 14, 2024

Inventors: Yilin Shen, Kaiwen Zhou, Hongxia Jin
System and method for robust image-query understanding based on contextual features

Patent number: 12127726

Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.

Type: Grant

Filed: April 15, 2021

Date of Patent: October 29, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Yu Wang, Yilin Shen, Hongxia Jin
SYSTEM AND METHOD FOR KEYWORD SPOTTING IN NOISY ENVIRONMENTS

Publication number: 20240339123

Abstract: A method includes receiving an audio input and generating a noisy time-frequency representation based on the audio input. The method also includes providing the noisy time-frequency representation to a noise management model trained to predict a denoising mask and a signal presence probability (SPP) map indicating a likelihood of a presence of speech. The method further includes determining an enhanced spectrogram using the denoising mask and the noisy time-frequency representation. The method also includes providing the enhanced spectrogram and the SPP map as inputs to a keyword classification model trained to determine a likelihood of a keyword being present in the audio input. In addition, the method includes, responsive to determining that a keyword is in the audio input, transmitting the audio input to a downstream application associated with the keyword.

Type: Application

Filed: September 20, 2023

Publication date: October 10, 2024

Inventors: Chou-Chang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Yilin Shen, Hongxia Jin
SYSTEM AND METHOD FOR MASK-BASED NEURAL BEAMFORMING FOR MULTI-CHANNEL SPEECH ENHANCEMENT

Publication number: 20240331715

Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.

Type: Application

Filed: August 29, 2023

Publication date: October 3, 2024

Inventors: Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
SEQUENTIAL CUSTOMIZATION OF TEXT-TO-IMAGE DIFFUSION MODELS OR OTHER MACHINE LEARNING MODELS

Publication number: 20240311693

Abstract: A method includes obtaining input data associated with a new concept to be learned by a trained machine learning model. The method also includes identifying initial weights of the trained machine learning model and one or more previous weight deltas associated with the trained machine learning model. The method further includes identifying one or more additional weight deltas based on the input data and guided by the initial weights and the one or more previous weight deltas. In addition, the method includes integrating the one or more additional weight deltas into the trained machine learning model. The one or more additional weight deltas are integrated into the trained machine learning model by identifying updated weights for the trained machine learning model based on the initial weights, the one or more previous weight deltas, and the one or more additional weight deltas.

Type: Application

Filed: February 29, 2024

Publication date: September 19, 2024

Inventors: James S. Smith, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Lingyu Zhang, Ting Hua
PROMPT TUNING FOR ZERO-SHOT COMPOSITIONAL LEARNING IN MACHINE LEARNING SYSTEMS

Publication number: 20240203143

Abstract: A method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. The model is trained during prompt tuning to select one attribute label and one object label that match content in the image. Performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. Intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.

Type: Application

Filed: August 23, 2023

Publication date: June 20, 2024

Inventors: Lingyu Zhang, Ting Hua, Yilin Shen, Hongxia Jin

1 2 3 4 5 next