Patents by Inventor Yilin Shen

Yilin Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12260874
    Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.
    Type: Grant
    Filed: November 22, 2022
    Date of Patent: March 25, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Chou-Chang Yang, Ching-Hua Lee, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Yilin Shen, Hongxia Jin
  • Publication number: 20250090033
    Abstract: A method for performing cuffless blood pressure (BP) measurement, including: obtaining a first physiological signal and a second physiological signal associated with a user; providing the first physiological signal as an input to a first transformer model; providing the second physiological signal as an input to a second transformer model; providing an output of the first transformer model and an output of the second transformer model as inputs to a third transformer model; providing an output of the third transformer model to at least one BP estimation model; and generating an estimated BP value corresponding to the first physiological signal and the second physiological signal based on an output of the at least one BP estimation model
    Type: Application
    Filed: September 12, 2024
    Publication date: March 20, 2025
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Suhas BETTAPALLI NAGARAJ, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Jaejin Cho, Ching-Hua Lee, Chouchang Yang, Yilin Shen, Hongxia Jin
  • Publication number: 20250094709
    Abstract: A method for performing multi-token prediction by an apparatus includes receiving, from an artificial intelligence (AI) assistance device, a request for an output token sequence that is subsequent to an input token sequence indicated by the request, predicting, by a trained machine learning model, a plurality of candidate output tokens, estimating joint probability distributions of one or more combinations of the plurality of candidate output tokens, calculating joint probabilities of the one or more combinations by masking the joint probability distributions with a co-occurrence weighted mask, determining, based on the joint probabilities, whether to reduce the number of candidate output tokens included in each combination of the one or more combinations, identifying, based on the joint probabilities, a combination of the one or more combinations as the output token sequence, and outputting, to the AI assistance device, a response to the request, the response comprising the output token sequence.
    Type: Application
    Filed: September 6, 2024
    Publication date: March 20, 2025
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Shikhar TULI, Chi-Heng Lin, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
  • Publication number: 20250094820
    Abstract: A method for enabling an improved device control capability of a small language model (SLM) transferrable to a hub device configured to be operable by a user in an environment, is disclosed. The method includes performing a fine-tuning the SLM based on a data set including base plans and contrastive plans; generating computer codes corresponding to the fine-tuned SLM; and transferring the generated computer codes to the hub device to be connected with a group of the electronic devices in the environment.
    Type: Application
    Filed: September 4, 2024
    Publication date: March 20, 2025
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Sudipta PAUL, Lingyu ZHANG, Yilin SHEN, Hongxia JIN
  • Publication number: 20250095666
    Abstract: A method for generating a customized speech enhancement model includes obtaining noisy-clean speech data from a source domain, obtaining noisy speech data from a target domain; obtaining raw speech data, using the noisy-clean speech data, the noisy speech data, and the raw speech data, training the customized SE model based on at least one of self-supervised representation-based adaptation (SSRA), ensemble mapping, or self-supervised adaptation loss, generating the customized SE model by denoising the noisy speech data using the trained customized SE model, and providing the customized SE model to a user device to use the denoised noisy speech data.
    Type: Application
    Filed: September 13, 2024
    Publication date: March 20, 2025
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ching-Hua LEE, Chouchang YANG, Rakshith Sharma SRINIVASA, Yashas Malur SAIDUTTA, Jaejin CHO, Yilin SHEN, Hongxia JIN
  • Publication number: 20250095638
    Abstract: A method includes: receiving one or more training text sentences; generating one or more training vectors based on inputting the one or more training sentences input into a text encoder, the one or more training vectors corresponding to one or more operations that an electronic device is configured to perform; generating one or more speech vectors based on one or more speech utterances input into a speech encoder; generating a similarity matrix that compares each of the one or more training vectors with each of the one or more speech vectors; and updating at least one of the text encoder and the speech encoder based on the similarity matrix.
    Type: Application
    Filed: September 20, 2024
    Publication date: March 20, 2025
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jaejin CHO, Rakshith Sharma Srinivasa, Chou-chang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin
  • Patent number: 12211486
    Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.
    Type: Grant
    Filed: January 10, 2022
    Date of Patent: January 28, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Avik Ray, Yilin Shen, Hongxia Jin
  • Patent number: 12210835
    Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.
    Type: Grant
    Filed: September 16, 2022
    Date of Patent: January 28, 2025
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Peixi Xiong, Yilin Shen, Hongxia Jin
  • Publication number: 20250029005
    Abstract: A method includes accessing a plurality of weight matrices of a machine learning model. The method also includes, for each weight matrix, decomposing the weight matrix into a U matrix, an S matrix, and a V matrix using singular value decomposition. The S matrix is a diagonal matrix, and a singular group corresponds to each element in the S matrix. The method further includes, for each weight matrix, determining an importance score of each singular group. The importance score of the singular group represents a change in loss if the singular group is removed from the machine learning model. The method also includes, for each weight matrix, ranking the singular groups across the plurality of weight matrices based on the importance scores. In addition, the method includes, for each weight matrix, identifying one or more of the singular groups to prune based on the ranking of the singular groups.
    Type: Application
    Filed: May 20, 2024
    Publication date: January 23, 2025
    Inventors: Ting Hua, Xiao Li, Shangqian Gao, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
  • Publication number: 20250021826
    Abstract: In one embodiment, a method includes accessing at least a portion of a training dataset for a trained neural network that includes multiple layers, where each layer includes a number of parameters, and where the training dataset includes multiple training samples that each include an input and a ground-truth output used to train the trained neural network. The method further includes training a hypernetwork to generate a layer-specific compression mask for each of one or more of the multiple layers of the trained neural network. The method further includes generating, by the trained hypernetwork, a final layer-specific compression mask for the trained neural network and compressing the trained neural network by reducing, for each of the one or more layers of the neural network, the number of parameters of that layer according to the final layer-specific compression mask.
    Type: Application
    Filed: April 8, 2024
    Publication date: January 16, 2025
    Inventors: Shangqian Gao, Ting Hua, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
  • Patent number: 12183062
    Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.
    Type: Grant
    Filed: January 31, 2022
    Date of Patent: December 31, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
  • Publication number: 20240414448
    Abstract: Provided is a U-shaped network for image restoration. The U-shaped network is lightweight based on a transformer block and is suitable to be deployed on-device, such as in a smartphone. The U-shaped network uses the transformer block to implement encoder, decoder and bottleneck functions. Decoders are connected to respective encoders using skip connections based on element-wise addition, which avoids dimension expansion of concatenation. The transformer block uses a configuration of scaling and pool mixing to process input image data without the need for self-attention computations which permits reduction in memory, reduction in latency, reduction in computational demand, all while maintaining good output image quality.
    Type: Application
    Filed: November 21, 2023
    Publication date: December 12, 2024
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Karim AHMED, Yi WEI, Vasili RAMANISHKA, Yilin SHEN, Hongxia JIN
  • Publication number: 20240394592
    Abstract: A method includes accessing a training dataset having multiple samples, where each sample includes a data point for each of multiple modalities. The method also includes generating, using a first encoder associated with a first modality of the multiple modalities, first modality embeddings for data points of the first modality in the training dataset. The method further includes, for each first modality embedding, determining a similarity metric to other first modality embeddings. The method also includes generating, using a second encoder associated with a second modality of the multiple modalities, second modality embeddings for data points of the second modality in the training dataset. In addition, the method includes training the second encoder based on a contrastive loss function to align the first modality embeddings and the second modality embeddings from different samples of the training dataset, where the contrastive loss function is weighed using the similarity metrics.
    Type: Application
    Filed: February 6, 2024
    Publication date: November 28, 2024
    Inventors: Rakshith Sharma Srinivasa, Jaejin Cho, Chouchang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin
  • Publication number: 20240386575
    Abstract: Provided is a method and apparatus for obtaining a foreground image from an input image containing the foreground object in a scene. Embodiments use multi-scale convolutional attention values, one or more hamburger heads and one or more multilayer perceptrons to obtain a segmentation map of the input image. In some embodiments, progressive segmentation is applied to obtain the segmentation map.
    Type: Application
    Filed: December 1, 2023
    Publication date: November 21, 2024
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jing Zhu, Karim Ahmed, Wenbo Li, Yilin Shen, Hongxia Jin
  • Publication number: 20240377829
    Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.
    Type: Application
    Filed: November 3, 2023
    Publication date: November 14, 2024
    Inventors: Yilin Shen, Kaiwen Zhou, Hongxia Jin
  • Patent number: 12127726
    Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.
    Type: Grant
    Filed: April 15, 2021
    Date of Patent: October 29, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yu Wang, Yilin Shen, Hongxia Jin
  • Publication number: 20240339123
    Abstract: A method includes receiving an audio input and generating a noisy time-frequency representation based on the audio input. The method also includes providing the noisy time-frequency representation to a noise management model trained to predict a denoising mask and a signal presence probability (SPP) map indicating a likelihood of a presence of speech. The method further includes determining an enhanced spectrogram using the denoising mask and the noisy time-frequency representation. The method also includes providing the enhanced spectrogram and the SPP map as inputs to a keyword classification model trained to determine a likelihood of a keyword being present in the audio input. In addition, the method includes, responsive to determining that a keyword is in the audio input, transmitting the audio input to a downstream application associated with the keyword.
    Type: Application
    Filed: September 20, 2023
    Publication date: October 10, 2024
    Inventors: Chou-Chang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Yilin Shen, Hongxia Jin
  • Publication number: 20240331715
    Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.
    Type: Application
    Filed: August 29, 2023
    Publication date: October 3, 2024
    Inventors: Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
  • Publication number: 20240311693
    Abstract: A method includes obtaining input data associated with a new concept to be learned by a trained machine learning model. The method also includes identifying initial weights of the trained machine learning model and one or more previous weight deltas associated with the trained machine learning model. The method further includes identifying one or more additional weight deltas based on the input data and guided by the initial weights and the one or more previous weight deltas. In addition, the method includes integrating the one or more additional weight deltas into the trained machine learning model. The one or more additional weight deltas are integrated into the trained machine learning model by identifying updated weights for the trained machine learning model based on the initial weights, the one or more previous weight deltas, and the one or more additional weight deltas.
    Type: Application
    Filed: February 29, 2024
    Publication date: September 19, 2024
    Inventors: James S. Smith, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Lingyu Zhang, Ting Hua
  • Publication number: 20240203143
    Abstract: A method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. The model is trained during prompt tuning to select one attribute label and one object label that match content in the image. Performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. Intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.
    Type: Application
    Filed: August 23, 2023
    Publication date: June 20, 2024
    Inventors: Lingyu Zhang, Ting Hua, Yilin Shen, Hongxia Jin