Patents by Inventor Yilin Shen

Yilin Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240386575
    Abstract: Provided is a method and apparatus for obtaining a foreground image from an input image containing the foreground object in a scene. Embodiments use multi-scale convolutional attention values, one or more hamburger heads and one or more multilayer perceptrons to obtain a segmentation map of the input image. In some embodiments, progressive segmentation is applied to obtain the segmentation map.
    Type: Application
    Filed: December 1, 2023
    Publication date: November 21, 2024
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jing Zhu, Karim Ahmed, Wenbo Li, Yilin Shen, Hongxia Jin
  • Publication number: 20240377829
    Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.
    Type: Application
    Filed: November 3, 2023
    Publication date: November 14, 2024
    Inventors: Yilin Shen, Kaiwen Zhou, Hongxia Jin
  • Patent number: 12127726
    Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.
    Type: Grant
    Filed: April 15, 2021
    Date of Patent: October 29, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yu Wang, Yilin Shen, Hongxia Jin
  • Publication number: 20240339123
    Abstract: A method includes receiving an audio input and generating a noisy time-frequency representation based on the audio input. The method also includes providing the noisy time-frequency representation to a noise management model trained to predict a denoising mask and a signal presence probability (SPP) map indicating a likelihood of a presence of speech. The method further includes determining an enhanced spectrogram using the denoising mask and the noisy time-frequency representation. The method also includes providing the enhanced spectrogram and the SPP map as inputs to a keyword classification model trained to determine a likelihood of a keyword being present in the audio input. In addition, the method includes, responsive to determining that a keyword is in the audio input, transmitting the audio input to a downstream application associated with the keyword.
    Type: Application
    Filed: September 20, 2023
    Publication date: October 10, 2024
    Inventors: Chou-Chang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Yilin Shen, Hongxia Jin
  • Publication number: 20240331715
    Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.
    Type: Application
    Filed: August 29, 2023
    Publication date: October 3, 2024
    Inventors: Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
  • Publication number: 20240311693
    Abstract: A method includes obtaining input data associated with a new concept to be learned by a trained machine learning model. The method also includes identifying initial weights of the trained machine learning model and one or more previous weight deltas associated with the trained machine learning model. The method further includes identifying one or more additional weight deltas based on the input data and guided by the initial weights and the one or more previous weight deltas. In addition, the method includes integrating the one or more additional weight deltas into the trained machine learning model. The one or more additional weight deltas are integrated into the trained machine learning model by identifying updated weights for the trained machine learning model based on the initial weights, the one or more previous weight deltas, and the one or more additional weight deltas.
    Type: Application
    Filed: February 29, 2024
    Publication date: September 19, 2024
    Inventors: James S. Smith, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Lingyu Zhang, Ting Hua
  • Publication number: 20240203143
    Abstract: A method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. The model is trained during prompt tuning to select one attribute label and one object label that match content in the image. Performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. Intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.
    Type: Application
    Filed: August 23, 2023
    Publication date: June 20, 2024
    Inventors: Lingyu Zhang, Ting Hua, Yilin Shen, Hongxia Jin
  • Publication number: 20240185850
    Abstract: A method includes extracting, using a keyword detection model, audio features from audio data. The method also includes processing the audio features by a first layer of the keyword detection model configured to predict a first likelihood that the audio data includes speech. The method also includes processing the audio features by a second layer of the keyword detection model configured to predict a second likelihood that the audio data includes keyword-like speech. The method also includes processing the audio features by a third layer of the keyword detection model configured to predict a third likelihood, for each of a plurality of possible keywords, that the audio data includes the keyword. The method also includes identifying a keyword included in the audio data. The method also includes generating instructions to perform an action based at least in part on the identified keyword.
    Type: Application
    Filed: July 14, 2023
    Publication date: June 6, 2024
    Inventors: Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
  • Publication number: 20240119077
    Abstract: A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.
    Type: Application
    Filed: September 14, 2023
    Publication date: April 11, 2024
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Shangqian GAO, Burak UZKENT, Yilin SHEN, Hongxia JIN
  • Publication number: 20240104309
    Abstract: A method includes receiving an input for a large language model (LLM) from a user. The method also includes generating one or more token embeddings based on the input. The method further includes generating one or more prompt embeddings based on the input using a contextual prompt generator (CPG), the one or more prompt embeddings representing new or updated information that is not contained in existing knowledge of the LLM. The method also includes providing the one or more token embeddings and the one or more prompt embeddings to the LLM. In addition, the method includes outputting a prediction based on the one or more token embeddings and the one or more prompt embeddings using the LLM, wherein the prediction reflects the new or updated information represented by the one or more prompt embeddings.
    Type: Application
    Filed: September 12, 2023
    Publication date: March 28, 2024
    Inventors: Yen-Chang Hsu, Harshavardhan Kamarthi, Yilin Shen, Hongxia Jin
  • Publication number: 20240080423
    Abstract: A method includes obtaining raw image data, where the raw image data includes data values each having most significant bits and least significant bits. The method also includes providing the raw image data to a trained machine learning model and generating processed image data using the trained machine learning model. The method further includes presenting an image based on the processed image data. The trained machine learning model is trained to modulate a feature map associated with the most significant bits of the data values of the raw image data based on the least significant bits of the data values of the raw image data in order to generate a fusion of the most significant bits and the least significant bits of the data values of the raw image data.
    Type: Application
    Filed: November 18, 2022
    Publication date: March 7, 2024
    Inventors: Wenbo Li, Zhipeng Mo, Yi Wei, Burak Uzkent, Qian Lou, Yilin Shen, Hongxia Jin
  • Publication number: 20240046946
    Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.
    Type: Application
    Filed: November 22, 2022
    Publication date: February 8, 2024
    Inventors: Chou-Chang Yang, Ching-Hua Lee, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Yilin Shen, Hongxia Jin
  • Patent number: 11875231
    Abstract: An electronic device for complex task machine learning includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive an unknown command for performing a task and generate a prompt regarding the unknown command. The at least one processor is also configured to receive one or more instructions in response to the prompt, where each of the one or more instructions provides information on performing at least a portion of the task. The at least one processor is further configured to determine at least one action for each one of the one or more instructions. In addition, the at least one processor is configured to create a complex action for performing the task based on the at least one action for each one of the one or more instructions.
    Type: Grant
    Filed: October 23, 2019
    Date of Patent: January 16, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Avik Ray, Yilin Shen, Hongxia Jin
  • Patent number: 11854528
    Abstract: An apparatus for detecting unsupported utterances in natural language understanding, includes a memory storing instructions, and at least one processor configured to execute the instructions to classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, obtain an OOD score of the extracted feature, and identify whether the feature is classified as OOD. The at least one processor is further configured to executed the instructions to, based on the feature being identified to be classified as in-domain, identify whether the obtained OOD score is greater than a predefined threshold, and based on the OOD score being identified to be greater than the predefined threshold, re-classify the feature as OOD.
    Type: Grant
    Filed: August 13, 2021
    Date of Patent: December 26, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yen-Chang Hsu, Yilin Shen, Avik Ray, Hongxia Jin
  • Patent number: 11775815
    Abstract: An electronic device including a deep memory model includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive input data to the deep memory model. The at least one processor is also configured to extract a history state of an external memory coupled to the deep memory model based on the input data. The at least one processor is further configured to update the history state of the external memory based on the input data. In addition, the at least one processor is configured to output a prediction based on the extracted history state of the external memory.
    Type: Grant
    Filed: August 8, 2019
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yilin Shen, Yue Deng, Avik Ray, Hongxia Jin
  • Publication number: 20230289590
    Abstract: A method of training a model includes configuring a first transformer for visual learning with a first set of weights, configuring a second transformer for textual learning with a second set of weights, adjusting at least the second set of weights based on minimizing a weight difference between the first set of weights and the second set of weights, replacing the first set of weights for the first transformer with the adjusted second set of weights, and updating the first transformer based on the adjusted second set of weights.
    Type: Application
    Filed: September 8, 2022
    Publication date: September 14, 2023
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Burak UZKENT, Vasili Ramanishka, Yilin Shen, Hongxia Jin
  • Patent number: 11741307
    Abstract: A method includes applying, by at least one processor, a natural language understanding (NLU) model to an input utterance in order to obtain initial slot probability distributions. The method also includes performing, by the at least one processor, a confidence calibration by applying a calibration probability distribution to the initial slot probability distributions in order to generate calibrated slot probability distributions. The calibration probability distribution has a higher number of dimensions than the initial slot probability distributions. The method further includes identifying, by the at least one processor, uncertainties associated with words in the input utterance based on the calibrated slot probability distributions. In addition, the method includes identifying, by the at least one processor, a new concept contained in the input utterance that is not recognized by the NLU model based on the identified uncertainties.
    Type: Grant
    Filed: October 20, 2020
    Date of Patent: August 29, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yilin Shen, Hongxia Jin
  • Patent number: 11720814
    Abstract: A recognition method includes retrieving an input including data of a first window size. The method further includes classifying the input based on comparison of warping distance of the input with a pruning threshold.
    Type: Grant
    Filed: December 27, 2018
    Date of Patent: August 8, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yilin Shen, Yue Deng, Hongxia Jin
  • Patent number: 11721090
    Abstract: A recommendation method includes retrieving content consumption data including content consumed and content not consumed. Based on the content consumption data, identifying a first piece of content not consumed. A first feature of the first piece of content related to negative consumption of the first piece of content is determined. A first system is used to revise the first feature to a second feature. A second piece of content including the second feature is provided to an electronic device. The second piece of content is a revised instance of the first piece of content.
    Type: Grant
    Filed: July 20, 2018
    Date of Patent: August 8, 2023
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Yue Deng, Yilin Shen, Hongxia Jin
  • Publication number: 20230245435
    Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.
    Type: Application
    Filed: January 31, 2022
    Publication date: August 3, 2023
    Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin