Patents by Inventor Yilin Shen
Yilin Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12127726Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.Type: GrantFiled: April 15, 2021Date of Patent: October 29, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Yu Wang, Yilin Shen, Hongxia Jin
-
Publication number: 20240339123Abstract: A method includes receiving an audio input and generating a noisy time-frequency representation based on the audio input. The method also includes providing the noisy time-frequency representation to a noise management model trained to predict a denoising mask and a signal presence probability (SPP) map indicating a likelihood of a presence of speech. The method further includes determining an enhanced spectrogram using the denoising mask and the noisy time-frequency representation. The method also includes providing the enhanced spectrogram and the SPP map as inputs to a keyword classification model trained to determine a likelihood of a keyword being present in the audio input. In addition, the method includes, responsive to determining that a keyword is in the audio input, transmitting the audio input to a downstream application associated with the keyword.Type: ApplicationFiled: September 20, 2023Publication date: October 10, 2024Inventors: Chou-Chang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Yilin Shen, Hongxia Jin
-
Publication number: 20240331715Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.Type: ApplicationFiled: August 29, 2023Publication date: October 3, 2024Inventors: Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
-
Publication number: 20240311693Abstract: A method includes obtaining input data associated with a new concept to be learned by a trained machine learning model. The method also includes identifying initial weights of the trained machine learning model and one or more previous weight deltas associated with the trained machine learning model. The method further includes identifying one or more additional weight deltas based on the input data and guided by the initial weights and the one or more previous weight deltas. In addition, the method includes integrating the one or more additional weight deltas into the trained machine learning model. The one or more additional weight deltas are integrated into the trained machine learning model by identifying updated weights for the trained machine learning model based on the initial weights, the one or more previous weight deltas, and the one or more additional weight deltas.Type: ApplicationFiled: February 29, 2024Publication date: September 19, 2024Inventors: James S. Smith, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Lingyu Zhang, Ting Hua
-
Publication number: 20240203143Abstract: A method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. The model is trained during prompt tuning to select one attribute label and one object label that match content in the image. Performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. Intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.Type: ApplicationFiled: August 23, 2023Publication date: June 20, 2024Inventors: Lingyu Zhang, Ting Hua, Yilin Shen, Hongxia Jin
-
Publication number: 20240185850Abstract: A method includes extracting, using a keyword detection model, audio features from audio data. The method also includes processing the audio features by a first layer of the keyword detection model configured to predict a first likelihood that the audio data includes speech. The method also includes processing the audio features by a second layer of the keyword detection model configured to predict a second likelihood that the audio data includes keyword-like speech. The method also includes processing the audio features by a third layer of the keyword detection model configured to predict a third likelihood, for each of a plurality of possible keywords, that the audio data includes the keyword. The method also includes identifying a keyword included in the audio data. The method also includes generating instructions to perform an action based at least in part on the identified keyword.Type: ApplicationFiled: July 14, 2023Publication date: June 6, 2024Inventors: Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
-
Publication number: 20240119077Abstract: A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.Type: ApplicationFiled: September 14, 2023Publication date: April 11, 2024Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Shangqian GAO, Burak UZKENT, Yilin SHEN, Hongxia JIN
-
Publication number: 20240104309Abstract: A method includes receiving an input for a large language model (LLM) from a user. The method also includes generating one or more token embeddings based on the input. The method further includes generating one or more prompt embeddings based on the input using a contextual prompt generator (CPG), the one or more prompt embeddings representing new or updated information that is not contained in existing knowledge of the LLM. The method also includes providing the one or more token embeddings and the one or more prompt embeddings to the LLM. In addition, the method includes outputting a prediction based on the one or more token embeddings and the one or more prompt embeddings using the LLM, wherein the prediction reflects the new or updated information represented by the one or more prompt embeddings.Type: ApplicationFiled: September 12, 2023Publication date: March 28, 2024Inventors: Yen-Chang Hsu, Harshavardhan Kamarthi, Yilin Shen, Hongxia Jin
-
Publication number: 20240080423Abstract: A method includes obtaining raw image data, where the raw image data includes data values each having most significant bits and least significant bits. The method also includes providing the raw image data to a trained machine learning model and generating processed image data using the trained machine learning model. The method further includes presenting an image based on the processed image data. The trained machine learning model is trained to modulate a feature map associated with the most significant bits of the data values of the raw image data based on the least significant bits of the data values of the raw image data in order to generate a fusion of the most significant bits and the least significant bits of the data values of the raw image data.Type: ApplicationFiled: November 18, 2022Publication date: March 7, 2024Inventors: Wenbo Li, Zhipeng Mo, Yi Wei, Burak Uzkent, Qian Lou, Yilin Shen, Hongxia Jin
-
Publication number: 20240046946Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.Type: ApplicationFiled: November 22, 2022Publication date: February 8, 2024Inventors: Chou-Chang Yang, Ching-Hua Lee, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Yilin Shen, Hongxia Jin
-
Patent number: 11875231Abstract: An electronic device for complex task machine learning includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive an unknown command for performing a task and generate a prompt regarding the unknown command. The at least one processor is also configured to receive one or more instructions in response to the prompt, where each of the one or more instructions provides information on performing at least a portion of the task. The at least one processor is further configured to determine at least one action for each one of the one or more instructions. In addition, the at least one processor is configured to create a complex action for performing the task based on the at least one action for each one of the one or more instructions.Type: GrantFiled: October 23, 2019Date of Patent: January 16, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Avik Ray, Yilin Shen, Hongxia Jin
-
Patent number: 11854528Abstract: An apparatus for detecting unsupported utterances in natural language understanding, includes a memory storing instructions, and at least one processor configured to execute the instructions to classify a feature that is extracted from an input utterance of a user, as one of in-domain and out-of-domain (OOD) for a response to the input utterance, obtain an OOD score of the extracted feature, and identify whether the feature is classified as OOD. The at least one processor is further configured to executed the instructions to, based on the feature being identified to be classified as in-domain, identify whether the obtained OOD score is greater than a predefined threshold, and based on the OOD score being identified to be greater than the predefined threshold, re-classify the feature as OOD.Type: GrantFiled: August 13, 2021Date of Patent: December 26, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yen-Chang Hsu, Yilin Shen, Avik Ray, Hongxia Jin
-
Patent number: 11775815Abstract: An electronic device including a deep memory model includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive input data to the deep memory model. The at least one processor is also configured to extract a history state of an external memory coupled to the deep memory model based on the input data. The at least one processor is further configured to update the history state of the external memory based on the input data. In addition, the at least one processor is configured to output a prediction based on the extracted history state of the external memory.Type: GrantFiled: August 8, 2019Date of Patent: October 3, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Yilin Shen, Yue Deng, Avik Ray, Hongxia Jin
-
Publication number: 20230289590Abstract: A method of training a model includes configuring a first transformer for visual learning with a first set of weights, configuring a second transformer for textual learning with a second set of weights, adjusting at least the second set of weights based on minimizing a weight difference between the first set of weights and the second set of weights, replacing the first set of weights for the first transformer with the adjusted second set of weights, and updating the first transformer based on the adjusted second set of weights.Type: ApplicationFiled: September 8, 2022Publication date: September 14, 2023Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Burak UZKENT, Vasili Ramanishka, Yilin Shen, Hongxia Jin
-
Patent number: 11741307Abstract: A method includes applying, by at least one processor, a natural language understanding (NLU) model to an input utterance in order to obtain initial slot probability distributions. The method also includes performing, by the at least one processor, a confidence calibration by applying a calibration probability distribution to the initial slot probability distributions in order to generate calibrated slot probability distributions. The calibration probability distribution has a higher number of dimensions than the initial slot probability distributions. The method further includes identifying, by the at least one processor, uncertainties associated with words in the input utterance based on the calibrated slot probability distributions. In addition, the method includes identifying, by the at least one processor, a new concept contained in the input utterance that is not recognized by the NLU model based on the identified uncertainties.Type: GrantFiled: October 20, 2020Date of Patent: August 29, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Yilin Shen, Hongxia Jin
-
Patent number: 11720814Abstract: A recognition method includes retrieving an input including data of a first window size. The method further includes classifying the input based on comparison of warping distance of the input with a pruning threshold.Type: GrantFiled: December 27, 2018Date of Patent: August 8, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yilin Shen, Yue Deng, Hongxia Jin
-
Patent number: 11721090Abstract: A recommendation method includes retrieving content consumption data including content consumed and content not consumed. Based on the content consumption data, identifying a first piece of content not consumed. A first feature of the first piece of content related to negative consumption of the first piece of content is determined. A first system is used to revise the first feature to a second feature. A second piece of content including the second feature is provided to an electronic device. The second piece of content is a revised instance of the first piece of content.Type: GrantFiled: July 20, 2018Date of Patent: August 8, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yue Deng, Yilin Shen, Hongxia Jin
-
Publication number: 20230245435Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.Type: ApplicationFiled: January 31, 2022Publication date: August 3, 2023Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
-
Patent number: 11681923Abstract: Intent determination based on one or more multi-model structures can include generating an output from each of a plurality of domain-specific models in response to a received input. The domain-specific models can comprise simultaneously trained machine learning models that are trained using a corresponding local loss metric for each domain-specific model and a global loss metric for the plurality of domain-specific models. The presence or absence of an intent corresponding to one or more domain-specific models can be determined by classifying the output of each domain-specific model.Type: GrantFiled: December 27, 2019Date of Patent: June 20, 2023Assignee: SAMSUNG ELECTRONICS CO., LTD.Inventors: Yu Wang, Yilin Shen, Yue Deng, Hongxia Jin
-
Publication number: 20230177332Abstract: A method includes accessing, using at least one processor of an electronic device, a machine learning model. The machine learning model is trained by directing a gradient direction of gradients to one or more flat local minima and using a dynamic learning rate for one or more additional tasks. The method also includes receiving, using the at least one processor, an input from an input source. The method further includes providing, using the at least one processor, the input to the machine learning model. The method also includes receiving, using the at least one processor, an output from the machine learning model. In addition, the method includes instructing, using the at least one processor, at least one action based on the output from the machine learning model.Type: ApplicationFiled: December 2, 2022Publication date: June 8, 2023Inventors: Sima Behpour, Yilin Shen, Hongxia Jin