Patents by Inventor Yilin Shen
Yilin Shen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12211486Abstract: A method includes identifying multiple tokens contained in an input utterance. The method also includes generating slot labels for at least some of the tokens contained in the input utterance using a trained machine learning model. The method further includes determining at least one action to be performed in response to the input utterance based on at least one of the slot labels. The trained machine learning model is trained to use attention distributions generated such that (i) the attention distributions associated with tokens having dissimilar slot labels are forced to be different and (ii) the attention distribution associated with each token is forced to not focus primarily on that token itself.Type: GrantFiled: January 10, 2022Date of Patent: January 28, 2025Assignee: Samsung Electronics Co., Ltd.Inventors: Avik Ray, Yilin Shen, Hongxia Jin
-
Patent number: 12210835Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.Type: GrantFiled: September 16, 2022Date of Patent: January 28, 2025Assignee: Samsung Electronics Co., Ltd.Inventors: Peixi Xiong, Yilin Shen, Hongxia Jin
-
Publication number: 20250029005Abstract: A method includes accessing a plurality of weight matrices of a machine learning model. The method also includes, for each weight matrix, decomposing the weight matrix into a U matrix, an S matrix, and a V matrix using singular value decomposition. The S matrix is a diagonal matrix, and a singular group corresponds to each element in the S matrix. The method further includes, for each weight matrix, determining an importance score of each singular group. The importance score of the singular group represents a change in loss if the singular group is removed from the machine learning model. The method also includes, for each weight matrix, ranking the singular groups across the plurality of weight matrices based on the importance scores. In addition, the method includes, for each weight matrix, identifying one or more of the singular groups to prune based on the ranking of the singular groups.Type: ApplicationFiled: May 20, 2024Publication date: January 23, 2025Inventors: Ting Hua, Xiao Li, Shangqian Gao, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
-
Publication number: 20250021826Abstract: In one embodiment, a method includes accessing at least a portion of a training dataset for a trained neural network that includes multiple layers, where each layer includes a number of parameters, and where the training dataset includes multiple training samples that each include an input and a ground-truth output used to train the trained neural network. The method further includes training a hypernetwork to generate a layer-specific compression mask for each of one or more of the multiple layers of the trained neural network. The method further includes generating, by the trained hypernetwork, a final layer-specific compression mask for the trained neural network and compressing the trained neural network by reducing, for each of the one or more layers of the neural network, the number of parameters of that layer according to the final layer-specific compression mask.Type: ApplicationFiled: April 8, 2024Publication date: January 16, 2025Inventors: Shangqian Gao, Ting Hua, Yen-Chang Hsu, Yilin Shen, Hongxia Jin
-
Patent number: 12183062Abstract: A method includes obtaining a batch of training data including multiple paired image-text pairs and multiple unpaired image-text pairs, where each paired image-text pair and each unpaired image-text pair includes an image and a text. The method also includes training a machine learning model using the training data based on an optimization of a combination of losses. The losses include, for each paired image-text pair, (i) a first multi-modal representation loss based on the paired image-text pair and (ii) a second multi-modal representation loss based on two or more unpaired image-text pairs, selected from among the multiple unpaired image-text pairs, wherein each of the two or more unpaired image-text pairs includes either the image or the text of the paired image-text pair.Type: GrantFiled: January 31, 2022Date of Patent: December 31, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Changsheng Zhao, Burak Uzkent, Yilin Shen, Hongxia Jin
-
Publication number: 20240414448Abstract: Provided is a U-shaped network for image restoration. The U-shaped network is lightweight based on a transformer block and is suitable to be deployed on-device, such as in a smartphone. The U-shaped network uses the transformer block to implement encoder, decoder and bottleneck functions. Decoders are connected to respective encoders using skip connections based on element-wise addition, which avoids dimension expansion of concatenation. The transformer block uses a configuration of scaling and pool mixing to process input image data without the need for self-attention computations which permits reduction in memory, reduction in latency, reduction in computational demand, all while maintaining good output image quality.Type: ApplicationFiled: November 21, 2023Publication date: December 12, 2024Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Karim AHMED, Yi WEI, Vasili RAMANISHKA, Yilin SHEN, Hongxia JIN
-
Publication number: 20240394592Abstract: A method includes accessing a training dataset having multiple samples, where each sample includes a data point for each of multiple modalities. The method also includes generating, using a first encoder associated with a first modality of the multiple modalities, first modality embeddings for data points of the first modality in the training dataset. The method further includes, for each first modality embedding, determining a similarity metric to other first modality embeddings. The method also includes generating, using a second encoder associated with a second modality of the multiple modalities, second modality embeddings for data points of the second modality in the training dataset. In addition, the method includes training the second encoder based on a contrastive loss function to align the first modality embeddings and the second modality embeddings from different samples of the training dataset, where the contrastive loss function is weighed using the similarity metrics.Type: ApplicationFiled: February 6, 2024Publication date: November 28, 2024Inventors: Rakshith Sharma Srinivasa, Jaejin Cho, Chouchang Yang, Yashas Malur Saidutta, Ching-Hua Lee, Yilin Shen, Hongxia Jin
-
Publication number: 20240386575Abstract: Provided is a method and apparatus for obtaining a foreground image from an input image containing the foreground object in a scene. Embodiments use multi-scale convolutional attention values, one or more hamburger heads and one or more multilayer perceptrons to obtain a segmentation map of the input image. In some embodiments, progressive segmentation is applied to obtain the segmentation map.Type: ApplicationFiled: December 1, 2023Publication date: November 21, 2024Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Jing Zhu, Karim Ahmed, Wenbo Li, Yilin Shen, Hongxia Jin
-
Publication number: 20240377829Abstract: A method includes determining a specified object to locate within a surrounding environment. The method also includes causing a robot to capture an image and a depth map of the surrounding environment. The method further includes using a scene understanding model, predicting one or more rooms and one or more objects captured in the image. The method also includes updating a second map of the surrounding environment based on the predicted rooms, the predicted objects, the depth map, and a location of the robot. The method further includes determining a likelihood of the specified object being in a candidate room and a likelihood of the specified object being near a candidate object using a pre-trained large language model. The method also includes causing the robot to move to a next location for the robot to search for the specified object, based on the likelihoods and the second map.Type: ApplicationFiled: November 3, 2023Publication date: November 14, 2024Inventors: Yilin Shen, Kaiwen Zhou, Hongxia Jin
-
Patent number: 12127726Abstract: A method includes obtaining, using at least one processor of an electronic device, an image-query understanding model. The method also includes obtaining, using the at least one processor, an image and a user query associated with the image, where the image includes a target image area and the user query includes a target phrase. The method further includes retraining, using the at least one processor, the image-query understanding model using a correlation between the target image area and the target phrase to obtain a retrained image-query understanding model.Type: GrantFiled: April 15, 2021Date of Patent: October 29, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Yu Wang, Yilin Shen, Hongxia Jin
-
Publication number: 20240339123Abstract: A method includes receiving an audio input and generating a noisy time-frequency representation based on the audio input. The method also includes providing the noisy time-frequency representation to a noise management model trained to predict a denoising mask and a signal presence probability (SPP) map indicating a likelihood of a presence of speech. The method further includes determining an enhanced spectrogram using the denoising mask and the noisy time-frequency representation. The method also includes providing the enhanced spectrogram and the SPP map as inputs to a keyword classification model trained to determine a likelihood of a keyword being present in the audio input. In addition, the method includes, responsive to determining that a keyword is in the audio input, transmitting the audio input to a downstream application associated with the keyword.Type: ApplicationFiled: September 20, 2023Publication date: October 10, 2024Inventors: Chou-Chang Yang, Yashas Malur Saidutta, Rakshith Sharma Srinivasa, Ching-Hua Lee, Yilin Shen, Hongxia Jin
-
Publication number: 20240331715Abstract: A method includes receiving, during a first time window, a set of noisy audio signals from a plurality of audio input devices. The method also includes generating a noisy time-frequency representation based on the set of noisy audio signals. The method further includes providing the noisy time-frequency representation as an input to a mask estimation model trained to output a mask used to predict a clean time-frequency representation of clean speech audio from the noisy time-frequency representation. The method also includes determining beamforming filter weights based on the mask. The method further includes applying the beamforming filter weights to the noisy time-frequency representation to isolate the clean speech audio from the set of noisy audio signals. In addition, the method includes outputting the clean speech audio.Type: ApplicationFiled: August 29, 2023Publication date: October 3, 2024Inventors: Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
-
Publication number: 20240311693Abstract: A method includes obtaining input data associated with a new concept to be learned by a trained machine learning model. The method also includes identifying initial weights of the trained machine learning model and one or more previous weight deltas associated with the trained machine learning model. The method further includes identifying one or more additional weight deltas based on the input data and guided by the initial weights and the one or more previous weight deltas. In addition, the method includes integrating the one or more additional weight deltas into the trained machine learning model. The one or more additional weight deltas are integrated into the trained machine learning model by identifying updated weights for the trained machine learning model based on the initial weights, the one or more previous weight deltas, and the one or more additional weight deltas.Type: ApplicationFiled: February 29, 2024Publication date: September 19, 2024Inventors: James S. Smith, Yen-Chang Hsu, Yilin Shen, Hongxia Jin, Lingyu Zhang, Ting Hua
-
Publication number: 20240203143Abstract: A method includes obtaining an image, a set of attribute labels, and a set of object labels and performing prompt tuning of a pre-trained vision-language model having first and second textual encoders and a vision encoder. The model is trained during prompt tuning to select one attribute label and one object label that match content in the image. Performing the prompt tuning includes, for each attribute label-object label pair, generating object textual features associated with the object label using the first textual encoder, generating attribute textual features associated with the attribute label using the second textual encoder, and generating image features associated with the image using the vision encoder. Intermediate outputs from initial layers of the textual encoders and the vision encoder are combined to generate layer-specific learnable prompt tokens that are appended to inputs of specified layers in the first and second textual encoders and the vision encoder.Type: ApplicationFiled: August 23, 2023Publication date: June 20, 2024Inventors: Lingyu Zhang, Ting Hua, Yilin Shen, Hongxia Jin
-
Publication number: 20240185850Abstract: A method includes extracting, using a keyword detection model, audio features from audio data. The method also includes processing the audio features by a first layer of the keyword detection model configured to predict a first likelihood that the audio data includes speech. The method also includes processing the audio features by a second layer of the keyword detection model configured to predict a second likelihood that the audio data includes keyword-like speech. The method also includes processing the audio features by a third layer of the keyword detection model configured to predict a third likelihood, for each of a plurality of possible keywords, that the audio data includes the keyword. The method also includes identifying a keyword included in the audio data. The method also includes generating instructions to perform an action based at least in part on the identified keyword.Type: ApplicationFiled: July 14, 2023Publication date: June 6, 2024Inventors: Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Ching-Hua Lee, Chou-Chang Yang, Yilin Shen, Hongxia Jin
-
Publication number: 20240119077Abstract: A method of performing a multimodal tasks by using a multimodal model that includes a text encoder and a vision encoder, may include obtaining a text feature from the query via the text encoder; obtaining an image feature from the one or more input images via the vision encoder; and outputting a response to the query based on similarity between the text feature and the image feature, wherein weights vectors of the text encoder and the vision encoder are pruned and shared according to a sharing vector and a pruning vector that are generated by a hypernetwork, and wherein the hypernetwork and the multimodal model are jointly trained to minimize at least one of a difference between the weight vectors in the text encoder and the vision encoder, a difference between the weight vectors in different layers of the text encoder, and a number of parameters in the multimodal model.Type: ApplicationFiled: September 14, 2023Publication date: April 11, 2024Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Shangqian GAO, Burak UZKENT, Yilin SHEN, Hongxia JIN
-
Publication number: 20240104309Abstract: A method includes receiving an input for a large language model (LLM) from a user. The method also includes generating one or more token embeddings based on the input. The method further includes generating one or more prompt embeddings based on the input using a contextual prompt generator (CPG), the one or more prompt embeddings representing new or updated information that is not contained in existing knowledge of the LLM. The method also includes providing the one or more token embeddings and the one or more prompt embeddings to the LLM. In addition, the method includes outputting a prediction based on the one or more token embeddings and the one or more prompt embeddings using the LLM, wherein the prediction reflects the new or updated information represented by the one or more prompt embeddings.Type: ApplicationFiled: September 12, 2023Publication date: March 28, 2024Inventors: Yen-Chang Hsu, Harshavardhan Kamarthi, Yilin Shen, Hongxia Jin
-
Publication number: 20240080423Abstract: A method includes obtaining raw image data, where the raw image data includes data values each having most significant bits and least significant bits. The method also includes providing the raw image data to a trained machine learning model and generating processed image data using the trained machine learning model. The method further includes presenting an image based on the processed image data. The trained machine learning model is trained to modulate a feature map associated with the most significant bits of the data values of the raw image data based on the least significant bits of the data values of the raw image data in order to generate a fusion of the most significant bits and the least significant bits of the data values of the raw image data.Type: ApplicationFiled: November 18, 2022Publication date: March 7, 2024Inventors: Wenbo Li, Zhipeng Mo, Yi Wei, Burak Uzkent, Qian Lou, Yilin Shen, Hongxia Jin
-
Publication number: 20240046946Abstract: A method includes obtaining, using at least one processing device, noisy speech signals and extracting, using the at least one processing device, acoustic features from the noisy speech signals. The method also includes receiving, using the at least one processing device, a predicted speech mask from a speech mask prediction model based on a first acoustic feature subset and receiving, using the at least one processing device, a predicted noise mask from a noise mask prediction model based on a second acoustic feature subset. The method further includes providing, using the at least one processing device, predicted speech features determined using the predicted speech mask and predicted noise features determined using the predicted noise mask to a filtering mask prediction model. In addition, the method includes generating, using the at least one processing device, a clean speech signal using a predicted filtering mask output by the filtering mask prediction model.Type: ApplicationFiled: November 22, 2022Publication date: February 8, 2024Inventors: Chou-Chang Yang, Ching-Hua Lee, Rakshith Sharma Srinivasa, Yashas Malur Saidutta, Yilin Shen, Hongxia Jin
-
Patent number: 11875231Abstract: An electronic device for complex task machine learning includes at least one memory and at least one processor coupled to the at least one memory. The at least one processor is configured to receive an unknown command for performing a task and generate a prompt regarding the unknown command. The at least one processor is also configured to receive one or more instructions in response to the prompt, where each of the one or more instructions provides information on performing at least a portion of the task. The at least one processor is further configured to determine at least one action for each one of the one or more instructions. In addition, the at least one processor is configured to create a complex action for performing the task based on the at least one action for each one of the one or more instructions.Type: GrantFiled: October 23, 2019Date of Patent: January 16, 2024Assignee: Samsung Electronics Co., Ltd.Inventors: Avik Ray, Yilin Shen, Hongxia Jin