Patents by Inventor Yingbo Zhou
Yingbo Zhou has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11056099Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.Type: GrantFiled: September 5, 2019Date of Patent: July 6, 2021Assignee: salesforce.com, inc.Inventors: Yingbo Zhou, Caiming Xiong
-
Publication number: 20210089882Abstract: Systems and methods are provided for near-zero-cost (NZC) query framework or approach for differentially private deep learning. To protect the privacy of training data during learning, the near-zero-cost query framework transfers knowledge from an ensemble of teacher models trained on partitions of the data to a student model. Privacy guarantees may be understood intuitively and expressed rigorously in terms of differential privacy. Other features are also provided.Type: ApplicationFiled: October 21, 2019Publication date: March 25, 2021Inventors: Lichao SUN, Jia LI, Caiming XIONG, Yingbo ZHOU
-
Patent number: 10958925Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.Type: GrantFiled: November 18, 2019Date of Patent: March 23, 2021Assignee: salesforce.com, inc.Inventors: Yingbo Zhou, Luowei Zhou, Caiming Xiong, Richard Socher
-
Patent number: 10783875Abstract: A system for domain adaptation includes a domain adaptation model configured to adapt a representation of a signal in a first domain to a second domain to generate an adapted presentation and a plurality of discriminators corresponding to a plurality of bands of values of a domain variable. Each of the plurality of discriminators is configured to discriminate between the adapted representation and representations of one or more other signals in the second domain.Type: GrantFiled: July 3, 2018Date of Patent: September 22, 2020Assignee: salesforce.com, inc.Inventors: Ehsan Hosseini-Asl, Caiming Xiong, Yingbo Zhou, Richard Socher
-
Publication number: 20200104699Abstract: Embodiments for training a neural network using sequential tasks are provided. A plurality of sequential tasks are received. For each task in the plurality of tasks a copy of the neural network that includes a plurality of layers is generated. From the copy of the neural network a task specific neural network is generated by performing an architectural search on the plurality of layers in the copy of the neural network. The architectural search identifies a plurality of candidate choices in the layers of the task specific neural network. Parameters in the task specific neural network that correspond to the plurality of candidate choices and that maximize architectural weights at each layer are identified. The parameters are retrained and merged with the neural network. The neural network trained on the plurality of sequential tasks is a trained neural network.Type: ApplicationFiled: October 31, 2018Publication date: April 2, 2020Inventors: Yingbo ZHOU, Xilai LI, Caiming XIONG
-
Publication number: 20200084465Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.Type: ApplicationFiled: November 18, 2019Publication date: March 12, 2020Inventors: Yingbo ZHOU, Luowei ZHOU, Caiming XIONG, Richard SOCHER
-
Patent number: 10573295Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.Type: GrantFiled: January 23, 2018Date of Patent: February 25, 2020Assignee: salesforce.com, inc.Inventors: Yingbo Zhou, Caiming Xiong
-
Patent number: 10542270Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.Type: GrantFiled: January 18, 2018Date of Patent: January 21, 2020Assignee: salesforce.com, inc.Inventors: Yingbo Zhou, Luowei Zhou, Caiming Xiong, Richard Socher
-
Publication number: 20200005765Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.Type: ApplicationFiled: September 5, 2019Publication date: January 2, 2020Inventors: Yingbo ZHOU, Caiming XIONG
-
Publication number: 20190295530Abstract: A system for domain adaptation includes a domain adaptation model configured to adapt a representation of a signal in a first domain to a second domain to generate an adapted presentation and a plurality of discriminators corresponding to a plurality of bands of values of a domain variable. Each of the plurality of discriminators is configured to discriminate between the adapted representation and representations of one or more other signals in the second domain.Type: ApplicationFiled: July 3, 2018Publication date: September 26, 2019Applicant: salesforce.com, inc.Inventors: Ehsan Hosseini-Asl, Caiming Xiong, Yingbo Zhou, Richard Socher
-
Publication number: 20190286073Abstract: A method for training parameters of a first domain adaptation model includes evaluating a cycle consistency objective using a first task specific model associated with a first domain and a second task specific model associated with a second domain. The evaluating the cycle consistency objective is based on one or more first training representations adapted from the first domain to the second domain by a first domain adaptation model and from the second domain to the first domain by a second domain adaptation model, and one or more second training representations adapted from the second domain to the first domain by the second domain adaptation model and from the first domain to the second domain by the first domain adaptation model. The method further includes evaluating a learning objective based on the cycle consistency objective, and updating parameters of the first domain adaptation model based on learning objective.Type: ApplicationFiled: August 3, 2018Publication date: September 19, 2019Inventors: Ehsan Hosseini-Asl, Caiming Xiong, Yingbo Zhou, Richard Socher
-
Publication number: 20190149834Abstract: Systems and methods for dense captioning of a video include a multi-layer encoder stack configured to receive information extracted from a plurality of video frames, a proposal decoder coupled to the encoder stack and configured to receive one or more outputs from the encoder stack, a masking unit configured to mask the one or more outputs from the encoder stack according to one or more outputs from the proposal decoder, and a decoder stack coupled to the masking unit and configured to receive the masked one or more outputs from the encoder stack. Generating the dense captioning based on one or more outputs of the decoder stack. In some embodiments, the one or more outputs from the proposal decoder include a differentiable mask. In some embodiments, during training, error in the dense captioning is back propagated to the decoder stack, the encoder stack, and the proposal decoder.Type: ApplicationFiled: January 18, 2018Publication date: May 16, 2019Inventors: Yingbo Zhou, Luowei ZHOU, Caiming XIONG, Richard SOCHER
-
Publication number: 20190130897Abstract: The disclosed technology teaches a deep end-to-end speech recognition model, including using multi-objective learning criteria to train a deep end-to-end speech recognition model on training data comprising speech samples temporally labeled with ground truth transcriptions.Type: ApplicationFiled: January 23, 2018Publication date: May 2, 2019Applicant: salesforce.com, inc.Inventors: Yingbo Zhou, Caiming Xiong
-
Publication number: 20190130896Abstract: The disclosed technology teaches regularizing a deep end-to-end speech recognition model to reduce overfitting and improve generalization: synthesizing sample speech variations on original speech samples labelled with text transcriptions, and modifying a particular original speech sample to independently vary tempo and pitch of the original speech sample while retaining the labelled text transcription of the original speech sample, thereby producing multiple sample speech variations having multiple degrees of variation from the original speech sample. The disclosed technology includes training a deep end-to-end speech recognition model, on thousands to millions of original speech samples and the sample speech variations on the original speech samples, that outputs recognized text transcriptions corresponding to speech detected in the original speech samples and the sample speech variations.Type: ApplicationFiled: December 21, 2017Publication date: May 2, 2019Applicant: salesforce.com, inc.Inventors: Yingbo ZHOU, Caiming XIONG, Richard SOCHER
-
Patent number: 9432671Abstract: A method, non-transitory computer readable medium, and apparatus for classifying machine printed text and handwritten text in an input are disclosed. For example, the method defines a perspective for an auto-encoder, receives the input for the auto-encoder, wherein the input comprises a document comprising the machine printed text and the handwritten text, performs an encoding on the input using an auto-encoder to generate a classifier, applies the classifier on the input and generates an output that separates the machine printed text and the handwritten text in the input based on the classifier in accordance with the perspective.Type: GrantFiled: May 22, 2014Date of Patent: August 30, 2016Assignee: Xerox CorporationInventors: Michael Robert Campanelli, Safwan R. Wshah, Yingbo Zhou
-
Publication number: 20150339543Abstract: A method, non-transitory computer readable medium, and apparatus for classifying machine printed text and handwritten text in an input are disclosed. For example, the method defines a perspective for an auto-encoder, receives the input for the auto-encoder, wherein the input comprises a document comprising the machine printed text and the handwritten text, performs an encoding on the input using an auto-encoder to generate a classifier, applies the classifier on the input and generates an output that separates the machine printed text and the handwritten text in the input based on the classifier in accordance with the perspective.Type: ApplicationFiled: May 22, 2014Publication date: November 26, 2015Applicant: Xerox CorporationInventors: Michael Robert Campanelli, Safwan R. Wshah, Yingbo Zhou
-
Patent number: 8872909Abstract: A system for extracting finger vein and finger texture images from a finger of a person at the same time, the device including an image capture device configured to capture at least one image of at least one finger in a contactless manner, a feature extraction module configured to extract unique finger vein features and finger texture features from the at least one captured image, and a processing module configured to normalize the at least one captured image and integrate the extracted finger vein features and finger texture features.Type: GrantFiled: June 1, 2011Date of Patent: October 28, 2014Assignee: The Hong Kong Polytechnic UniversityInventors: Ajay Kumar, Yingbo Zhou
-
Publication number: 20110304720Abstract: A system for extracting finger vein and finger texture images from a finger of a person at the same time, the device including an image capture device configured to capture at least one image of at least one finger in a contactless manner, a feature extraction module configured to extract unique finger vein features and finger texture features from the at least one captured image, and a processing module configured to normalize the at least one captured image and integrate the extracted finger vein features and finger texture features.Type: ApplicationFiled: June 1, 2011Publication date: December 15, 2011Applicant: The Hong Kong Polytechnic UniversityInventors: Ajay Kumar, Yingbo Zhou