Patents by Inventor Hamidreza VAEZI JOZE
Hamidreza VAEZI JOZE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230291993Abstract: Systems and methods are provided for determining faces and bodies of people in an image by adaptively scaling images and by iteratively using a deep neural network for inferencing. A camera captures an image including faces and bodies of people. A face/body determiner determines faces and bodies of people appearing in the image by resizing the image into a predetermined pixel dimension as input to the deep neural network. A region cropper determines a crop region associated with a low level of confidence in detecting faces and bodies that are too small to determine with an acceptable level of confidence. The region cropper resizes the crop region into the predetermined pixel dimension as input to the deep neural network. The face and body determiner determines other faces and bodies appearing in the resized crop region. An aggregator aggregates locations of the determined faces and bodies in the image.Type: ApplicationFiled: May 19, 2023Publication date: September 14, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Hamidreza VAEZI JOZE, Zehua WEI
-
Patent number: 11700445Abstract: Systems and methods are provided for determining faces and bodies of people in an image by adaptively scaling images and by iteratively using a deep neural network for inferencing. A camera captures an image including faces and bodies of people. A face/body determiner determines faces and bodies of people appearing in the image by resizing the image into a predetermined pixel dimension as input to the deep neural network. A region cropper determines a crop region associated with a low level of confidence in detecting faces and bodies that are too small to determine with an acceptable level of confidence. The region cropper resizes the crop region into the predetermined pixel dimension as input to the deep neural network. The face and body determiner determines other faces and bodies appearing in the resized crop region. An aggregator aggregates locations of the determined faces and bodies in the image.Type: GrantFiled: October 28, 2021Date of Patent: July 11, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Hamidreza Vaezi Joze, Zehua Wei
-
Publication number: 20230153379Abstract: A transformer is described herein for using transformer-based technology to process data items (e.g., image items). The transformer increases the efficiency of the transformer-based technology by using a modified attention component. In operation, the modified attention component accepts embedding vectors that represent a plurality of item tokens, together with a classification token. A first stage of the modified attention component generates original attention information based on the embedding vectors. A second stage generates score information based on a portion of the original attention information that pertains to the classification token. A third stage produces modified attention information by removing attention values from the original attention information, as guided by a sampling operation that is performed on the score information. The second and third stages do not rely on machine-trained values, which expedites the deployment of these functions in existing transformers.Type: ApplicationFiled: November 14, 2021Publication date: May 18, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Mohsen FAYYAZ, Soroush ABBASI KOOHPAYEGANI, Eric Chris Wolfgang SOMMERLADE, Hamidreza VAEZI JOZE
-
Publication number: 20230133854Abstract: Systems and methods are provided for determining faces and bodies of people in an image by adaptively scaling images and by iteratively using a deep neural network for inferencing. A camera captures an image including faces and bodies of people. A face/body determiner determines faces and bodies of people appearing in the image by resizing the image into a predetermined pixel dimension as input to the deep neural network. A region cropper determines a crop region associated with a low level of confidence in detecting faces and bodies that are too small to determine with an acceptable level of confidence. The region cropper resizes the crop region into the predetermined pixel dimension as input to the deep neural network. The face and body determiner determines other faces and bodies appearing in the resized crop region. An aggregator aggregates locations of the determined faces and bodies in the image.Type: ApplicationFiled: October 28, 2021Publication date: May 4, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Hamidreza VAEZI JOZE, Zehua WEI
-
Publication number: 20220405521Abstract: An image processor receives first image data representing an image. The first image data comprising a plurality of color values corresponding to a plurality of pixels in the image. The image processor determines, using a trained machine learning model, second image data based on the first image data. The second image data comprises surface spectral reflection values corresponding to the plurality of pixels in the image, where the surface spectral reflection values are distributed across a plurality of wavelengths of visible light in the image. The image processor then performs at least one image processing operation with respect to the image using the second image data.Type: ApplicationFiled: June 21, 2021Publication date: December 22, 2022Applicant: Microsoft Technology Licensing, LLCInventor: Hamidreza Vaezi JOZE
-
Patent number: 11526972Abstract: Described herein are technologies related to correcting image degradations in images. An image is received, and values for features that are usable to correct for image degradation associated with blur, noise, and low light conditions are generated by separate encoders based upon the received image. A fusion network learned by way of network architecture search fuses these values, and the fused values are employed to generate an improved image, such that the image degradations associated with blur, noise, and low light conditions are simultaneously corrected.Type: GrantFiled: February 1, 2021Date of Patent: December 13, 2022Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Hamidreza Vaezi Joze, Rajeev Yasarla
-
Publication number: 20220358332Abstract: A method of training a neural network for detecting target features in images is described. The neural network is trained using a first data set that includes labeled images, where at least some of the labeled images having subjects with labeled features, including: dividing each of the labeled images of the first data set into a respective plurality of tiles, and generating, for each of the plurality of tiles, a plurality of feature anchors that indicate target features within the corresponding tile. Target features that correspond to the plurality of feature anchors are detected in a second data set of unlabeled images. Images of the second data set having target features that were not detected are labeled. A third data set that includes the first data set and the labeled images of the second data set is generated. The neural network is trained using the third data set.Type: ApplicationFiled: May 7, 2021Publication date: November 10, 2022Applicant: Microsoft Technology Licensing, LLCInventors: Hamidreza Vaezi JOZE, Vivek PRADEEP, Karthik VIJAYAN
-
Publication number: 20220245776Abstract: Described herein are technologies related to correcting image degradations in images. An image is received, and values for features that are usable to correct for image degradation associated with blur, noise, and low light conditions are generated by separate encoders based upon the received image. A fusion network learned by way of network architecture search fuses these values, and the fused values are employed to generate an improved image, such that the image degradations associated with blur, noise, and low light conditions are simultaneously corrected.Type: ApplicationFiled: February 1, 2021Publication date: August 4, 2022Inventors: Hamidreza VAEZI JOZE, Rajeev YASARLA
-
Patent number: 11354934Abstract: Electronic fingerprint readers are often used for security such as log-in authentication for the identification of a user for selective access to a computing system. As computing devices shrink in overall size and with downward pressure on device pricing, smaller and less expensive fingerprint readers are increasingly desired. While whole fingerprint readers have the greatest accuracy in user identification, the whole fingerprint is often not required for user identification. Often, only a portion of the user's fingerprint is required to adequately identify the user and thus a small segment fingerprint reader may be sufficient for user authentication. However, the smaller the sensing area of the small-segment fingerprint reader, the more likely that the fingerprint reader misidentifies the user or fails to collect sufficient information to identify the user. Systems and methods for improving identification accuracy of small-segment fingerprint readers are disclosed in detail herein.Type: GrantFiled: May 3, 2018Date of Patent: June 7, 2022Assignee: Microsoft Technology Licensing, LLCInventor: Hamidreza Vaezi Joze
-
Patent number: 10931976Abstract: In an embodiment described herein, a method for face-speech bridging by cycle video/audio reconstruction is described. The method comprises encoding audio data and video data via a mutual autoencoders that comprise an audio autoencoder and a video autoencoder, wherein the mutual autoencoders share a common space with corresponding embeddings derived by each of the audio autoencoder and the video autoencoder. Additionally, the method comprises substituting embeddings from a non-corrupted modality for corresponding corrupted embeddings in a corrupted modality in real-time based at least in part on corrupted audio data or corrupted video data. The method also comprises synthesizing reconstructed audio data and reconstructed video data based on, at least in part, the substituted embeddings.Type: GrantFiled: October 14, 2019Date of Patent: February 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Hamidreza Vaezi Joze, Hassan Akbari
-
Patent number: 10929676Abstract: Implementations described herein discloses a multi-modality video recognition system. Specifically, the multi-modality video recognition system is configured to train a plurality of classifier networks, each of the classifier network trained with a different one of the plurality of video streams, wherein each of the plurality of different classifier networks includes multiple intermediate layers, determine correlation matrices of related intermediate layers of each of the plurality of the different classifier networks, and align the correlation matrices of the related intermediate layers of each of the plurality of the different classifier networks.Type: GrantFiled: February 27, 2019Date of Patent: February 23, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Hamidreza Vaezi Joze, Mahdi Abavisani
-
Patent number: 10679044Abstract: Methods, apparatuses, and computer-readable mediums for generating human action data sets are disclosed by the present disclosure. In an aspect, an apparatus may receive a set of reference images, where each of the images within the set of reference images includes a person, and a background image. The apparatus may identify body parts of the person from the set of reference image and generate a transformed skeleton image by mapping each of the body parts of the person to corresponding skeleton parts of a target skeleton. The apparatus may generate a mask of the transformed skeleton image. The apparatus may generate, using machine learning, a frame of the person formed according to the target skeleton within the background image.Type: GrantFiled: March 23, 2018Date of Patent: June 9, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Hamidreza Vaezi Joze, Ilya Zharkov, Vivek Pradeep, Mehran Khodabandeh
-
Publication number: 20200143169Abstract: Implementations described herein discloses a multi-modality video recognition system. Specifically, the multi-modality video recognition system is configured to train a plurality of classifier networks, each of the classifier network trained with a different one of the plurality of video streams, wherein each of the plurality of different classifier networks includes multiple intermediate layers, determine correlation matrices of related intermediate layers of each of the plurality of the different classifier networks, and align the correlation matrices of the related intermediate layers of each of the plurality of the different classifier networks.Type: ApplicationFiled: February 27, 2019Publication date: May 7, 2020Inventors: Hamidreza VAEZI JOZE, Mahdi ABAVISANI
-
Publication number: 20190340414Abstract: Electronic fingerprint readers are often used for security such as log-in authentication for the identification of a user for selective access to a computing system. As computing devices shrink in overall size and with downward pressure on device pricing, smaller and less expensive fingerprint readers are increasingly desired. While whole fingerprint readers have the greatest accuracy in user identification, the whole fingerprint is often not required for user identification. Often, only a portion of the user's fingerprint is required to adequately identify the user and thus a small segment fingerprint reader may be sufficient for user authentication. However, the smaller the sensing area of the small-segment fingerprint reader, the more likely that the fingerprint reader misidentifies the user or fails to collect sufficient information to identify the user. Systems and methods for improving identification accuracy of small-segment fingerprint readers are disclosed in detail herein.Type: ApplicationFiled: May 3, 2018Publication date: November 7, 2019Inventor: Hamidreza VAEZI JOZE
-
Publication number: 20190294871Abstract: Methods, apparatuses, and computer-readable mediums for generating human action data sets are disclosed by the present disclosure. In an aspect, an apparatus may receive a set of reference images, where each of the images within the set of reference images includes a person, and a background image. The apparatus may identify body parts of the person from the set of reference image and generate a transformed skeleton image by mapping each of the body parts of the person to corresponding skeleton parts of a target skeleton. The apparatus may generate a mask of the transformed skeleton image. The apparatus may generate, using machine learning, a frame of the person formed according to the target skeleton within the background image.Type: ApplicationFiled: March 23, 2018Publication date: September 26, 2019Inventors: Hamidreza VAEZI JOZE, Ilya ZHARKOV, Vivek PRADEEP, Mehran KHODABANDEH