Patents by Inventor Udi Barzelay
Udi Barzelay has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12045717Abstract: A system and method for generating hard training data from easy training data. Training data including visual data with synthetic semantic implants (“VSSI”) having at least one cue is received. An annotator identifies at least one cue in the VSSI and annotates the VSSI to indicate the cue to create a modified training data set. A data scrambler removes at least one cue from the VSSI to create the tagged training data, which can then be used to train a classifier to identify transitions between segments when the cues are not present.Type: GrantFiled: December 9, 2020Date of Patent: July 23, 2024Assignee: International Business Machines CorporationInventors: Daniel Nechemia Rotman, Yevgeny Yaroker, Udi Barzelay, Joseph Shtok
-
Patent number: 11948382Abstract: A method for synthesizing negative training data associated with training models to detect text within documents and images. The method includes one or more computer processors receiving a set of dictates associated with generating one or more negative training datasets for training a set of models to classify a plurality of features found within a data source. The method further includes identifying a set of rules related to generating negative training data to detect text based on the received set of dictates. The method further includes compiling one or more arrays of elements of hard-negative training data into a negative training data dataset based on the identified set of rules and one or more dictates. The method further includes determining metadata corresponding an array of elements of hard-negative training data.Type: GrantFiled: December 18, 2020Date of Patent: April 2, 2024Assignee: International Business Machines CorporationInventors: Ophir Azulai, Udi Barzelay
-
Patent number: 11842278Abstract: An example system includes a processor to receive an image containing an object to be detected. The processor is to detect the object in the image via a binary object detector trained via a self-supervised training on raw and unlabeled videos.Type: GrantFiled: January 26, 2023Date of Patent: December 12, 2023Assignee: International Business Machines CorporationInventors: Elad Amrani, Tal Hakim, Rami Ben-Ari, Udi Barzelay
-
Publication number: 20230343124Abstract: Described are techniques for font attribute detection. The techniques include receiving a document having different font attributes amongst a plurality of words respectively comprised of at least one character. The techniques further include generating a dense image document from the document by setting the plurality of words to a predefined size, removing blank spaces from the document, and altering an order of characters relative to the document. The techniques further include determining characteristics of the characters in the dense image document and aggregating the characteristics for at least one word. The techniques further include annotating the at least one word with a font attribute based on the aggregated characteristics.Type: ApplicationFiled: April 26, 2022Publication date: October 26, 2023Inventors: Ophir Azulai, Daniel Nechemia Rotman, Udi Barzelay
-
Patent number: 11776287Abstract: An approach to identifying text within an image may be presented. The approach can receive an image. The approach can classify an image on a pixel-by-pixel basis whether the pixel is text. The approach can generate bounding boxes around groups of pixels that are classified as text. The approach can mask sections of an image that where pixels are not classified as text. The approach may be used as a pre-processing technique for optical character recognition in documents, scanned images, or still images.Type: GrantFiled: April 27, 2021Date of Patent: October 3, 2023Assignee: International Business Machines CorporationInventors: Udi Barzelay, Ophir Azulai, Inbar Shapira
-
Patent number: 11741732Abstract: In some examples, a system for detecting text in an image includes a memory device to store a text detection model trained using images of up-scaled text, and a processor configured to perform text detection on an image to generate original bounding boxes that identify potential text in the image. The processor is also configured to generate a secondary image that includes up-scaled portions of the image associated with bounding boxes below a threshold size, and perform text detection on the secondary image to generate secondary bounding boxes that identify potential text in the secondary image. The processor is also configured to compare the original bounding boxes with the secondary bounding boxes to identify original bounding boxes that are false positives, and generate an image file that includes the original bounding boxes, wherein those original bounding boxes that are identified as false positives are removed.Type: GrantFiled: December 22, 2021Date of Patent: August 29, 2023Assignee: International Business Machines CorporationInventors: Ophir Azulai, Udi Barzelay, Oshri Pesah Naparstek
-
Publication number: 20230245481Abstract: A method, computer system, and a computer program product for text detection is provided. The present invention may include training a text detection model. The present invention may include performing text detection on an inputted image using the trained text detection model. The present invention may include determining whether at least one of a plurality of bounding boxes generated using the inputted image has an aspect ratio above a threshold. The present invention may include based upon determining that at least one of the plurality of bounding boxes generated using the inputted image has the aspect ratio above the threshold, upscaling any text within the at least one bounding box and performing text detection on a new image using the trained text detection model. The present invention may include outputting an output image.Type: ApplicationFiled: January 31, 2022Publication date: August 3, 2023Inventors: Ophir Azulai, Udi Barzelay, Oshri Pesah Naparstek
-
Publication number: 20230196807Abstract: In some examples, a system for detecting text in an image includes a memory device to store a text detection model trained using images of up-scaled text, and a processor configured to perform text detection on an image to generate original bounding boxes that identify potential text in the image. The processor is also configured to generate a secondary image that includes up-scaled portions of the image associated with bounding boxes below a threshold size, and perform text detection on the secondary image to generate secondary bounding boxes that identify potential text in the secondary image. The processor is also configured to compare the original bounding boxes with the secondary bounding boxes to identify original bounding boxes that are false positives, and generate an image file that includes the original bounding boxes, wherein those original bounding boxes that are identified as false positives are removed.Type: ApplicationFiled: December 22, 2021Publication date: June 22, 2023Inventors: Ophir AZULAI, Udi BARZELAY, Oshri Pesah NAPARSTEK
-
Publication number: 20230169344Abstract: An example system includes a processor to receive an image containing an object to be detected. The processor is to detect the object in the image via a binary object detector trained via a self-supervised training on raw and unlabeled videos.Type: ApplicationFiled: January 26, 2023Publication date: June 1, 2023Inventors: Elad AMRANI, Tal HAKIM, Rami BEN-ARI, Udi BARZELAY
-
Patent number: 11636385Abstract: An example system includes a processor to receive raw and unlabeled videos. The processor is to extract speech from the raw and unlabeled videos. The processor is to extract positive frames and negative frames from the raw and unlabeled videos based on the extracted speech for each object to be detected. The processor is to extract region proposals from the positive frames and negative frames. The processor is to extract features based on the extracted region proposals. The processor is to cluster the region proposals and assign a potential score to each cluster. The processor is to train a binary object detector to detect objects based on positive samples randomly selected based on the potential score.Type: GrantFiled: November 4, 2019Date of Patent: April 25, 2023Assignee: International Business Machines CorporationInventors: Elad Amrani, Udi Barzelay, Rami Ben-Ari, Tal Hakim
-
Publication number: 20220343103Abstract: An approach to identifying text within an image may be presented. The approach can receive an image. The approach can classify an image on a pixel-by-pixel basis whether the pixel is text. The approach can generate bounding boxes around groups of pixels that are classified as text. The approach can mask sections of an image that where pixels are not classified as text. The approach may be used as a pre-processing technique for optical character recognition in documents, scanned images, or still images.Type: ApplicationFiled: April 27, 2021Publication date: October 27, 2022Inventors: Udi Barzelay, Ophir Azulai, Inbar Shapira
-
Publication number: 20220318555Abstract: Approaches presented herein enable action recognition. More specifically, a plurality of video segments having one or more action representations is received. One or more sub-action representations in the plurality of video segments are learned. An embedding in a space of a distance metric learning (DML) network for each of the one or more sub-action representations is determined. A set of respective trajectory distances between each of the one or more sub-action representations and one or more class representatives in the space of the DML network based on the embedding is computed, and the one or more action representations based on the set of respective trajectory distances are classified.Type: ApplicationFiled: March 31, 2021Publication date: October 6, 2022Inventors: Rami Ben-Ari, Ophir Azulai, Udi Barzelay, Mor Shpigel Nacson
-
Patent number: 11450111Abstract: A video scene detection machine learning model is provided. A computer device receives feature vectors corresponding to audio and video components of a video. The computing device provides the feature vectors as input to a trained neural network. The computing device receives from the trained neural network, a plurality of output feature vectors that correspond to shots of the video. The computing device applies optimal sequence grouping to the output feature vectors. The computing device further trains the trained neural network based, at least in part, on the applied optimal sequence grouping.Type: GrantFiled: August 27, 2020Date of Patent: September 20, 2022Assignee: International Business Machines CorporationInventors: Daniel Nechemia Rotman, Rami Ben-Ari, Udi Barzelay
-
Patent number: 11416757Abstract: An example system includes a processor to receive input data comprising noisy positive data and clean negative data. The processor is to cluster the input data. The processor is to compute a potential score for each cluster of the clustered input data. The processor is to iteratively refine cluster quality of the clusters using the potential scores of the clusters as weights. The processor is to train a classifier by sampling the negative dataset uniformly and the positive set in a non-uniform manner based on the potential score.Type: GrantFiled: November 4, 2019Date of Patent: August 16, 2022Assignee: International Business Machines CorporationInventors: Elad Amrani, Udi Barzelay, Rami Ben-Ari, Tal Hakim
-
Publication number: 20220198186Abstract: A method for synthesizing negative training data associated with training models to detect text within documents and images. The method includes one or more computer processors receiving a set of dictates associated with generating one or more negative training datasets for training a set of models to classify a plurality of features found within a data source. The method further includes identifying a set of rules related to generating negative training data to detect text based on the received set of dictates. The method further includes compiling one or more arrays of elements of hard-negative training data into a negative training data dataset based on the identified set of rules and one or more dictates. The method further includes determining metadata corresponding an array of elements of hard-negative training data.Type: ApplicationFiled: December 18, 2020Publication date: June 23, 2022Inventors: Ophir Azulai, Udi Barzelay
-
Publication number: 20220180182Abstract: A system and method for generating hard training data from easy training data. Training data including visual data with synthetic semantic implants (“VSSI”) having at least one cue is received. An annotator identifies at least one cue in the VSSI and annotates the VSSI to indicate the cue to create a modified training data set. A data scrambler removes at least one cue from the VSSI to create the tagged training data, which can then be used to train a classifier to identify transitions between segments when the cues are not present.Type: ApplicationFiled: December 9, 2020Publication date: June 9, 2022Inventors: Daniel Nechemia Rotman, Yevgeny Yaroker, Udi Barzelay, Joseph Shtok
-
Publication number: 20220067386Abstract: A video scene detection machine learning model is provided. A computer device receives feature vectors corresponding to audio and video components of a video. The computing device provides the feature vectors as input to a trained neural network. The computing device receives from the trained neural network, a plurality of output feature vectors that correspond to shots of the video. The computing device applies optimal sequence grouping to the output feature vectors. The computing device further trains the trained neural network based, at least in part, on the applied optimal sequence grouping.Type: ApplicationFiled: August 27, 2020Publication date: March 3, 2022Inventors: Daniel Nechemia Rotman, Rami Ben-Ari, Udi Barzelay
-
Publication number: 20220067546Abstract: An example system includes a processor to learn a shared embedding space on unlabeled videos using speech visual correspondence. The processor can learn a number of additional embeddings including a question plus video embedding and an answer embedding using the shared embedding space to generate a trained visual question answering model. The processor can execute a visual question answering based on the trained visual question answering model.Type: ApplicationFiled: August 31, 2020Publication date: March 3, 2022Inventors: Elad Amrani, Rami Ben-Ari, Daniel Nechemia Rotman, Udi Barzelay
-
Publication number: 20220044105Abstract: An example system includes a processor to receive unannotated multimodal data. The processor can estimate a probability an associated pair of different modalities in the unannotated multimodal data to be correctly associated using a multimodal similarity function and a local density estimation. The processor can also train a multimodal representation learning model on the unannotated multimodal data using the estimated probability as a weight for the associated pair in a loss function.Type: ApplicationFiled: August 4, 2020Publication date: February 10, 2022Inventors: Elad Amrani, Rami Ben-Ari, Daniel Nechemia Rotman, Udi Barzelay
-
Patent number: 11164005Abstract: Embodiments may provide techniques that provide identification of images that can provide reduced resource utilization due to reduced sampling of video frames for visual recognition. For example, in an embodiment, a method of visual recognition processing may be implemented in a computer system comprising a processor, memory accessible by the processor, and computer program instructions stored in the memory and executable by the processor, the method comprising: coarsely segmenting video frames of video stream into a plurality of clusters based on scenes of the video stream, sampling a plurality of video frames from each cluster; determining a quality of each cluster, re-clustering the video frames of video stream to improve the quality of at least some of the clusters.Type: GrantFiled: April 12, 2020Date of Patent: November 2, 2021Assignee: International Business Machines CorporationInventors: Yevgeny Burshtein, Daniel Nechemia Rotman, Dror Porat, Udi Barzelay