Patents by Inventor Yonit Hoffman

Yonit Hoffman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Text detection in videos

Patent number: 12374138

Abstract: Systems and methods for detecting text in videos. To address problems with conventional Optical Character Recognition (OCR) systems, the present disclosure provides detection of text for improved OCR. Aspects of the present disclosure can, therefore, be utilized to detect a textual logo in videos, including when the text of the textual logo is clearly visible and when the text is inferred. Thus, examples capture appearance time of a textual logo from a video view perspective. Aspects use a multi-threshold pipeline for detecting video frames including the textual logo. A textual-visual scoring system is additionally used to leverage visual aspects of text in logos. A shot detection system is used to detect inferred text beyond a detected video frame. One or more verification models can be further applied.

Type: Grant

Filed: December 16, 2022

Date of Patent: July 29, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eliyahu Strugo, Yonit Hoffman
CLUSTERING-BASED RECOGNITION OF TEXT IN VIDEOS

Publication number: 20250131035

Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.

Type: Application

Filed: October 9, 2024

Publication date: April 24, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yonit HOFFMAN, Maayan YEDIDIA, Avner LEVI
Audio event detection with window-based prediction

Patent number: 12272377

Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

Type: Grant

Filed: March 5, 2024

Date of Patent: April 8, 2025

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
Combining visual and audio insights to detect opening scenes in multimedia files

Patent number: 12266175

Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.

Type: Grant

Filed: December 29, 2022

Date of Patent: April 1, 2025

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Mordechai Kadosh, Zvi Figov, Eliyahu Strugo, Mattan Serry, Michael Ben-Haym
Two-Stage Suppression for Multi-Class, Multi-Object Detection and Tracking Systems

Publication number: 20250095319

Abstract: The technology relates to methods and systems for performing two-stage suppression of bounding boxes generated during object detection techniques for digital images. The two-stage suppression includes a per-class suppression stage and a class-agnostic suppression stage. In an example method, preliminary bounding boxes are generated for multiple objects in a digital image. A first subset of bounding boxes is selected by performing a per-class suppression of the preliminary bounding boxes. A second subset of bounding boxes is selected by performing a class-agnostic suppression of the first subset of bounding boxes. Based on the second subset of bounding boxes, at least one of an enriched image or a video index is generated.

Type: Application

Filed: March 12, 2024

Publication date: March 20, 2025

Applicant: Microsoft Technology Licensing, LLC

Inventors: Shay AMRAM, Moti KADOSH, Yonit HOFFMAN, Zvi FIGOV
TRACK AWARE DETECTION FOR OBJECT TRACKING SYSTEMS

Publication number: 20250095161

Abstract: Examples of the present disclosure describe systems and methods for track aware object detection. In examples, image content comprising one or more objects is received. Frames in the image content are identified. Candidate bounding boxes are created around objects to be tracked in the frames and a confidence score is assigned to each candidate bounding box. The candidate bounding boxes for each object are compared to a predicted bounding box that is generated based on a current track for the object. Candidate bounding boxes that are determined to be similar to the predicted bounding box and/or that exceed a confidence score threshold are selected. The selected candidate bounding boxes are filtered until a single candidate bounding box that is most representative of each object to be tracked remains. The frame comprising the representative bounding box for each object is then added to a current track for the object.

Type: Application

Filed: December 29, 2023

Publication date: March 20, 2025

Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Shay AMRAM, Zvi FIGOV, Moti KADOSH, Yonit HOFFMAN
OBJECT TRACKING WITH SHOT TRANSITION DETECTION AND DYNAMIC QUEUE RESIZING

Publication number: 20240420342

Abstract: An object tracking tool integrates scene transition detection and/or dynamic queue resizing. By integrating shot transition detection, the object tracking tool can change which operations are performed depending on whether a shot transition has been detected. For example, if a shot transition is not detected, lower-complexity interpolation operations can be performed to determine spatial information for objects, instead of using higher-complexity object detection operations, which can reduce computational complexity. As another example, depending on whether a shot transition has been detected, the object tracking tool can adjust operations performed when associating identifiers with objects, which can improve accuracy of object tracking operations. With dynamic queue resizing, an object tracking tool can selectively adjust the maximum size of a queue used to store frames for object tracking.

Type: Application

Filed: June 13, 2023

Publication date: December 19, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Zvi FIGOV, Yonit HOFFMAN, Moti KADOSH
TEXTLESS MATERIAL SCENE MATCHING IN VIDEOS

Publication number: 20240420469

Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.

Type: Application

Filed: April 26, 2024

Publication date: December 19, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
Clustering-based recognition of text in videos

Patent number: 12141200

Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.

Type: Grant

Filed: May 27, 2022

Date of Patent: November 12, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Maayan Yedidia, Avner Levi
GENERATING SUMMARY PROMPTS WITH VISUAL AND AUDIO INSIGHTS AND USING SUMMARY PROMPTS TO OBTAIN MULTIMEDIA CONTENT SUMMARIES

Publication number: 20240370661

Abstract: Multimedia content is summarized with the use of summary prompts that are created with audio and visual insights obtained from the multimedia content. An aggregated timeline temporally aligns the audio and visual insights. The aggregated timeline is segmented into coherent segments that each include a unique combination of audio and visual insights. These segments are grouped into chunks, based on prompt size constraints, and are used with identified summarization styles to create the summary prompts. The summary prompts are provided to summarization models to obtain summaries having content and summarization styles based on the summary prompts.

Type: Application

Filed: June 9, 2023

Publication date: November 7, 2024

Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA DOTAN, Oron NIR
AUDIO EVENT DETECTION WITH WINDOW-BASED PREDICTION

Publication number: 20240363139

Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

Type: Application

Filed: March 5, 2024

Publication date: October 31, 2024

Inventors: Lihi Ahuva SHILOH PERL, Ben FISHMAN, Gilad PUNDAK, Yonit HOFFMAN
NATURAL LANGUAGE PROCESSING BASED DOMINANT ITEM DETECTION IN VIDEOS

Publication number: 20240303442

Abstract: Systems and method are provided for extracting and processing terms referenced in multimedia content with the use of different term extraction models to determine the relevance of categories to the referenced terms and to rank the categories by relative dominance for the multimedia content. The most dominant category for the multimedia content and/or particular segment(s) of the multimedia content can then be identified and used to link to supplemental content and/or to identify multimedia content related to topics of interest.

Type: Application

Filed: May 17, 2023

Publication date: September 12, 2024

Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA
COMBINING VISUAL AND AUDIO INSIGHTS TO DETECT OPENING SCENES IN MULTIMEDIA FILES

Publication number: 20240221379

Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.

Type: Application

Filed: December 29, 2022

Publication date: July 4, 2024

Inventors: Yonit HOFFMAN, Mordechai KADOSH, Zvi FIGOV, Eliyahu STRUGO, Mattan SERRY, Michael BEN-HAYM
Detecting prominence of objects in video information

Patent number: 12026200

Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.

Type: Grant

Filed: July 14, 2022

Date of Patent: July 2, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Tom Hirshberg, Maayan Yedidia, Zvi Figov
Dynamic detection and recognition of media subjects

Patent number: 12020483

Abstract: A system for indexing animated content receives detections extracted from a media file, where each one of the detections includes an image extracted from a corresponding frame of the media file that corresponds to a detected instance of an animated character. The system determines, for each of the received detections, an embedding defining a set of characteristics for the detected instance. The embedding associated with each detection is provided to a grouping engine that is configured to dynamically configure at least one grouping parameter based on a total number of the detections received. The grouping engine is also configured to sort the detections into groups using the grouping parameter and the embedding for each detection. A character ID is assigned to each one of the groups of detections, and the system indexes the groups of detections in a database in association with the character ID assigned to each group.

Type: Grant

Filed: August 26, 2022

Date of Patent: June 25, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yonit Hoffman, Irit Ofer, Avner Levi, Haim Sabo, Reut Amior
TEXT DETECTION IN VIDEOS

Publication number: 20240203146

Abstract: Systems and methods for detecting text in videos. To address problems with conventional Optical Character Recognition (OCR) systems, the present disclosure provides detection of text for improved OCR. Aspects of the present disclosure can, therefore, be utilized to detect a textual logo in videos, including when the text of the textual logo is clearly visible and when the text is inferred. Thus, examples capture appearance time of a textual logo from a video view perspective. Aspects use a multi-threshold pipeline for detecting video frames including the textual logo. A textual-visual scoring system is additionally used to leverage visual aspects of text in logos. A shot detection system is used to detect inferred text beyond a detected video frame. One or more verification models can be further applied.

Type: Application

Filed: December 16, 2022

Publication date: June 20, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Eliyahu Strugo, Yonit Hoffman
Textless material scene matching in videos

Patent number: 11995892

Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.

Type: Grant

Filed: May 26, 2022

Date of Patent: May 28, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Mattan Serry, Zvi Figov, Yonit Hoffman, Maayan Yedidia
Audio event detection with window-based prediction

Patent number: 11948599

Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.

Type: Grant

Filed: January 6, 2022

Date of Patent: April 2, 2024

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
Detecting Prominence of Objects in Video Information

Publication number: 20240020338

Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.

Type: Application

Filed: July 14, 2022

Publication date: January 18, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Yonit HOFFMAN, Tom HIRSHBERG, Maayan YEDIDIA, Zvi FIGOV
Systems and Methods for Video Genre Classification

Publication number: 20230419663

Abstract: Examples of the present disclosure describe systems and methods for video genre classification. In one example implementation, video content is received. A plurality of sliding windows of the video content is sampled. The plurality of sliding windows comprises audio data and video data. The audio data is analyzed to identify a set of audio features. The video data is analyzed to identify a set of video features. The set of audio features and the set of video features is provided to a classifier. The classifier is configured to detect a genre for the video content using the set of audio features and the set of video features. The video content is indexed based on the genre.

Type: Application

Filed: June 27, 2022

Publication date: December 28, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Oron NIR, Mattan SERRY, Yonit HOFFMAN, Michael BEN-HAYM, Zvi FIGOV, Eliyahu STRUGO, Avi NEEMAN

1 2 next