Patents by Inventor Yonit Hoffman

Yonit Hoffman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250131035
    Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.
    Type: Application
    Filed: October 9, 2024
    Publication date: April 24, 2025
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yonit HOFFMAN, Maayan YEDIDIA, Avner LEVI
  • Patent number: 12272377
    Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.
    Type: Grant
    Filed: March 5, 2024
    Date of Patent: April 8, 2025
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
  • Patent number: 12266175
    Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.
    Type: Grant
    Filed: December 29, 2022
    Date of Patent: April 1, 2025
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yonit Hoffman, Mordechai Kadosh, Zvi Figov, Eliyahu Strugo, Mattan Serry, Michael Ben-Haym
  • Publication number: 20250095319
    Abstract: The technology relates to methods and systems for performing two-stage suppression of bounding boxes generated during object detection techniques for digital images. The two-stage suppression includes a per-class suppression stage and a class-agnostic suppression stage. In an example method, preliminary bounding boxes are generated for multiple objects in a digital image. A first subset of bounding boxes is selected by performing a per-class suppression of the preliminary bounding boxes. A second subset of bounding boxes is selected by performing a class-agnostic suppression of the first subset of bounding boxes. Based on the second subset of bounding boxes, at least one of an enriched image or a video index is generated.
    Type: Application
    Filed: March 12, 2024
    Publication date: March 20, 2025
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Shay AMRAM, Moti KADOSH, Yonit HOFFMAN, Zvi FIGOV
  • Publication number: 20250095161
    Abstract: Examples of the present disclosure describe systems and methods for track aware object detection. In examples, image content comprising one or more objects is received. Frames in the image content are identified. Candidate bounding boxes are created around objects to be tracked in the frames and a confidence score is assigned to each candidate bounding box. The candidate bounding boxes for each object are compared to a predicted bounding box that is generated based on a current track for the object. Candidate bounding boxes that are determined to be similar to the predicted bounding box and/or that exceed a confidence score threshold are selected. The selected candidate bounding boxes are filtered until a single candidate bounding box that is most representative of each object to be tracked remains. The frame comprising the representative bounding box for each object is then added to a current track for the object.
    Type: Application
    Filed: December 29, 2023
    Publication date: March 20, 2025
    Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Shay AMRAM, Zvi FIGOV, Moti KADOSH, Yonit HOFFMAN
  • Publication number: 20240420342
    Abstract: An object tracking tool integrates scene transition detection and/or dynamic queue resizing. By integrating shot transition detection, the object tracking tool can change which operations are performed depending on whether a shot transition has been detected. For example, if a shot transition is not detected, lower-complexity interpolation operations can be performed to determine spatial information for objects, instead of using higher-complexity object detection operations, which can reduce computational complexity. As another example, depending on whether a shot transition has been detected, the object tracking tool can adjust operations performed when associating identifiers with objects, which can improve accuracy of object tracking operations. With dynamic queue resizing, an object tracking tool can selectively adjust the maximum size of a queue used to store frames for object tracking.
    Type: Application
    Filed: June 13, 2023
    Publication date: December 19, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Zvi FIGOV, Yonit HOFFMAN, Moti KADOSH
  • Publication number: 20240420469
    Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.
    Type: Application
    Filed: April 26, 2024
    Publication date: December 19, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA
  • Patent number: 12141200
    Abstract: Systems and methods for spatial-textual clustering-based recognition of text in videos are disclosed. A method includes performing textual clustering on a first subset of a set of predictions that correspond to numeric characters only and performing spatial-textual clustering on a second subset of the set of predictions that correspond to alphabetical characters only. The method includes, for each cluster of predictions associated with the first subset of the set of predictions, choosing a first cluster representative to correct any errors in each cluster of predictions associated with the first subset of the set of predictions and outputting any recognized numeric characters. The method includes, for each cluster of predictions associated with the second subset of the set of predictions, choosing a second cluster representative to correct any errors in each cluster of predictions associated with the second subset of the set of predictions and outputting any recognized alphabetical characters.
    Type: Grant
    Filed: May 27, 2022
    Date of Patent: November 12, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yonit Hoffman, Maayan Yedidia, Avner Levi
  • Publication number: 20240370661
    Abstract: Multimedia content is summarized with the use of summary prompts that are created with audio and visual insights obtained from the multimedia content. An aggregated timeline temporally aligns the audio and visual insights. The aggregated timeline is segmented into coherent segments that each include a unique combination of audio and visual insights. These segments are grouped into chunks, based on prompt size constraints, and are used with identified summarization styles to create the summary prompts. The summary prompts are provided to summarization models to obtain summaries having content and summarization styles based on the summary prompts.
    Type: Application
    Filed: June 9, 2023
    Publication date: November 7, 2024
    Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA DOTAN, Oron NIR
  • Publication number: 20240363139
    Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.
    Type: Application
    Filed: March 5, 2024
    Publication date: October 31, 2024
    Inventors: Lihi Ahuva SHILOH PERL, Ben FISHMAN, Gilad PUNDAK, Yonit HOFFMAN
  • Publication number: 20240303442
    Abstract: Systems and method are provided for extracting and processing terms referenced in multimedia content with the use of different term extraction models to determine the relevance of categories to the referenced terms and to rank the categories by relative dominance for the multimedia content. The most dominant category for the multimedia content and/or particular segment(s) of the multimedia content can then be identified and used to link to supplemental content and/or to identify multimedia content related to topics of interest.
    Type: Application
    Filed: May 17, 2023
    Publication date: September 12, 2024
    Inventors: Tom HIRSHBERG, Yonit HOFFMAN, Zvi FIGOV, Maayan YEDIDIA
  • Publication number: 20240221379
    Abstract: Disclosed is a method for automatically detecting an introduction/opening song within a multimedia file. The method includes designating sequential blocks of time in the multimedia file as scene(s) and detecting certain feature(s) associated with each scene. The extracted scene feature(s) may be analyzed and used to assign a probability to each scene that the scene is part of the introduction/opening song. The probabilities may be used to classify each scene as either correlating to or not correlating to, the introduction/opening song. The temporal location of the opening song may be saved as index data associated with the multimedia file.
    Type: Application
    Filed: December 29, 2022
    Publication date: July 4, 2024
    Inventors: Yonit HOFFMAN, Mordechai KADOSH, Zvi FIGOV, Eliyahu STRUGO, Mattan SERRY, Michael BEN-HAYM
  • Patent number: 12026200
    Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.
    Type: Grant
    Filed: July 14, 2022
    Date of Patent: July 2, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yonit Hoffman, Tom Hirshberg, Maayan Yedidia, Zvi Figov
  • Patent number: 12020483
    Abstract: A system for indexing animated content receives detections extracted from a media file, where each one of the detections includes an image extracted from a corresponding frame of the media file that corresponds to a detected instance of an animated character. The system determines, for each of the received detections, an embedding defining a set of characteristics for the detected instance. The embedding associated with each detection is provided to a grouping engine that is configured to dynamically configure at least one grouping parameter based on a total number of the detections received. The grouping engine is also configured to sort the detections into groups using the grouping parameter and the embedding for each detection. A character ID is assigned to each one of the groups of detections, and the system indexes the groups of detections in a database in association with the character ID assigned to each group.
    Type: Grant
    Filed: August 26, 2022
    Date of Patent: June 25, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yonit Hoffman, Irit Ofer, Avner Levi, Haim Sabo, Reut Amior
  • Publication number: 20240203146
    Abstract: Systems and methods for detecting text in videos. To address problems with conventional Optical Character Recognition (OCR) systems, the present disclosure provides detection of text for improved OCR. Aspects of the present disclosure can, therefore, be utilized to detect a textual logo in videos, including when the text of the textual logo is clearly visible and when the text is inferred. Thus, examples capture appearance time of a textual logo from a video view perspective. Aspects use a multi-threshold pipeline for detecting video frames including the textual logo. A textual-visual scoring system is additionally used to leverage visual aspects of text in logos. A shot detection system is used to detect inferred text beyond a detected video frame. One or more verification models can be further applied.
    Type: Application
    Filed: December 16, 2022
    Publication date: June 20, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Eliyahu Strugo, Yonit Hoffman
  • Patent number: 11995892
    Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.
    Type: Grant
    Filed: May 26, 2022
    Date of Patent: May 28, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Mattan Serry, Zvi Figov, Yonit Hoffman, Maayan Yedidia
  • Patent number: 11948599
    Abstract: A computing system for a plurality of classes of audio events is provided, including one or more processors configured to divide a run-time audio signal into a plurality of segments and process each segment of the run-time audio signal in a time domain to generate a normalized time domain representation of each segment. The processor is further configured to feed the normalized time domain representation of each segment to an input layer of a trained neural network. The processor is further configured to generate, by the neural network, a plurality of predicted classification scores and associated probabilities for each class of audio event contained in each segment of the run-time input audio signal. In post-processing, the processor is further configured to generate smoothed predicted classification scores, associated smoothed probabilities, and class window confidence values for each class for each of a plurality of candidate window sizes.
    Type: Grant
    Filed: January 6, 2022
    Date of Patent: April 2, 2024
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Lihi Ahuva Shiloh Perl, Ben Fishman, Gilad Pundak, Yonit Hoffman
  • Publication number: 20240020338
    Abstract: A video-processing technique uses machine-trained logic to detect and track people that appear in video information. The technique then ranks the prominence of these people in the video information, to produce ranking information. The prominence of a person reflects a level of importance of the person in the video information, corresponding to the capacity of the person to draw the attention of a viewer. For instance, the prominence of the person reflects, at least in part, an extent to which the person appears in the video information. The technique performs its ranking based on person-specific feature information. The technique produces each instance of person-specific feature information by accumulating features pertaining to a particular person. One or more application systems make use of the ranking information to control the presentation of the video information.
    Type: Application
    Filed: July 14, 2022
    Publication date: January 18, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Yonit HOFFMAN, Tom HIRSHBERG, Maayan YEDIDIA, Zvi FIGOV
  • Publication number: 20230419663
    Abstract: Examples of the present disclosure describe systems and methods for video genre classification. In one example implementation, video content is received. A plurality of sliding windows of the video content is sampled. The plurality of sliding windows comprises audio data and video data. The audio data is analyzed to identify a set of audio features. The video data is analyzed to identify a set of video features. The set of audio features and the set of video features is provided to a classifier. The classifier is configured to detect a genre for the video content using the set of audio features and the set of video features. The video content is indexed based on the genre.
    Type: Application
    Filed: June 27, 2022
    Publication date: December 28, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Oron NIR, Mattan SERRY, Yonit HOFFMAN, Michael BEN-HAYM, Zvi FIGOV, Eliyahu STRUGO, Avi NEEMAN
  • Publication number: 20230316753
    Abstract: Systems, methods, and a computer-readable medium are provided for matching textless elements to texted elements in video content. A video processing system including a textless matching system may divide a video into shots, identify shots having similar durations, identify sequences of shots having similar durations, and compare image content in representative frames of the sequences to determine whether the sequences match. When the sequences are determined to match, the sequences may be paired, wherein the first sequence may include shots with overlaid text and the second sequence may include textless version of corresponding texted shots included in the first sequence. In some examples, the video processing system may further replace the determined corresponding texted shots.
    Type: Application
    Filed: May 26, 2022
    Publication date: October 5, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Mattan SERRY, Zvi FIGOV, Yonit HOFFMAN, Maayan YEDIDIA