Patents by Inventor Ratin Kumar

Ratin Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20260100041
    Abstract: In various examples, multiple sensors of an ego-machine may be used to generate corresponding streams of sensor data, and a multi-modal language model may be used to select a stream from among multiple streams to evaluate for a given detection task. Taking image data generated using multiple exterior cameras as an example, a vision language model (VLM) may be prompted to evaluate an image from each camera (e.g., resized to a designated resolution supported by the VLM) and identify a relevant camera for a designated task. The VLM may be prompted at a designated frame rate, and once the VLM identifies a camera or corresponding video stream, the VLM may be prompted with a (e.g., subsequent, higher resolution) frame of image data from the identified camera to perform one or more subsequent tasks (e.g., a designated detection task, identify an ROI within the higher resolution frame of image data, etc.).
    Type: Application
    Filed: October 3, 2024
    Publication date: April 9, 2026
    Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
  • Publication number: 20260100042
    Abstract: In various examples, a multi-modal language model such as a vision language model (VLM) may be iteratively prompted to identify and/or refine a region of interest (ROI) to evaluate for a designated detection task. For example, an initial prompt may broadly focus the VLM on identifying one or more ROIs within an image. After identifying an initial ROI, the VLM may be prompted to refine the ROI, for example, by prompting the VLM to evaluate the initial ROI and identify an ROI with more specific content than the initial prompt did, or by evaluating successive frames generated over time until a measure of confidence that an identified ROI contains the designated content meets a designate threshold, upon which, the multi-modal language model may be prompted to perform the detection task on the identified ROI.
    Type: Application
    Filed: October 3, 2024
    Publication date: April 9, 2026
    Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
  • Publication number: 20260097781
    Abstract: In various examples, a multi-modal language model may be split up and hosted by multiple devices. For example, a modality (e.g., vision, audio) encoder and/or projector of the multi-modal language model (e.g., a vision language model) may be hosted on one device (e.g., an in-vehicle SoC) that encodes raw sensor data into corresponding tokens and streams the tokens to a second device (e.g., an external graphic processing unit (GPU) or artificial intelligence (AI) accelerator) that hosts an inference server and a language model (LM) of the multi-modal language model. The LM may return a response indicating the result(s) of the requested detection task, and the response may be used to take some responsive action (e.g., control one or more operations of an ego-machine).
    Type: Application
    Filed: October 3, 2024
    Publication date: April 9, 2026
    Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
  • Patent number: 12597266
    Abstract: In various examples, low power proximity based threat detection using optical flow for vehicle systems and applications are provided. Some embodiments may use a tiered framework that uses sensor fusion techniques to detect and track the movement of a threat candidate, and perform a threat classification and/or intent prediction as the threat candidate approaches approach. Relative depth indications from optical flow, computed using data from image sensors, can be used to initially segment and track a moving object over a sequence of image frames. Additional sensors and processing may be brought online when a moving object becomes close enough to be considered a higher risk threat candidate. A threat response system may generate a risk score based on a predicted intent of a threat candidate, and when the risk score exceeds a certain threshold, then the threat response system may respond accordingly based on the threat classification and/or risk score.
    Type: Grant
    Filed: June 1, 2023
    Date of Patent: April 7, 2026
    Assignee: NVIDIA Corporation
    Inventors: Shagan Sah, Niranjan Avadhanam, Rajath Shetty, Ratin Kumar, Yile Chen
  • Publication number: 20260023553
    Abstract: In various examples, a technique for resolving a standards violation includes receiving a violation notification of a standards violation detected in a software codebase. The technique also includes determining additional information relevant to the standards violation and included in one or more information sources. The technique further includes generating a prompt based at least on the violation notification and the additional information, generating, using a machine learning model and based at least on the prompt, one or more corrective suggestions associated with the standards violation, and modifying the software codebase based at least on the one or more software code changes.
    Type: Application
    Filed: July 17, 2024
    Publication date: January 22, 2026
    Inventors: Niral Lalit PATHAK, Rajath Bellipady SHETTY, Chandana NEERUKONDA, Ratin KUMAR
  • Patent number: 12487581
    Abstract: Approaches provide for performance of a complex (e.g., compound) task that may involve multiple discrete tasks not obvious from an instruction to perform the complex task. A set of conditions for an environment can be determined using captured image data, and the instruction analyzed to determine a set of final conditions to exist in the environment after performance of the instruction. These initial and end conditions are used to determine a sequence of discrete tasks to be performed to cause a robot or automated device to perform the instruction. This can involve use of a symbolic or visual planner in at least some embodiments, as well as a search of possible sequences of actions available for the robot or automated device. A robot can be caused to perform the sequence of discrete tasks, and feedback provided such that the sequence of tasks can be modified as appropriate.
    Type: Grant
    Filed: March 17, 2022
    Date of Patent: December 2, 2025
    Assignee: Nvidia Corporation
    Inventors: Christopher Jason Paxton, Shagan Sah, Ratin Kumar, Dieter Fox
  • Publication number: 20250292687
    Abstract: A vision language model (VLM) may be used to evaluate signs that designate restricted or toll lanes, determine whether it is permissible (and/or the cost) to merge into a restricted or toll lane, and/or determine when to merge out of a restricted or toll lane based on the cost. Frames from one or more (e.g., front-facing) camera(s) may be evaluated for applicable signs (e.g., using a sign recognition DNN or a VLM). If detected, the (e.g., cropped) image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether to drive in the restricted or toll lane (e.g., whether it can be taken within budget) and/or what the cost would be. The generated response may be provided to an ADAS to trigger an initiation of a merge left or right or a determination to stay in the current lane.
    Type: Application
    Filed: August 1, 2024
    Publication date: September 18, 2025
    Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
  • Publication number: 20250292595
    Abstract: Some embodiments relate to driver or occupant monitoring using vision language models (VLMs). Any number of DNNs in a detection pipeline may be replaced with a VLM, and the VLM may be prompted to determine whether a corresponding feature is present in an image or sampled frames from a video. To facilitate using the VLM(s) to control one or more downstream actions, the VLM(s) may be prompted using structured inputs, and a designated output format for a corresponding structured output may be enforced in any suitable manner. As such, any number of VLMs may be used to perform any number of driver and/or occupant monitoring tasks (e.g., driver drowsiness detection, driver distraction detection, driver or occupant out-of-position detection, driver or occupant identification, seatbelt usage detection, occupant presence detection, occupant classification, child presence detection, gesture recognition, occlusion detection, and/or others).
    Type: Application
    Filed: August 1, 2024
    Publication date: September 18, 2025
    Inventors: Chandana Neerukonda, Niral Lalit Pathak, Rajath Bellipady Shetty, Ratin Kumar, Niranjan Avadhanam
  • Publication number: 20250292557
    Abstract: In some embodiments, the same vision language model (VLM) may be used to support different types of detection tasks (e.g., one foundational VLM supporting some or all detection tasks performed by an ego-machine, one VLM for interior sensing tasks and one for exterior sensing tasks, etc.), and an inference scheduler may be used to serve or handle inference requests for the VLM(s) to perform the different tasks. In some embodiments, the scheduler prioritizes inference requests based on safety (e.g., prioritizing inference requests to perform ADAS tasks such as pedestrian detection, bicycle detection, or trajectory planning over requests to perform driver or occupant monitoring tasks, prioritizing exterior sensing tasks over interior sensing tasks, etc.). As such, the scheduler may queue, manage, distribute inference requests from different detection applications to the VLM(s), and receive and return responses to corresponding detection task managers.
    Type: Application
    Filed: August 1, 2024
    Publication date: September 18, 2025
    Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
  • Publication number: 20250289456
    Abstract: Some embodiments relate to environmental text perception using vision language models (VLMs). For example, an Advanced Driver Assistance System (ADAS) may identify candidate parking spaces, and a VLM may be used to evaluate parking signs and determine whether it is permissible and/or the cost to park in a candidate parking space. For example, frames from corresponding (e.g., front-facing, repeater, side pillar) camera(s) may be evaluated for corresponding parking signs (e.g., using a sign recognition DNN or a VLM). If a parking sign is detected, the image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether it is permissible to park at a corresponding location (and if so, the cost). The generated response may be provided to the ADAS to confirm or invalidate the candidate parking space, and a representation of the results may be provided to the driver.
    Type: Application
    Filed: August 1, 2024
    Publication date: September 18, 2025
    Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
  • Publication number: 20250222958
    Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.
    Type: Application
    Filed: January 8, 2024
    Publication date: July 10, 2025
    Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
  • Publication number: 20250170958
    Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.
    Type: Application
    Filed: January 27, 2025
    Publication date: May 29, 2025
    Inventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker
  • Publication number: 20250136130
    Abstract: Various embodiments of the present disclosure relate to operator assistance based on extracting natural language characters from one or more sensed objects. For instance, particular embodiments may generate a natural language utterance based on extracting natural language text in a nearby traffic sign. In an illustrative example, particular embodiments may detect, via object detection and within image data, one or more regions of the image data depicting the traffic sign. Particular embodiments can then extract one or more first natural language characters represented in the traffic sign based at least on performing optical character recognition within the one or more regions of the image data in response to detecting the one or more regions of the image data depicting the traffic sign.
    Type: Application
    Filed: November 1, 2023
    Publication date: May 1, 2025
    Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
  • Publication number: 20250136134
    Abstract: Various embodiments of the present disclosure relate to operator assistance based on operator monitoring. For instance, during long drives, a driver may become drowsy or may not otherwise be alert. As such, particular embodiments have the capability of starting a conversation with the driver based on driver interests and/or detecting that the driver is getting drowsy. In an illustrative example, a Driver Monitoring System (DMS) camera of a vehicle may employ a component that derives pixel-level information showing head nodding, hands dropping, or the like. Based on image pattern characteristics in the image data, particular embodiments generate a score representing an alertness level. A representation of the alertness level can be provided as input to a machine learning model so that the model may generate a suitable natural language or other response, such as starting a conversation with personalized trivia, sending a control signal to honk a horn, or the like.
    Type: Application
    Filed: November 1, 2023
    Publication date: May 1, 2025
    Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
  • Publication number: 20250065844
    Abstract: In various examples, infrared image data may be used to detect a subcutaneous characteristic(s) (e.g., a palm vein topology) of a person (e.g., a person requesting entry to a vehicle, a vehicle occupant) and authenticate the user based on the detected subcutaneous characteristic(s). For example, infrared image data representing one or more acquired subcutaneous characteristics (e.g., a topology of veins and/or other blood vessels in a region of the authenticating user's palm, hand, neck, forearm, face, fingertip, eye, etc.) may be generated. Hand and/or palm detection may be applied to detect a region depicting the user's hand or palm, and that region (or some subset thereof) may be segmented to generate a representation of an acquired vein topology. The acquired vein topology may be compared with one or more reference vein topologies stored in a database to determine whether the acquired vein topology matches one of the reference vein topologies.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 27, 2025
    Inventors: Rajath SHETTY, Braeden Chance Syrnyk, Ratin Kumar
  • Publication number: 20250065920
    Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.
    Type: Application
    Filed: November 8, 2024
    Publication date: February 27, 2025
    Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
  • Publication number: 20250045996
    Abstract: In various examples, a virtually animated and interactive agent may be rendered for visual and audible communication with one or more users with an application. For example, a conversational artificial intelligence (AI) assistant may be rendered and displayed for visual communication in addition to audible communication with end-users. As such, the AI assistant may leverage the visual domain—in addition to the audible domain—to more clearly communicate with users, including interacting with a virtual environment in which the AI assistant is rendered. Similarly, the AI assistant may leverage audio, video, and/or text inputs from a user to determine a request, mood, gesture, and/or posture of a user for more accurately responding to and interacting with the user.
    Type: Application
    Filed: October 21, 2024
    Publication date: February 6, 2025
    Inventors: Rev Lebaredian, Simon Yuen, Santanu Dutta, Jonathan Michael Cohen, Ratin Kumar
  • Patent number: 12208732
    Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.
    Type: Grant
    Filed: January 27, 2020
    Date of Patent: January 28, 2025
    Assignee: NVIDIA Corporation
    Inventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker
  • Patent number: 12205210
    Abstract: In various examples, a virtually animated and interactive agent may be rendered for visual and audible communication with one or more users with an application. For example, a conversational artificial intelligence (AI) assistant may be rendered and displayed for visual communication in addition to audible communication with end-users. As such, the AI assistant may leverage the visual domain—in addition to the audible domain—to more clearly communicate with users, including interacting with a virtual environment in which the AI assistant is rendered. Similarly, the AI assistant may leverage audio, video, and/or text inputs from a user to determine a request, mood, gesture, and/or posture of a user for more accurately responding to and interacting with the user.
    Type: Grant
    Filed: May 12, 2021
    Date of Patent: January 21, 2025
    Assignee: NVIDIA Corporation
    Inventors: Rev Lebaredian, Simon Yuen, Santanu Dutta, Jonathan Michael Cohen, Ratin Kumar
  • Publication number: 20240404296
    Abstract: In various examples, low power proximity based threat detection using optical flow for vehicle systems and applications are provided. Some embodiments may use a tiered framework that uses sensor fusion techniques to detect and track the movement of a threat candidate, and perform a threat classification and/or intent prediction as the threat candidate approaches approach. Relative depth indications from optical flow, computed using data from image sensors, can be used to initially segment and track a moving object over a sequence of image frames. Additional sensors and processing may be brought online when a moving object becomes close enough to be considered a higher risk threat candidate. A threat response system may generate a risk score based on a predicted intent of a threat candidate, and when the risk score exceeds a certain threshold, then the threat response system may respond accordingly based on the threat classification and/or risk score.
    Type: Application
    Filed: June 1, 2023
    Publication date: December 5, 2024
    Inventors: Shagan Sah, Niranjan Avadhanam, Rajath Shetty, Ratin Kumar, Yile Chen