Patents by Inventor Ratin Kumar
Ratin Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260100041Abstract: In various examples, multiple sensors of an ego-machine may be used to generate corresponding streams of sensor data, and a multi-modal language model may be used to select a stream from among multiple streams to evaluate for a given detection task. Taking image data generated using multiple exterior cameras as an example, a vision language model (VLM) may be prompted to evaluate an image from each camera (e.g., resized to a designated resolution supported by the VLM) and identify a relevant camera for a designated task. The VLM may be prompted at a designated frame rate, and once the VLM identifies a camera or corresponding video stream, the VLM may be prompted with a (e.g., subsequent, higher resolution) frame of image data from the identified camera to perform one or more subsequent tasks (e.g., a designated detection task, identify an ROI within the higher resolution frame of image data, etc.).Type: ApplicationFiled: October 3, 2024Publication date: April 9, 2026Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
-
Publication number: 20260100042Abstract: In various examples, a multi-modal language model such as a vision language model (VLM) may be iteratively prompted to identify and/or refine a region of interest (ROI) to evaluate for a designated detection task. For example, an initial prompt may broadly focus the VLM on identifying one or more ROIs within an image. After identifying an initial ROI, the VLM may be prompted to refine the ROI, for example, by prompting the VLM to evaluate the initial ROI and identify an ROI with more specific content than the initial prompt did, or by evaluating successive frames generated over time until a measure of confidence that an identified ROI contains the designated content meets a designate threshold, upon which, the multi-modal language model may be prompted to perform the detection task on the identified ROI.Type: ApplicationFiled: October 3, 2024Publication date: April 9, 2026Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
-
Publication number: 20260097781Abstract: In various examples, a multi-modal language model may be split up and hosted by multiple devices. For example, a modality (e.g., vision, audio) encoder and/or projector of the multi-modal language model (e.g., a vision language model) may be hosted on one device (e.g., an in-vehicle SoC) that encodes raw sensor data into corresponding tokens and streams the tokens to a second device (e.g., an external graphic processing unit (GPU) or artificial intelligence (AI) accelerator) that hosts an inference server and a language model (LM) of the multi-modal language model. The LM may return a response indicating the result(s) of the requested detection task, and the response may be used to take some responsive action (e.g., control one or more operations of an ego-machine).Type: ApplicationFiled: October 3, 2024Publication date: April 9, 2026Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
-
Patent number: 12597266Abstract: In various examples, low power proximity based threat detection using optical flow for vehicle systems and applications are provided. Some embodiments may use a tiered framework that uses sensor fusion techniques to detect and track the movement of a threat candidate, and perform a threat classification and/or intent prediction as the threat candidate approaches approach. Relative depth indications from optical flow, computed using data from image sensors, can be used to initially segment and track a moving object over a sequence of image frames. Additional sensors and processing may be brought online when a moving object becomes close enough to be considered a higher risk threat candidate. A threat response system may generate a risk score based on a predicted intent of a threat candidate, and when the risk score exceeds a certain threshold, then the threat response system may respond accordingly based on the threat classification and/or risk score.Type: GrantFiled: June 1, 2023Date of Patent: April 7, 2026Assignee: NVIDIA CorporationInventors: Shagan Sah, Niranjan Avadhanam, Rajath Shetty, Ratin Kumar, Yile Chen
-
Publication number: 20260023553Abstract: In various examples, a technique for resolving a standards violation includes receiving a violation notification of a standards violation detected in a software codebase. The technique also includes determining additional information relevant to the standards violation and included in one or more information sources. The technique further includes generating a prompt based at least on the violation notification and the additional information, generating, using a machine learning model and based at least on the prompt, one or more corrective suggestions associated with the standards violation, and modifying the software codebase based at least on the one or more software code changes.Type: ApplicationFiled: July 17, 2024Publication date: January 22, 2026Inventors: Niral Lalit PATHAK, Rajath Bellipady SHETTY, Chandana NEERUKONDA, Ratin KUMAR
-
Patent number: 12487581Abstract: Approaches provide for performance of a complex (e.g., compound) task that may involve multiple discrete tasks not obvious from an instruction to perform the complex task. A set of conditions for an environment can be determined using captured image data, and the instruction analyzed to determine a set of final conditions to exist in the environment after performance of the instruction. These initial and end conditions are used to determine a sequence of discrete tasks to be performed to cause a robot or automated device to perform the instruction. This can involve use of a symbolic or visual planner in at least some embodiments, as well as a search of possible sequences of actions available for the robot or automated device. A robot can be caused to perform the sequence of discrete tasks, and feedback provided such that the sequence of tasks can be modified as appropriate.Type: GrantFiled: March 17, 2022Date of Patent: December 2, 2025Assignee: Nvidia CorporationInventors: Christopher Jason Paxton, Shagan Sah, Ratin Kumar, Dieter Fox
-
Publication number: 20250292687Abstract: A vision language model (VLM) may be used to evaluate signs that designate restricted or toll lanes, determine whether it is permissible (and/or the cost) to merge into a restricted or toll lane, and/or determine when to merge out of a restricted or toll lane based on the cost. Frames from one or more (e.g., front-facing) camera(s) may be evaluated for applicable signs (e.g., using a sign recognition DNN or a VLM). If detected, the (e.g., cropped) image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether to drive in the restricted or toll lane (e.g., whether it can be taken within budget) and/or what the cost would be. The generated response may be provided to an ADAS to trigger an initiation of a merge left or right or a determination to stay in the current lane.Type: ApplicationFiled: August 1, 2024Publication date: September 18, 2025Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
-
Publication number: 20250292595Abstract: Some embodiments relate to driver or occupant monitoring using vision language models (VLMs). Any number of DNNs in a detection pipeline may be replaced with a VLM, and the VLM may be prompted to determine whether a corresponding feature is present in an image or sampled frames from a video. To facilitate using the VLM(s) to control one or more downstream actions, the VLM(s) may be prompted using structured inputs, and a designated output format for a corresponding structured output may be enforced in any suitable manner. As such, any number of VLMs may be used to perform any number of driver and/or occupant monitoring tasks (e.g., driver drowsiness detection, driver distraction detection, driver or occupant out-of-position detection, driver or occupant identification, seatbelt usage detection, occupant presence detection, occupant classification, child presence detection, gesture recognition, occlusion detection, and/or others).Type: ApplicationFiled: August 1, 2024Publication date: September 18, 2025Inventors: Chandana Neerukonda, Niral Lalit Pathak, Rajath Bellipady Shetty, Ratin Kumar, Niranjan Avadhanam
-
Publication number: 20250292557Abstract: In some embodiments, the same vision language model (VLM) may be used to support different types of detection tasks (e.g., one foundational VLM supporting some or all detection tasks performed by an ego-machine, one VLM for interior sensing tasks and one for exterior sensing tasks, etc.), and an inference scheduler may be used to serve or handle inference requests for the VLM(s) to perform the different tasks. In some embodiments, the scheduler prioritizes inference requests based on safety (e.g., prioritizing inference requests to perform ADAS tasks such as pedestrian detection, bicycle detection, or trajectory planning over requests to perform driver or occupant monitoring tasks, prioritizing exterior sensing tasks over interior sensing tasks, etc.). As such, the scheduler may queue, manage, distribute inference requests from different detection applications to the VLM(s), and receive and return responses to corresponding detection task managers.Type: ApplicationFiled: August 1, 2024Publication date: September 18, 2025Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
-
Publication number: 20250289456Abstract: Some embodiments relate to environmental text perception using vision language models (VLMs). For example, an Advanced Driver Assistance System (ADAS) may identify candidate parking spaces, and a VLM may be used to evaluate parking signs and determine whether it is permissible and/or the cost to park in a candidate parking space. For example, frames from corresponding (e.g., front-facing, repeater, side pillar) camera(s) may be evaluated for corresponding parking signs (e.g., using a sign recognition DNN or a VLM). If a parking sign is detected, the image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether it is permissible to park at a corresponding location (and if so, the cost). The generated response may be provided to the ADAS to confirm or invalidate the candidate parking space, and a representation of the results may be provided to the driver.Type: ApplicationFiled: August 1, 2024Publication date: September 18, 2025Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
-
Publication number: 20250222958Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.Type: ApplicationFiled: January 8, 2024Publication date: July 10, 2025Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
-
Publication number: 20250170958Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.Type: ApplicationFiled: January 27, 2025Publication date: May 29, 2025Inventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker
-
Publication number: 20250136130Abstract: Various embodiments of the present disclosure relate to operator assistance based on extracting natural language characters from one or more sensed objects. For instance, particular embodiments may generate a natural language utterance based on extracting natural language text in a nearby traffic sign. In an illustrative example, particular embodiments may detect, via object detection and within image data, one or more regions of the image data depicting the traffic sign. Particular embodiments can then extract one or more first natural language characters represented in the traffic sign based at least on performing optical character recognition within the one or more regions of the image data in response to detecting the one or more regions of the image data depicting the traffic sign.Type: ApplicationFiled: November 1, 2023Publication date: May 1, 2025Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
-
Publication number: 20250136134Abstract: Various embodiments of the present disclosure relate to operator assistance based on operator monitoring. For instance, during long drives, a driver may become drowsy or may not otherwise be alert. As such, particular embodiments have the capability of starting a conversation with the driver based on driver interests and/or detecting that the driver is getting drowsy. In an illustrative example, a Driver Monitoring System (DMS) camera of a vehicle may employ a component that derives pixel-level information showing head nodding, hands dropping, or the like. Based on image pattern characteristics in the image data, particular embodiments generate a score representing an alertness level. A representation of the alertness level can be provided as input to a machine learning model so that the model may generate a suitable natural language or other response, such as starting a conversation with personalized trivia, sending a control signal to honk a horn, or the like.Type: ApplicationFiled: November 1, 2023Publication date: May 1, 2025Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
-
Publication number: 20250065844Abstract: In various examples, infrared image data may be used to detect a subcutaneous characteristic(s) (e.g., a palm vein topology) of a person (e.g., a person requesting entry to a vehicle, a vehicle occupant) and authenticate the user based on the detected subcutaneous characteristic(s). For example, infrared image data representing one or more acquired subcutaneous characteristics (e.g., a topology of veins and/or other blood vessels in a region of the authenticating user's palm, hand, neck, forearm, face, fingertip, eye, etc.) may be generated. Hand and/or palm detection may be applied to detect a region depicting the user's hand or palm, and that region (or some subset thereof) may be segmented to generate a representation of an acquired vein topology. The acquired vein topology may be compared with one or more reference vein topologies stored in a database to determine whether the acquired vein topology matches one of the reference vein topologies.Type: ApplicationFiled: August 22, 2023Publication date: February 27, 2025Inventors: Rajath SHETTY, Braeden Chance Syrnyk, Ratin Kumar
-
Publication number: 20250065920Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.Type: ApplicationFiled: November 8, 2024Publication date: February 27, 2025Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
-
Publication number: 20250045996Abstract: In various examples, a virtually animated and interactive agent may be rendered for visual and audible communication with one or more users with an application. For example, a conversational artificial intelligence (AI) assistant may be rendered and displayed for visual communication in addition to audible communication with end-users. As such, the AI assistant may leverage the visual domain—in addition to the audible domain—to more clearly communicate with users, including interacting with a virtual environment in which the AI assistant is rendered. Similarly, the AI assistant may leverage audio, video, and/or text inputs from a user to determine a request, mood, gesture, and/or posture of a user for more accurately responding to and interacting with the user.Type: ApplicationFiled: October 21, 2024Publication date: February 6, 2025Inventors: Rev Lebaredian, Simon Yuen, Santanu Dutta, Jonathan Michael Cohen, Ratin Kumar
-
Patent number: 12208732Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.Type: GrantFiled: January 27, 2020Date of Patent: January 28, 2025Assignee: NVIDIA CorporationInventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker
-
Patent number: 12205210Abstract: In various examples, a virtually animated and interactive agent may be rendered for visual and audible communication with one or more users with an application. For example, a conversational artificial intelligence (AI) assistant may be rendered and displayed for visual communication in addition to audible communication with end-users. As such, the AI assistant may leverage the visual domain—in addition to the audible domain—to more clearly communicate with users, including interacting with a virtual environment in which the AI assistant is rendered. Similarly, the AI assistant may leverage audio, video, and/or text inputs from a user to determine a request, mood, gesture, and/or posture of a user for more accurately responding to and interacting with the user.Type: GrantFiled: May 12, 2021Date of Patent: January 21, 2025Assignee: NVIDIA CorporationInventors: Rev Lebaredian, Simon Yuen, Santanu Dutta, Jonathan Michael Cohen, Ratin Kumar
-
Publication number: 20240404296Abstract: In various examples, low power proximity based threat detection using optical flow for vehicle systems and applications are provided. Some embodiments may use a tiered framework that uses sensor fusion techniques to detect and track the movement of a threat candidate, and perform a threat classification and/or intent prediction as the threat candidate approaches approach. Relative depth indications from optical flow, computed using data from image sensors, can be used to initially segment and track a moving object over a sequence of image frames. Additional sensors and processing may be brought online when a moving object becomes close enough to be considered a higher risk threat candidate. A threat response system may generate a risk score based on a predicted intent of a threat candidate, and when the risk score exceeds a certain threshold, then the threat response system may respond accordingly based on the threat classification and/or risk score.Type: ApplicationFiled: June 1, 2023Publication date: December 5, 2024Inventors: Shagan Sah, Niranjan Avadhanam, Rajath Shetty, Ratin Kumar, Yile Chen