Patents by Inventor Ratin Kumar

Ratin Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Machine-learning techniques for dialog processing

Patent number: 12670913

Abstract: Apparatuses, systems, and techniques to identify speakers based on content of speech. In at least one embodiment, one or more speakers are identified based on content of speech.

Type: Grant

Filed: April 10, 2020

Date of Patent: June 30, 2026

Assignee: NVIDIA Corporation

Inventors: Anshul Jain, Sumit Kumar Bhattacharya, Ratin Kumar, Jason Conrad Roche, Shubhadeep Das, Bangqi Wang, Rajath Bellipady Shetty
Query response generation

Patent number: 12664185

Abstract: In various examples, a conversational artificial intelligence (AI) platform uses structured data and unstructured data to generate responses to queries from users. In an example, if data for a response to a query is not stored in a structured data structured, the conversational AI platform searches for the data in an unstructured data structure.

Type: Grant

Filed: February 17, 2022

Date of Patent: June 23, 2026

Assignee: NVIDIA Corporation

Inventors: Shubhadeep Das, Sumit Bhattacharya, Ratin Kumar
SENSOR STREAM SELECTION FOR MULTI-MODAL LANGUAGE MODELS FOR AUTONOMOUS MACHINES AND APPLICATIONS

Publication number: 20260100041

Abstract: In various examples, multiple sensors of an ego-machine may be used to generate corresponding streams of sensor data, and a multi-modal language model may be used to select a stream from among multiple streams to evaluate for a given detection task. Taking image data generated using multiple exterior cameras as an example, a vision language model (VLM) may be prompted to evaluate an image from each camera (e.g., resized to a designated resolution supported by the VLM) and identify a relevant camera for a designated task. The VLM may be prompted at a designated frame rate, and once the VLM identifies a camera or corresponding video stream, the VLM may be prompted with a (e.g., subsequent, higher resolution) frame of image data from the identified camera to perform one or more subsequent tasks (e.g., a designated detection task, identify an ROI within the higher resolution frame of image data, etc.).

Type: Application

Filed: October 3, 2024

Publication date: April 9, 2026

Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
ITERATIVE INPUT REFINEMENT FOR MULTI-MODAL LANGUAGE MODELS

Publication number: 20260100042

Abstract: In various examples, a multi-modal language model such as a vision language model (VLM) may be iteratively prompted to identify and/or refine a region of interest (ROI) to evaluate for a designated detection task. For example, an initial prompt may broadly focus the VLM on identifying one or more ROIs within an image. After identifying an initial ROI, the VLM may be prompted to refine the ROI, for example, by prompting the VLM to evaluate the initial ROI and identify an ROI with more specific content than the initial prompt did, or by evaluating successive frames generated over time until a measure of confidence that an identified ROI contains the designated content meets a designate threshold, upon which, the multi-modal language model may be prompted to perform the detection task on the identified ROI.

Type: Application

Filed: October 3, 2024

Publication date: April 9, 2026

Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
TOKENIZED DATA STREAMING FOR MULTI-MODAL LANGUAGE MODELS

Publication number: 20260097781

Abstract: In various examples, a multi-modal language model may be split up and hosted by multiple devices. For example, a modality (e.g., vision, audio) encoder and/or projector of the multi-modal language model (e.g., a vision language model) may be hosted on one device (e.g., an in-vehicle SoC) that encodes raw sensor data into corresponding tokens and streams the tokens to a second device (e.g., an external graphic processing unit (GPU) or artificial intelligence (AI) accelerator) that hosts an inference server and a language model (LM) of the multi-modal language model. The LM may return a response indicating the result(s) of the requested detection task, and the response may be used to take some responsive action (e.g., control one or more operations of an ego-machine).

Type: Application

Filed: October 3, 2024

Publication date: April 9, 2026

Inventors: Rajath Bellipady Shetty, Niral Lalit Pathak, Ratin Kumar
Low power proximity-based presence detection using optical flow

Patent number: 12597266

Abstract: In various examples, low power proximity based threat detection using optical flow for vehicle systems and applications are provided. Some embodiments may use a tiered framework that uses sensor fusion techniques to detect and track the movement of a threat candidate, and perform a threat classification and/or intent prediction as the threat candidate approaches approach. Relative depth indications from optical flow, computed using data from image sensors, can be used to initially segment and track a moving object over a sequence of image frames. Additional sensors and processing may be brought online when a moving object becomes close enough to be considered a higher risk threat candidate. A threat response system may generate a risk score based on a predicted intent of a threat candidate, and when the risk score exceeds a certain threshold, then the threat response system may respond accordingly based on the threat classification and/or risk score.

Type: Grant

Filed: June 1, 2023

Date of Patent: April 7, 2026

Assignee: NVIDIA Corporation

Inventors: Shagan Sah, Niranjan Avadhanam, Rajath Shetty, Ratin Kumar, Yile Chen
ENFORCING STANDARDS WITH LARGE LANGUAGE MODELS

Publication number: 20260023553

Abstract: In various examples, a technique for resolving a standards violation includes receiving a violation notification of a standards violation detected in a software codebase. The technique also includes determining additional information relevant to the standards violation and included in one or more information sources. The technique further includes generating a prompt based at least on the violation notification and the additional information, generating, using a machine learning model and based at least on the prompt, one or more corrective suggestions associated with the standards violation, and modifying the software codebase based at least on the one or more software code changes.

Type: Application

Filed: July 17, 2024

Publication date: January 22, 2026

Inventors: Niral Lalit PATHAK, Rajath Bellipady SHETTY, Chandana NEERUKONDA, Ratin KUMAR
Interpreting discrete tasks from complex instructions for robotic systems and applications

Patent number: 12487581

Abstract: Approaches provide for performance of a complex (e.g., compound) task that may involve multiple discrete tasks not obvious from an instruction to perform the complex task. A set of conditions for an environment can be determined using captured image data, and the instruction analyzed to determine a set of final conditions to exist in the environment after performance of the instruction. These initial and end conditions are used to determine a sequence of discrete tasks to be performed to cause a robot or automated device to perform the instruction. This can involve use of a symbolic or visual planner in at least some embodiments, as well as a search of possible sequences of actions available for the robot or automated device. A robot can be caused to perform the sequence of discrete tasks, and feedback provided such that the sequence of tasks can be modified as appropriate.

Type: Grant

Filed: March 17, 2022

Date of Patent: December 2, 2025

Assignee: Nvidia Corporation

Inventors: Christopher Jason Paxton, Shagan Sah, Ratin Kumar, Dieter Fox
ENVIRONMENTAL TEXT PERCEPTION AND TOLL EVALUATION USING VISION LANGUAGE MODELS

Publication number: 20250292687

Abstract: A vision language model (VLM) may be used to evaluate signs that designate restricted or toll lanes, determine whether it is permissible (and/or the cost) to merge into a restricted or toll lane, and/or determine when to merge out of a restricted or toll lane based on the cost. Frames from one or more (e.g., front-facing) camera(s) may be evaluated for applicable signs (e.g., using a sign recognition DNN or a VLM). If detected, the (e.g., cropped) image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether to drive in the restricted or toll lane (e.g., whether it can be taken within budget) and/or what the cost would be. The generated response may be provided to an ADAS to trigger an initiation of a merge left or right or a determination to stay in the current lane.

Type: Application

Filed: August 1, 2024

Publication date: September 18, 2025

Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
DRIVER AND OCCUPANT MONITORING USING VISION LANGUAGE MODELS

Publication number: 20250292595

Abstract: Some embodiments relate to driver or occupant monitoring using vision language models (VLMs). Any number of DNNs in a detection pipeline may be replaced with a VLM, and the VLM may be prompted to determine whether a corresponding feature is present in an image or sampled frames from a video. To facilitate using the VLM(s) to control one or more downstream actions, the VLM(s) may be prompted using structured inputs, and a designated output format for a corresponding structured output may be enforced in any suitable manner. As such, any number of VLMs may be used to perform any number of driver and/or occupant monitoring tasks (e.g., driver drowsiness detection, driver distraction detection, driver or occupant out-of-position detection, driver or occupant identification, seatbelt usage detection, occupant presence detection, occupant classification, child presence detection, gesture recognition, occlusion detection, and/or others).

Type: Application

Filed: August 1, 2024

Publication date: September 18, 2025

Inventors: Chandana Neerukonda, Niral Lalit Pathak, Rajath Bellipady Shetty, Ratin Kumar, Niranjan Avadhanam
SCHEDULING AND PRIORITIZATION OF VISION LANGUAGE MODEL INFERENCE REQUESTS

Publication number: 20250292557

Abstract: In some embodiments, the same vision language model (VLM) may be used to support different types of detection tasks (e.g., one foundational VLM supporting some or all detection tasks performed by an ego-machine, one VLM for interior sensing tasks and one for exterior sensing tasks, etc.), and an inference scheduler may be used to serve or handle inference requests for the VLM(s) to perform the different tasks. In some embodiments, the scheduler prioritizes inference requests based on safety (e.g., prioritizing inference requests to perform ADAS tasks such as pedestrian detection, bicycle detection, or trajectory planning over requests to perform driver or occupant monitoring tasks, prioritizing exterior sensing tasks over interior sensing tasks, etc.). As such, the scheduler may queue, manage, distribute inference requests from different detection applications to the VLM(s), and receive and return responses to corresponding detection task managers.

Type: Application

Filed: August 1, 2024

Publication date: September 18, 2025

Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
ENVIRONMENTAL TEXT PERCEPTION AND PARKING EVALUATION USING VISION LANGUAGE MODELS

Publication number: 20250289456

Abstract: Some embodiments relate to environmental text perception using vision language models (VLMs). For example, an Advanced Driver Assistance System (ADAS) may identify candidate parking spaces, and a VLM may be used to evaluate parking signs and determine whether it is permissible and/or the cost to park in a candidate parking space. For example, frames from corresponding (e.g., front-facing, repeater, side pillar) camera(s) may be evaluated for corresponding parking signs (e.g., using a sign recognition DNN or a VLM). If a parking sign is detected, the image of the sign may be provided as input to a VLM with a textual prompt instructing the VLM to determine whether it is permissible to park at a corresponding location (and if so, the cost). The generated response may be provided to the ADAS to confirm or invalidate the candidate parking space, and a representation of the results may be provided to the driver.

Type: Application

Filed: August 1, 2024

Publication date: September 18, 2025

Inventors: Niral Lalit Pathak, Chandana Neerukonda, Rajath Bellipady Shetty, Niranjan Avadhanam, Ratin Kumar
SYSTEMS AND METHODS FOR COMPUTER-ASSISTED SHUTTLES, BUSES, ROBO-TAXIS, RIDE-SHARING AND ON-DEMAND VEHICLES WITH SITUATIONAL AWARENESS

Publication number: 20250222958

Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.

Type: Application

Filed: January 8, 2024

Publication date: July 10, 2025

Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
AUTOMATICALLY-ADJUSTING MIRROR FOR USE IN VEHICLES

Publication number: 20250170958

Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.

Type: Application

Filed: January 27, 2025

Publication date: May 29, 2025

Inventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker
MACHINE OPERATION ASSISTANCE USING LANGUAGE MODEL-AUGMENTED PERCEPTION

Publication number: 20250136130

Abstract: Various embodiments of the present disclosure relate to operator assistance based on extracting natural language characters from one or more sensed objects. For instance, particular embodiments may generate a natural language utterance based on extracting natural language text in a nearby traffic sign. In an illustrative example, particular embodiments may detect, via object detection and within image data, one or more regions of the image data depicting the traffic sign. Particular embodiments can then extract one or more first natural language characters represented in the traffic sign based at least on performing optical character recognition within the one or more regions of the image data in response to detecting the one or more regions of the image data depicting the traffic sign.

Type: Application

Filed: November 1, 2023

Publication date: May 1, 2025

Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
MACHINE OPERATION ASSISTANCE USING LANGUAGE MODEL-AUGMENTED OPERATOR MONITORING

Publication number: 20250136134

Abstract: Various embodiments of the present disclosure relate to operator assistance based on operator monitoring. For instance, during long drives, a driver may become drowsy or may not otherwise be alert. As such, particular embodiments have the capability of starting a conversation with the driver based on driver interests and/or detecting that the driver is getting drowsy. In an illustrative example, a Driver Monitoring System (DMS) camera of a vehicle may employ a component that derives pixel-level information showing head nodding, hands dropping, or the like. Based on image pattern characteristics in the image data, particular embodiments generate a score representing an alertness level. A representation of the alertness level can be provided as input to a machine learning model so that the model may generate a suitable natural language or other response, such as starting a conversation with personalized trivia, sending a control signal to honk a horn, or the like.

Type: Application

Filed: November 1, 2023

Publication date: May 1, 2025

Inventors: Rajath SHETTY, Ratin KUMAR, Niral Lalit PATHAK, Niranjan AVADHANAM
USER AUTHENTICATION FOR VEHICLE ACCESS AND IN-CABIN EXPERIENCE USING INFRARED IMAGING

Publication number: 20250065844

Abstract: In various examples, infrared image data may be used to detect a subcutaneous characteristic(s) (e.g., a palm vein topology) of a person (e.g., a person requesting entry to a vehicle, a vehicle occupant) and authenticate the user based on the detected subcutaneous characteristic(s). For example, infrared image data representing one or more acquired subcutaneous characteristics (e.g., a topology of veins and/or other blood vessels in a region of the authenticating user's palm, hand, neck, forearm, face, fingertip, eye, etc.) may be generated. Hand and/or palm detection may be applied to detect a region depicting the user's hand or palm, and that region (or some subset thereof) may be segmented to generate a representation of an acquired vein topology. The acquired vein topology may be compared with one or more reference vein topologies stored in a database to determine whether the acquired vein topology matches one of the reference vein topologies.

Type: Application

Filed: August 22, 2023

Publication date: February 27, 2025

Inventors: Rajath SHETTY, Braeden Chance Syrnyk, Ratin Kumar
SYSTEMS AND METHODS FOR COMPUTER-ASSISTED SHUTTLES, BUSES, ROBO-TAXIS, RIDE-SHARING AND ON-DEMAND VEHICLES WITH SITUATIONAL AWARENESS

Publication number: 20250065920

Abstract: A system and method for an on-demand shuttle, bus, or taxi service able to operate on private and public roads provides situational awareness and confidence displays. The shuttle may include ISO 26262 Level 4 or Level 5 functionality and can vary the route dynamically on-demand, and/or follow a predefined route or virtual rail. The shuttle is able to stop at any predetermined station along the route. The system allows passengers to request rides and interact with the system via a variety of interfaces, including without limitation a mobile device, desktop computer, or kiosks. Each shuttle preferably includes an in-vehicle controller, which preferably is an AI Supercomputer designed and optimized for autonomous vehicle functionality, with computer vision, deep learning, and real time ray tracing accelerators. An AI Dispatcher performs AI simulations to optimize system performance according to operator-specified system parameters.

Type: Application

Filed: November 8, 2024

Publication date: February 27, 2025

Inventors: Gary HICOK, Michael COX, Miguel SAINZ, Martin HEMPEL, Ratin KUMAR, Timo ROMAN, Gordon GRIGOR, David NISTER, Justin EBERT, Chin-Hsien SHIH, Tony TAM, Ruchi BHARGAVA
CONVERSATIONAL AI PLATFORM WITH RENDERED GRAPHICAL OUTPUT

Publication number: 20250045996

Abstract: In various examples, a virtually animated and interactive agent may be rendered for visual and audible communication with one or more users with an application. For example, a conversational artificial intelligence (AI) assistant may be rendered and displayed for visual communication in addition to audible communication with end-users. As such, the AI assistant may leverage the visual domain—in addition to the audible domain—to more clearly communicate with users, including interacting with a virtual environment in which the AI assistant is rendered. Similarly, the AI assistant may leverage audio, video, and/or text inputs from a user to determine a request, mood, gesture, and/or posture of a user for more accurately responding to and interacting with the user.

Type: Application

Filed: October 21, 2024

Publication date: February 6, 2025

Inventors: Rev Lebaredian, Simon Yuen, Santanu Dutta, Jonathan Michael Cohen, Ratin Kumar
Automatically-adjusting mirror for use in vehicles

Patent number: 12208732

Abstract: Systems and methods for a self-adjusting vehicle mirror. The mirror automatically locates the face of the driver or another passenger, and orients the mirror to provide the driver/passenger face with a desired view from the mirror. The mirror may continue to reorient itself as the driver or passenger shifts position, to continuously provide a desired field of view even as he or she changes position over time. In certain embodiments, the mirror system of the disclosure can be a self-contained system, with the mirror, mirror actuator, camera, and computing device all contained within the mirror housing as a single integrated unit.

Type: Grant

Filed: January 27, 2020

Date of Patent: January 28, 2025

Assignee: NVIDIA Corporation

Inventors: Feng Hu, Niranjan Avadhanam, Ratin Kumar, Simon John Baker

1 2 3 next