Patents by Inventor Avnish Sikka

Avnish Sikka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PROCESSING COMPLEX UTTERANCES FOR NATURAL LANGUAGE UNDERSTANDING

Publication number: 20230032575

Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.

Type: Application

Filed: August 8, 2022

Publication date: February 2, 2023

Inventors: Cengiz Erbas, Thomas Kollar, Avnish Sikka, Spyridon Matsoukas, Simon Peter Reavely
Processing complex utterances for natural language understanding

Patent number: 11410646

Abstract: A system capable of performing natural language understanding (NLU) on utterances including complex command structures such as sequential commands (e.g., multiple commands in a single utterance), conditional commands (e.g., commands that are only executed if a condition is satisfied), and/or repetitive commands (e.g., commands that are executed until a condition is satisfied). Audio data may be processed using automatic speech recognition (ASR) techniques to obtain text. The text may then be processed using machine learning models that are trained to parse text of incoming utterances. The models may identify complex utterance structures and may identify what command portions of an utterance go with what conditional statements. Machine learning models may also identify what data is needed to determine when the conditionals are true so the system may cause the commands to be executed (and stopped) at the appropriate times.

Type: Grant

Filed: March 28, 2019

Date of Patent: August 9, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Cengiz Erbas, Thomas Kollar, Avnish Sikka, Spyridon Matsoukas, Simon Peter Reavely
Item recognition using context data

Patent number: 10043069

Abstract: A system for recognizing objects and/or text in image data may use context data to perform object/text recognition. The system may also use context data when determining potential functions to execute in response to recognizing the object/text. Context data may be gathered based on device sensor data, user profile data such as the behavior of a user or the behavior of those in a user's social network, or other factors. Recognition processing and/or function selection may be configured to account for context data when operating to improve output results.

Type: Grant

Filed: March 4, 2014

Date of Patent: August 7, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Yue Liu, Utkarsh Prateek, Avnish Sikka, Matthew Daniel Hart, Emilie Noelle McConville, Sonjeev Jahagirdar
Determining camera auto-focus settings

Patent number: 9854155

Abstract: A system and method of determining a tilt angle of a portable computing device using a sensor indicating gravitational pull on the device; determining the tilt angle of a camera of the device; identifying a tilt angle range from a plurality of predetermined tilt angle ranges; determining a first focal length setting using a first array that associates the tilt angle range with the first focal length setting; determining an adjustment increment using a second array that associates the adjustment increment with the tilt angle range; and determining a second focal length setting of the camera using the adjustment increment according to an autofocus scan range algorithm. A portable computing device including a processor; a camera; and a memory device including instructions operable to be executed by the processor to perform a set of actions, enabling the portable computing device to perform the method.

Type: Grant

Filed: June 16, 2015

Date of Patent: December 26, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Avnish Sikka, Yue Liu
Determining camera auto-focus settings

Patent number: 9826156

Abstract: A system and method of determining a tilt angle of a portable computing device using sensor data; identifying a tilt angle from a plurality of predetermined tilt angle ranges; determining focal length settings for image capture devices of the portable computing device using the tilt angle, adjustment increments, and autofocus scan range algorithms. A portable computing device including a processor; a first image capture device on a first side of the portable computing device, and a second image capture device on the second side of the portable computing device, the second side located opposite of the first side; and a memory device including instructions operable to be executed by the processor to perform a set of actions, enabling the portable computing device to perform the method.

Type: Grant

Filed: June 16, 2015

Date of Patent: November 21, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Yue Liu, Avnish Sikka
Merging optical character recognized text from frames of image data

Patent number: 9659224

Abstract: Disclosed are techniques for merging optical character recognized (OCR'd) text from frames of image data. In some implementations, a device sends frames of image data to a server, where each frame includes at least a portion of a captured textual item. The server performs optical character recognition (OCR) on the image data of each frame. When OCR'd text from respective frames is returned to the device from the server, the device can perform matching operations on the text, for instance, using bounding boxes and/or edit distance processing. The device can merge any identified matches of OCR'd text from different frames. The device can then display the merged text with any corrections.

Type: Grant

Filed: March 31, 2014

Date of Patent: May 23, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Matthew Joseph Cole, Sonjeev Jahagirdar, Matthew Daniel Hart, David Paul Ramos, Ankur Datta, Utkarsh Prateek, Emilie Noelle McConville, Prashant Hegde, Avnish Sikka
Visual and audio recognition for scene change events

Patent number: 9536161

Abstract: Various embodiments describe systems and methods for utilizing a reduced amount of processing capacity for incoming data over time, and, in response to detecting a scene-change-event, notify one or more data processors that a scene-change-event has occurred, and cause incoming data to be processed as new data. In some embodiments, an incoming frame can be compared with a reference frame to determine a difference between the reference frame and the incoming frame. The reference frame may be correlated to a latest scene-change-event. In response to a determination that the difference does not meet one or more difference criteria, a user interface or at least one processor of the computing device can be notified to reduce processing of incoming data over time. In response to a determination that the difference meets the one or more difference criteria, the user interface or the at least one processor can be notified that a scene-change-event has occurred.

Type: Grant

Filed: June 17, 2014

Date of Patent: January 3, 2017

Assignee: Amazon Technologies, Inc.

Inventors: Christopher John Lish, Oleg Rybakov, Sonjeev Jahagirdar, Junxiong Jia, Neil David Cooper, Avnish Sikka
Image processing using multiple aspect ratios

Patent number: 9418283

Abstract: A system to recognize text, objects, or symbols in a captured image using machine learning models reduces computational overhead by generating a plurality of thumbnail versions of the image at different downscaled resolutions and aspect ratios, and then processing the downscaled images instead of the entire image, or sections of the entire image. The downscaled images are processed to produce a combine feature vector characterizing the overall image. The combined feature vector is processed using the machine learning model.

Type: Grant

Filed: August 20, 2014

Date of Patent: August 16, 2016

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Pradeep Natarajan, Avnish Sikka, Rohit Prasad
Recognizing text from frames of image data using contextual information

Patent number: 9355336

Abstract: Disclosed are techniques for recognizing text from one or more frames of image data using contextual information. In some implementations, image data including a captured textual item is processed to identify an entity in the image data. A context can be selected using the entity, where the context corresponds to a dictionary. Text in the captured textual item can be identified using the dictionary. The identified text can be output to a display device.

Type: Grant

Filed: April 23, 2014

Date of Patent: May 31, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Sonjeev Jahagirdar, Matthew Joseph Cole, David Paul Ramos, Utkarsh Prateek, Emilie Noelle McConville, Ankur Datta, Laura Varnum Finney, Yue Liu, Bhavesh Anil Doshi, Avnish Sikka, Michael Vanne
Using a front-facing camera to improve OCR with a rear-facing camera

Patent number: 9269009

Abstract: Various embodiments enable a computing device to incorporate frame selection or preprocessing techniques into a text recognition pipeline in an attempt to improve text recognition accuracy in various environments and situations. For example, a mobile computing device can capture images of text using a first camera, such as a rear-facing camera, while capturing images of the environment or a user with a second camera, such as a front-facing camera. Based on the images captured of the environment or user, one or more image preprocessing parameters can be determined and applied to the captured images in an attempt to improve text recognition accuracy.

Type: Grant

Filed: May 20, 2014

Date of Patent: February 23, 2016

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Yue Liu, Sonjeev Jahagirdar, Matthew Joseph Cole, Utkarsh Prateek, Emilie Noelle McConville, Daniel Makoto Wilenson, Avnish Sikka
Graphical refinement for points of interest

Patent number: 9269011

Abstract: Various embodiments crowd source images to cover various angles, zoom levels, and elevations of objects and/or points of interest (POIs) while under various lighting conditions. The crowd sourced images are tagged or associated with a particular POI or geographic location and stored in a database for use by an augmented reality (AR) application to recognize objects appearing in a live view of a scene captured by at least one camera of a computing device. The more comprehensive the database, the more accurately an object or POI in the scene will be recognized and/or tracked by the AR application. Accordingly, the more accurately an object is recognized and tracked by the AR application, the more smoothly and continuous the content and movement transitions thereof can be presented to users in the live view.

Type: Grant

Filed: February 11, 2013

Date of Patent: February 23, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Avnish Sikka, James Sassano, Sonjeev Jahagirdar, Pengcheng Wu, Nicholas Randal Sovich
Optimizing pre-processing times for faster response

Patent number: 9262689

Abstract: Embodiments of the subject technology provide for determining a region of a first acquired image based at least on a viewing mode and a set of respective positions of graphical elements to decrease the pre-processing time and perceived latency for the first image. One or more regions of text in the first image are detected, and a set of regions of text that overlap with the region of the image is determined and pre-processed. The subject technology may then pre-process an entirety of a subsequent image (e.g., to pick up missing text from the region of the first image). Thus, additional OCR results may be provided to the user by using the subsequent image(s) and merging subsequent results with previous results from the first image.

Type: Grant

Filed: December 18, 2013

Date of Patent: February 16, 2016

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Avnish Sikka, David Paul Ramos, Matthew Daniel Hart, Yue Liu, Emilie Noelle McConville
Text recognition near an edge

Patent number: 9239961

Abstract: The recognition of text in an acquired image is improved by using general and type-specific heuristics that can determine the likelihood that a portion of the text is truncated at an edge of an image, frame, or screen. Truncated text can be filtered such that the user is not provided with an option to perform an undesirable task, such as to dial an incorrect number or connect to an incorrect Web address, based on recognizing an incomplete text string. The general and type-specific heuristics can be combined to improve confidence, and the image data can be pre-processed on the device before processing with an optical character recognition (OCR) engine. Multiple frames can be analyzed to attempt to recognize words or characters that might have been truncated in one or more of the frames.

Type: Grant

Filed: September 24, 2014

Date of Patent: January 19, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Matthew Joseph Cole, Yue Liu, David Paul Ramos, Avnish Sikka
Text orientation estimation in camera captured OCR

Patent number: 9224061

Abstract: A system estimates text orientation in images captured using a handheld camera prior detecting text in the image. Text orientation is estimated based on edges detected within the image, and the image is rotated based on the estimated orientation. Text detection and processing is then performed on the rotated image. Non-text features along a periphery of the image may be sampled to assure that clutter will not undermine the estimation of orientation.

Type: Grant

Filed: August 20, 2014

Date of Patent: December 29, 2015

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Pradeep Natarajan, Avnish Sikka, Rohit Prasad
Local image enhancement for text recognition

Patent number: 9058644

Abstract: Various embodiments enable regions of text to be identified in an image captured by a camera of a computing device for preprocessing before being analyzed by a visual recognition engine. For example, each of the identified regions can be analyzed or tested to determine whether a respective region contains a quality associated with poor text recognition results, such as poor contrast, blur, noise, and the like, which can be measured by one or more algorithms. Upon identifying a region with such a quality, an image quality enhancement can be automatically applied to the respective region without user instruction or intervention. Accordingly, once each region has been cleared of the quality associated with poor recognition, the regions of text can be processed with a visual recognition algorithm or engine.

Type: Grant

Filed: March 13, 2013

Date of Patent: June 16, 2015

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: David Paul Ramos, Chang Yuan, Keith Harrison Goodman, Avnish Sikka
Aggregating multiple functions into a single platform

Patent number: 8787875

Abstract: Methods and apparatus, including computer program products, for aggregating multiple functions into a single platform. A communications system includes at least one processor, at least one computer readable storage medium storing computer executable instructions that, when executed by the at least one processor, implement components including a workflow module comprising sets of workflow instructions for processing different types of information packets, and selectable communication function modules, the workflow module coordinating processing of a received packet using selected ones of the selectable communication function modules.

Type: Grant

Filed: October 7, 2011

Date of Patent: July 22, 2014

Assignee: Affirmed Networks, Inc.

Inventors: Hassan Ahmed, Anand Krishnamurthy, Terry Durand, Tim Mortsolf, Paul Sherer, Avnish Sikka
AGGREGATING MULTIPLE FUNCTIONS INTO A SINGLE PLATFORM

Publication number: 20120190331

Abstract: Methods and apparatus, including computer program products, for aggregating multiple functions into a single platform. A communications system includes at least one processor, at least one computer readable storage medium storing computer executable instructions that, when executed by the at least one processor, implement components including a workflow module comprising sets of workflow instructions for processing different types of information packets, and selectable communication function modules, the workflow module coordinating processing of a received packet using selected ones of the selectable communication function modules.

Type: Application

Filed: October 7, 2011

Publication date: July 26, 2012

Inventors: Hassan Ahmed, Anand Krishnamurthy, Terry Durand, Tim Mortsolf, Paul Sherer, Avnish Sikka