AUGMENTED REALITY-ENHANCED FOOD PREPARATION SYSTEM AND RELATED METHODS

Info

Publication number: 20210030199
Type: Application
Filed: Mar 5, 2018
Publication Date: Feb 4, 2021
Applicant: Miso Robotics, Inc. (Pasadena, CA)
Inventors: Sean Olson (Pacific Palisades, CA), David Zito (Pasadena, CA), Ryan W. Sinnet (Pasadena, CA), Robert Anderson (Pasadena, CA), Benjamin Pelletier , Grant Stafford (Pasadena, CA), Zachary Zweig Vinegar (Los Angeles, CA), William Werst (Pasadena, CA)
Application Number: 16/490,775

Abstract

A food preparation system is configured to enhance the efficiency of food preparation operations in a commercial kitchen by displaying instructions on a surface in the kitchen work area. The food preparation system includes a plurality of cameras aimed at a kitchen workspace for preparing the plurality of food items and a processor operable to compute an instruction for a kitchen worker to perform a food preparation step based on one or more types of information selected from order information, recipe information, kitchen equipment information, data from the cameras, and food item inventory information. A projector in communication with the processor visually projects the instruction onto a location in the kitchen workspace for the kitchen worker to observe. Related methods for projecting food preparation instructions are described.

Description

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is an International PCT Application claiming the benefit of priority to US Provisional Application No. 62/467,743, filed Mar. 6, 2017, US Provisional Application No. 62/467,735, filed Mar. 6, 2017, and US Provisional Application No. 62/592,130, filed Nov. 29, 2017.

TECHNICAL FIELD

The present invention relates to food processing, and particularly, to food preparation by augmented reality, artificial intelligence, and visual projection of information.

BACKGROUND ART

Efficiently and accurately preparing food for consumers in restaurant and commercial kitchen environments is challenging because of the wide variety of types of food, cooking techniques, kitchen appliances, kitchen tools, and utensils.

Food preparation is often labor intensive and subject to human error. Workers employed by these businesses require careful and sometimes excessive training to accurately and safely prepare the food, thus increasing costs. It follows that businesses that prepare and sell food typically have high labor costs and experience large amounts of monetary and food loss as well as customer dissatisfaction due to human error.

Commercial kitchens and restaurant kitchens sometimes employ the use of displays and associated computer systems and/or paper tickets to present orders to food preparation workers. Sometimes, the completion of these orders is tracked by workers via touchscreens or keyboards or by moving the paper ticket to a different location. Once the order is presented, food preparation workers rely on training, skill, and experience and management to execute the orders efficiently using few tools other than in some instances manually-initiated timers and manually inserted food thermometers.

Such displays, computer systems, timers and thermometers are not ideal during real time operations for several reasons. First, both displays and paper tickets typically only provide information on what items need to be prepared and not instructions that actually help the workers to prepare the items, such as the next step in preparing a food item or how long an individual food item has been on the grill. Second, for both displays and timers, the kitchen worker is required to take her eye off the food item or equipment. Third, displays and tickets are not able to show the information on a given food item in close proximity to the food item itself, which can make correlating displayed information to particular food items challenging. Fourth, timers, whether shown on a display or a separate timing device, must be started manually for each food item, creating complexity and opportunities for human error. Fifth, thermometers must be inserted manually into each food item to be measured, which is time-consuming, draws the kitchen worker's attention away from other tasks, and may require the worker to remember readings for multiple food items. Sixth, none of these systems provides real-time information on the status of food items to assist in preparation, e.g., estimates of whether food items are cooked based on real-time temperature measurements.

Consequently, orders may not always be prepared correctly, efficiently, and in some instances, kitchen workers are subject to safety challenges in the event a kitchen worker is unknowingly or improperly handling a dangerously hot or sharp item. Customers may be subject to health risks or reduced food quality due to improperly prepared food, e.g., undercooked or overcooked food.

Accordingly, there is still a need to instruct kitchen workers in a more accurate, automatic, and efficient manner, and that overcomes the above-mentioned challenges.

SUMMARY OF THE INVENTION

A novel food preparation system is configured to compute instructions and useful data for enhancing the efficiency of grilling operations in a commercial kitchen based on a wide variety of information. The food preparation system projects the instructions in real time in the kitchen work area.

In embodiments, a food preparation system for preparing food items in a commercial kitchen environment includes at least one camera aimed at the kitchen workspace and a processor in communication with the camera. The camera and processor are operable to recognize and locate the food items and track their state. The processor is further operable to compute an instruction for a kitchen worker to perform a food preparation step based on order information, the state of the food items, and recipe information. The system also includes a projector in communication with the processor and operable to visually project the instruction onto a location in the kitchen workspace for the kitchen worker to observe.

In embodiments, the state of the food items can include such information as, for example, how long the food item has been cooking and its location.

In embodiments, the food preparation system is applied to food assembly, prep work, grilling, frying, sautéing, assembly, and packaging and projects the instructions onto the objects themselves, in close proximity to the objects (such as onto the kitchen equipment being used to prepare the objects or in other embodiments) or displays an augmented reality view of the objects with the computed instructions superimposed thereon or in close proximity.

In embodiments, a food preparation system includes one or more visible spectrum cameras, one or more infrared cameras, sensors, visual projectors, and servers or computers. The system may also include items such as displays, scales, temperature transducers, wireless communication systems, speakers, microphone, 3D sensors or 3D camera, other types of sensors, lighting, and physical input devices. Headsets with augmented/virtual reality capabilities (hereafter referred to as “AR glasses”) may also be used for projecting the instructions onto a food item or equipment.

In embodiments, a view or representation of the food item is shown on a display, and the view or representation is enhanced with instructions or information applicable to preparing the food item. The enhanced information may be located on the view of the food item, or in close proximity, or otherwise associated with the representation of the food item.

In embodiments, a food preparation system includes the following elements: visible spectrum cameras for visual recognition, infrared cameras for visual recognition and real-time temperature estimates, 3D sensors for visual recognition, computers for processing information, a laser projection system, and instructions projected by the laser projection system onto food items or kitchen equipment such as a commercial grill. Optionally, the food preparation system is connected to the internet, KDS, POS, and/or other communication systems to send and receive information.

The types of information used by the system can include a wide variety of information. In embodiments, the computer is operable with one or more of the following types of information: restaurant and order information, sensor and camera data, recipe information, kitchen equipment information, and food item inventory information. In a preferred embodiment, the computer is operable to recognize and locate the food item using the sensor and camera data, and to maintain a current state of the system including the status of food items for various types of information.

In embodiments, the food preparation system receives orders from customers, and then, using knowledge of the current state of the grill, information collected by cameras and other sensors, recipes for various food items, and other information, projects meaningful visual information onto work areas (either directly with a laser or virtually with AR glasses or an AR glass panel) or directly on to food items to help guide kitchen workers. The food preparation system automatically recognizes and locates food items being prepared. In alternate embodiments, the instructions are not projected, the instructions are shown to the human worker on a display.

In embodiments, the system is integrated with the restaurant POS or with the printer that prints paper tickets, i.e., the system translates the data sent to the printer (for example, PostScript) back into order information.

In embodiments, the system projects individual steps, then monitors for completion of those steps by human kitchen workers, and then projects another step. For example, the system may instruct the kitchen worker to add a raw hamburger patty to the grill, then detect the presence of the patty, and then project another instruction to the worker.

In embodiments, a food preparation system for workers in the grill and assembly areas of quick serve restaurant (QSR) kitchens is trained to prepare the following menu items: Hamburgers, Cheeseburgers, Chicken sandwiches, and Veggie burgers.

In embodiments, a food preparation system is comprised of a combination of the following components: an order management module (e.g., a software module to collect data on customer orders); cameras (e.g., a camera to collect visual information on the status of food items, work areas and workers); 3D sensors (e.g., to support identification and analysis of food items, work areas, and workers); a projection system (e.g., to display visual information directly or indirectly on a work surface to kitchen workers to guide their actions); microphone and speakers (e.g., to communicate with kitchen workers and take verbal input); a video display and/or touchscreens (e.g., to display visual information to kitchen works and, in the case of the touchscreen, to receive input); key pads and other physical input devices (e.g., to receive input from kitchen workers); computational engine (e.g., a programmed processor configured to use information from the order management module and cameras along with specifications for recipes, cooking times, etc. and knowledge of the current state of various food items and to compute instructions for kitchen workers that will be displayed through the projection system); and sensors (e.g., to collect data to support calculations by the computational engine).

Embodiments of the food preparation system can be distinguished from a traditional kitchen display system by the projection of information directly onto the work areas (whether directly or via AR glasses or AR glass panels/screens), and in embodiments, by the projection of the information onto the ingredients or food items themselves. Embodiments can also be distinguished by monitoring the work area and then automatically projecting instructions or helpful information to the kitchen worker when a food item is detected. Embodiments can also be distinguished by projecting food preparation steps necessary to complete an order rather than just individual elements of a customer order.

Embodiments of the food preparation system automatically recognize and locate items that are placed on the grill and display steps to complete preparation of the food items according to the recipe for the food item. In embodiments, recipe information provided in advance to the system are used to compute the steps.

In embodiments, the system is operable to automatically recognize food items, determine from recipe and/or order information the desired estimated internal temperature that item should be cooked to (or similar variable or intermediate variable that would provide such information), take real time temperature measurements, determine when an item is done, and project information to the kitchen worker to take appropriate action.

In embodiments, a food preparation system is operable to: automatically recognize when food items are placed on a cooking surface; automatically track real time status of the food items (e.g., how long the food items have been on the grill); provide instructions to the food preparation worker based on information about the food items from the tracking step.

In embodiments, a food preparation system is operable to project real time status information on food items by, e.g., using an IR camera to estimate the temperature of the food item to help workers know when it is done.

In embodiments, a food preparation system is operable to automatically recognize when an item is placed on the grill, flipped, or moved. In particular embodiments, the automatic recognition is performed using cameras and neural networks to know when the item has actually been placed on the grill, flipped, or otherwise moved.

In embodiments, a food preparation system is operable to automatically recognize food items and food preparation steps completed; to track, in real time, the status of food items, and to display or project real time information on subsequent food preparation steps to be performed (in contrast to a mere request for a customer order or an individual element of a customer order to be completed).

The data employed by the system to recognize and locate items may be obtained from a wide variety of cameras or sensors including one or more of the following: visible spectrum cameras, IR cameras, motion sensors, TOF sensors, structured light sensors, depth sensors, and 3D sensors.

In embodiments, the food preparation steps include where to place items on a grill, when to flip a certain item, or when to remove a certain item from a certain location on the grill.

Embodiments of the food preparation system are capable of robustly displaying information in kitchen environments in a manner convenient to restaurant employees, and capable of automatically and robustly identifying and localizing relevant items in a kitchen environment. Relevant items to identify and locate include, but are not limited to, food items, food preparation equipment, utensils, and restaurant workers. In some implementations, a convolutional neural network (CNN) are used for the visual recognition.

Embodiments of the food preparation system include algorithms that take as input order information, recipe information, current state of all elements of food preparation system comprising food items, commercial equipment, restaurant workers, food storage bins, cleanliness of grill, and past orders, and outputs instructions to human workers.

In embodiments, a food preparation system is operable to automatically recognize when a food item is put on the grill, start a timer, and project how long the item has been cooking on the grill. The system can optionally be integrated with the POS. In other embodiments, the system is not integrated into the POS and automatically recognizes when a food item is added and displays the corresponding food preparation steps and/or information to assist in food preparation, e.g., the temperature of the food item.

In embodiments, a food preparation system is operable to recognize a food item; project when to flip the food item; and to project when the food item is finished cooking and should be removed.

In embodiments, a food preparation system is operable to automatically assess time to flip a food item, and time to remove the food item, based on a volumetric analysis on the food item. The food preparation system further provides the instructions on flipping and removing the food item. In preferred embodiments, the instructions are provided by a projection system.

In embodiments, the method automatically recognizes and locates a food item; determines from recipe data on the food item steps necessary to prepare the food item; projects a first step to the kitchen worker; identifies when the first step has been completed; and projects a second step subsequent to the first step to the kitchen worker. In particular embodiments, prior to the step of automatically recognizing, the method receives a customer order. The system may constantly monitor the food items so that even if the food items are moved, an applicable or correct subsequent step is displayed or projected to the kitchen worker for the food item.

In embodiments, the food preparation steps are carried out without input from the restaurant POS or customer orders. The determination of cooking time and step time is automatic and doesn't require any manual input from the human kitchen or restaurant worker.

In embodiments, a food preparation system is operable to automatically track multiple items simultaneously and present the information in a list, table, or matrix to the worker.

In embodiments, a method for preparing food comprises aiming one or more sensors at a food preparation area; inspecting the food preparation area using the sensors to obtain image data information; and determining identity and position information of the food item or food preparation item based on the image data information from the inspecting step.

In embodiments, the step of aiming the combination of sensors includes aiming an infrared (IR) camera, RGB camera, and/or depth sensor at the food preparation area.

In embodiments, the method further comprises processing the image data from the combination of sensors to obtain combined image data with data from each sensor registered in the same coordinate system at each point in the image.

In embodiments, the step of determining is performed using a trained neural network.

In embodiments, the method further comprises determining an output to command a robotic arm, instruct a kitchen worker, a projector, or otherwise assist in food preparation.

In embodiments, the command or instructions are shown on a display.

In embodiments, the food preparation item is an item selected from the group consisting of kitchen implement, and a kitchen worker, or appendage of the kitchen worker.

In embodiments, a food preparation system that assists a kitchen worker in the preparation of a food item comprises a combination of sensors to inspect a food preparation area in the kitchen environment. The combination of sensors includes an Infrared (IR) camera that generates IR image data and a second sensor adapted to generate second image data. The system further includes a processor to pre-process the IR image data and second image data into combined image data. The processor is further operable to automatically recognize at least one food item or food preparation item.

In embodiments, the combination includes the IR camera, a visible spectrum camera, and a depth sensor. The processor is operable to combine the data from the combination of sensors and cameras.

In embodiments, the processor is operable to determine a position estimate of the at least one food item or food preparation item.

In embodiments, the processor is operable to automatically recognize objects in the food preparation area including food items, kitchen implements, a kitchen worker, or an appendage of a kitchen worker.

In embodiments, the processor is further operable to compute an output based on the combined image data, wherein the output comprises a probability that at least one food item or food preparation item is present in a particular region of the combined image data.

In embodiments, the processor employs a trained convolutional neural network.

In embodiments, an automated food preparation system comprising any one or more of the components described herein.

In embodiments, a food preparation method comprising any one or more of the steps described herein.

The guidance provided by embodiments of the food preparation system can be used to create one or more benefits, including but not limited to: improving throughput, reducing waste, increasing capacity, reducing training requirements for workers, improving product quality, and improving product consistency.

These advantages as well as other objects and advantages of the present invention will become apparent from the detailed description to follow, together with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a food preparation system in a kitchen environment;

FIG. 2 schematically depicts an example architecture of an automated food preparation system;

FIG. 3 is a flow diagram of a method for projecting an instruction onto a location in the kitchen workspace viewable by the kitchen worker;

FIGS. 4A-4B depict various software modules of a system for projecting an instruction onto a location in the kitchen workspace in accordance with embodiments of the invention; and

FIG. 5 is a flow diagram of a method for processing data from multiple sensors in accordance with an embodiment of the invention.

DISCLOSURE OF THE INVENTION

Before the present invention is described in detail, it is to be understood that this invention is not limited to particular variations set forth herein as various changes or modifications may be made to the invention described and equivalents may be substituted without departing from the spirit and scope of the invention. As will be apparent to those of skill in the art upon reading this disclosure, each of the individual embodiments described and illustrated herein has discrete components and features which may be readily separated from or combined with the features of any of the other several embodiments without departing from the scope or spirit of the present invention. In addition, many modifications may be made to adapt a particular situation, material, composition of matter, process, process act(s) or step(s) to the objective(s), spirit or scope of the present invention. All such modifications are intended to be within the scope of the claims made herein.

Methods recited herein may be carried out in any order of the recited events which is logically possible, as well as the recited order of events. Furthermore, where a range of values is provided, it is understood that every intervening value, between the upper and lower limit of that range and any other stated or intervening value in that stated range is encompassed within the invention. Also, it is contemplated that any optional feature of the inventive variations described may be set forth and claimed independently, or in combination with any one or more of the features described herein.

All existing subject matter mentioned herein (e.g., publications, patents, patent applications and hardware) is incorporated by reference herein in its entirety except insofar as the subject matter may conflict with that of the present invention (in which case what is present herein shall prevail). Amongst other patent applications and patents listed herein, provisional patent application no. 62/467,735, filed Mar. 6, 2017, and entitled “VISUAL INSTRUCTION DISPLAY SYSTEM TO ENHANCE EFFICIENCY OF WORKERS” is incorporated herein by reference in its entirety.

Reference to a singular item, includes the possibility that there are plural of the same items present. More specifically, as used herein and in the appended claims, the singular forms “a,” “an,” “said” and “the” include plural referents unless the context clearly dictates otherwise. It is further noted that the claims may be drafted to exclude any optional element. As such, this statement is intended to serve as antecedent basis for use of such exclusive terminology as “solely,” “only” and the like in connection with the recitation of claim elements, or use of a “negative” limitation.

It is also to be appreciated that unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Food Preparation System Overview

FIG. 1 depicts a food preparation system 100 for projecting instructions to a location in the kitchen environment viewable by kitchen workers in accordance with an embodiment of the invention. Particularly, the food preparation system 100 is shown including: cameras 110 for visual recognition, a laser projection system 120, various food items 130, viewable instructions 140 projected by the laser projection system onto the food items or onto the commercial grill 150 as desired. Additional components or features can be included (or operate) with the food preparation system (not shown) including but are not limited to: a computer for processing information, an Internet connection, a point of sale system, a kitchen display system (KDS), local and remote server(s), and a human worker to read the instructions and act accordingly.

In preferred embodiments, the food preparation system takes or receives orders from a system that collects customer orders, and then, as described further herein, projects meaningful visual information onto the work areas (either directly with a laser, for instance, or virtually with AR glasses) to help guide kitchen workers in the preparation of food items. The food preparation system is operable to determine which instructions to project (and when and where to project the instructions) based on various types of information including knowledge of the current state of the grill, status of food items being prepared, information collected by cameras and other sensors, recipes for various food items, past orders, and other information. In embodiments, the system automatically monitors the work area for evidence that the step has been completed and then projects a next step.

It is also to be understood that the food preparation system may be used with a wide variety of kitchen equipment including but not limited to various types of grills, stove tops, assembly areas, areas for “prepping” food items, e.g., cutting and dicing, packaging areas, and storage areas as desired. Examples of grills include a standard electric grill with a cooking surface that is 24 inches deep and 48 inches wide. Such a grill can be used for preparation of burger patties, chicken, steaks, and onions. Another electric grill is 24 inches deep and 24 inches wide that is used for the preparation of veggie patties and buns. Another type of equipment is an assembly area and packaging area that contains work surfaces and bins with the various toppings that the customer can choose from. Yet another equipment is storage for the various ingredients. Indeed, the invention described herein may be used with a wide variety of kitchen equipment as desired.

FIG. 2 schematically depicts an example architecture of a food preparation system 210 in accordance with an embodiment of the invention. System 210 is shown including a computer 212 with storage 214, a processor 216, and communication interface 218. The storage may be preloaded with various information relating to recipe data, kitchen equipment parameters, and computer instructions for the processor to read and perform.

System 210 is also shown including sensors 222 which, as described further herein, can be used in combination with the cameras 220 to recognize and locate various food items as well as determine doneness of a food item being cooked.

In embodiments, three cameras are employed. A first camera is trained on a first type of grill such as a 24×48 inch grill. The first camera can be mounted approximately 48 inches up from the back surface of the grill, at the midpoint of the grill width and set back approximately 6 inches.

A second camera can be trained on a second type of grill such as the 24×24 grill and positioned similarly to that described in connection with the first camera.

A third camera may be located in the assembly area and trained on the assembly area.

Examples of cameras include but are not limited to the Blackfly 2.3 MP Color USB3 Vision with Sony Pregius IMX249 sensors with Fujinon CF12.5HA-1 lens with focal length of 12.5 mm, each of which is commercially available. However, other cameras may be used. Additionally, in embodiments, supplemental lighting and/or bandpass filters may be employed to improve the quality of images captured by the cameras.

Additionally, other types of visual sensors may be used that provide additional information, e.g., depth sensing systems incorporating projected infrared grids, such as the one used by the Xbox Kinect v1 sensor, and/or depth sensing system employing time-of-flight technologies, such as the one used by Xbox Kinect v2, developed by PrimeSense, manufactured by Microsoft Corporation.

Additionally, 3D, Lidar and ultrasonic-based sensors can be employed to provide data to locate and identify food items, workers, and equipment.

In embodiments, the plurality of sensors includes a visible spectrum camera (e.g., a black and white, or RGB camera), a depth sensor, and an infrared (IR) camera.

The infrared or IR camera generates IR image data by measuring the intensity of infrared waves and providing data representing such measurements over the observed area. In embodiments, the focal length of the camera lens and orientation of the optics has been set such that the area imaged includes the work area. Preferably, the IR camera is adapted to measure the intensity of IR waves over an area and generates IR image data. Preferably, the IR wavelength ranges from 7.2 to 13 microns, but other wavelengths in the IR may be used. An exemplary IR sensor is the CompactPro high resolution thermal imaging camera manufactured by Seek Thermal Corporation (Santa Barbara, Calif.), which can provide an image of size 320×240 with each value a 16-bit unsigned integer representing measured thermal intensity.

In embodiments, the visible spectrum camera is an RGB camera to generate image data. The RGB image comprises a 960 by 540 grid with intensity data for red, green, and blue portions of the spectrum for each pixel in the form of 8-bit unsigned integers. In embodiments, the focal length of the camera lens and orientation of the optics have been set such that the area imaged includes the work surface. An exemplary visible spectrum camera is the Kinect One sensor manufactured Microsoft Corporation (Redmond, Wash.).

A depth sensor incorporates a time of flight (TOF) camera to generate data on the distance of each point in the field of view from the camera. The TOF camera is a range imaging camera system that resolves distance based on the known speed of light, measuring the time-of-flight of a light signal between the camera and the subject for each point of the image. In embodiments, the image comprises a 960 by 540 grid with a value of the distance from the sensor for each point in the form of a 16-bit unsigned integer. An exemplary depth sensor is the Kinect One sensor manufactured Microsoft Corporation (Redmond, Wash.).

Without intending to be bound to theory, we have discovered the IR camera sensors providing IR image data have the potential to mitigate or overcome the above-mentioned shortcomings associated with conventional automated cooking equipment. Due to the temperature differences typically present when an uncooked food is placed on a hot grill or other high temperature cooking surface or when a kitchen worker or kitchen worker's appendage is imaged against a predominantly room temperature background, IR camera sensors are able to provide high contrast and high signal-to-noise image data that is an important starting point for determining identity and location of kitchen objects, including food items, food preparation items and human workers. In contrast, the signal-to-noise ratio is significantly lower using only traditional RGB images than if using IR images. This occurs because some kitchen backgrounds, work surfaces, and cooking surfaces can be similar to food items in color, but temperatures are generally significantly different. Based on the foregoing, embodiments of the invention include IR-camera sensors in combination with other types of sensors as described herein. Use of IR sensors for assisting with food preparation is also described in provisional patent application no. 62/592,130, filed Nov. 29, 2017, and entitled “AN INFRARED-BASED AUTOMATED KITCHEN ASSISTANT SYSTEM FOR RECOGNIZING AND PREPARING FOOD AND RELATED METHODS”, incorporated herein by reference in its entirety.

FIG. 2 also shows human input 224 which may be in the form of a keyboard, touchscreen display or another means to provide input to the system. If available, a point of sale (POS) 226 can be coupled to the computer 212 which sends real time customer order information to the system.

Projection System

As described herein, the invention is directed to projecting instructions to the kitchen worker via a projection system. A wide range of projection systems may be employed to project the instructions onto a location or food item in the kitchen workspace.

For example, projection of visual information can be performed via a laser projection system such as the Clubmax 800 from Pangolin Laser Systems, Inc. Such commercially available laser projection systems comprise a laser source, mirrors, galvanometer scanners, various electronic components, and other optical components capable of projecting a laser beam at a given area in the system's field of view. One advantage of such a laser projection system is that they have, over the ranges under consideration for this application, essentially infinite depth of field.

Other methods may be used to perform the projection including, but not limited to, augmented reality (AR) glasses and digital light processing (DLP) projectors. In embodiments, AR glasses are employed with beacons placed on or near the grill to properly orient the projected visual information. Such beacons can be rigidly attached to a fixed surface, preferably the grill, and emit visual, infrared, or ultraviolet light that is recognized by sensors in the AR glasses. An example of a beacon is shown in FIG. 1, represented by reference numeral 160. Additional examples of AR type glasses include, without limitation, the Microsoft HoloLens, Google glass, Oculus rift. See also U.S. Pat. Nos. 9,483,875 and 9,285,589.

The eyepiece's object recognition software may process the images being received by the eyepiece's forward-facing camera in order to determine what is in the field of view.

In other embodiments, the GPS coordinates of the location as determined by the eyepiece's GPS are used to determine what is in the field of view.

In other embodiments, an RFID or other beacon in the environment may be broadcasting a location. Any one or combination of the above may be used by the eyepiece to identify the location and the identity of what is in the field of view.

In embodiments, a DLP-type projector is employed to project the instructions. For example, a DLP projection system rated at 1500 lumens, mounted 4 feet above the surface of the grill at the midpoint of its long axis on the back side, so as to be out of the way of the workers operating the grill, can be used. The DLP projector uses optics to align the focal plane of the image with the grill surface. With these optics in place, the DLP projector is able to project images onto the grill surface that are easily readable by workers operating the grill.

Instructions may be projected on a wide variety of locations whether flat or irregularly shaped. In embodiments, instructions may be projected onto food items using standard video projector and projection mapping software that uses 3D knowledge of work surface to be projected upon to modify projected image so that it appears clear to the worker. Such software uses data on the orientation and shape of the surface to be projected upon, such as data provided by a 3D sensor and modifies the projected image accordingly.

Indeed, a wide range of projection systems may be incorporated into the food preparation system described herein and the invention is only intended to be limited as recited in the appended claims.

With reference again to FIG. 2, other modalities to deliver instructions or output to the kitchen worker include, for example, audio 240 or a display 250. For example, the computed instructions may be communicated and shown to the worker via the restaurant's KDS.

FIG. 3 is a flowchart illustrating an overview of a food preparation assistance method 300 for projecting food preparation instructions to a kitchen worker in accordance with an embodiment of the invention.

Step 310 states to receive customer order. Order data can be electronically collected from a POS system using any of a number of techniques, including, but not limited to: querying a remote server that collects order data, querying a local server that collects order data, and intercepting data sent to printer to create order tickets. Preferably, the order data is collected from a local or remote server via communication interface.

Step 320 states to compute an instruction for a kitchen worker to perform a food preparation step. The computation carried out in step 320 is performed by a processor programmed with software and is based on a number of types of information and/or data including, but not limited to, camera and sensor data, the current state of the food items, customer order information from step 310, recipe information, kitchen equipment information, food inventory information, and estimates of upcoming demand based on such items as historical order information correlated to relevant variables such as day of the week, time of day, presence of holiday, etc.

Step 330 states projecting the instruction onto a location in the workspace and viewable by the kitchen worker. As described further herein in connection with the projection system, an instruction can be projected (directly or virtually) onto the grill or a food item to indicate to the kitchen worker the food item needs to be manipulated (e.g., removed from the grill, flipped, etc.)

Step 340 states to update the state of food items, customer order, recipe, kitchen equipment, and food item inventory information. As described further herein, in embodiments, a kitchen scene understanding engine or module computes and updates a state of the food items in the kitchen using data from the sensors and cameras, as well as input from other means.

In embodiments, the state system monitors the time a food item is cooked based on an internal clock. The system automatically detects the food item placed on the grill using camera data, and automatically commences an internal electronic stopwatch type timer.

In other embodiments, the state system monitors the surface temperature of the food item being cooked based on IR camera readings. Yet in other embodiments, the state system monitors the internal temperature of the food item based on extrapolating or applying predictive algorithms to the a) surface temperature data of the food item, b) volumetric analysis of the food item based on the visual camera data, and/or c) temperature data of the grill surface. The system is updated and computes instructions once a threshold condition is met such as time elapsed, internal temperature target (e.g., a threshold temperature for a minimum amount of time), or surface temperature target (e.g., a threshold temperature for a minimum amount of time).

FIG. 4A depicts an overview of various software modules or engines 400 in accordance with an embodiment of the invention and which are run on the processor or server(s) described herein. FIG. 4A shows a Kitchen Scene Understanding Engine (KSUE) 430 receiving camera and sensor data 420. As described further herein, the KSUE computes the food item type and location information.

FIG. 4A also shows Food Preparation Supervisory System (FPSS) 494 receiving the location information from the KSUE. The FPSS keeps track of the state of the system and is operable, as described further herein, to compute instructions to present to the kitchen worker. The FPSS is a computational engine utilizing a wide range of inputs 410 such as but not limited to state of all relevant items, recipe, inventory, POS and order information, human inputs, and kitchen equipment specifications. The FPSS also accepts and is updated with the type and location of the food items and kitchen objects by the KSUE 430. Particularly, the system automatically detects the items, and updates the system state accordingly with the updated information.

FIG. 4A shows output information 412 including a restaurant display, restaurant KDS, data log, and in some embodiments, a robotic kitchen assistant (RKA) to carry out food preparation steps.

After the instructions for the kitchen worker area computed by the FPSS 494, the instructions are delivered to the projection system 496 to either virtually or actually project the instructions on the food item or kitchen location.

Now with reference to FIG. 4B, an expanded view of the Kitchen Scene Understanding Engine is shown. Particularly, the sensor data 420, including cameras and IR image data arising from viewing objects in the kitchen environment, is provided to the kitchen scene understanding engine 430. The sensor image data 420 is pre-processed 440 in order that the multi-sensor image data are aligned, or registered into one reference frame (e.g., the IR image reference frame).

The combined image data serves as an input layer 450 to a trained convolutional neural network (CNN) 460.

As shown with reference to step 460, a CNN processes the image input data to produce the CNN output layer 470. In embodiments, the CNN has been trained to identify food items and food preparation items, kitchen items, and other objects as may be necessary for the preparation of food items. Such items include but are not limited to human workers, kitchen implements, and food.

For each set of combined image data provided as an input layer to the CNN, the CNN outputs a CNN output layer 470 containing location in the image data and associated confidence levels for objects the CNN has been trained to recognize. In embodiments, the location data contained in the output layer 470 is in the form of a “bounding box” in the image data defined by two corners of a rectangle.

As described further herein, one embodiment of the CNN 460 is a combination of a region proposal network and CNN. An example of region proposal network and CNN is described in Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Faster”, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 39 Issue 6, June 2017, which is hereby incorporated by reference in its entirety. Examples of other types of convolution neural networks are described in Patent Publication Nos. US 20170169315 entitled “Deeply learned convolutional neural networks (cnns) for object localization and classification”; 20170206431 entitled “Object detection and classification in images”, and Pat. No. 9,542,621 entitled “Spatial pyramid pooling networks for image processing”, each of which is herein incorporated by reference in its entirety.

Optionally, the accuracy of the object's location within the image may be further computed. In some embodiments, for example, IR image data measured within the area defined by the bounding box taken from the CNN output layer is further processed to more accurately determine an object's location. Techniques to do so include various computer vision and segmentation algorithms known in the art such Ohta, Yu-Ichi, Takeo Kanade, and Toshiyuki Sakai. “Color information for region segmentation.” Computer graphics and image processing 13.3 (1980): 222-241; and Beucher, Serge, and Fernand Meyer. “The morphological approach to segmentation: the watershed transformation.” Optical Engineering-New York-Marcel Dekker Incorporated-34 (1992): 433-433.

In some embodiments, determining location information includes determining information on orientation including angular position, angle, or attitude.

It is to be appreciated that the direct incorporation of the IR image data into the image data that, along with the RGB and depth data, makes up the input layer 450 to the CNN 460 improves the performance of the system. Although determining exactly why the inclusion of a given sensor improves the capabilities of a CNN is challenging because of the nature of CNNs, we conjecture, and without intending to be bound to theory, that the IR data offer higher signal-to-noise ratios for certain objects of a given temperature in a kitchen environment where such objects are often placed on work surfaces or imaged against backgrounds with significantly different temperatures. In cases where the CNN is used to recognize foods by the extent to which they are cooked, the IR data provides helpful information to the CNN on the thermal state of the food item and work surface, which can be a cooking surface.

With reference again to FIG. 4B, the CNN output layer 470 is then further processed to translate the location data of the identified objects given in the two dimensional coordinate system of the image into a three dimensional coordinate system such as a world coordinate frame or system reference frame. In embodiments, the world coordinate frame is the same frame used by ancillary apparatus such as robotic kitchen assistants or other automated food preparation and cooking devices. Step 480 may be carried out using standard transformations as is known to those of ordinary skill in the art.

The resulting object output vector 490 represents a single observation on the presence of a food or other type of object. Particularly, the object output vector 490 contains the location of recognized objects in the 3D or world coordinate frame and a confidence level that each such recognized object is the object the CNN has been trained to identify.

In embodiments, the output vector comprises instances of known food items and the degree that each is cooked (namely, “degree of doneness”). In embodiments, the measure of cooking is the internal temperature of the object, such as a steak cooked to medium rare corresponding to an internal temperature of 130 to 135 degrees Fahrenheit. In embodiments, the CNN is trained to detect not just individual objects and their location, but the internal temperature of the objects. Measurements of the internal temperature of the food item can be taken with temperature sensors and used in the output vector for the training of the CNN. In some embodiments, these temperature measurements are taken dynamically by a thermocouple that is inserted into the food item.

In embodiments, an alternate or additional thermal model is used to track the estimated internal temperature of various food items to determine when they are cooked to the appropriate level. In these cases, data can be provided by the Kitchen Scene Understanding Engine on how long the various items have been cooked and their current surface temperature and or temperature history as measured by the IR camera.

Kitchen Bayesian Belief Engine 492 receives the object output vector 490 and assembles/aggregates the real-time continuous stream of these vectors into a set of beliefs which represents the state of all recognized food and kitchen implements in the kitchen area. In a sense, the output of the engine 430 is an atlas or aggregated set of information on the types of food, kitchen implements, and workers within the work space. An example of a final set of beliefs is represented as a list of objects that are believed to exist with associated classification confidences and location estimates, and in embodiments internal temperatures.

A Food Preparation Supervisory System 494 is shown receiving the updated food item state information from the Kitchen Scene Understanding Engine 430. As described above with reference to FIG. 4A, the Food Preparation Supervisory System 494 also can receive data from the order management system, inputs from workers, and computes visual instructions for the grill and assembly workers. The visual instructions are sent to the projection system as described above to instruct the worker accordingly. This Food Preparation Supervisory computation engine 494 has recipes on all the various food items and estimates (and in embodiments, may assist in defining or creating the estimates) for the time required to perform all relevant steps in the process. This computation engine preferably tracks the state of all relevant items, including but not limited to: grill temperature, items currently on grill, items in assembly area, total number of items processed, number of workers in the assembly area, tasks being performed by workers in assembly area, grill temperature, grill cleanliness, ingredient levels in storage bins, and inputs from restaurant workers.

The computational engine then sends the computed instructions to the projection system for viewing by the kitchen worker. In embodiments, commands are sent to a robotic kitchen assistant and optionally to the KDS or other displays and data logs.

FIG. 5 Pre-Processing Sensor Data

As stated above, in embodiments, the invention collects data from multiple sensors and cameras. This data is preferably pre-processed prior to being fed to the convolutional neural network for object recognition. With reference to FIG. 5, a flow diagram serves to illustrate details of a method 500 for pre-processing the data from multiple sensors in accordance with an embodiment of the invention.

Step 510 states to create multi-sensor point cloud. Image data from RGB and depth sensors are combined into a point cloud as is known in the art. In embodiments, the resulting point cloud is a size of m by n with X, Y, Z, and RGB at each point (herein we refer to the combined RGB and depth image point cloud as “the RGBD point cloud”). In embodiments, the size of the RGBD point cloud is 960 by 540.

Step 520 states to transform the multi-sensor point cloud to the IR sensor coordinates. The process of transforming an image from one frame to another is commonly referred to as registration (see, e.g., Lucas, Bruce D., and Takeo Kanade. “An iterative image registration technique with an application to stereo vision.” (1981): 674-679). Particularly, in embodiments, the RGBD point cloud is transformed into the frame of the IR camera using extrinsic transformations and re-projection. In embodiments, because the field of view of the RGB and depth sensors is larger than the field of view of the IR sensor, a portion of the RGB and depth data is cropped during registration and the resulting RGBD point cloud becomes 720 by 540.

Step 530 states to register the multi-sensor point cloud to the IR sensor data and coordinates. The transformed RGBD point cloud is registered into the IR frame by projecting the RGBD data into the IR image frame. In embodiments, the resulting combined sensor image input data is 720 by 540 RGBD, and IR data for each point. In embodiments, values are converted to 8-bit unsigned integers. In other embodiments, the registration process is reversed and the IR image is projected into the RGBD frame.

In embodiments with multiple sensors, including an IR camera, the registration of the data from the various sensors simplifies the training of the CNN. Registering the IR data and the RGB and depth data in the same frame of reference converts the input (namely, the image input data 450 of FIG. 4B) into a more convenient form for the CNN, improving the accuracy of the CNN to recognize food items and/or reducing the number of labeled input images required to train the CNN.

Following step 530, the registered multi-sensor image data is fed into the CNN, described above in connection with the Kitchen Scene Understanding Engine.

EXAMPLE

Described below is a prophetic example for illustrating the steps and aspects of the subject invention.

Initially, the processor or computer is configured to receive customer order information and operable to execute or run the software engines and modules described above.

When a first order is received the system evaluates the preparation time of each of the various items. The calculation of preparation time can include assembly and packaging.

Once preparation time is calculated, the system identifies the sequence of steps to cook the items so as to: (a) Minimize the time between completion of the food item and transfer of the food item for pick-up by the consumer; and (b) Ensure the even flow of completed items to the assembly area (based on a range of variables, including but not limited to, activities of current staff).

Once the computer has selected the initial item to be cooked, the system projects the representative symbol onto the grill which is an instruction for the grill operator to put the chosen food item onto the image on the grill. In embodiments, the symbols are as follows: “C” for chicken; “B” for burger patty; “V” for vegetable patty; and “N” for bun. Other abbreviations and symbols and insignia may be used to represent the food item.

The system selects the specific location of the area onto which to project the symbol by dividing up the grill space as follows: (a) The 24 inch deep ×48 inch wide grill is divided into a grid of squares, with 4 squares on the short edge and 8 squares on the long edge; (b) The 24 inch deep ×24 inch wide grill is divided into two separate grids of squares. The first grid runs the full depth and half the width of the grilling surface and is comprised of 8 squares in 4 rows and 2 columns. The second grid runs the full depth and the other half of the width of the grill surface and consists of 3 columns of 6 squares. Other grid dimensions may be used.

The squares on the grills can be denoted by their row and column position, with the bottom left square being denoted as e.g., (1,1). Further, squares on the 24×48 grill can include the letter A and shall be denoted as e.g., A(1,1). Squares on the 24×24 grill can include the letter B and shall be denoted as e.g., B(1,1).

For the first item to be cooked, the system selects the appropriate position based on which grill is to be used. Subsequent items are then placed to fill out rows. The specific locations may be chosen in order to optimize throughput and minimize errors.

The system monitors the necessary preparation time and cooking steps for each item and signals appropriate instructions to the worker.

The following instructions are used: “F” to flip the item; “O” to apply grilled onions to the item; and “A” to apply cheese to the item.

The letter number-combinations X1, X2, X3, X4, . . . , and X8 shall mean the instruction for the kitchen worker to remove the food item and put it into any one of eight different positions for later assembly.

For this example, a first order shall consist of: (a) Cheeseburger with grilled onions, lettuce, tomato, and special sauce; (b) Chicken sandwich with cheese, grilled onions, and special sauce; (c) Hamburger with lettuce, tomato, and special sauce.

A second order which comes in 3 minutes after the first order shall consist of: (a) Chicken sandwich with cheese, grilled onions, and special sauce; (b) Hamburger with lettuce, tomato, and special sauce; and (c) Hamburger with lettuce, tomato, and special sauce.

Cooking times may be pre-stored, uploaded with recipe data, or otherwise input into storage. In this example, the cooking times for the various items are as follows:

Order 1 cheeseburger 4 minutes Order 1 chicken sandwich with cheese 7 minutes Order 1 hamburger 4 minutes Order 2 chicken sandwich with cheese 7 minutes Order 2 hamburger 4 minutes Order 2 hamburger 4 minutes

Upon receiving the initial order, the system calculates the total production time of each item.

The system looks at the state of the all relevant elements of the food preparation system, recipes, time required for each step, available restaurant worker capacity, and other variables and determines an optimal cooking order and set of steps that are then communicated to grill workers via the projection system.

In this example, the system signals that the chicken is placed on the grill and cooked for one minute, followed by the two hamburgers. The system signals when to flip each item and put on cheese.

When the second order comes in, there are three items on the grill, and the chicken needs to be flipped in 30 seconds. The system calculates that there is sufficient time for the worker to place the new chicken patty on the grill and communicates that step to the grill worker.

Alternate Embodiments

In embodiments, instead of a predetermined time to cook a specific food item, one or more of the sensors and cameras are operable to detect placement of the food item on the grill, and to determine doneness of the food item. Particularly, in embodiments, IR data is used to determine whether a food item is done cooking by comparing an estimated internal temperature to a maximum or target internal temperature for the food item. The computational engine is operable to receive the IR data and trained to predict the internal temperature of the food item based on the IR data, and to evaluate whether the recognized food item is done. Alternatively, an internal temperature is estimated of the food item based on one or more of the following alone or in combination: grill temperature, time cooked, volumetric analysis of the food item, and IR surface temperature data.

In embodiments, a display (such as monitor or tablet) showing the kitchen location or food item is augmented with the instructions. The processor, cameras, sensors and display operate together to fuse or superimpose the instructions on top of the kitchen item, food item or location. Such super-imposition AR can be used in lieu or in combination with the AR projection described herein. In embodiments, the screen is a transparent screen (e.g., large glass panel) is positioned between the worker and food items or kitchen equipment, and instructions are presented on the screen to appear as if the instructions are located on top of the applicable equipment, utensil, and food item to be manipulated.

In embodiments, a portion of the tasks necessary to prepare various food items may be performed by a robotic kitchen assistant as described in, for example, provisional patent application no. 62/467,743, filed Mar. 6, 2017, and entitled “ROBOTIC SYSTEM FOR PREPARING FOOD ITEMS IN A COMMERCIAL KITCHEN”, and co-pending PCT Application No. ***to be assigned***, inventors D. Zito et al., filed Mar. 5, 2018, entitled “ROBOTIC KITCHEN ASSISTANT FOR PREPARING FOOD ITEMS IN A COMMERCIAL KITCHEN AND RELATED METHODS” corresponding to attorney docket number MIS001PCT, each of which is incorporated herein by reference in its entirety.

The invention described herein delegates some tasks to the human worker and other tasks to the Robotic Kitchen Assistant (RKA). For example, frying or sautéing vegetables such as onions or peppers on the grill may be delegated to the human worker, and operating the deep fryer to cook French fries is performed automatically by the RKA.

Though the invention has been described in context of kitchen environments, the system or aspects of the system may be applied to any process involving directing human workers using projected information in dynamic environments. The invention is only intended to be limited as recited in the appended claims.

Other modifications and variations can be made to the disclosed embodiments without departing from the subject invention.

Claims

1. A food preparation system for preparing a plurality of food items in a commercial kitchen environment, the system comprising:

at least one camera aimed at a kitchen workspace for preparing the plurality of food items;

a processor operable to compute an instruction for a kitchen worker to perform a food preparation step based on data from the at least one camera, order information, and recipe information; and

a projector in communication with the processor and operable to visually project the instruction onto a location in the kitchen workspace for the kitchen worker to observe.

2. The food preparation system of claim 1, wherein the projector is configured to project the instruction on the food item or in close proximity to the food item.

3. The food preparation system of claim 1, wherein the projector comprises a laser for projecting the instruction.

4. The food preparation system of claim 1, wherein the projector comprises AR glasses.

5. (canceled)

6. The food preparation system of claim 1, wherein the instruction comprises at least one of a text, indicia, symbols, figures, or diagrams.

7. (canceled)

8. (canceled)

9. The food preparation system of claim 1, further comprising at least one sensor in addition to the at least one camera.

10. (canceled)

11. The food preparation system of claim 1, further comprising a communication interface to connect with the internet.

12. The food preparation system of claim 1, wherein the processor is in communication with the POS and adapted to receive the order information.

13. The food preparation system of claim 1, wherein the processor is operable to recognize and locate the food item based on data from the at least one camera.

14. (canceled)

15. (canceled)

16. The food preparation system of claim 1, wherein one of the cameras is an IR camera, and wherein the processor is operable to determine next food step based on image data from the IR camera of the food item.

17. (canceled)

18. (canceled)

19. (canceled)

20. (canceled)

21. A method to assist a kitchen worker to prepare a plurality of food items in a kitchen workspace in order to complete a customer order, the method comprising:

receiving the customer order;

computing an instruction for a kitchen worker to perform a food preparation step on the plurality of food items in the kitchen workspace based on order information, camera data, and recipe information; and

projecting the instruction onto a location in the workspace viewable by the kitchen worker.

22. The method of claim 21, wherein the step of projecting is performed directly onto the location.

23. (canceled)

24. (canceled)

25. (canceled)

26. The method of claim 21, further comprising recognizing and locating the food items based on the camera data.

27. The method of claim 26, wherein the step of recognizing and locating is performed with a trained CNN.

28. The method of claim 26, further comprising monitoring a state of the food items.

29. The method of claim 28, further comprising providing an instruction based on a change in the state.

30. (canceled)

31. (canceled)

32. The method of claim 21, further comprising computing the total time to carry out the customer order.

33. (canceled) The method of claim 32, further comprising computing a first set of steps to complete a first customer order including computing a duration for each step, and a time to commence each step.

34. (canceled)

35. The method of claim 21, wherein the step of projecting includes projecting a symbol onto a grill or the food item.

36. (canceled)

37. (canceled)

38. The method of claim 21, further comprising displaying the instruction and wherein the displaying comprises use of an AR display adapted to superimpose instructions onto another location shown in the display.

39. The method of claim 38, wherein the location is on top of a food item.

40. (canceled)

41. (canceled)

42. (canceled)

43. (canceled)

44. (canceled)

45. (canceled)

46. (canceled)

47. A method to assist a kitchen worker to prepare a plurality of food items in a kitchen workspace to complete a customer order, the method comprising:

detecting when a cooking threshold has been reached;

computing an instruction for a kitchen worker to perform a food preparation step on the plurality of food items in the kitchen workspace; and

projecting the instruction onto a location in the workspace viewable by the kitchen worker.

48. The method of claim 47 wherein the threshold is temperature.

49.-88. (canceled)