TRIGGERING IMAGE PROCESSING BASED ON INFRARED DATA ANALYSIS
A method for triggering image processing based on infrared data analysis may include receiving first infrared input data captured using a first group of infrared sensors; analyzing the first infrared input data to detect an engagement of a person with a retail shelf; receiving second infrared input data captured using a second group of infrared sensors after the capturing of the first infrared input data; analyzing the second infrared input data to determine a completion of the engagement of the person with the retail shelf; in response to the determined completion of the engagement of the person with the retail shelf, analyzing an image of the retail shelf captured using an image sensor after the completion of the engagement of the person with the retail shelf; and using the analysis of the image to determine a state of the retail shelf.
This application claims the benefit of priority of U.S. Provisional Application No. 63/113,490, filed Nov. 13, 2020. The foregoing application is incorporated herein by reference in its entirety.
BACKGROUND I. Technical FieldThe present disclosure relates generally to systems and methods for deriving information from sensors in retail environment, and more specifically to systems and methods for deriving information from image, infrared and vibration sensors in retail environment.
II. Background InformationShopping in stores is a prevalent part of modern daily life. Storeowners (also known as “retailers”) stock a wide variety of products in retail stores and add associated labels and promotions in the retail stores. Managing and operating retail stores efficiently is an ongoing effort consuming tremendous resources. Placing cameras in the retail stores, and using image analysis to determine information for enhancing and improving retail stores operation and management is becoming prevalent. However, in large scale, image analysis is still expensive, and the level of details and accuracy of the information derived from the image analysis is still insufficient for many tasks.
The disclosed devices and methods are directed to providing new ways for deriving information in retail stores in an efficient manner.
SUMMARYEmbodiments consistent with the present disclosure provide methods, systems, and computer-readable media are provided for deriving information from sensors in retail environment. Some non-limiting examples of such sensors may include image sensors, infrared sensors, vibration sensors, and so forth.
In some embodiments, methods, systems, and computer-readable media are provided for triggering image processing based on infrared data analysis.
In some examples, first infrared input data captured using a first group of one or more infrared sensors may be received. The first infrared input data may be analyzed to detect an engagement of a person with a retail shelf. Second infrared input data captured using a second group of one or more infrared sensors after the capturing of the first infrared input data may be received. The second infrared input data may be analyzed to determine a completion of the engagement of the person with the retail shelf. In one example, for example in response to the determined completion of the engagement of the person with the retail shelf, at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf may be analyzed. The analysis of the at least one image may be used to determine a state of the retail shelf
In some examples, the first group of one or more infrared sensors may be a group of one or more passive infrared sensors. In some examples, the first group of one or more infrared sensors may be identical to the second group of one or more infrared sensors. In some examples, the first group of one or more infrared sensors may be a group of one or more infrared sensors positioned below a second retail shelf, the second retail shelf is positioned above the retail shelf.
In some examples, the determined state of the retail shelf may include an inventory data associated with products on the retail shelf after the engagement of the person with the retail shelf. In some examples, the determined state of the retail shelf may include facings data associated with products on the retail shelf after the engagement of the person with the retail shelf. In some examples, the determined state of the retail shelf may include planogram compliance status associated with the retail shelf after the engagement of the person with the retail shelf.
In some examples, the analysis of the at least one image and an analysis of one or more images of the retail shelf captured using the at least one image sensor before the engagement of the person with the retail shelf may be used to determine a change associated with the retail shelf during the engagement of the person with the retail shelf.
In some examples, the at least one image sensor may be at least one image sensor mounted to a second retail shelf. In some examples, the at least one image sensor may be at least one image sensor mounted to an image capturing robot.
In some examples, for example in response to the determined completion of the engagement of the person with the retail shelf, the capturing of the at least one image of the retail shelf using the at least one image sensor may be triggered.
In some examples, the first infrared input data may be analyzed to determine a type of the engagement of the person with the retail shelf. Further, in some examples, in response to a first determined type of the engagement, the analyzing the at least one image of the retail shelf may be triggered, and in response to a second determined type of the engagement, analyzing the at least one image of the retail shelf may be forgone.
In some examples, the first infrared input data may be analyzed to determine a type of the engagement of the person with the retail shelf. Further, in one example, in response to a first determined type of the engagement, a first analysis step may be included in the analysis of the at least one image of the retail shelf, and in response to a second determined type of the engagement, a second analysis step may be included in the analysis of the at least one image of the retail shelf. The second analysis step may differ from the first analysis step.
In some examples, the determination of the completion of the engagement of the person with the retail shelf may be a determination that the person cleared an environment of the retail shelf.
In some examples, a convolution of at least part of the first infrared input data may be calculated. Further, in some examples, in response to a first value of the calculated convolution of the at least part of the first infrared input data, the engagement of a person with a retail shelf may be detected, and in response to a second value of the calculated convolution of the at least part of the first infrared input data, detecting the engagement of a person with a retail shelf may be forgone.
In some examples, for example in response to the detected engagement of a person with a retail shelf, one or more images of the retail shelf captured before the completion of the engagement of the person with the retail shelf may be analyzed to determine at least one aspect of the engagement. In one example, a virtual shopping cart associated with the person may be updated based on the determined at least one aspect of the engagement. In one example, the analysis of the at least one image of the retail shelf captured after the completion of the engagement of the person with the retail shelf and the determined at least one aspect of the engagement may be used to determine the state of the retail shelf.
In some embodiments, methods, systems, and computer-readable media are provided for triggering image processing based on vibration data analysis.
In some examples, vibration data captured using one or more vibration sensors mounted to a shelving unit including a plurality of retail shelves may be received. The vibration data may be analyzed to determine whether a vibration is a result of an engagement of a person with at least one retail shelf of the plurality of retail shelves. In one example, in response to a determination that the vibration is the result of the engagement of the person with the at least one retail shelf of the plurality of retail shelves, analysis of at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves may be triggered, and in response to a determination that the vibration is not the result of the engagement of the person with the at least one retail shelf of the plurality of retail shelves, triggering the analysis of the at least one image may be forgone. In one example, information may be provided based on a result of the analysis of the at least one image of the at least part of the plurality of retail shelves.
In some examples, the plurality of retail shelves may include at least a first retail shelf and a second retail shelf. The vibration data may be analyzed to determine that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves. In one example, for example in response to the determination that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves, including images depicting the second shelf in the at least one image may be avoided.
In some examples, the at least one image may be at least one image of the at least part of the plurality of retail shelves captured after a completion of the engagement of the person with the at least one retail shelf. In one example, the vibration data may be analyzed to determine the completion of the engagement of the person with the at least one retail shelf. In one example, one or more images of the at least one retail shelf may be analyzed to determine the completion of the engagement of the person with the at least one retail shelf. In one example, infrared data captured using at least one infrared sensor may be analyzed to determine a completion of the engagement of the person with the at least one retail shelf. In one example, the analysis of the at least one image of the at least part of the plurality of retail shelves may be used to determine a state of at least one retail shelf after the completion of the engagement. For example, the determined state of the at least one retail shelf may include an inventory data associated with products on the at least one retail shelf after the completion of the engagement, the inventory data is determined using the analysis of the at least one image. In another example, the determined state of the at least one retail shelf may include facings data associated with products on the at least one retail shelf after the completion of the engagement, the facings data is determined using the analysis of the at least one image. In yet another example, the determined state of the at least one retail shelf may include planogram compliance status of the at least one retail shelf after the completion of the engagement, and the planogram compliance status may be determined using the analysis of the at least one image. In an additional example, the analysis of the at least one image and an analysis of one or more images of the at least one retail shelf captured using the at least one image sensor before the engagement may be used to determine a change associated with the at least one retail shelf during the engagement.
In some examples, the at least one image may be captured using at least one image sensor mounted to a retail shelf not included in the at least one retail shelf. In some examples, the at least one image may be captured using at least one image sensor mounted to an image capturing robot. In some examples, the at least one image may be captured using at least one image sensor mounted to a ceiling of a retail store. In some examples, the at least one image may be captured using at least one image sensor included in a personal mobile device.
In some examples, for example, in response to the determination that the vibration is a result of the engagement of the person with the at least one retail shelf, capturing of the at least one image of the at least part of the plurality of retail shelves may be triggered.
In some examples, the vibration data may be analyzed to determine a type of the engagement of the person with the at least one retail shelf. In one example, in response to a first determined type of the engagement, a first analysis step may be included in the analysis of the at least one image of the at least part of the plurality of retail shelves, and in response to a second determined type of the engagement, a second analysis step may be included in the analysis of the at least one image of the at least part of the plurality of retail shelves, the second analysis step differs from the first analysis step.
In some examples, the vibration data may be analyzed to determine a type of the engagement of the person with the at least one retail shelf. In one example, in response to a first determined type of the engagement, the analysis of the at least one image of the at least part of the plurality of retail shelves may be triggered, and in response to a second determined type of the engagement, triggering the analysis of the at least one image of the at least part of the plurality of retail shelves may be forgone.
In some embodiments, methods, systems, and computer-readable media are provided for forgoing image processing in response to infrared data analysis.
In some examples, infrared input data captured using one or more infrared sensors may be received. The infrared input data may be analyzed to detect a presence of an object in an environment of a retail shelf. In one example, in response to no detected presence of an object in the environment of the retail unit, at least one image of the retail shelf captured using at least one image sensor may be analyzed, and in response to a detection of presence of an object in the environment of the retail unit, analyzing the at least one image of the retail shelf captured using the at least one image sensor may be forgone.
In some examples, the at least one image sensor may be at least one image sensor mounted to a second retail shelf. In some examples, the at least one image sensor may be at least one image sensor mounted to an image capturing robot. In some examples, the at least one image sensor may be at least one image sensor mounted to a ceiling of a retail store. In some examples, the at least one image sensor may be a part of a personal mobile device.
In some examples, the analysis of the at least one image may be used to determine a state of the retail shelf. In some examples, the environment of the retail shelf may include an area between the at least one image sensor and at least part of the retail shelf. In some examples, the one or more infrared sensors may be one or more infrared sensors physically coupled with the at least one image sensor. In some examples, the one or more infrared sensors may be one or more passive infrared sensors. In some examples, the object may be at least one of a person, a robot, and an inanimate object.
In some examples, the infrared input data may be analyzed to determine a portion of a field of view of the at least one image sensor associated with the object. In one example, in response to a first determined portion of the field of view of the at least one image sensor associated with the object, the at least one image of the retail shelf captured using the at least one image sensor, and in response to a second determined portion of the field of view of the at least one image sensor associated with the object, analyzing the at least one image of the retail shelf captured using the at least one image sensor may be forgone. In one example, the field of view of the at least one image sensor may differ from the field of view of the one or more infrared sensors.
In some examples, the infrared input data may be analyzed to determine a type of the object. In one example, in response to a first determined type of the object, the at least one image of the retail shelf captured using the at least one image sensor may be analyzed, and in response to a second determined type of the object, analyzing the at least one image of the retail shelf captured using the at least one image sensor may be forgone.
In some examples, the infrared input data may be analyzed to determine a duration associated with the presence of an object in the environment of the retail shelf. The determined duration may be compared with a threshold. In one example, in response to a first result of the comparison, the at least one image of the retail shelf captured using the at least one image sensor may be analyzed, and in response to a second result of the comparison, analyzing the at least one image of the retail shelf captured using the at least one image sensor may be forgone. In one example, the threshold may be selected based on at least one product type associated with the retail shelf. In one example, the threshold may be selected based on a status of the retail shelf determined using image analysis of one or more images of the retail shelf captured using the at least one image sensor before the capturing of the infrared input data. In one example, the threshold may be selected based on a time of day.
In some examples, in response to no detected presence of an object in the environment of the retail unit, the at least one image of the retail shelf using the at least one image sensor may be captured, and in response to a detection of presence of an object in the environment of the retail unit, the capturing of the at least one image of the retail shelf may be forgone.
In some embodiments, methods, systems, and computer-readable media are provided for robust action recognition in retail environment.
In some examples, infrared data captured using one or more infrared sensors from a retail environment may be received. Further, at least one image captured using at least one image sensor from the retail environment may be received. The infrared data and the at least one image may be analyzed to detect an action performed in the retail environment. In one example, information based on the detected action may be provided.
In some examples, the action may include at least one of picking a product from a retail shelf, placing a product on a retail shelf and moving a product on a retail shelf. In some examples, detecting the action performed in the retail environment may include recognizing a type of the action. In some examples, detecting the action performed in the retail environment may include at least one of identifying a product type associated with the action and determining a quantity of products associated with the action. In some examples, the at least one image may include at least one three-dimensional image.
In some examples, a convolution of at least part of the at least one image may be calculated to obtain a value of the calculated convolution. Further, the value of the calculated convolution may be used to analyze the infrared data to detect the action performed in the retail environment.
In some examples, a convolution of at least part of the infrared data may be calculated to obtain a value of the calculated convolution. Further, the value of the calculated convolution may be used to analyze the at least one image to detect the action performed in the retail environment.
In some examples, a convolution of at least part of the at least one image may be calculated to obtain a value of the calculated convolution. Further, the infrared data may be analyzed to determine a wavelength associated with the infrared data. In one example, in response to a first combination of the value of the calculated convolution and the wavelength associated with the infrared data, the action performed in the retail environment may be detected, and in response to a second combination of the value of the calculated convolution and the wavelength associated with the infrared data, the detection of the action performed in the retail environment may be forgone.
In some examples, the infrared data may include a time series of samples captured using the one or more infrared sensors at different points in time. In one example, the time series of samples may be analyzed to select the at least one image of a plurality of images. In one example, two samples of the time series of samples may be compared to one another, and a result of the comparison may be used to analyze the at least one image to detect the action performed in the retail environment.
In some examples, the at least one image may include a plurality of frames of a video captured using the at least one image sensor. In one example, two frames of the plurality of frames may be compared to one another, and a result of the comparison may be used to analyze the infrared data to detect the action performed in the retail environment.
In some examples, the infrared data may be analyzed to select a portion of the at least one image, and the selected portion of the at least one image may be analyzed to detect the action performed in the retail environment.
In some examples, the infrared data may be analyzed to attempt to detect the action performed in the retail environment, and in response to a failure of the attempt to successfully detect the action, the at least one image may be analyzed to detect the action performed in the retail environment. In one example, the failure to successfully detect the action may be a failure to successfully detect the action at a confidence level higher than a selected threshold. In another example, the failure to successfully detect the action may be a failure to determine at least one aspect of the action. In yet another example, in response to a failure to successfully detect the action, the capturing of the at least one image using the at least one image sensor may be triggered.
In some embodiments, methods, systems, and computer-readable media are provided for using vibration data analysis and image analysis for robust action recognition in retail environment.
In some examples, vibration data captured using one or more vibration sensors mounted to a shelving unit including at least one retail shelf may be received. Further, at least one image captured using at least one image sensor from a retail environment including the shelving unit may be received. The vibration data and the at least one image may be analyzed to detect an action performed in the retail environment. In one example, information based on the detected action may be provided.
In some examples, the action may include at least one of picking a product from a retail shelf, placing a product on a retail shelf and moving a product on a retail shelf. In some examples, detecting the action performed in the retail environment may include recognizing a type of the action. In some examples, detecting the action performed in the retail environment may include at least one of identifying a product type associated with the action and determining a quantity of products associated with the action. In some examples, the at least one image may include at least one three-dimensional image.
In some examples, a convolution of at least part of the at least one image may be calculated to obtain a value of the calculated convolution. Further, the value of the calculated convolution may be used to analyze the vibration data to detect the action performed in the retail environment.
In some examples, a convolution of at least part of the vibration data to obtain a value of the calculated convolution may be calculated. Further, the value of the calculated convolution may be used to analyze the at least one image to detect the action performed in the retail environment.
In some examples, a convolution of at least part of the at least one image to obtain a value of the calculated convolution may be calculated. Further, the vibration data may be analyzed to determine a frequency associated with the vibration data. In one example, in response to a first combination of the value of the calculated convolution and the frequency associated with the vibration data, the action performed in the retail environment may be detected, and in response to a second combination of the value of the calculated convolution and the frequency associated with the vibration data, the detection of the action performed in the retail environment may be forgone.
In some examples, the vibration data may include a time series of samples captured using the one or more vibration sensors at different points in time. For example, the time series of samples may be analyzed to select the at least one image of a plurality of images. In another example, two samples of the time series of samples may be compared to one another, and a result of the comparison may be used to analyze the at least one image to detect the action performed in the retail environment.
In some examples, the at least one image may include a plurality of frames of a video captured using the at least one image sensor. In one example, two frames of the plurality of frames may be compared to one another, and a result of the comparison may be used to analyze the vibration data to detect the action performed in the retail environment.
In some examples, the vibration data may be analyzed to select a portion of the at least one image, and the selected portion of the at least one image may be analyzed to detect the action performed in the retail environment.
In some examples, the vibration data may be analyzed to attempt to detect the action performed in the retail environment, and in response to a failure of the attempt to successfully detect the action, the at least one image may be analyzed to detect the action performed in the retail environment. In one example, the failure to successfully detect the action may be a failure to successfully detect the action at a confidence level higher than a selected threshold. In another example, the failure to successfully detect the action may be a failure to determine at least one aspect of the action. In one example, for example, in response to a failure to successfully detect the action, the capturing of the at least one image using the at least one image sensor may be triggered.
Consistent with other disclosed embodiments, non-transitory computer-readable medium including instructions that when executed by a processor may cause the processor to perform any of the methods described herein.
The foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the claims.
The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate various disclosed embodiments. In the drawings:
The following detailed description refers to the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the following description to refer to the same or similar parts. While several illustrative embodiments are described herein, modifications, adaptations and other implementations are possible. For example, substitutions, additions, or modifications may be made to the components illustrated in the drawings, and the illustrative methods described herein may be modified by substituting, reordering, removing, or adding steps to the disclosed methods. Accordingly, the following detailed description is not limited to the disclosed embodiments and examples. Instead, the proper scope is defined by the appended claims.
The present disclosure is directed to systems and methods for processing images captured in a retail store. As used herein, the term “retail store” or simply “store” refers to an establishment offering products for sale by direct selection by customers physically or virtually shopping within the establishment. The retail store may be an establishment operated by a single retailer (e.g., supermarket) or an establishment that includes stores operated by multiple retailers (e.g., a shopping mall). Embodiments of the present disclosure include receiving an image depicting a store shelf having at least one product displayed thereon. As used herein, the term “store shelf” or simply “shelf” refers to any suitable physical structure which may be used for displaying products in a retail environment. In one embodiment, the store shelf may be part of a shelving unit including a number of individual store shelves. In another embodiment, the store shelf may include a display unit having a single-level or multi-level surfaces.
Consistent with the present disclosure, the system may process images and image data acquired by a capturing device to determine information associated with products displayed in the retail store. The term “capturing device” refers to any device configured to acquire image data representative of products displayed in the retail store. Examples of capturing devices may include a digital camera, a time-of-flight camera, a stereo camera, an active stereo camera, a depth camera, a Lidar system, a laser scanner, CCD based devices, or any other sensor based system capable of converting received light into electric signals. The term “image data” refers to any form of data generated based on optical signals in the near-infrared, infrared, visible, and ultraviolet spectrums (or any other suitable radiation frequency range). Consistent with the present disclosure, the image data may include pixel data streams, digital images, digital video streams, data derived from captured images, and data that may be used to construct a 3D image. The image data acquired by a capturing device may be transmitted by wired or wireless transmission to a remote server. In one embodiment, the capturing device may include a stationary camera with communication layers (e.g., a dedicated camera fixed to a store shelf, a security camera, and so forth). Such an embodiment is described in greater detail below with reference to
In some embodiments, the capturing device may include one or more image sensors. The term “image sensor” refers to a device capable of detecting and converting optical signals in the near-infrared, infrared, visible, and ultraviolet spectrums into electrical signals. The electrical signals may be used to form image data (e.g., an image or a video stream) based on the detected signal. Examples of image sensors may include semiconductor charge-coupled devices (CCD), active pixel sensors in complementary metal-oxide-semiconductor (CMOS), or N-type metal-oxide-semiconductors (NMOS, Live MOS). In some cases, the image sensor may be part of a camera included in the capturing device.
Embodiments of the present disclosure further include analyzing images to detect and identify different products. As used herein, the term “detecting a product” may broadly refer to determining an existence of the product. For example, the system may determine the existence of a plurality of distinct products displayed on a store shelf. By detecting the plurality of products, the system may acquire different details relative to the plurality of products (e.g., how many products on a store shelf are associated with a same product type), but it does not necessarily gain knowledge of the type of product. In contrast, the term “identifying a product” may refer to determining a unique identifier associated with a specific type of product that allows inventory managers to uniquely refer to each product type in a product catalogue. Additionally or alternatively, the term “identifying a product” may refer to determining a unique identifier associated with a specific brand of products that allows inventory managers to uniquely refer to products, e.g., based on a specific brand in a product catalogue. Additionally or alternatively, the term “identifying a product” may refer to determining a unique identifier associated with a specific category of products that allows inventory managers to uniquely refer to products, e.g., based on a specific category in a product catalogue. In some embodiments, the identification may be made based at least in part on visual characteristics of the product (e.g., size, shape, logo, text, color, and so forth). The unique identifier may include any codes that may be used to search a catalog, such as a series of digits, letters, symbols, or any combinations of digits, letters, and symbols. Consistent with the present disclosure, the terms “determining a type of a product” and “determining a product type” may also be used interchangeably in this disclosure with reference to the term “identifying a product.”
Embodiments of the present disclosure further include determining at least one characteristic of the product for determining the type of the product. As used herein, the term “characteristic of the product” refers to one or more visually discernable features attributed to the product. Consistent with the present disclosure, the characteristic of the product may assist in classifying and identifying the product. For example, the characteristic of the product may be associated with the ornamental design of the product, the size of the product, the shape of the product, the colors of the product, the brand of the product, a logo or text associated with the product (e.g., on a product label), and more. In addition, embodiments of the present disclosure further include determining a confidence level associated with the determined type of the product. The term “confidence level” refers to any indication, numeric or otherwise, of a level (e.g., within a predetermined range) indicative of an amount of confidence the system has that the determined type of the product is the actual type of the product. For example, the confidence level may have a value between 1 and 10, alternatively, the confidence level may be expressed as a percentage.
In some cases, the system may compare the confidence level to a threshold. The term “threshold” as used herein denotes a reference value, a level, a point, or a range of values, for which, when the confidence level is above it (or below it depending on a particular use case), the system may follow a first course of action and, when the confidence level is below it (or above it depending on a particular use case), the system may follow a second course of action. The value of the threshold may be predetermined for each type of product or may be dynamically selected based on different considerations. In one embodiment, when the confidence level associated with a certain product is below a threshold, the system may obtain contextual information to increase the confidence level. As used herein, the term “contextual information” (or “context”) refers to any information having a direct or indirect relationship with a product displayed on a store shelf. In some embodiments, the system may retrieve different types of contextual information from captured image data and/or from other data sources. In some cases, contextual information may include recognized types of products adjacent to the product under examination. In other cases, contextual information may include text appearing on the product, especially where that text may be recognized (e.g., via OCR) and associated with a particular meaning. Other examples of types of contextual information may include logos appearing on the product, a location of the product in the retail store, a brand name of the product, a price of the product, product information collected from multiple retail stores, product information retrieved from a catalog associated with a retail store, etc.
Reference is now made to
System 100 may also include an image processing unit 130 to execute the analysis of images captured by the one or more capturing devices 125. Image processing unit 130 may include a server 135 operatively connected to a database 140. Image processing unit 130 may include one or more servers connected by a communication network, a cloud platform, and so forth. Consistent with the present disclosure, image processing unit 130 may receive raw or processed data from capturing device 125 via respective communication links, and provide information to different system components using a network 150. Specifically, image processing unit 130 may use any suitable image analysis technique including, for example, object recognition, object detection, image segmentation, feature extraction, optical character recognition (OCR), object-based image analysis, shape region techniques, edge detection techniques, pixel-based detection, artificial neural networks, convolutional neural networks, etc. In addition, image processing unit 130 may use classification algorithms to distinguish between the different products in the retail store. In some embodiments, image processing unit 130 may utilize suitably trained machine learning algorithms and models to perform the product identification. Network 150 may facilitate communications and data exchange between different system components when these components are coupled to network 150 to enable output of data derived from the images captured by the one or more capturing devices 125. In some examples, the types of outputs that image processing unit 130 can generate may include identification of products, indicators of product quantity, indicators of planogram compliance, indicators of service-improvement events (e.g., a cleaning event, a restocking event, a rearrangement event, etc.), and various reports indicative of the performances of retail stores 105. Additional examples of the different outputs enabled by image processing unit 130 are described below with reference to
Consistent with the present disclosure, network 150 may be any type of network (including infrastructure) that provides communications, exchanges information, and/or facilitates the exchange of information between the components of system 100. For example, network 150 may include or be part of the Internet, a Local Area Network, wireless network (e.g., a Wi-Fi/302.11 network), or other suitable connections. In other embodiments, one or more components of system 100 may communicate directly through dedicated communication links, such as, for example, a telephone network, an extranet, an intranet, the Internet, satellite communications, off-line communications, wireless communications, transponder communications, a local area network (LAN), a wide area network (WAN), a virtual private network (VPN), and so forth.
In one example configuration, server 135 may be a cloud server that processes images received directly (or indirectly) from one or more capturing device 125 and processes the images to detect and/or identify at least some of the plurality of products in the image based on visual characteristics of the plurality of products. The term “cloud server” refers to a computer platform that provides services via a network, such as the Internet. In this example configuration, server 135 may use virtual machines that may not correspond to individual hardware. For example, computational and/or storage capabilities may be implemented by allocating appropriate portions of desirable computation/storage power from a scalable repository, such as a data center or a distributed computing environment. In one example, server 135 may implement the methods described herein using customized hard-wired logic, one or more Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs), firmware, and/or program logic which, in combination with the computer system, cause server 135 to be a special-purpose machine.
In another example configuration, server 135 may be part of a system associated with a retail store that communicates with capturing device 125 using a wireless local area network (WLAN) and may provide similar functionality as a cloud server. In this example configuration, server 135 may communicate with an associated cloud server (not shown) and cloud database (not shown). The communications between the store server and the cloud server may be used in a quality enforcement process, for upgrading the recognition engine and the software from time to time, for extracting information from the store level to other data users, and so forth. Consistent with another embodiment, the communications between the store server and the cloud server may be discontinuous (purposely or unintentional) and the store server may be configured to operate independently from the cloud server. For example, the store server may be configured to generate a record indicative of changes in product placement that occurred when there was a limited connection (or no connection) between the store server and the cloud server, and to forward the record to the cloud server once connection is reestablished.
As depicted in
Database 140 may be included on a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible or non-transitory computer-readable medium. Database 140 may also be part of server 135 or separate from server 135. When database 140 is not part of server 135, server 135 may exchange data with database 140 via a communication link. Database 140 may include one or more memory devices that store data and instructions used to perform one or more features of the disclosed embodiments. In one embodiment, database 140 may include any suitable databases, ranging from small databases hosted on a workstation to large databases distributed among data centers. Database 140 may also include any combination of one or more databases controlled by memory controller devices (e.g., server(s), etc.) or software. For example, database 140 may include document management systems, Microsoft SQL databases, SharePoint databases, Oracle™ databases, Sybase™ databases, other relational databases, or non-relational databases, such as mongo and others.
Consistent with the present disclosure, image processing unit 130 may communicate with output devices 145 to present information derived based on processing of image data acquired by capturing devices 125. The term “output device” is intended to include all possible types of devices capable of outputting information from server 135 to users or other computer systems (e.g., a display screen, a speaker, a desktop computer, a laptop computer, mobile device, tablet, a PDA, etc.), such as 145A, 145B, 145C and 145D. In one embodiment, each of the different system components (i.e., retail stores 105, market research entity 110, suppliers 115, and users 120) may be associated with an output device 145, and each system component may be configured to present different information on the output device 145. In one example, server 135 may analyze acquired images including representations of shelf spaces. Based on this analysis, server 135 may compare shelf spaces associated with different products, and output device 145A may present market research entity 110 with information about the shelf spaces associated with different products. The shelf spaces may also be compared with sales data, expired products data, and more. Consistent with the present disclosure, market research entity 110 may be a part of (or may work with) supplier 115. In another example, server 135 may determine product compliance to a predetermined planogram, and output device 145B may present to supplier 115 information about the level of product compliance at one or more retail stores 105 (for example in a specific retail store 105, in a group of retail stores 105 associated with supplier 115, in all retail stores 105, and so forth). The predetermined planogram may be associated with contractual obligations and/or other preferences related to the retailer methodology for placement of products on the store shelves. In another example, server 135 may determine that a specific store shelf has a type of fault in the product placement, and output device 145C may present to a manager of retail store 105 a user-notification that may include information about a correct display location of a misplaced product, information about a store shelf associated with the misplaced product, information about a type of the misplaced product, and/or a visual depiction of the misplaced product. In another example, server 135 may identify which products are available on the shelf and output device 145D may present to user 120 an updated list of products.
The components and arrangements shown in
Processing device 202, shown in
Consistent with the present disclosure, the methods and processes disclosed herein may be performed by server 135 as a result of processing device 202 executing one or more sequences of one or more instructions contained in a non-transitory computer-readable storage medium. As used herein, a non-transitory computer-readable storage medium refers to any type of physical memory on which information or data readable by at least one processor can be stored. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, nonvolatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, any other optical data storage medium, any physical medium with patterns of holes, a RAM, a PROM, an EPROM, a FLASH-EPROM or any other flash memory, NVRAM, a cache, a register, any other memory chip or cartridge, and networked versions of the same. The terms “memory” and “computer-readable storage medium” may refer to multiple structures, such as a plurality of memories or computer-readable storage mediums located within server 135, or at a remote location. Additionally, one or more computer-readable storage mediums can be utilized in implementing a computer-implemented method. The term “computer-readable storage medium” should be understood to include tangible items and exclude carrier waves and transient signals.
According to one embodiment, server 135 may include network interface 206 (which may also be any communications interface) coupled to bus 200. Network interface 206 may provide one-way or two-way data communication to a local network, such as network 150. Network interface 206 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection to a corresponding type of telephone line. As another example, network interface 206 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. In another embodiment, network interface 206 may include an Ethernet port connected to radio frequency receivers and transmitters and/or optical (e.g., infrared) receivers and transmitters. The specific design and implementation of network interface 206 depends on the communications network(s) over which server 135 is intended to operate. As described above, server 135 may be a cloud server or a local server associated with retail store 105. In any such implementation, network interface 206 may be configured to send and receive electrical, electromagnetic, or optical signals, through wires or wirelessly, that may carry analog or digital data streams representing various types of information. In another example, the implementation of network interface 206 may be similar or identical to the implementation described below for network interface 306.
Server 135 may also include peripherals interface 208 coupled to bus 200. Peripherals interface 208 may be connected to sensors, devices, and subsystems to facilitate multiple functionalities. In one embodiment, peripherals interface 208 may be connected to I/O system 210 configured to receive signals or input from devices and provide signals or output to one or more devices that allow data to be received and/or transmitted by server 135. In one embodiment I/O system 210 may include or be associated with output device 145. For example, I/O system 210 may include a touch screen controller 212, an audio controller 214, and/or other input controller(s) 216. Touch screen controller 212 may be coupled to a touch screen 218. Touch screen 218 and touch screen controller 212 can, for example, detect contact, movement, or break thereof using any of a plurality of touch sensitivity technologies, including but not limited to capacitive, resistive, infrared, and surface acoustic wave technologies as well as other proximity sensor arrays or other elements for determining one or more points of contact with touch screen 218. Touch screen 218 may also, for example, be used to implement virtual or soft buttons and/or a keyboard. In addition to or instead of touch screen 218, I/O system 210 may include a display screen (e.g., CRT, LCD, etc.), virtual reality device, augmented reality device, and so forth. Specifically, touch screen controller 212 (or display screen controller) and touch screen 218 (or any of the alternatives mentioned above) may facilitate visual output from server 135. Audio controller 214 may be coupled to a microphone 220 and a speaker 222 to facilitate voice-enabled functions, such as voice recognition, voice replication, digital recording, and telephony functions. Specifically, audio controller 214 and speaker 222 may facilitate audio output from server 135. The other input controller(s) 216 may be coupled to other input/control devices 224, such as one or more buttons, keyboards, rocker switches, thumb-wheel, infrared port, USB port, image sensors, motion sensors, depth sensors, and/or a pointer device such as a computer mouse or a stylus.
In some embodiments, processing device 202 may use memory interface 204 to access data and a software product stored on a memory device 226. Memory device 226 may include operating system programs for server 135 that perform operating system functions when executed by the processing device. By way of example, the operating system programs may include Microsoft Windows™, Unix™ Linux™, Apple™ operating systems, personal digital assistant (PDA) type operating systems such as Apple iOS, Google Android, Blackberry OS, or other types of operating systems.
Memory device 226 may also store communication instructions 228 to facilitate communicating with one or more additional devices (e.g., capturing device 125), one or more computers (e.g., output devices 145A-145D) and/or one or more servers. Memory device 226 may include graphical user interface instructions 230 to facilitate graphic user interface processing; image processing instructions 232 to facilitate image data processing-related processes and functions; sensor processing instructions 234 to facilitate sensor-related processing and functions; web browsing instructions 236 to facilitate web browsing-related processes and functions; and other software instructions 238 to facilitate other processes and functions. Each of the above identified instructions and applications may correspond to a set of instructions for performing one or more functions described above. These instructions need not be implemented as separate software programs, procedures, or modules. Memory device 226 may include additional instructions or fewer instructions. Furthermore, various functions of server 135 may be implemented in hardware and/or in software, including in one or more signal processing and/or application specific integrated circuits. For example, server 135 may execute an image processing algorithm to identify in received images one or more products and/or obstacles, such as shopping carts, people, and more.
In one embodiment, memory device 226 may store database 140. Database 140 may include product type model data 240 (e.g., an image representation, a list of features, a model obtained by training machine learning algorithm using training examples, an artificial neural network, and more) that may be used to identify products in received images; contract-related data 242 (e.g., planograms, promotions data, etc.) that may be used to determine if the placement of products on the store shelves and/or the promotion execution are consistent with obligations of retail store 105; catalog data 244 (e.g., retail store chain's catalog, retail store's master file, etc.) that may be used to check if all product types that should be offered in retail store 105 are in fact in the store, if the correct price is displayed next to an identified product, etc.; inventory data 246 that may be used to determine if additional products should be ordered from suppliers 115; employee data 248 (e.g., attendance data, records of training provided, evaluation and other performance-related communications, productivity information, etc.) that may be used to assign specific employees to certain tasks; and calendar data 250 (e.g., holidays, national days, international events, etc.) that may be used to determine if a possible change in a product model is associated with a certain event. In other embodiments of the disclosure, database 140 may store additional types of data or fewer types of data. Furthermore, various types of data may be stored in one or more memory devices other than memory device 226.
The components and arrangements shown in
According to one embodiment, network interface 306 may be used to facilitate communication with server 135. Network interface 306 may be an Ethernet port connected to radio frequency receivers and transmitters and/or optical receivers and transmitters. The specific design and implementation of network interface 306 depends on the communications network(s) over which capturing device 125 is intended to operate. For example, in some embodiments, capturing device 125 may include a network interface 306 designed to operate over a GSM network, a GPRS network, an EDGE network, a Wi-Fi or WiMax network, a Bluetooth® network, etc. In another example, the implementation of network interface 306 may be similar or identical to the implementation described above for network interface 206.
In the example illustrated in
Consistent with the present disclosure, capturing device 125 may include digital components that collect data from image sensor 310, transform it into an image, and store the image on a memory device 314 and/or transmit the image using network interface 306. In one embodiment, capturing device 125 may be fixedly mountable to a store shelf or to other objects in the retail store (such as walls, ceilings, floors, refrigerators, checkout stations, displays, dispensers, rods which may be connected to other objects in the retail store, and so forth). In one embodiment, capturing device 125 may be split into at least two housings such that only image sensor 310 and lens 312 may be visible on the store shelf, and the rest of the digital components may be located in a separate housing. An example of this type of capturing device is described below with reference to
Consistent with the present disclosure, capturing device 125 may use memory interface 304 to access memory device 314. Memory device 314 may include high-speed, random access memory and/or non-volatile memory such as one or more magnetic disk storage devices, one or more optical storage devices, and/or flash memory (e.g., NAND, NOR) to store captured image data. Memory device 314 may store operating system instructions 316, such as DARWIN, RTXC, LINUX, iOS, UNIX, LINUX, OS X, WINDOWS, or an embedded operating system such as VXWorkS. Operating system 316 can include instructions for handling basic system services and for performing hardware dependent tasks. In some implementations, operating system 316 may include a kernel (e.g., UNIX kernel, LINUX kernel, and so forth). In addition, memory device 314 may store capturing instructions 318 to facilitate processes and functions related to image sensor 310; graphical user interface instructions 320 that enables a user associated with capturing device 125 to control the capturing device and/or to acquire images of an area-of-interest in a retail establishment; and application instructions 322 to facilitate a process for monitoring compliance of product placement or other processes.
The components and arrangements shown in
With reference to
Differing numbers of capturing devices 125 may be used to cover shelving unit 402. In addition, there may be an overlap region in the horizontal field of views of some of capturing devices 125. For example, the horizontal fields of view of capturing devices (e.g., adjacent capturing devices) may at least partially overlap with one another. In another example, one capturing device may have a lower field of view than the field of view of a second capturing device, and the two capturing devices may have at least partially overlapping fields of view. According to one embodiment, each capturing device 125 may be equipped with network interface 306 for communicating with server 135. In one embodiment, the plurality of capturing devices 125 in retail store 105 may be connected to server 135 via a single WLAN. Network interface 306 may transmit information associated with a plurality of images captured by the plurality of capturing devices 125 for analysis purposes. In one example, server 135 may determine an existence of an occlusion event (such as, by a person, by store equipment, such as a ladder, cart, etc.) and may provide a notification to resolve the occlusion event. In another example, server 135 may determine if a disparity exists between at least one contractual obligation and product placement as determined based on automatic analysis of the plurality of images. The transmitted information may include raw images, cropped images, processed image data, data about products identified in the images, and so forth. Network interface 306 may also transmit information identifying the location of the plurality capturing devices 125 in retail store 105.
With reference to
In a second embodiment, server 135 may receive image data acquired by crowd sourcing. In one exemplary implementation, server 135 may provide a request to a detected mobile device for an updated image of the area-of-interest in aisle 400. The request may include an incentive (e.g., $2 discount) to user 120 for acquiring the image. In response to the request, user 120 may acquire and transmit an up-to-date image of the area-of-interest. After receiving the image from user 120, server 135 may transmit the accepted incentive or agreed upon reward to user 120. The incentive may comprise a text notification and a redeemable coupon. In some embodiments, the incentive may include a redeemable coupon for a product associated with the area-of-interest. Server 135 may generate image-related data based on aggregation of data from images received from crowd sourcing and from images received from a plurality of cameras fixedly connected to store shelves. Additional details of this embodiment are described in Applicant's International Patent Application No. PCT/IB2017/000919, which is incorporated herein by reference.
With reference to
As discussed above with reference to
System 500 may also include a data conduit 508 extending between first housing 502 and second housing 504. Data conduit 508 may be configured to enable transfer of control signals from the at least one processor to image capture device 506 and to enable collection of image data acquired by image capture device 506 for transmission by the network interface. Consistent with the present disclosure, the term “data conduit” may refer to a communications channel that may include either a physical transmission medium such as a wire or a logical connection over a multiplexed medium such as a radio channel. In some embodiments, data conduit 508 may be used for conveying image data from image capture device 506 to at least one processor located in second housing 504. Consistent with one implementation of system 500, data conduit 508 may include flexible printed circuits and may have a length of at least about 5 cm, at least about 10 cm, at least about 15 cm, etc. The length of data conduit 508 may be adjustable to enable placement of first housing 502 separately from second housing 504. For example, in some embodiments, data conduit may be retractable within second housing 504 such that the length of data conduit exposed between first housing 502 and second housing 504 may be selectively adjusted.
In one embodiment, the length of data conduit 508 may enable first housing 502 to be mounted on a first side of a horizontal store shelf facing the aisle (e.g., store shelf 510 illustrated in
Consistent with the present disclosure, image capture device 506 may be associated with a lens (e.g., lens 312) having a fixed focal length selected according to a distance expected to be encountered between retail shelving units on opposite sides of an aisle (e.g., distance d1 shown in
Consistent with the present disclosure, second housing 504 may include a power port 512 for conveying energy from a power source to first housing 502. In one embodiment, second housing 504 may include a section for at least one mobile power source 514 (e.g., in the depicted configuration the section is configured to house four batteries). The at least one mobile power source may provide sufficient power to enable image capture device 506 to acquire more than 1,000 pictures, more than 5,000 pictures, more than 10,000 pictures, or more than 15,000 pictures, and to transmit them to server 135. In one embodiment, mobile power source 514 located in a single second housing 504 may power two or more image capture devices 506 mounted on the store shelf. For example, as depicted in
As shown in
In some embodiments of the disclosure, the at least one processor of system 500 may cause at least one image capture device 506 to periodically capture images of products located on an opposing retail shelving unit (e.g., images of products located on a shelf across an aisle from the shelf on which first housing 502 is mounted). The term “periodically capturing images” includes capturing an image or images at predetermined time intervals (e.g., every minute, every 30 minutes, every 150 minutes, every 300 minutes, etc.), capturing video, capturing an image every time a status request is received, and/or capturing an image subsequent to receiving input from an additional sensor, for example, an associated proximity sensor. Images may also be captured based on various other triggers or in response to various other detected events. In some embodiments, system 500 may receive an output signal from at least one sensor located on an opposing retail shelving unit. For example, system 500B may receive output signals from a sensing system located on second retail shelving unit 604. The output signals may be indicative of a sensed lifting of a product from second retail shelving unit 604 or a sensed positioning of a product on second retail shelving unit 604. In response to receiving the output signal from the at least one sensor located on second retail shelving unit 604, system 500B may cause image capture device 506 to capture one or more images of second retail shelving unit 604. Additional details on a sensing system, including the at least one sensor that generates output signals indicative of a sensed lifting of a product from an opposing retail shelving unit, is discussed below with reference to
Consistent with embodiments of the disclosure, system 500 may detect an object 608 in a selected area between first retail shelving unit 602 and second retail shelving unit 604. Such detection may be based on the output of one or more dedicated sensors (e.g., motion detectors, etc.) and/or may be based on image analysis of one or more images acquired by an image acquisition device. Such images, for example, may include a representation of a person or other object recognizable through various image analysis techniques (e.g., trained neural networks, Fourier transform analysis, edge detection, filters, face recognition, and so forth). The selected area may be associated with distance d1 between first retail shelving unit 602 and second retail shelving unit 604. The selected area may be within the field of view of image capture device 506 or an area where the object causes an occlusion of a region of interest (such as a shelf, a portion of a shelf being monitored, and more). Upon detecting object 608, system 500 may cause image capture device 506 to forgo image acquisition while object 608 is within the selected area. In one example, object 608 may be an individual, such as a customer or a store employee. In another example, detected object 608 may be an inanimate object, such as a cart, box, carton, one or more products, cleaning robots, etc. In the example illustrated in
As shown in
Consistent with the present disclosure, system 500 may be mounted on a retail shelving unit that includes at least two adjacent horizontal shelves (e.g., shelves 622A and 622B) forming a substantially continuous surface for product placement. The store shelves may include standard store shelves or customized store shelves. A length of each store shelf 622 may be at least 50 cm, less than 200 cm, or between 75 cm to 175 cm. In one embodiment, first housing 502 may be fixedly mounted on the retail shelving unit in a slit between two adjacent horizontal shelves. For example, first housing 502G may be fixedly mounted on retail shelving unit 620 in a slit between horizontal shelf 622B and horizontal shelf 622C. In another embodiment, first housing 502 may be fixedly mounted on a first shelf and second housing 504 may be fixedly mounted on a second shelf. For example, first housing 502I may be mounted on horizontal shelf 622D and second housing 504I may be mounted on horizontal shelf 622E. In another embodiment, first housing 502 may be fixedly mounted on a retail shelving unit on a first side of a horizontal shelf facing the opposing retail shelving unit and second housing 504 may be fixedly mounted on retail shelving unit 620 on a second side of the horizontal shelf orthogonal to the first side. For example, first housing 502H may mounted on a first side 624 of horizontal shelf 622C next to a label and second housing 504H may be mounted on a second side 626 of horizontal shelf 622C that faces down (e.g., towards the ground or towards a lower shelf). In another embodiment, second housing 504 may be mounted closer to the back of the horizontal shelf than to the front of the horizontal shelf. For example, second housing 504H may be fixedly mounted on horizontal shelf 622C on second side 626 closer to third side 628 of the horizontal shelf 622C than to first side 624. Third side 628 may be parallel to first side 624. As mentioned above, data conduit 508 (e.g., data conduit 508H) may have an adjustable or selectable length for extending between first housing 502 and second housing 504. In one embodiment, when first housing 502H is fixedly mounted on first side 624, the length of data conduit 508H may enable second housing 604H to be fixedly mounted on second side 626 closer to third side 628 than to first side 624.
As mentioned above, at least one processor contained in a single second housing 504 may control a plurality of image capture devices 506 contained in a plurality of first housings 502 (e.g., system 500J). In some embodiments, the plurality of image capture devices 506 may be configured for location on a single horizontal shelf and may be directed to substantially the same area of the opposing first retail shelving unit (e.g., system 500D in
Consistent with the present disclosure, a central communication device 630 may be located in retail store 105 and may be configured to communicate with server 135 (e.g., via an Internet connection). The central communication device may also communicate with a plurality of systems 500 (for example, less than ten, ten, eleven, twelve, more than twelve, and so forth). In some cases, at least one system of the plurality of systems 500 may be located in proximity to central communication device 630. In the illustrated example, system 500F may be located in proximity to central communication device 630. In some embodiments, at least some of systems 500 may communicate directly with at least one other system 500. The communications between some of the plurality of systems 500 may happen via a wired connection, such as the communications between system 500J and system 500I and the communications between system 500H and system 500G. Additionally or alternatively, the communications between some of the plurality of systems 500 may occur via a wireless connection, such as the communications between system 500G and system 500F and the communications between system 500I and system 500F. In some examples, at least one system 500 may be configured to transmit captured image data (or information derived from the captured image data) to central communication device 630 via at least two mediating systems 500, at least three mediating systems 500, at least four mediating systems 500, or more. For example, system 500J may convey captured image data to central communication device 630 via system 500I and system 500F.
Consistent with the present disclosure, two (or more) systems 500 may share information to improve image acquisition. For example, system 500J may be configured to receive from a neighboring system 500I information associated with an event that system 500I had identified, and control image capture device 506 based on the received information. For example, system 500J may forgo image acquisition based on an indication from system 500I that an object has entered or is about to enter its field of view. Systems 500I and 500J may have overlapping fields of view or non-overlapping fields of view. In addition, system 500J may also receive (from system 500I) information that originates from central communication device 630 and control image capture device 506 based on the received information. For example, system 500I may receive instructions from central communication device 630 to capture an image when suppler 115 inquiries about a specific product that is placed in a retail unit opposing system 500I. In some embodiments, a plurality of systems 500 may communicate with central communication device 630. In order to reduce or avoid network congestion, each system 500 may identify an available transmission time slot. Thereafter, each system 500 may determine a default time slot for future transmissions based on the identified transmission time slot.
In addition to adjustment mechanism 642, first housing 502 may include a first physical adapter (not shown) configured to operate with multiple types of image capture device 506 and a second physical adapter (not shown) configured to operate with multiple types of lenses. During installation, the first physical adapter may be used to connect a suitable image capture device 506 to system 500 according to the level of recognition requested (e.g., detecting a barcode from products, detecting text and price from labels, detecting different categories of products, and so forth). Similarly, during installation, the second physical adapter may be used to associate a suitable lens to image capture device 506 according to the physical conditions at the store (e.g., the distance between the aisles, the horizontal field of view required from image capture device 506, and/or the vertical field of view required from image capture device 506). The second physical adapter provides the employee/installer the ability to select the focal length of lens 312 during installation according to the distance between retail shelving units on opposite sides of an aisle (e.g., distance d1 and/or distance d2 shown in
In addition to adjustment mechanism 642 and the different physical adapters, system 500 may modify the image data acquired by image capture device 506 based on at least one attribute associated with opposing retail shelving unit 640. Consistent with the present disclosure, the at least one attribute associated with retail shelving unit 640 may include a lighting condition, the dimensions of opposing retail shelving unit 640, the size of products displayed on opposing retail shelving unit 640, the type of labels used on opposing retail shelving unit 640, and more. In some embodiments, the attribute may be determined, based on analysis of one or more acquired images, by at least one processor contained in second housing 504. Alternatively, the attribute may be automatically sensed and conveyed to the at least one processor contained in second housing 504. In one example, the at least one processor may change the brightness of captured images based on the detected light conditions. In another example, the at least one processor may modify the image data by cropping the image such that it will include only the products on retail shelving unit (e.g., not to include the floor or the ceiling), only area of the shelving unit relevant to a selected task (such as planogram compliance check), and so forth.
Consistent with the present disclosure, during installation, system 500 may enable real-time display 646 of field of view 644 on a handheld device 648 of a user 650 installing image capturing device 506K. In one embodiment, real-time display 646 of field of view 644 may include augmented markings 652 indicating a location of a field of view 654 of an adjacent image capture device 506L. In another embodiment, real-time display 646 of field of view 644 may include augmented markings 656 indicating a region of interest in opposing retail shelving unit 640. The region of interest may be determined based on a planogram, identified product type, and/or part of retail shelving unit 640. For example, the region of interest may include products with a greater likelihood of planogram incompliance. In addition, system 500K may analyze acquired images to determine if field of view 644 includes the area that image capturing device 506K is supposed to monitor (for example, from labels on opposing retail shelving unit 640, products on opposing retail shelving unit 640, images captured from other image capturing devices that may capture other parts of opposing retail shelving unit 640 or capture the same part of opposing retail shelving unit 640 but in a lower resolution or at a lower frequency, and so forth). In additional embodiments, system 500 may further comprise an indoor location sensor which may help determine if the system 500 is positioned at the right location in retail store 105.
In some embodiments, an anti-theft device may be located in at least one of first housing 502 and second housing 504. For example, the anti-theft device may include a specific RF label or a pin-tag radio-frequency identification device, which may be the same or similar to a type of anti-theft device that is used by retail store 105 in which system 500 is located. The RF label or the pin-tag may be incorporated within the body of first housing 502 and second housing 504 and may not be visible. In another example, the anti-theft device may include a motion sensor whose output may be used to trigger an alarm in the case of motion or disturbance, in case of motion that is above a selected threshold, and so forth.
At step 702, the method includes fixedly mounting on first retail shelving unit 602 at least one first housing 502 containing at least one image capture device 506 such that an optical axis (e.g., optical axis 606) of at least one image capture device 506 is directed to second retail shelving unit 604. In one embodiment, fixedly mounting first housing 502 on first retail shelving unit 602 may include placing first housing 502 on a side of store shelf 622 facing second retail shelving unit 604. In another embodiment, fixedly mounting first housing 502 on retail shelving unit 602 may include placing first housing 502 in a slit between two adjacent horizontal shelves. In some embodiments, the method may further include fixedly mounting on first retail shelving unit 602 at least one projector (such as projector 632) such that light patterns projected by the at least one projector are directed to second retail shelving unit 604. In one embodiment, the method may include mounting the at least one projector to first retail shelving unit 602 at a selected distance to first housing 502 with image capture device 506. In one embodiment, the selected distance may be at least 5 cm, at least 10 cm, at least 15 cm, less than 40 cm, less than 30 cm, between about 5 cm to about 20 cm, or between about 10 cm to about 15 cm. In one embodiment, the selected distance may be calculated according to a distance between to first retail shelving unit 602 and second retail shelving unit 604, such as d1 and/or d2, for example selecting the distance to be a function of d1 and/or d2, a linear function of d1 and/or d2, a function of d1*log(d1) and/or d2*log(d2) such as a1*d1*log(d1) for some constant a1, and so forth.
At step 704, the method includes fixedly mounting on first retail shelving unit 602 second housing 504 at a location spaced apart from the at least one first housing 502, second housing 504 may include at least one processor (e.g., processing device 302). In one embodiment, fixedly mounting second housing 504 on the retail shelving unit may include placing second housing 504 on a different side of store shelf 622 than the side first housing 502 is mounted on.
At step 706, the method includes extending at least one data conduit 508 between at least one first housing 502 and second housing 504. In one embodiment, extending at least one data conduit 508 between at least one first housing 502 and second housing 504 may include adjusting the length of data conduit 508 to enable first housing 502 to be mounted separately from second housing 504. At step 708, the method includes capturing images of second retail shelving unit 604 using at least one image capture device 506 contained in at least one first housing 502 (e.g., first housing 502A, first housing 502B, or first housing 502C). In one embodiment, the method further includes periodically capturing images of products located on second retail shelving unit 604. In another embodiment the method includes capturing images of second retail shelving unit 604 after receiving a trigger from at least one additional sensor in communication with system 500 (wireless or wired).
At step 710, the method includes transmitting at least some of the captured images from second housing 504 to a remote server (e.g., server 135) configured to determine planogram compliance relative to second retail shelving unit 604. In some embodiments, determining planogram compliance relative to second retail shelving unit 604 may include determining at least one characteristic of planogram compliance based on detected differences between the at least one planogram and the actual placement of the plurality of product types on second retail shelving unit 604. Consistent with the present disclosure, the characteristic of planogram compliance may include at least one of: product facing, product placement, planogram compatibility, price correlation, promotion execution, product homogeneity, restocking rate, and planogram compliance of adjacent products.
At step 722, at least one processor contained in a second housing may receive from at least one image capture device contained in at least one first housing fixedly mounted on a retail shelving unit a plurality of images of an opposing retail shelving unit. For example, at least one processor contained in second housing 504A may receive from at least one image capture device 506 contained in first housing 502A (fixedly mounted on first retail shelving unit 602) a plurality of images of second retail shelving unit 604. The plurality of images may be captured and collected during a period of time (e.g., a minute, an hour, six hours, a day, a week, or more).
At step 724, the at least one processor contained in the second housing may analyze the plurality of images acquired by the at least one image capture device. In one embodiment, at least one processor contained in second housing 504A may use any suitable image analysis technique (for example, object recognition, object detection, image segmentation, feature extraction, optical character recognition (OCR), object-based image analysis, shape region techniques, edge detection techniques, pixel-based detection, artificial neural networks, convolutional neural networks, etc.) to identify objects in the plurality of images. In one example, the at least one processor contained in second housing 504A may determine the number of products located in second retail shelving unit 604. In another example, the at least one processor contained in second housing 504A may detect one or more objects in an area between first retail shelving unit 602 and second retail shelving unit 604.
At step 726, the at least one processor contained in the second housing may identify in the plurality of images a first image that includes a representation of at least a portion of an object located in an area between the retail shelving unit and the opposing retail shelving unit. In step 728, the at least one processor contained in the second housing may identify in the plurality of images a second image that does not include any object located in an area between the retail shelving unit and the opposing retail shelving unit. In one example, the object in the first image may be an individual, such as a customer or a store employee. In another example, the object in the first image may be an inanimate object, such as carts, boxes, products, etc.
At step 730, the at least one processor contained in the second housing may instruct a network interface contained in the second housing, fixedly mounted on the retail shelving unit separate from the at least one first housing, to transmit the second image to a remote server and to avoid transmission of the first image to the remote server. In addition, the at least one processor may issue a notification when an object blocks the field of view of the image capturing device for more than a predefined period of time (e.g., at least 30 minutes, at least 75 minutes, at least 150 minutes).
Embodiments of the present disclosure may automatically assess compliance of one or more store shelves with a planogram. For example, embodiments of the present disclosure may use signals from one or more sensors to determine placement of one or more products on store shelves. The disclosed embodiments may also use one or more sensors to determine empty spaces on the store shelves. The placements and empty spaces may be automatically assessed against a digitally encoded planogram. A planogram refers to any data structure or specification that defines at least one product characteristic relative to a display structure associated with a retail environment (such as store shelf or area of one or more shelves). Such product characteristics may include, among other things, quantities of products with respect to areas of the shelves, product configurations or product shapes with respect to areas of the shelves, product arrangements with respect to areas of the shelves, product density with respect to areas of the shelves, product combinations with respect to areas of the shelves, etc. Although described with reference to store shelves, embodiments of the present disclosure may also be applied to end caps or other displays; bins, shelves, or other organizers associated with a refrigerator or freezer units; or any other display structure associated with a retail environment.
The embodiments disclosed herein may use any sensors configured to detect one or more parameters associated with products (or a lack thereof). For example, embodiments may use one or more of pressure sensors, weight sensors, light sensors, resistive sensors, capacitive sensors, inductive sensors, vacuum pressure sensors, high pressure sensors, conductive pressure sensors, infrared sensors, photo-resistor sensors, photo-transistor sensors, photo-diodes sensors, ultrasonic sensors, or the like. Some embodiments may use a plurality of different kinds of sensors, for example, associated with the same or overlapping areas of the shelves and/or associated with different areas of the shelves. Some embodiments may use a plurality of sensors configured to be placed adjacent a store shelf, configured for location on the store shelf, configured to be attached to, or configured to be integrated with the store shelf. In some cases, at least part of the plurality of sensors may be configured to be placed next to a surface of a store shelf configured to hold products. For example, the at least part of the plurality of sensors may be configured to be placed relative to a part of a store shelf such that the at least part of the plurality of sensors may be positioned between the part of a store shelf and products placed on the part of the shelf. In another embodiment, the at least part of the plurality of sensors may be configured to be placed above and/or within and/or under the part of the shelf.
In one example, the plurality of sensors may include light detectors configured to be located such that a product placed on the part of the shelf may block at least some of the ambient light from reaching the light detectors. The data received from the light detectors may be analyzed to detect a product or to identify a product based on the shape of a product placed on the part of the shelf. In one example, the system may identify the product placed above the light detectors based on data received from the light detectors that may be indicative of at least part of the ambient light being blocked from reaching the light detectors. Further, the data received from the light detectors may be analyzed to detect vacant spaces on the store shelf. For example, the system may detect vacant spaces on the store shelf based on the received data that may be indicative of no product being placed on a part of the shelf. In another example, the plurality of sensors may include pressure sensors configured to be located such that a product placed on the part of the shelf may apply detectable pressure on the pressure sensors. Further, the data received from the pressure sensors may be analyzed to detect a product or to identify a product based on the shape of a product placed on the part of the shelf. In one example, the system may identify the product placed above the pressure sensors based on data received from the pressure sensors being indicative of pressure being applied on the pressure sensors. In addition, the data from the pressure sensors may be analyzed to detect vacant spaces on the store shelf, for example based on the readings being indicative of no product being placed on a part of the shelf, for example, when the pressure readings are below a selected threshold. Consistent with the present disclosure, inputs from different types of sensors (such as pressure sensors, light detectors, etc.) may be combined and analyzed together, for example to detect products placed on a store shelf, to identify shapes of products placed on a store shelf, to identify types of products placed on a store shelf, to identify vacant spaces on a store shelf, and so forth.
With reference to
Detection elements associated with shelf 800 may be associated with different areas of shelf 800. For example, detection elements 801A and 801B are associated with area 805A while other detection elements are associated with area 805B. Although depicted as rows, areas 805A and 805B may comprise any areas of shelf 800, whether contiguous (e.g., a square, a rectangular, or other regular or irregular shape) or not (e.g., a plurality of rectangles or other regular and/or irregular shapes). Such areas may also include horizontal regions between shelves (as shown in
One or more processors (e.g., processing device 202) configured to communicate with the detection elements (e.g., detection elements 801A and 801B) may detect first signals associated with a first area (e.g., areas 805A and/or 805B) and second signals associated with a second area. In some embodiments, the first area may, in part, overlap with the second area. For example, one or more detection elements may be associated with the first area as well as the second area and/or one or more detection elements of a first type may be associated with the first area while one or more detection elements of a second type may be associated with the second area overlapping, at least in part, the first area. In other embodiments, the first area and the second area may be spatially separate from each other.
The one or more processors may, using the first and second signals, determine that one or more products have been placed in the first area while the second area includes at least one empty area. For example, if the detection elements include pressure sensors, the first signals may include weight signals that match profiles of particular products (such as the mugs or plates depicted in the example of
The one or more processors may similarly process signals from other types of sensors. For example, if the detection elements include resistive or inductive sensors, the first signals may include resistances, voltages, and/or currents that match profiles of particular products (such as the mugs or plates depicted in the example of
Any of the profile matching described above may include direct matching of a subject to a threshold. For example, direct matching may include testing one or more measured values against the profile value(s) within a margin of error; mapping a received pattern onto a profile pattern with a residual having a maximum, minimum, integral, or the like within the margin of error; performing an autocorrelation, Fourier transform, convolution, or other operation on received measurements or a received pattern and comparing the resultant values or function against the profile within a margin of error; or the like. Additionally or alternatively, profile matching may include fuzzy matching between measured values and/or patterns and a database of profiles such that a profile with a highest level of confidence according to the fuzzy search. Moreover, as depicted in the example of
Any of the profile matching described above may include use of one or more machine learning techniques. For example, one or more artificial neural networks, random forest models, or other models trained on measurements annotated with product identifiers may process the measurements from the detection elements and identify products therefrom. In such embodiments, the one or more models may use additional or alternative input, such as images of the shelf (e.g., from capturing devices 125 of
Based on detected products and/or empty spaces, determined using the first signals and second signals, the one or more processors may determine one or more aspects of planogram compliance. For example, the one or more processors may identify products and their locations on the shelves, determine quantities of products within particular areas (e.g., identifying stacked or clustered products), identify facing directions associated with the products (e.g., whether a product is outward facing, inward facing, askew, or the like), or the like. Identification of the products may include identifying a product type (e.g., a bottle of soda, a loaf of broad, a notepad, or the like) and/or a product brand (e.g., a Coca-Cola® bottle instead of a Sprite® bottle, a Starbucks® coffee tumbler instead of a Tervis® coffee tumbler, or the like). Product facing direction and/or orientation, for example, may be determined based on a detected orientation of an asymmetric shape of a product base using pressure sensitive pads, detected density of products, etc. For example, the product facing may be determined based on locations of detected product bases relative to certain areas of a shelf (e.g., along a front edge of a shelf), etc. Product facing may also be determined using image sensors, light sensors, or any other sensor suitable for detecting product orientation.
The one or more processors may generate one or more indicators of the one or more aspects of planogram compliance. For example, an indicator may comprise a data packet, a data file, or any other data structure indicating any variations from a planogram, e.g., with respect to product placement such as encoding intended coordinates of a product and actual coordinates on the shelf, with respect to product facing direction and/or orientation such as encoding indicators of locations that have products not facing a correct direction and/or in an undesired orientation, or the like.
In addition to or as an alternative to determining planogram compliance, the one or more processors may detect a change in measurements from one or more detection elements. Such measurement changes may trigger a response. For example, a change of a first type may trigger capture of at least one image of the shelf (e.g., using capturing devices 125 of
With reference to
Moreover, although depicted as located on shelf 850, some detection elements may be located next to shelf 850 (e.g., for magnetometers or the like), across from shelf 850 (e.g., for image sensors or other light sensors, light detection and ranging (LIDAR) sensors, radio detection and ranging (RADAR) sensors, or the like), above shelf 850 (e.g., for acoustic sensors or the like), below shelf 850 (e.g., for pressure sensors, light detectors, or the like), or any other appropriate spatial arrangement. Further, although depicted as standalone in the example of
Detection elements associated with shelf 850 may be associated with different areas of shelf 850, e.g., area 855A, area 855B, or the like. Although depicted as rows, areas 855A and 855B may comprise any areas of shelf 850, whether contiguous (e.g., a square, a rectangular, or other regular or irregular shape) or not (e.g., a plurality of rectangles or other regular and/or irregular shapes).
One or more processors (e.g., processing device 202) in communication with the detection elements (e.g., detection elements 851A and 851B) may detect first signals associated with a first area and second signals associated with a second area. Any of the processing of the first and second signals described above with respect to
In both
With reference to
Method 1000 may include a step 1005 of receiving first signals from a first subset of detection elements (e.g., detection elements 801A and 801B of
As described above with respect to arrangements 910 and 940 of
In some embodiments, such as those including pressure sensors or other contact sensors as depicted in the example of
In embodiments including proximity sensors as depicted in the example of
Method 1000 may include step 1010 of using the first signals to identify at least one pattern associated with a product type of the plurality of products. For example, any of the pattern matching techniques described above with respect to
In some embodiments, step 1010 may further include accessing a memory storing data (e.g., memory device 226 of
In the example of
Additionally or alternatively, step 1010 may include using the at least one pattern to determine a number of products placed on the at least one area of the store shelf associated with the first subset of detection elements. For example, any of the pattern matching techniques described above may be used to identify the presence of one or more product types and then to determine the number of products of each product type (e.g., by detecting a number of similarly sized and shaped product bases and optionally by detecting weight signals associated with each detected base). In another example, an artificial neural network configured to determine the number of products of selected product types may be used to analyze the signals received by step 1005 (such as signals from pressure sensors, from light detectors, from contact sensors, and so forth) to determine the number of products of selected product types placed on an area of a shelf (such as an area of a shelf associated with the first subset of detection elements). In yet another example, a machine learning algorithm trained using training examples to determine the number of products of selected product types may be used to analyze the signals received by step 1005 (such as signals from pressure sensors, from light detectors, from contact sensors, and so forth) to determine the number of products of selected product types placed on an area of a shelf (such as an area of a shelf associated with the first subset of detection elements). Additionally or alternatively, step 1010 may include extrapolating from a stored pattern associated with a single product (or type of product) to determine the number of products matching the first signals. In such embodiments, step 1010 may further include determining, for example based on product dimension data stored in a memory, a number of additional products that can be placed on the at least one area of the store shelf associated with the second subset of detection elements. For example, step 1010 may include extrapolating based on stored dimensions of each product and stored dimensions of the shelf area to determine an area and/or volume available for additional products. Step 1010 may further include extrapolation of the number of additional products based on the stored dimensions of each product and determined available area and/or volume.
Method 1000 may include step 1015 of receiving second signals from a second subset of detection elements (e.g., detection elements 851A and 851B of
Method 1000 may include step 1025 of determining, based on the at least one pattern associated with a detected product and the at least one empty space, at least one aspect of planogram compliance. As explained above with respect to
For example, the at least one aspect may include product homogeneity, and step 1025 may further include counting occurrences where a product of the second type is placed on an area of the store shelf associated with the first type of product. For example, by accessing a memory including base patterns (or any other type of pattern associated with product types, such as product models), the at least one processor may detect different products and product types. A product of a first type may be recognized based on a first pattern, and product of a second type may be recognized based on a second, different pattern (optionally also based on weight signal information to aid in differentiating between products). Such information may be used, for example, to monitor whether a certain region of a shelf includes an appropriate or intended product or product type. Such information may also be useful in determining whether products or product types have been mixed (e.g., product homogeneity). Regarding planogram compliance, detection of different products and their relative locations on a shelf may aid in determining whether a product homogeneity value, ratio, etc. has been achieved. For example, the at least one processor may count occurrences where a product of a second type is placed on an area of the store shelf associated with a product of a first type.
Additionally or alternatively, the at least one aspect of planogram compliance may include a restocking rate, and step 1025 may further include determining the restocking rate based on a sensed rate at which products are added to the at least one area of the store shelf associated with the second subset of detection elements. Restocking rate may be determined, for example, by monitoring a rate at which detection element signals change as products are added to a shelf (e.g., when areas of a pressure sensitive pad change from a default value to a product-present value).
Additionally or alternatively, the at least one aspect of planogram compliance may include product facing, and step 1025 may further include determining the product facing based on a number of products determined to be placed on a selected area of the store shelf at a front of the store shelf. Such product facing may be determined by determining a number of products along a certain length of a front edge of a store shelf and determining whether the number of products complies with, for example, a specified density of products, a specified number of products, and so forth.
Step 1025 may further include transmitting an indicator of the at least one aspect of planogram compliance to a remote server. For example, as explained above with respect to
Method 1000 may further include additional steps. For example, method 1000 may include identifying a change in at least one characteristic associated with one or more of the first signals (e.g., signals from a first group or type of detection elements), and in response to the identified change, triggering an acquisition of at least one image of the store shelf. The acquisition may be implemented by activating one or more of capturing devices 125 of
Additionally or alternatively, method 1000 may be combined with method 1050 of
Method 1050 may include a step 1055 of determining a change in at least one characteristic associated with one or more first signals. For example, the first signals may have been captured as part of method 1000 of
Method 1050 may include step 1060 of using the first signals to identify at least one pattern associated with a product type of the plurality of products. For example, any of the pattern matching techniques described above with respect to
Method 1050 may include step 1065 of determining a type of event associated with the change. For example, a type of event may include a product removal, a product placement, movement of a product, or the like.
Method 1050 may include step 1070 of triggering an acquisition of at least one image of the store shelf when the change is associated with a first event type. For example, a first event type may include removal of a product, moving of a product, or the like, such that the first event type may trigger a product-related task for an employee of the retail store depending on analysis of the at least one image. The acquisition may be implemented by activating one or more of capturing devices 125 of
Method 1050 may include a step (not shown) of forgoing the acquisition of at least one image of the store shelf when the change is associated with a second event type. For example, a second event type may include replacement of a removed product by a customer, stocking of a shelf by an employee, or the like. As another example, a second event type may include removal, placement, or movement of a product that is detected within a margin of error of the detection elements and/or detected within a threshold (e.g., removal of only one or two products; movement of a product by less than 5 cm, 20 cm, or the like; moving of a facing direction by less than 10 degrees; or the like), such that no image acquisition is required.
In some embodiments, server 135 may provide market research entity 110 with information including shelf organization, analysis of skew productivity trends, and various reports aggregating information on products appearing across large numbers of retail stores 105. For example, as shown in
In some embodiments, server 135 may generate reports that summarize performance of the current assortment and the planogram compliance. These reports may advise supplier 115 of the category and the item performance based on individual SKU, sub segments of the category, vendor, and region. In addition, server 135 may provide suggestions or information upon which decisions may be made regarding how or when to remove markdowns and when to replace underperforming products. For example, as shown in
In some embodiments, server 135 may cause real-time automated alerts when products are out of shelf (or near out of shelf), when pricing is inaccurate, when intended promotions are absent, and/or when there are issues with planogram compliance, among others. In the example shown in FIG. 11C, GUI 1120 may include a first display area 1122 for showing the average scores (for certain metrics) of a specific retail store 105 over a selected period of time. GUI 1120 may also include a second display area 1124 for showing a map of the specific retail store 105 with real-time indications of selected in-store execution events that require attention, and a third display area 1126 for showing a list of the selected in-store execution events that require attention. In another example, shown in
Consistent with the present disclosure, the near real-time display of retail store 105 may be presented to the online customer in a manner enabling easy virtual navigation in retail store 105. For example, as shown in
In some embodiments, a method, such as methods 700, 720, 1000, 1050, 1200, 1300, 1400, 1500 and 1600 may comprise one or more steps. In some examples, these methods, as well as all individual steps therein, may be performed by various aspects of capturing device 125, server 135, a cloud platform, a computational node, and so forth. For example, a system comprising of at least one processor, such as processing device 202 and/or processing device 302, may perform any of these methods as well as all individual steps therein, for example by processing device 202 and/or processing device 302 executing software instructions stored within memory device 226 and/or memory device 314. In some examples, these methods, as well as all individual steps therein, may be performed by a dedicated hardware. In some examples, computer readable medium, such as a non-transitory computer readable medium, may store data and/or computer implementable instructions for carrying out any of these methods as well as all individual steps therein. Some non-limiting examples of possible execution manners of a method may include continuous execution (for example, returning to the beginning of the method once the method normal execution ends), periodically execution, executing the method at selected times, execution upon the detection of a trigger (some non-limiting examples of such trigger may include a trigger from a user, a trigger from another process, a trigger from an external device, etc.), and so forth.
In some embodiments, machine learning algorithms (also referred to as machine learning models in the present disclosure) may be trained using training examples, for example by Step 1010, Step 1204, Step 1208, Step 1210, Step 1304, Step 1306, Step 1404, Step 1406, Step 1506 and Step 1606, and in the cases described herein. Some non-limiting examples of such machine learning algorithms may include classification algorithms, data regressions algorithms, image segmentation algorithms, visual detection algorithms (such as object detectors, face detectors, person detectors, motion detectors, edge detectors, etc.), visual recognition algorithms (such as face recognition, person recognition, object recognition, etc.), speech recognition algorithms, mathematical embedding algorithms, natural language processing algorithms, support vector machines, random forests, nearest neighbors algorithms, deep learning algorithms, artificial neural network algorithms, convolutional neural network algorithms, recurrent neural network algorithms, linear machine learning models, non-linear machine learning models, ensemble algorithms, and so forth. For example, a trained machine learning algorithm may comprise an inference model, such as a predictive model, a classification model, a data regression model, a clustering model, a segmentation model, an artificial neural network (such as a deep neural network, a convolutional neural network, a recurrent neural network, etc.), a random forest, a support vector machine, and so forth. In some examples, the training examples may include example inputs together with the desired outputs corresponding to the example inputs. Further, in some examples, training machine learning algorithms using the training examples may generate a trained machine learning algorithm, and the trained machine learning algorithm may be used to estimate outputs for inputs not included in the training examples. In some examples, engineers, scientists, processes and machines that train machine learning algorithms may further use validation examples and/or test examples. For example, validation examples and/or test examples may include example inputs together with the desired outputs corresponding to the example inputs, a trained machine learning algorithm and/or an intermediately trained machine learning algorithm may be used to estimate outputs for the example inputs of the validation examples and/or test examples, the estimated outputs may be compared to the corresponding desired outputs, and the trained machine learning algorithm and/or the intermediately trained machine learning algorithm may be evaluated based on a result of the comparison. In some examples, a machine learning algorithm may have parameters and hyper parameters, where the hyper parameters may be set manually by a person or automatically by an process external to the machine learning algorithm (such as a hyper parameter search algorithm), and the parameters of the machine learning algorithm may be set by the machine learning algorithm based on the training examples. In some implementations, the hyper-parameters may be set based on the training examples and the validation examples, and the parameters may be set based on the training examples and the selected hyper-parameters. For example, given the hyper-parameters, the parameters may be conditionally independent of the validation examples.
In some embodiments, trained machine learning algorithms (also referred to as machine learning models and trained machine learning models in the present disclosure) may be used to analyze inputs and generate outputs, for example by Step 1010, Step 1204, Step 1208, Step 1210, Step 1304, Step 1306, Step 1404, Step 1406, Step 1506 and Step 1606, and in the cases described below. In some examples, a trained machine learning algorithm may be used as an inference model that when provided with an input generates an inferred output. For example, a trained machine learning algorithm may include a classification algorithm, the input may include a sample, and the inferred output may include a classification of the sample (such as an inferred label, an inferred tag, and so forth). In another example, a trained machine learning algorithm may include a regression model, the input may include a sample, and the inferred output may include an inferred value corresponding to the sample. In yet another example, a trained machine learning algorithm may include a clustering model, the input may include a sample, and the inferred output may include an assignment of the sample to at least one cluster. In an additional example, a trained machine learning algorithm may include a classification algorithm, the input may include an image, and the inferred output may include a classification of an item depicted in the image. In yet another example, a trained machine learning algorithm may include a regression model, the input may include an image, and the inferred output may include an inferred value corresponding to an item depicted in the image (such as an estimated property of the item, such as size, volume, age of a person depicted in the image, cost of a product depicted in the image, and so forth). In an additional example, a trained machine learning algorithm may include an image segmentation model, the input may include an image, and the inferred output may include a segmentation of the image. In yet another example, a trained machine learning algorithm may include an object detector, the input may include an image, and the inferred output may include one or more detected objects in the image and/or one or more locations of objects within the image. In some examples, the trained machine learning algorithm may include one or more formulas and/or one or more functions and/or one or more rules and/or one or more procedures, the input may be used as input to the formulas and/or functions and/or rules and/or procedures, and the inferred output may be based on the outputs of the formulas and/or functions and/or rules and/or procedures (for example, selecting one of the outputs of the formulas and/or functions and/or rules and/or procedures, using a statistical measure of the outputs of the formulas and/or functions and/or rules and/or procedures, and so forth).
In some embodiments, artificial neural networks may be configured to analyze inputs and generate corresponding outputs, for example by Step 1010, Step 1210, Step 1306, Step 1406, Step 1506 and Step 1606, and in the cases described below. Some non-limiting examples of such artificial neural networks may comprise shallow artificial neural networks, deep artificial neural networks, feedback artificial neural networks, feed forward artificial neural networks, autoencoder artificial neural networks, probabilistic artificial neural networks, time delay artificial neural networks, convolutional artificial neural networks, recurrent artificial neural networks, long short term memory artificial neural networks, and so forth. In some examples, an artificial neural network may be configured manually. For example, a structure of the artificial neural network may be selected manually, a type of an artificial neuron of the artificial neural network may be selected manually, a parameter of the artificial neural network (such as a parameter of an artificial neuron of the artificial neural network) may be selected manually, and so forth. In some examples, an artificial neural network may be configured using a machine learning algorithm. For example, a user may select hyper-parameters for the an artificial neural network and/or the machine learning algorithm, and the machine learning algorithm may use the hyper-parameters and training examples to determine the parameters of the artificial neural network, for example using back propagation, using gradient descent, using stochastic gradient descent, using mini-batch gradient descent, and so forth. In some examples, an artificial neural network may be created from two or more other artificial neural networks by combining the two or more other artificial neural networks into a single artificial neural network.
Some non-limiting examples of image data may include images, grayscale images, color images, 2D images, 3D images, videos, 2D videos, 3D videos, frames, footages, data derived from other image data, and so forth. In some embodiments, analyzing image data (for example by the methods, steps and modules described herein, such as Step 724, Step 1210, Step 1306, Step 1406, Step 1506 and Step 1606) may comprise analyzing the image data to obtain a preprocessed image data, and subsequently analyzing the image data and/or the preprocessed image data to obtain the desired outcome. One of ordinary skill in the art will recognize that the followings are examples, and that the image data may be preprocessed using other kinds of preprocessing methods. In some examples, the image data may be preprocessed by transforming the image data using a transformation function to obtain a transformed image data, and the preprocessed image data may comprise the transformed image data. For example, the transformed image data may comprise one or more convolutions of the image data. For example, the transformation function may comprise one or more image filters, such as low-pass filters, high-pass filters, band-pass filters, all-pass filters, and so forth. In some examples, the transformation function may comprise a nonlinear function. In some examples, the image data may be preprocessed by smoothing at least parts of the image data, for example using Gaussian convolution, using a median filter, and so forth. In some examples, the image data may be preprocessed to obtain a different representation of the image data. For example, the preprocessed image data may comprise: a representation of at least part of the image data in a frequency domain; a Discrete Fourier Transform of at least part of the image data; a Discrete Wavelet Transform of at least part of the image data; a time/frequency representation of at least part of the image data; a representation of at least part of the image data in a lower dimension; a lossy representation of at least part of the image data; a lossless representation of at least part of the image data; a time ordered series of any of the above; any combination of the above; and so forth. In some examples, the image data may be preprocessed to extract edges, and the preprocessed image data may comprise information based on and/or related to the extracted edges. In some examples, the image data may be preprocessed to extract image features from the image data. Some non-limiting examples of such image features may comprise information based on and/or related to: edges; corners; blobs; ridges; Scale Invariant Feature Transform (SIFT) features; temporal features; and so forth. In some examples, analyzing the image data may include calculating at least one convolution of at least a portion of the image data, and using the calculated at least one convolution to calculate at least one resulting value and/or to make determinations, identifications, recognitions, classifications, and so forth.
In some embodiments, analyzing image data (for example by the methods, steps and modules described herein, such as Step 724, Step 1210, Step 1306, Step 1406, Step 1506 and Step 1606) may comprise analyzing the image data and/or the preprocessed image data using one or more rules, functions, procedures, artificial neural networks, object detection algorithms, face detection algorithms, visual event detection algorithms, action detection algorithms, motion detection algorithms, background subtraction algorithms, inference models, and so forth. Some non-limiting examples of such inference models may include: an inference model preprogrammed manually; a classification model; a regression model; a result of training algorithms, such as machine learning algorithms and/or deep learning algorithms, on training examples, where the training examples may include examples of data instances, and in some cases, a data instance may be labeled with a corresponding desired label and/or result; and so forth. In some embodiments, analyzing image data (for example by the methods, steps and modules described herein, such as Step 724, Step 1210, Step 1306, Step 1406, Step 1506 and Step 1606) may comprise analyzing pixels, voxels, point cloud, range data, etc. included in the image data.
Some non-limiting examples of infrared data (also referred to as infrared input data in the present disclosure) may include any data captured using infrared sensors. Some non-limiting examples of infrared sensors may include at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors, photovoltaic infrared sensors and thermographic cameras. For example, an infrared sensor may include a radiation-sensitive optoelectronic component with a spectral sensitivity in the infrared wavelength range (780 nm to 50 μm). In some examples, the infrared data may be or include an infrared image and/or an infrared video, and any technique for analyzing image data may be used to analyze the infrared image and/or the infrared video, including the image analysis techniques described above. In some examples, the infrared data may be or include a time series data of a plurality of data instances captured using infrared sensors and indexed in time order, and any technique for analyzing time series data may be used to analyze the infrared data. In some examples, the infrared data may be or include a single measured value, and the analysis of the infrared data may include basing a determination on the single measured value. In some embodiments, analyzing infrared data (for example by the methods, steps and modules described herein, such as Step 1204, Step 1208, Step 1404 and Step 1506) may comprise analyzing the infrared data to obtain a preprocessed infrared data, and subsequently analyzing the infrared data and/or the preprocessed infrared data to obtain the desired outcome. One of ordinary skill in the art will recognize that the followings are examples, and that the infrared data may be preprocessed using other kinds of preprocessing methods. In some examples, the infrared data may be preprocessed by transforming the infrared data using a transformation function to obtain a transformed infrared data, and the preprocessed infrared data may comprise the transformed infrared data. For example, the transformed infrared data may comprise one or more convolutions of the infrared data. For example, the transformation function may comprise at least one of low-pass filters, high-pass filters, band-pass filters, all-pass filters, and so forth. In some examples, the transformation function may comprise a nonlinear function. In some examples, the infrared data may be preprocessed by smoothing at least parts of the infrared data, for example using Gaussian convolution, using a median filter, and so forth. In some examples, the infrared data may be preprocessed to obtain a different representation of the infrared data. For example, the preprocessed infrared data may comprise: a representation of at least part of the infrared data in a lower dimension; a lossy representation of at least part of the infrared data; a lossless representation of at least part of the infrared data; a time ordered series of any of the above; any combination of the above; and so forth. In some examples, analyzing the infrared data may include calculating at least one convolution of at least a portion of the infrared data, and using the calculated at least one convolution to calculate at least one resulting value and/or to make determinations, identifications, recognitions, classifications, and so forth.
In some embodiments, analyzing infrared data (for example by the methods, steps and modules described herein, such as Step 1204, Step 1208, Step 1404 and Step 1506) may comprise analyzing the infrared data and/or the preprocessed infrared data using one or more rules, functions, procedures, artificial neural networks, object detection algorithms, motion detection algorithms, inference models, and so forth. Some non-limiting examples of such inference models may include: an inference model preprogrammed manually; a classification model; a regression model; a result of training algorithms, such as machine learning algorithms and/or deep learning algorithms, on training examples, where the training examples may include examples of data instances, and in some cases, a data instance may be labeled with a corresponding desired label and/or result; and so forth.
In some embodiments, infrared data may be captured using one or more infrared sensors (for example by the methods, steps and modules described herein, such as Step 1202, Step 1206, Step 1402 and 1502). Some non-limiting examples of such infrared sensors may include at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors and photovoltaic infrared sensors. In some examples, at least one of the one or more infrared sensors may be positioned on one side of an aisle fixedly mounted thereon and directed such that they may capture infrared data of the middle of the aisle and/or of the opposing side of aisle. For example, the at least one of the one or more infrared sensors may be positioned on one side of aisle 400, for example in a similar fashion to capturing devices 125A, 125B, and 125C as illustrated in
Some non-limiting examples of vibration data may include any data captured using vibration sensors. Some non-limiting examples of vibration sensors may include at least one of accelerometers, piezoelectric sensors, piezoresistive sensors, capacitive MEMS sensors, displacement sensors, velocity sensors, laser based vibration sensors, and so forth. In some examples, the vibration data may be or include a vibration image and/or a vibration video, and any technique for analyzing image data may be used to analyze the vibration image and/or the vibration video, including the image analysis techniques described above. In some examples, the vibration data may be or include a time series data of a plurality of data instances captured using vibration sensors and indexed in time order, and any technique for analyzing time series data may be used to analyze the vibration data. In some examples, the vibration data may be or include a single measured value, and the analysis of the infrared data may include basing a determination on the single measured value. In some embodiments, analyzing vibration data (for example by the methods, steps and modules described herein, such as Step 1304 and Step 1606) may comprise analyzing the vibration data to obtain a preprocessed vibration data, and subsequently analyzing the vibration data and/or the preprocessed vibration data to obtain the desired outcome. One of ordinary skill in the art will recognize that the followings are examples, and that the vibration data may be preprocessed using other kinds of preprocessing methods. In some examples, the vibration data may be preprocessed by transforming the vibration data using a transformation function to obtain a transformed vibration data, and the preprocessed vibration data may comprise the transformed vibration data. For example, the transformed vibration data may comprise one or more convolutions of the vibration data. For example, the transformation function may comprise at least one of low-pass filters, high-pass filters, band-pass filters, all-pass filters, and so forth. In some examples, the transformation function may comprise a nonlinear function. In some examples, the vibration data may be preprocessed by smoothing at least parts of the vibration data, for example using Gaussian convolution, using a median filter, and so forth. In some examples, the vibration data may be preprocessed to obtain a different representation of the vibration data. For example, the preprocessed vibration data may comprise: a representation of at least part of the vibration data in a lower dimension; a lossy representation of at least part of the vibration data; a lossless representation of at least part of the vibration data; a time ordered series of any of the above; any combination of the above; and so forth. In some examples, analyzing the vibration data may include calculating at least one convolution of at least a portion of the vibration data, and using the calculated at least one convolution to calculate at least one resulting value and/or to make determinations, identifications, recognitions, classifications, and so forth.
In some embodiments, analyzing vibration data (for example by the methods, steps and modules described herein, such as Step 1304 and Step 1606) may comprise analyzing the vibration data and/or the preprocessed vibration data using one or more rules, functions, procedures, artificial neural networks, object detection algorithms, motion detection algorithms, inference models, and so forth. Some non-limiting examples of such inference models may include: an inference model preprogrammed manually; a classification model; a regression model; a result of training algorithms, such as machine learning algorithms and/or deep learning algorithms, on training examples, where the training examples may include examples of data instances, and in some cases, a data instance may be labeled with a corresponding desired label and/or result; and so forth.
In some embodiments, vibration data may be captured using one or more vibration sensors (for example by the methods, steps and modules described herein, such as Step 1302 and Step 1602). Some non-limiting examples of such vibration sensors may include at least one of an accelerometer, a piezoelectric sensor, a piezoresistive sensor, a capacitive MEMS sensor, a displacement sensor, a velocity sensor, a laser based vibration sensor, and so forth. In some examples, at least one of the one or more vibration sensors may be physically connected to at least one retail shelve, for example above the at least one retail shelve, below the at least one retail shelve, to the side of at least one retail shelve, to an internal part of the at least one retail shelve, and so forth. For example, the at least one of the one or more vibration sensors may be physically connected to retail shelf 622E, for example in a similar fashion to housing 504I as illustrated in
Image processing of images and videos captured from a retail environment may be a burdening task. Processing the images and videos in the retail environment may require placing expensive hardware in the retail environment. Further, image and video processing may consume significant amount of power, which may be challenging for battery powered systems. On the other hand, transmitting images and videos to a remove system (such as a server or a cloud platform) for processing may be challenging due to the large size of images and videos. Therefore, it is desired to reduce the number of images and videos processed, and to limit the parts of the images and videos that are transmitted or processed, to the images and videos, or the parts of the images and videos that include relevant information.
In some examples, systems, methods and computer-readable media for triggering image processing based on infrared data analysis are provided.
In some examples, Step 1202 may comprise receiving first infrared input data captured using a first group of one or more infrared sensors. For example, receiving the first infrared input data by Step 1202 may comprise at least one of reading the first infrared input data, receiving the first infrared input data from an external device (for example, using a digital communication device), capturing the first infrared input data using the first group of one or more infrared sensors, and so forth. In some examples, the first group of one or more infrared sensors may be a group of at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors and photovoltaic infrared sensors. In one example, the first group of one or more infrared sensors may be a group of one or more passive infrared sensors. In some examples, the first group of one or more infrared sensors may be a group of one or more infrared sensors positioned below a second retail shelf. In one example, the second retail shelf may be positioned above the retail shelf. For example, the first group of one or more infrared sensors may be a group of one or more infrared sensors mounted to the second retail shelf, mounted to a surface (for example, of a wall, of a rack, etc.) connecting the second retail shelf and the retail shelf, and so forth.
In some examples, Step 1204 may comprise analyzing the first infrared input data received by Step 1202 to detect an engagement of a person with a retail shelf. In one example, a machine learning model may be trained using training examples to detect engagements of people with retail shelves from infrared data. An example of such training example may include sample infrared data, together with a label indicating whether the sample infrared data corresponds to an engagement of a person with a retail shelf. In one example, Step 1204 may use the trained machine learning model to analyze the first infrared input data received by Step 1202 to detect the engagement of the person with the retail shelf. In another example, Step 1204 may compare the first infrared input data or a preprocessed version of the first infrared input data (such as a function of the first infrared input data) with a threshold, and may use a result of the comparison to detect the engagement of the person with the retail shelf. For example, the threshold may differentiate between an ambient temperature of an environment of the retail shelf and a typical human body temperature. In an additional example, the threshold may be selected based on a statistical measure of infrared data captured using the first group of one or more infrared sensors of Step 1202 over time. In some examples, Step 1204 may calculate a convolution of at least part of the first infrared input data received by Step 1202. Further, in response to a first value of the calculated convolution of the at least part of the first infrared input data, Step 1204 may detect the engagement of a person with a retail shelf, and in response to a second value of the calculated convolution of the at least part of the first infrared input data, Step 1204 may forgo detecting the engagement of a person with a retail shelf.
In some examples, Step 1206 may comprise receiving second infrared input data captured using a second group of one or more infrared sensors after the capturing of the second infrared input data by Step 1202. For example, receiving the second infrared input data by Step 1202 may comprise at least one of reading the second infrared input data, receiving the second infrared input data from an external device (for example, using a digital communication device), capturing the second infrared input data using the second group of one or more infrared sensors, and so forth. In some examples, the second group of one or more infrared sensors may be a group of at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors and photovoltaic infrared sensors. In one example, the second group of one or more infrared sensors may be a group of one or more passive infrared sensors. In one example, the first group of one or more infrared sensors may be identical to the second group of one or more infrared sensors. In another example, the first group of one or more infrared sensors may differ from the second group of one or more infrared sensors. In yet another example, the first group of one or more infrared sensors and the second group of one or more infrared sensors may include at least one common infrared sensor. In an additional example, the first group of one or more infrared sensors and the second group of one or more infrared sensors may include no common infrared sensor. In some examples, the second group of one or more infrared sensors may be a group of one or more infrared sensors positioned below a second retail shelf. In one example, the second retail shelf may be positioned above the retail shelf. For example, the second group of one or more infrared sensors may be a group of one or more infrared sensors mounted to the second retail shelf, mounted to a surface (for example, of a wall, of a rack, etc.) connecting the second retail shelf and the retail shelf, and so forth.
In some examples, Step 1208 may comprise analyzing the second infrared input data received by Step 1206 to determine a completion of the engagement of the person with the retail shelf detected by Step 1204. In one example, a machine learning model may be trained using training examples to determine completions of engagements of people with retail shelves from infrared data. An example of such training example may include sample infrared data, together with a label indicating whether the sample infrared data corresponds to a completion of an engagement of a person with a retail shelf. In one example, Step 1208 may use the trained machine learning model to analyze the second infrared input data received by Step 1206 to determine the completion of the engagement of the person with the retail shelf. In another example, Step 1204 may compare the second infrared input data or a preprocessed version of the second infrared input data (such as a function of the second infrared input data) with a threshold, and may use a result of the comparison to determine the completion of the engagement of the person with the retail shelf. For example, the threshold may differentiate between an ambient temperature of an environment of the retail shelf and a typical human body temperature. In another example, the threshold may be selected based on an analysis of the first infrared input data received by Step 1202, for example, based on a value of a statistical measure of the first infrared input data. In an additional example, the threshold may be selected based on a statistical measure of infrared data captured using the second group of one or more infrared sensors of Step 1206 over time. In yet another example, the threshold of Step 1208 may be identical or different from the threshold of Step 1204. In some examples, the determination of the completion of the engagement of the person with the retail shelf by Step 1208 may be a determination that the person cleared an environment of the retail shelf. In some examples, Step 1208 may calculate a convolution of at least part of the second infrared input data received by Step 1206. Further, in response to a first value of the calculated convolution of the at least part of the second infrared input data, Step 1208 may determine a completion of the engagement of the person with the retail shelf detected by Step 1204, and in response to a second value of the calculated convolution of the at least part of the second infrared input data, Step 1208 may determine that the engagement of the person with the retail shelf detected by Step 1204 is not completed.
In some examples, Step 1210 may comprise, for example in response to the determined completion of the engagement of the person with the retail shelf by Step 1208, analyzing at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf. The analysis of the at least one image of the retail shelf may include any image analysis described herein. For example, Step 1210 may analyze the at least one image of the retail shelf using at least one of image processing instructions 232, Step 724, Step 726 and Step 728. In another example, Step 1210 may analyze the at least one image of the retail shelf using any of the techniques for analyzing image data described above. In yet another example, Step 1210 may analyze the at least one image of the retail shelf using at least one of an image classification algorithm, an object recognition algorithm, a product recognition algorithm, a label recognition algorithm, a logo recognition algorithm and a semantic segmentation algorithm. In some examples, a machine learning model may be trained using training examples to analyze images. An example of such training example may include a sample image, together with a label indicating a desired outcome corresponding to the analysis of the sample image. In one example, Step 1210 may use the trained machine learning model to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to obtain an outcome of the analysis. In some example, Step 1210 may use an artificial neural network to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to obtain an outcome of the analysis, for example as described above. In some examples, Step 1210 may base the analysis of the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf on a calculated convolution of at least part of the at least one image. In some examples, for example in response to the determined completion of the engagement of the person with the retail shelf by Step 1208, Step 1210 may further comprise triggering the capturing of the at least one image of the retail shelf using the at least one image sensor. In some examples, the at least one image sensor of Step 1210 may be at least one image sensor mounted to a second retail shelf. For example, the second retail shelf may be positioned on an opposite side of an aisle from the retail shelf. In another example, the second retail shelf may be positioned above the retail shelf. In yet another example, the second retail shelf may be positioned above the retail shelf and the at least one image sensor may be positioned below the second retail shelf. In some examples, the at least one image sensor of Step 1210 may be at least one image sensor mounted to an image capturing robot. In some examples, the at least one image sensor of Step 1210 may be at least one image sensor mounted to a ceiling of a retail store. In some examples, the at least one image sensor of Step 1210 may be part of a personal mobile device.
In some examples, Step 1212 may comprise using the analysis of the at least one image to determine a state of the retail shelf. In some example, Step 1210 may analyze the at least one image to obtain an outcome of the analysis. In one example, in response to a first outcome of the analysis of Step 1210, Step 1212 may determine a first state of the retail shelf, and in response to a second outcome of the analysis of Step 1210, Step 1212 may determine a second state of the retail shelf, the second state of the retail shelf may differ from the first state of the retail shelf. In some examples, Step 1210 may recognize products and/or labels associated with the retail shelf, and Step 1212 may determine the state of the retail shelf based on the products and/or labels associated with the retail shelf. In some examples, a machine learning model may be trained using training examples to determine state of retail shelves from images. An example of such training example may include a sample image of a sample retail shelf, together with a label indicating a state of the sample retail shelf. In one example, Steps 1210 and 1212 may use the trained machine learning model to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to determine the state of the retail shelf. In some example, Steps 1210 and 1212 may use an artificial neural network to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to determine the state of the retail shelf. In some example, Steps 1210 and 1212 may use an image classification model to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to determine the state of the retail shelf, for example where each class of the classification model correspond to a different state of the retail shelf. In some example, Steps 1210 and 1212 may use a regression model to analyze the at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf to determine at least one aspect the state of the retail shelf (such as number of product on the retail shelf, score corresponding to the retail shelf, size of an empty space on the retail shelf, and so forth). In some examples, the state of the retail shelf determined by Step 1212 may include an inventory data associated with products on the retail shelf after the engagement of the person with the retail shelf. In some examples, the state of the retail shelf determined by Step 1212 may include facings data associated with products on the retail shelf after the engagement of the person with the retail shelf. In some examples, the state of the retail shelf determined by Step 1212 may include planogram compliance status associated with the retail shelf after the engagement of the person with the retail shelf. In some examples, the state of the retail shelf determined by Step 1212 may include empty space indication associated with the retail shelf after the engagement of the person with the retail shelf.
In some examples, Step 1212 may comprise using the analysis of the at least one image by Step 1210 and an analysis of one or more images of the retail shelf captured using the at least one image sensor before the engagement of the person with the retail shelf to determine a change associated with the retail shelf during the engagement of the person with the retail shelf. Some non-limiting examples of such change may include a product placed on the retail shelf, a product moved from one position on the retail shelf to another position on the retail shelf, a product removed from the retail shelf, and so forth. For example, Step 1212 may compare the state of the retail shelf before the engagement of the person with the retail shelf (determined based on the analysis of the at least one image by Step 1210) and the state of the retail shelf after the completion of the engagement of the person with the retail shelf (determined based on the analysis of the one or more images of the retail shelf captured using the at least one image sensor before the engagement of the person with the retail shelf) to determine the change associated with the retail shelf during the engagement of the person with the retail shelf. In another example, Steps 1210 and 1212 may compare the at least one image of Step 1210 and the one or more images of the retail shelf captured using the at least one image sensor before the engagement of the person with the retail shelf to determine the change associated with the retail shelf during the engagement of the person with the retail shelf.
In some examples, Step 1204 may further comprise analyzing the first infrared input data received by Step 1202 to determine a type of the engagement of the person with the retail shelf. For example, a classification model may be used to analyze the first infrared input data received by Step 1202 and classify it to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different type of engagement. In one example, in response to a first determined type of the engagement, Step 1210 may trigger the analyzing the at least one image of the retail shelf, and in response to a second determined type of the engagement, method 1200 may forgo analyzing the at least one image of the retail shelf. In another example, in response to a first determined type of the engagement, Step 1210 may include a first analysis step in the analysis of the at least one image of the retail shelf, and in response to a second determined type of the engagement (and may exclude a second analysis step from the analysis of the at least one image of the retail shelf), Step 1210 may include the second analysis step in the analysis of the at least one image of the retail shelf (and may exclude the first analysis step from the analysis of the at least one image of the retail shelf), the second analysis step may differ from the first analysis step. In one example, the first type of engagement may include a physical contact (for example, with items placed on the retail shelf, with the retail shelf, with items associated with the retail shelf, etc.), and the second type of engagement may include no physical contact. In another example, the first type of engagement may include engagement associated with a first portion of the retail shelf, and the second type of engagement may include engagement associated with a second portion of the retail shelf. In yet another example, the first type of engagement from a first distance, and the second type of engagement may include engagement from a second distance. In an additional example, the first type of engagement may include engagement associated with a first time duration, and the second type of engagement may include engagement associated with a second time duration.
In some examples, for example in response to the detected engagement of a person with a retail shelf, method 1200 may analyze one or more images of the retail shelf captured before the completion of the engagement of the person with the retail shelf to determine at least one aspect of the engagement. For example, the at least one aspect of the engagement may include a change associated with the retail shelf during the engagement of the person with the retail shelf, as described above. In another example, the at least one aspect of the engagement may include at least one of a product type associated with the engagement (such as a product type of a product taken from the retail shelf during the engagement, a product type of a product placed on the retail shelf during the engagement, a product moved from one location to another on the retail shelf during the engagement, etc.), a quantity of products associated with the engagement (such as a quantity of products of products taken from the retail shelf during the engagement, a quantity of products of products placed on the retail shelf during the engagement, a quantity of products moved from one location to another on the retail shelf during the engagement, etc.), and so forth. In one example, method 1200 may further comprise updating a virtual shopping cart associated with the person based on the determined at least one aspect of the engagement (for example, based on the determined product type, based on the determined quantity of products, and so forth). In one example, Step 1212 may further comprise using the analysis of the at least one image captured after the completion of the engagement of the person with the retail shelf and the determined at least one aspect of the engagement to determine the state of the retail shelf.
In some examples, systems, methods and computer-readable media for triggering image processing based on vibration data analysis are provided.
In some examples, Step 1302 may comprise receiving vibration data captured using one or more vibration sensors mounted to a shelving unit including a plurality of retail shelves. For example, receiving the vibration data by Step 1302 may comprise at least one of reading the vibration data, receiving the vibration data from an external device (for example, using a digital communication device), capturing the vibration data using the one or more vibration sensors, and so forth.
In some examples, Step 1304 may comprise analyzing the vibration data to determine whether a vibration is a result of an engagement of a person with at least one retail shelf of the plurality of retail shelves. In one example, a machine learning model may be trained using training examples to determine whether vibrations are result of engagement of people with retail shelves. An example of such training example may include sample vibration data, together with a label indicating whether the sample vibration data corresponds to engagement of people with retail shelves. In one example, Step 1304 may use the trained machine learning model to analyze the vibration data received by Step 1302 to determine whether the vibration is the result of an engagement of a person with at least one retail shelf of the plurality of retail shelves. In another example, Step 1304 may compare the vibration data or a preprocessed version of the vibration data (such as a function of the vibration data) with a threshold, and may use a result of the comparison to determine whether the vibration is the result of an engagement of a person with at least one retail shelf of the plurality of retail shelves. For example, the threshold may differentiate between an ambient vibrations from an environment of the retail shelf and vibrations originating from the retail shelf. In an additional example, the threshold may be selected based on a statistical measure of historic vibration data captured using the one or more vibration sensors of Step 1302 over time. In some examples, Step 1304 may calculate a convolution of at least part of the vibration data received by Step 1302. Further, in response to a first value of the calculated convolution of the at least part of the vibration data, Step 1304 may determine that the vibration is the result of an engagement of a person with at least one retail shelf of the plurality of retail shelves, and in response to a second value of the calculated convolution of the at least part of the vibration data, Step 1304 may determine that the vibration is not the result of an engagement of a person with at least one retail shelf of the plurality of retail shelves.
In some examples, Step 1306 may comprise, for example in response to a determination by Step 1304 that the vibration is the result of the engagement of the person with the at least one retail shelf of the plurality of retail shelves, triggering analysis of at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves. In some examples, Step 1308 may comprise, for example in response to a determination by Step 1304 that the vibration is not the result of the engagement of the person with the at least one retail shelf of the plurality of retail shelves, forgoing triggering the analysis of the at least one image In some examples, the triggering of the analysis of the at least one image may comprise transmitting a signal (for example to an external device) configured to cause the analysis of the at least one image (for example by the external device), performing the analysis of the at least one image, storing a selected value at a selected location in a memory configured to cause another process to perform the analysis of the at least one image, and so forth. The analysis of the at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves may include any image analysis described herein. For example, Step 1306 may analyze the at least one image of the at least part of the plurality of retail shelves using at least one of image processing instructions 232, Step 724, Step 726 and Step 728. In another example, Step 1306 may analyze the at least one image of the at least part of the plurality of retail shelves using any of the techniques for analyzing image data described above. In yet another example, Step 1306 may analyze the at least one image of the at least part of the plurality of retail shelves using at least one of an image classification algorithm, an object recognition algorithm, a product recognition algorithm, a label recognition algorithm, a logo recognition algorithm and a semantic segmentation algorithm. In some examples, a machine learning model may be trained using training examples to analyze images. An example of such training example may include a sample image, together with a label indicating a desired outcome corresponding to the analysis of the sample image. In one example, Step 1306 may use the trained machine learning model to analyze the at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves to obtain an outcome of the analysis. In some example, Step 1306 may use an artificial neural network to analyze the at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves to obtain an outcome of the analysis, for example as described above. In some examples, Step 1306 may base the analysis of the at least one image of at least part of the plurality of retail shelves captured after the beginning of the engagement of the person with the at least one retail shelf of the plurality of retail shelves on a calculated convolution of at least part of the at least one image. Additionally or alternatively to triggering analysis of at least one image, Step 1306 may comprise, for example in response to the determination by Step 1304 that the vibration is the result of the engagement of the person with the at least one retail shelf, triggering capturing of the at least one image of the at least part of the plurality of retail shelves, and in some examples, Step 1308 may comprise, for example in response to the determination by Step 1304 that the vibration is not the result of the engagement of the person with the at least one retail shelf, forgoing triggering the capturing of the at least one image.
In some examples, Step 1310 may comprise providing information based on a result of the analysis triggered by Step 1306 of the at least one image of the at least part of the plurality of retail shelves. For example, providing the information based on the based on the result of the analysis triggered by Step 1306 of the at least one image of the at least part of the plurality of retail shelves may comprise at least one of storing the information in memory, transmitting the information to an external device, providing the information to a user (for example, visually, audibly, textually, through a user interface, etc.), and so forth.
In some examples, the plurality of retail shelves of method 1300 may include at least a first retail shelf and a second retail shelf. Additionally or alternatively to Step 1304, method 1304 may comprise analyzing the vibration data to determine that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves. In one example, a machine learning model may be trained using training examples to determine particular retail shelves corresponding to engagement of people from vibration data. An example of such training example may include sample vibration data, together with a label indicating a particular retail shelf corresponding to engagement corresponding to the sample vibration data of a plurality of alternative retail shelves. In one example, method 1300 may use the trained machine learning model to analyze the vibration data received by Step 1302 to determine that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves. In another example, method 1300 may compare the vibration data or a preprocessed version of the vibration data (such as a function of the vibration data) with a threshold, and may use a result of the comparison to determine that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves. In some examples, method 1300 may calculate a convolution of at least part of the vibration data received by Step 1302. Further, in response to a first value of the calculated convolution of the at least part of the vibration data, method 1300 may determine that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves, and in response to a second value of the calculated convolution of the at least part of the vibration data, method 1300 may determine that the vibration is not a result of an engagement with the first retail shelf of the plurality of retail shelves and/or that the vibration is a result of an engagement with the second retail shelf of the plurality of retail shelves. Further, in some examples, for example in response to the determination that the vibration is a result of an engagement with the first retail shelf of the plurality of retail shelves and not a result of an engagement with the second retail shelf of the plurality of retail shelves, method 1300 may avoid including images depicting the second shelf in the at least one image of Steps 1306, 1308 and 1310.
In some examples, the at least one image of method 1300 may be captured using at least one image sensor mounted to a retail shelf not included the at least one retail shelf. In one example, the retail shelf not included the at least one retail shelf may be on an opposite side of an aisle from the at least one retail shelf, for example as illustrated in
Additionally or alternatively to determining whether the vibration is the result of an engagement of a person with the at least one retail shelf, Step 1304 may analyze the vibration data received by Step 1302 to determine a type of the engagement of the person with the at least one retail shelf. For example, a classification model may be used to analyze the vibration data received by Step 1302 and classify it to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different type of engagement. In one example, in response to a first determined type of the engagement, Step 1306 may trigger the analysis of the at least one image of the at least part of the plurality of retail shelves, and in response to a second determined type of the engagement, Step 1308 may forgo triggering the analysis of the at least one image of the at least part of the plurality of retail shelves. In another example, in response to a first determined type of the engagement, Step 1306 may include a first analysis step in the analysis of the at least one image of the at least part of the plurality of retail shelves (and may exclude a second analysis step from the analysis of the at least one image of the at least part of the plurality of retail shelves), and in response to a second determined type of the engagement, Step 1306 may include the second analysis step in the analysis of the at least one image of the at least part of the plurality of retail shelves (and may exclude the first analysis step from the analysis of the at least one image of the at least part of the plurality of retail shelves), the second analysis step may differ from the first analysis step. In one example, the first type of engagement may include a physical contact (for example, with items placed on the retail shelf, with the retail shelf, with items associated with the retail shelf, etc.), and the second type of engagement may include no physical contact. In another example, the first type of engagement may include engagement associated with a first portion of the at least one retail shelf, and the second type of engagement may include engagement associated with a second portion of the at least one retail shelf. In yet another example, the first type of engagement may include engagement associated with a first type of action, and the second type of engagement may include engagement associated with a second type of action. Some non-limiting examples of such types of actions may include removal of at least one item (such as a product) from the at least one retail shelf, placement of at least one item (such as a product) on the at least one retail shelf, repositioning of at least one item (such as a product) on the at least one retail shelf, and so forth.
In some examples, the at least one image of method 1300 may be at least one image of the at least part of the plurality of retail shelves captured after a completion of the engagement of the person with the at least one retail shelf. In one example, Step 1304 may comprise analyzing the vibration data to determine the completion of the engagement of the person with the at least one retail shelf from vibration data. In one example, a machine learning model may be trained using training examples to determine completion of engagement of people with retail shelves. An example of such training example may include sample vibration data, together with a label indicating whether the sample vibration data corresponds to completion of engagement of a person with a retail shelf. In one example, Step 1304 may use the trained machine learning model to analyze the vibration data received by Step 1302 to determine the completion of the engagement of the person with the at least one retail shelf. In another example, Step 1304 may compare the vibration data or a preprocessed version of the vibration data (such as a function of the vibration data) with a threshold, and may use a result of the comparison to determine the completion of the engagement of the person with the at least one retail shelf. For example, the threshold may differentiate between an ambient vibrations from an environment of the retail shelf and vibrations resulting from such engagement. In an additional example, the threshold may be selected based on a statistical measure of historic vibration data captured using the one or more vibration sensors of Step 1302 over time. In some examples, Step 1304 may calculate a convolution of at least part of the vibration data received by Step 1302. Further, in response to a first value of the calculated convolution of the at least part of the vibration data, Step 1304 may determine the completion of the engagement of the person with the at least one retail shelf, and in response to a second value of the calculated convolution of the at least part of the vibration data, Step 1304 may forgo the determination of the completion of the engagement of the person with the at least one retail shelf.
In some examples, the at least one image of method 1300 may be at least one image of the at least part of the plurality of retail shelves captured after a completion of the engagement of the person with the at least one retail shelf. In some examples, method 1300 may comprise analyzing one or more images of the at least one retail shelf to determine the completion of the engagement of the person with the at least one retail shelf. In one example, a machine learning model may be trained using training examples to determine completion of engagement of people with retail shelves from images. An example of such training example may include sample image, together with a label indicating whether the sample image corresponds to completion of engagement of a person with a retail shelf. In one example, method 1300 may use the trained machine learning model to analyze the one or more images of the at least one retail shelf to determine the completion of the engagement of the person with the at least one retail shelf. In one example, method 1300 may calculate a convolution of at least part of the one or more images of the at least one retail shelf. Further, in response to a first value of the calculated convolution of the at least part of the vibration data, method 1300 may determine the completion of the engagement of the person with the at least one retail shelf, and in response to a second value of the calculated convolution of the at least part of the vibration data, method 1300 may forgo the determination of the completion of the engagement of the person with the at least one retail shelf. In some examples, method 1300 may analyzing infrared data captured using at least one infrared sensor to determine a completion of the engagement of the person with the at least one retail shelf, for example as described above.
In some examples, the at least one image of method 1300 may be at least one image of the at least part of the plurality of retail shelves captured after a completion of the engagement of the person with the at least one retail shelf. Further, in some examples, method 1300 may use the analysis of Step 1306 of the at least one image of the at least part of the plurality of retail shelves to determine a state of at least one retail shelf after the completion of the engagement, for example as described above in relation to Step 1210. In one example, the determined state of the at least one retail shelf may include an inventory data associated with products on the at least one retail shelf after the completion of the engagement, and the inventory data may be determined using the analysis of the at least one image by Step 1306, for example as described above in relation to Step 1212. In another example, the determined state of the at least one retail shelf may include facings data associated with products on the at least one retail shelf after the completion of the engagement, and the facings data may be determined using the analysis of the at least one image by Step 1306, for example as described above in relation to Step 1212. In yet another example, the determined state of the at least one retail shelf may include planogram compliance status of the at least one retail shelf after the completion of the engagement, and the planogram compliance status may be determined using the analysis of the at least one image by Step 1306, for example as described above in relation to Step 1212.
In some examples, the at least one image of method 1300 may be at least one image of the at least part of the plurality of retail shelves captured after a completion of the engagement of the person with the at least one retail shelf. Further, in some examples, method 1300 may use the analysis of the at least one image by Step 1306 and an analysis of one or more images of the at least one retail shelf captured using the at least one image sensor before the engagement to determine a change associated with the at least one retail shelf during the engagement, for example as described above in relation to Steps 1210 and 1212. Some non-limiting examples of such change may include a product placed on the retail shelf, a product moved from one position on the retail shelf to another position on the retail shelf, a product removed from the retail shelf, and so forth.
In some examples, systems, methods and computer-readable media for forgoing image processing in response to infrared data analysis are provided.
In some examples, Step 1402 may comprise receiving infrared input data captured using one or more infrared sensors. For example, receiving the infrared input data by Step 1402 may comprise at least one of reading the infrared input data, receiving the infrared input data from an external device (for example, using a digital communication device), capturing the infrared input data using the one or more infrared sensors, and so forth. In some examples, the one or more infrared sensors may be at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors and photovoltaic infrared sensors. In one example, the one or more infrared sensors may be one or more passive infrared sensors. In some examples, the one or more infrared sensors may be one or more infrared sensors positioned below a second retail shelf. In one example, the second retail shelf may be positioned above the retail shelf. For example, the one or more infrared sensors may be one or more infrared sensors mounted to the second retail shelf, mounted to a surface (for example, of a wall, of a rack, etc.) connecting the second retail shelf and the retail shelf, and so forth. In some examples, the one or more infrared sensors may be one or more infrared sensors mounted to a second retail shelf. In one example, the second retail shelf may be positioned on an opposite side of an aisle from the retail shelf.
In some examples, Step 1404 may comprise analyzing the infrared input data received by Step 1402 to detect a presence of an object in an environment of a retail shelf. In one example, a machine learning model may be trained using training examples to detect presence of objects in environments from infrared data. An example of such training example may include sample infrared data, together with a label indicating whether the sample infrared data corresponds to a presence of an object in an environment. In one example, Step 1404 may use the trained machine learning model to analyze the infrared input data received by Step 1402 to detect the presence of the object in the environment of the retail shelf. In another example, Step 1404 may compare the infrared input data or a preprocessed version of the infrared input data (such as a function of the infrared input data) with a threshold, and may use a result of the comparison to detect the presence of the object in the environment of the retail shelf. For example, the threshold may differentiate between an ambient temperature of an environment of the retail shelf and a typical human body temperature, or between typical temperatures of a refrigeration unit including the retail shelf to an ambient temperature. In an additional example, the threshold may be selected based on a statistical measure of infrared data captured using the one or more infrared sensors of Step 1402 over time. In some examples, Step 1404 may calculate a convolution of at least part of the infrared input data received by Step 1402. Further, in response to a first value of the calculated convolution of the at least part of the infrared input data, Step 1404 may detect the presence of the object in the environment of the retail shelf, and in response to a second value of the calculated convolution of the at least part of the infrared input data, Step 1404 may avoid detecting the presence of the object in the environment of the retail shelf. In some examples, the one or more infrared sensors may be one or more infrared sensors physically coupled with the at least one image sensor (such as capturing devices 125A, 125B, and 125C as illustrated in
In some examples, Step 1406 may comprise, for example in response to no detected presence of an object in the environment of the retail unit by Step 1404, analyzing at least one image of the retail shelf captured using at least one image sensor. In some examples, Step 1408 may comprise, for example in response to a detection of presence of an object in the environment of the retail unit by Step 1404, forgoing analyzing the at least one image of the retail shelf captured using the at least one image sensor. In some examples, analyzing at least one image of the retail shelf captured using at least one image sensor by Step 1406 may include any image analysis described herein. For example, Step 1406 may analyze the at least one image using at least one of image processing instructions 232, Step 724, Step 726 and Step 728. In another example, Step 1406 may analyze the at least one image using any of the techniques for analyzing image data described above. In yet another example, Step 1406 may analyze the at least one image using at least one of an image classification algorithm, an object recognition algorithm, a product recognition algorithm, a label recognition algorithm, a logo recognition algorithm and a semantic segmentation algorithm. In some examples, a machine learning model may be trained using training examples to analyze images. An example of such training example may include a sample image, together with a label indicating a desired outcome corresponding to the analysis of the sample image. In one example, Step 1406 may use the trained machine learning model to analyze the at least one image to obtain an outcome of the analysis. In some example, Step 1406 may use an artificial neural network to analyze the at least one image to obtain an outcome of the analysis, for example as described above. In some examples, Step 1406 may base the analysis of the at least one image on a calculated convolution of at least part of the at least one image. Additionally or alternatively to triggering analysis of at least one image, Step 1406 may comprise, for example in response to no detected presence of an object in the environment of the retail unit by Step 1404, triggering capturing of the at least one image, and in some examples, Step 1408 may comprise, for example in response to a detection of presence of an object in the environment of the retail unit by Step 1404, forgoing triggering the capturing of the at least one image.
In some examples, the at least one image sensor of Step 1406 and Step 1408 may be at least one image sensor mounted to a second retail shelf. In one example, the second retail shelf may be on an opposite side of an aisle from the retail shelf, for example as illustrated in
In some examples, method 1400 may further comprise using the analysis of the at least one image by Step 1406 to determine a state of the retail shelf, for example as described above in relation to Step 1210. In one example, the determined state of the retail shelf may include an inventory data associated with products on the retail shelf, and the inventory data may be determined using the analysis of the at least one image by Step 1406, for example as described above in relation to Step 1212. In another example, the determined state of the retail shelf may include facings data associated with products on the retail shelf, and the facings data may be determined using the analysis of the at least one image by Step 1406, for example as described above in relation to Step 1212. In yet another example, the determined state of the retail shelf may include planogram compliance status of the retail shelf, and the planogram compliance status may be determined using the analysis of the at least one image by Step 1406, for example as described above in relation to Step 1212.
In some examples, Step 1404 may analyze the infrared input data to determine a portion of a field of view of the at least one image sensor associated with the object, for example using a regression model, using a semantic segmentation model, using a background subtraction model, and so forth. Further, in some examples, in response to a first portion of the field of view of the at least one image sensor associated with the object determined by Step 1404, Step 1406 may analyze the at least one image of the retail shelf captured using the at least one image sensor, and in response to a second portion of the field of view of the at least one image sensor associated with the object determined by Step 1404, Step 1408 may forgo analyzing the at least one image of the retail shelf captured using the at least one image sensor. In one example, the field of view of the at least one image sensor may differ from the field of view of the one or more infrared sensors. In another example, the field of view of the at least one image sensor and the field of view of the one or more infrared sensors may be identical or substantially identical. In some examples, Step 1404 may analyze the infrared input data to determine a type of the object, for example using an object recognition algorithm, using a classification model, and so forth. Further, in some examples, in response to a first type of the object determined by Step 1404, Step 1406 may analyze the at least one image of the retail shelf captured using the at least one image sensor, and in response to a second type of the object determined by Step 1404, Step 1408 may forgo analyzing the at least one image of the retail shelf captured using the at least one image sensor.
In some examples, Step 1404 may analyze the infrared input data to determine a duration associated with the presence of an object in the environment of the retail shelf, for example using a regression model, using a Markov model, using a Viterbi algorithm, and so forth. In some examples, method 1400 may further comprise comparing the duration determined by Step 1404 with a threshold. Further, in response to a first result of the comparison, Step 1406 may analyze the at least one image of the retail shelf captured using the at least one image sensor, and in response to a second result of the comparison, Step 1408 may forgo analyzing the at least one image of the retail shelf captured using the at least one image sensor. In one example, the threshold may be selected based on at least one product type associated with the retail shelf. For example, in response to a first product type associated with the retail shelf, a first threshold may be selected, and in response to a second product type associated with the retail shelf, a second threshold may be selected, the second threshold may differ from the first threshold. In one example, the threshold may be selected based on a status of the retail shelf determined using image analysis (for example using Steps 1210 and 1212 or using method 1200) of one or more images of the retail shelf captured using the at least one image sensor before the capturing of the infrared input data by Step 1402. For example, in response to a first status of the retail shelf, a first threshold may be selected, and in response to a second status of the retail shelf, a second threshold may be selected, the second threshold may differ from the first threshold. In one example, the threshold may be selected based on a time of day. For example, in response to a first time of day, a first threshold may be selected, and in response to a second time of day, a second threshold may be selected, the second threshold may differ from the first threshold.
In some examples, method 1400 may further comprise, in response to no presence of an object in the environment of the retail unit detected by Step 1404, capturing the at least one image of the retail shelf using the at least one image sensor, and in response to a detection of presence of an object in the environment of the retail unit by Step 1404, forgoing the capturing of the at least one image of the retail shelf.
Using only one type modality (such as image data, infrared data, vibration data, etc.) to detect and/or recognize actions may result in unsatisfactory results, such as low accuracy, low precision, low sensitivity, results with low confidence levels, failure to successfully determine aspects of the actions (such as a type of an action, a product type associated with an action, a quantity associated with an action, etc.), and so forth. For example, using only image data to detect and/or recognize actions may fail due to image blur, occlusions, insufficient pixel resolution, insufficient frame rate, ambiguity in the visual data, and so forth. In another example, using only infrared data to detect and/or recognize actions may fail due to ambient noise, ambiguity in the infrared data, and so forth. In yet another example, using only vibration data to detect and/or recognize actions may fail due to ambient noise, ambiguity in the vibration data, and so forth. Analyzing data from multiple modalities together to detect and/or recognize actions may improve the results. For example, combining data from multiple modalities may overcome many of the problems faced when using only one modality, and may therefore provide improve accuracy, improve precision, improve sensitivity, provide results with higher confidence levels, enable determination of additional aspects of the actions (such as a type of an action, a product type associated with an action, a quantity associated with an action, etc.), and so forth.
In some examples, systems, methods and computer-readable media for using infrared data analysis and image analysis for robust action recognition in retail environment are provided.
In some examples, Step 1502 may comprise receiving infrared data captured using one or more infrared sensors from a retail environment. For example, receiving the infrared data by Step 1502 may comprise at least one of reading the infrared data, receiving the infrared data from an external device (for example, using a digital communication device), capturing the infrared data using the one or more infrared sensors from the retail environment, and so forth. In some examples, the one or more infrared sensors may be at least one of active infrared sensors, passive infrared sensors, thermal infrared sensors, pyroelectric infrared sensors, thermoelectric infrared sensors, photoconductive infrared sensors and photovoltaic infrared sensors. In one example, the one or more infrared sensors may be one or more passive infrared sensors. In some examples, the one or more infrared sensors may be one or more infrared sensors positioned below a second retail shelf. In one example, the second retail shelf may be positioned above the retail shelf. For example, the one or more infrared sensors may be one or more infrared sensors mounted to the second retail shelf, mounted to a surface (for example, of a wall, of a rack, etc.) connecting the second retail shelf and the retail shelf, and so forth. In some examples, the one or more infrared sensors may be one or more infrared sensors mounted to a second retail shelf. In one example, the second retail shelf may be positioned on an opposite side of an aisle from the retail shelf.
In some examples, Step 1504 may comprise receiving at least one image captured using at least one image sensor from a retail environment (for example, from the retail environment of Step 1502), for example as described above. In some examples, receiving at least one image by Step 1504 may comprise at least one of reading the at least one image, receiving the at least one image from an external device (for example, using a digital communication device), capturing the at least one image using the at least one image sensor from the retail environment, and so forth. In some examples, the at least one image sensor of Step 504 may be at least one image sensor mounted to a retail shelf, for example as illustrated in
In some examples, Step 1506 may comprise analyzing the infrared data received by Step 1502 and the at least one image received by Step 1504 to detect an action performed in the retail environment. In some examples, the action may include at least one of picking a product from a retail shelf, placing a product on a retail shelf and moving a product on a retail shelf. Some other non-limiting examples of such action may include placing a label (such as a shelf label), remoting a label (such as a shelf label), placing a promotional sign, removing a promotion sign, changing a price, cleaning, restocking, rearranging products, and so forth. In some examples, a machine learning model may be trained using training examples to detect actions from infrared data and images. An example of such training example may include a sample infrared data and a sample image, together with a label indicating whether the sample infrared data and the sample image corresponds to an action performed in an environment. In one example, Step 1506 may use the trained machine learning model to analyze the infrared data received by Step 1502 and the at least one image received by Step 1504 to detect the action performed in the retail environment. In some example, Step 1506 may use an artificial neural network to analyze the infrared data received by Step 1502 and the at least one image received by Step 1504 to detect the action performed in the retail environment.
In some examples, Step 1506 may calculate a convolution of at least part of the at least one image received by Step 1504 to obtain a value of the calculated convolution, and may use the value of the calculated convolution to analyze the infrared data received by Step 1502 to detect the action performed in the retail environment. For example, Step 1506 may analyze the infrared data received by Step 1502 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the value of the calculated convolution. In another example, in response to a first value of the calculated convolution, Step 1506 may analyze the infrared data received by Step 1502 using a first analysis step to detect the action performed in the retail environment, and in response to a second value of the calculated convolution, Step 1506 may analyze the infrared data received by Step 1502 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, Step 1506 may calculate a convolution of at least part of the infrared data received by Step 1502 to obtain a value of the calculated convolution, and may use the value of the calculated convolution to analyze the at least one image received by Step 1504 to detect the action performed in the retail environment. For example, Step 1506 may analyze at least one image received by Step 1504 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the value of the calculated convolution. In another example, in response to a first value of the calculated convolution, Step 1506 may analyze the at least one image received by Step 1504 using a first analysis step to detect the action performed in the retail environment, and in response to a second value of the calculated convolution, Step 1506 may analyze the at least one image received by Step 1504 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, the infrared data received by Step 1502 may include a time series of samples captured using the one or more infrared sensors at different points in time. In some examples, Step 1506 may compare two samples of the time series of samples, and may use a result of the comparison to analyze the at least one image received by Step 1504 to detect the action performed in the retail environment. For example, Step 1506 may analyze at least one image received by Step 1504 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the result of the comparison. In another example, in response to a first result of the comparison, Step 1506 may analyze the at least one image received by Step 1504 using a first analysis step to detect the action performed in the retail environment, and in response to a second result of the comparison, Step 1506 may analyze the at least one image received by Step 1504 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, the at least one image received by Step 1504 may include a plurality of frames of a video captured using the at least one image sensor. In some examples, Step 1506 may compare two frames of the plurality of frames, and may use a result of the comparison to analyze the infrared data received by Step 1502 to detect the action performed in the retail environment. For example, Step 1506 may analyze the infrared data received by Step 1502 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the result of the comparison. In another example, in response to a first result of the comparison, Step 1506 may analyze the infrared data received by Step 1502 using a first analysis step to detect the action performed in the retail environment, and in response to a second result of the comparison, Step 1506 may analyze the infrared data received by Step 1502 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, Step 1506 may analyzing the infrared data received by Step 1502 to select a portion of the at least one image received by Step 1504. For example, in response to a first infrared data received by Step 1502, Step 1504 may select a first portion of the at least one image received by Step 1504, and in response to a second infrared data received by Step 1502, Step 1504 may select a second portion of the at least one image received by Step 1504, the second portion may differ from the first portion. In another example, the infrared data received by Step 1502 may include spatial properties, and Step 1506 may select the portion of the at least one image received by Step 1504 based on the spatial properties. For example, the spatial properties may include an indication of a region in the retail environment, and Step 1506 may select a portion of the at least one image received by Step 1504 corresponding to the indicated region of the retail environment. Further, in some examples, Step 1506 may analyzing the selected portion of the at least one image to detect the action performed in the retail environment, for example using the image analysis described above.
In some examples, Step 1506 may comprise analyzing the infrared data received by Step 1502 to attempt to detect the action performed in the retail environment, for example using a pattern recognition algorithm. In some examples, for example in response to a failure of the attempt to successfully detect the action, Step 1506 may analyze the at least one image received by Step 1504 to detect the action performed in the retail environment, for example using a visual action recognition algorithm. In one example, for example in response to a failure to successfully detect the action, method 1500 may trigger the capturing of the at least one image using the at least one image sensor. In one example, the failure to successfully detect the action may be a failure to successfully detect the action at a confidence level higher than a selected threshold. In another example, the failure to successfully detect the action may be a failure to determine at least one aspect of the action. Some non-limiting examples of such aspect may include at least one of a type of the action, a product type associated with the action, and a quantity of products associated with the action.
In some examples, Step 1508 may comprise providing information based on the action detected by Step 1506. For example, providing the information based on the action detected by Step 1506 may comprise at least one of storing the information in memory, transmitting the information to an external device, providing the information to a user (for example, visually, audibly, textually, through a user interface, etc.), and so forth.
In some examples, detecting the action performed in the retail environment by Step 1506 may further include recognizing a type of the action. For example, Step 1506 may use a classification model to classify the action to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different type of action. In another example, Step 1506 may analyze the infrared data received by Step 1502 and the at least one image received by Step 1504 (for example using the classification mode, using a machine learning model trained using training examples to recognize types of actions from records including both infrared data and images, using an artificial neural network, and so forth) to recognize the type of the action. Some non-limiting examples of such types of actions may include picking an item, picking a product, placing an item, placing a product, moving an item, moving a product, placing a label (such as a shelf label), remoting a label (such as a shelf label), placing a promotional sign, removing a promotion sign, changing a price, cleaning, restocking, rearranging products, and so forth. Further, in some examples, the information provided by Step 1508 may be based on the type of the action. In one example, the information provided by Step 1508 may include an indication of the type of the action. In one example, in response to a first type of the action, Step 1508 may provide first information, and in response to a second type of the action, Step 1508 may provide second information, the second information may differ from the first information. In one example, in response to a first type of the action, Step 1508 may provide the information, and in response to a second type of the action, Step 1508 may forgo providing the information.
In some examples, detecting the action performed in the retail environment by Step 1506 may further include identifying a product type associated with the action. For example, Step 1506 may use a classification model to classify the action to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different product type. In another example, Step 1506 may analyze the infrared data received by Step 1502 and the at least one image received by Step 1504 (for example using the classification mode, using a machine learning model trained using training examples to identify product types of products associated with actions from records including both infrared data and images, using an artificial neural network, and so forth) to identify the product type. In one example, the action may include at least one of picking, placing and moving a product, and the product type associated with the action may be a product type of the product. In one example, the action may include at least one of placing and remoting a label (such as a shelf label), and the product type associated with the action may be a product type indicated by the label (for example, by text printed on the label, by a logo on the label, by a picture on the label, by a visual code on the label, and so forth). In one example, the action may include at least one of placing and removing a promotion sign, and the product type associated with the action may be a product type associated with the promotion sign. In one example, the action may include changing a price of products of a particular product type, and the product type associated with the action may be the particular product type. Further, in some examples, the information provided by Step 1508 may be based on the product type associated with the action. In one example, the information provided by Step 1508 may include an indication of the product type (for example, textual indication, a picture of a product of the product type, a barcode associated with the product type, and so forth). In one example, in response to a first product type associated with of the action, Step 1508 may provide first information, and in response to a second product type associated with of the action, Step 1508 may provide second information, the second information may differ from the first information. In one example, in response to a first product type associated with of the action, Step 1508 may provide the information, and in response to a second product type associated with of the action, Step 1508 may forgo providing the information.
In some examples, detecting the action performed in the retail environment by Step 1506 may further include determining a quantity of products associated with the action. For example, Step 1506 may use a regression model to determine the quantity of products associated with the action. In another example, Step 1506 may analyze the infrared data received by Step 1502 and the at least one image received by Step 1504 (for example using the classification mode, using a machine learning model trained using training examples to determine quantity of products associated with actions from records including both infrared data and images, using an artificial neural network, and so forth) to determine the quantity of products associated with the action. In one example, the action may include at least one of picking, placing and moving at least one product, and the quantity of products associated with the action may be the quantity of products picked, placed and/or moved in the action. In one example, the action may include at least one of placing and removing a promotion sign, and the quantity of products associated with the action may be a quantity of products indicated in the promotion sign. Further, in some examples, the information provided by Step 1508 may be based on the quantity of products associated with the action. In one example, the information provided by Step 1508 may include an indication of the quantity of products associated with the action. In one example, in response to a first quantity of products associated with the action, Step 1508 may provide first information, and in response to a second quantity of products associated with the action, Step 1508 may provide second information, the second information may differ from the first information. In one example, in response to a first quantity of products associated with the action, Step 1508 may provide the information, and in response to a second quantity of products associated with the action, Step 1508 may forgo providing the information.
In some examples, the infrared data received by Step 1502 may include a time series of samples captured using the one or more infrared sensors at different points in time. In some examples, Step 1504 may further comprise analyzing the time series of the samples captured using the one or more infrared sensors at the different points in time to select the at least one image of a plurality of images. For example, in response to a first result of the analysis of the time series of samples, Step 1504 may selected a first subgroup of the plurality of images, and in response to a second result of the analysis of the time series of samples, Step 1504 may selected a second subgroup of the plurality of images, the second subgroup may differ from the first subgroup. In another example, Step 1504 may analyze the time series of the samples captured using the one or more infrared sensors at the different points in time to select a particular point in time (for example, a point in time corresponding to an extremum of the samples, a point in time corresponding to a sample satisfying a particular criterion, and so forth), each image of the plurality of images may correspond to a different point in time (for example, based on the capturing time of the image), and Step 1504 may select the image of the plurality of images corresponding to the particular point in time (or corresponding to a point in time nearest to the particular point in time of the points in time corresponding to the plurality of images).
In some examples, Step 1506 may calculate a convolution of at least part of the at least one image to obtain a value of the calculated convolution. Further, in some examples, Step 1506 may analyze the infrared data to determine a wavelength associated with the infrared data. For example, the wavelength associated with the infrared data may be the most prominent wavelength in the infrared data, the most prominent wavelength in a selected range of wavelengths in the infrared data, the second most prominent wavelength in the infrared data, and so forth. In one example, in response to a first combination of the value of the calculated convolution and the wavelength associated with the infrared data, Step 1506 may detect the action performed in the retail environment, and in response to a second combination of the value of the calculated convolution and the wavelength associated with the infrared data, Step 1506 may forgo the detection of the action performed in the retail environment. In another example, in response to a first combination of the value of the calculated convolution and the wavelength associated with the infrared data, Step 1506 may determine a first type of the action performed in the retail environment, and in response to a second combination of the value of the calculated convolution and the wavelength associated with the infrared data, Step 1506 may determine a second type of the action performed in the retail environment, the second type may differ from the first type.
In some examples, systems, methods and computer-readable media for using vibration data analysis and image analysis for robust action recognition in retail environment are provided.
In some examples, Step 1602 may comprise receiving vibration data captured using one or more vibration sensors mounted to a shelving unit including at least one retail shelf. For example, receiving the vibration data by Step 1602 may comprise at least one of reading the vibration data, receiving the vibration data from an external device (for example, using a digital communication device), capturing the vibration data using the one or more vibration sensors mounted to a shelving unit including at least one retail shelf, and so forth. In some examples, the one or more vibration sensors may be at least one of active vibration sensors, passive vibration sensors, thermal vibration sensors, pyroelectric vibration sensors, thermoelectric vibration sensors, photoconductive vibration sensors and photovoltaic vibration sensors. In one example, the one or more vibration sensors may be one or more passive vibration sensors. In some examples, the one or more vibration sensors may be one or more vibration sensors positioned below a second retail shelf. In one example, the second retail shelf may be positioned above the retail shelf. For example, the one or more vibration sensors may be one or more vibration sensors mounted to the second retail shelf, mounted to a surface (for example, of a wall, of a rack, etc.) connecting the second retail shelf and the retail shelf, and so forth. In some examples, the one or more vibration sensors may be one or more vibration sensors mounted to a second retail shelf. In one example, the second retail shelf may be positioned on an opposite side of an aisle from the retail shelf.
In some examples, Step 1604 may comprise receiving at least one image captured using at least one image sensor from a retail environment (for example, a retail environment including the shelving unit of Step 1602), for example as described above. In some examples, receiving at least one image by Step 1604 may comprise at least one of reading the at least one image, receiving the at least one image from an external device (for example, using a digital communication device), capturing the at least one image using the at least one image sensor from the retail environment, and so forth. In some examples, the at least one image sensor of Step 504 may be at least one image sensor mounted to a second retail shelf, for example as illustrated in
In some examples, Step 1606 may comprise analyzing the vibration data received by Step 1602 and the at least one image received by Step 1604 to detect an action performed in the retail environment. In some examples, the action may include at least one of picking a product from a retail shelf, placing a product on a retail shelf and moving a product on a retail shelf. Some other non-limiting examples of such action may include placing a label (such as a shelf label), remoting a label (such as a shelf label), placing a promotional sign, removing a promotion sign, changing a price, cleaning, restocking, rearranging products, and so forth. In some examples, a machine learning model may be trained using training examples to detect actions from vibration data and images. An example of such training example may include a sample vibration data and a sample image, together with a label indicating whether the sample vibration data and the sample image corresponds to an action performed in an environment. In one example, Step 1606 may use the trained machine learning model to analyze the vibration data received by Step 1602 and the at least one image received by Step 1604 to detect the action performed in the retail environment. In some example, Step 1606 may use an artificial neural network to analyze the vibration data received by Step 1602 and the at least one image received by Step 1604 to detect the action performed in the retail environment.
In some examples, Step 1606 may calculate a convolution of at least part of the at least one image received by Step 1604 to obtain a value of the calculated convolution, and may use the value of the calculated convolution to analyze the vibration data received by Step 1602 to detect the action performed in the retail environment. For example, Step 1606 may analyze the vibration data received by Step 1602 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the value of the calculated convolution. In another example, in response to a first value of the calculated convolution, Step 1606 may analyze the vibration data received by Step 1602 using a first analysis step to detect the action performed in the retail environment, and in response to a second value of the calculated convolution, Step 1606 may analyze the vibration data received by Step 1602 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, Step 1606 may calculate a convolution of at least part of the vibration data received by Step 1602 to obtain a value of the calculated convolution, and may use the value of the calculated convolution to analyze the at least one image received by Step 1604 to detect the action performed in the retail environment. For example, Step 1606 may analyze at least one image received by Step 1604 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the value of the calculated convolution. In another example, in response to a first value of the calculated convolution, Step 1606 may analyze the at least one image received by Step 1604 using a first analysis step to detect the action performed in the retail environment, and in response to a second value of the calculated convolution, Step 1606 may analyze the at least one image received by Step 1604 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, the vibration data received by Step 1602 may include a time series of samples captured using the one or more vibration sensors at different points in time. In some examples, Step 1606 may compare two samples of the time series of samples, and may use a result of the comparison to analyze the at least one image received by Step 1604 to detect the action performed in the retail environment. For example, Step 1606 may analyze at least one image received by Step 1604 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the result of the comparison. In another example, in response to a first result of the comparison, Step 1606 may analyze the at least one image received by Step 1604 using a first analysis step to detect the action performed in the retail environment, and in response to a second result of the comparison, Step 1606 may analyze the at least one image received by Step 1604 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, the at least one image received by Step 1604 may include a plurality of frames of a video captured using the at least one image sensor. In some examples, Step 1606 may compare two frames of the plurality of frames, and may use a result of the comparison to analyze the vibration data received by Step 1602 to detect the action performed in the retail environment. For example, Step 1606 may analyze the vibration data received by Step 1602 using a parametric model to detect the action performed in the retail environment, and the parameter may be selected based on the result of the comparison. In another example, in response to a first result of the comparison, Step 1606 may analyze the vibration data received by Step 1602 using a first analysis step to detect the action performed in the retail environment, and in response to a second result of the comparison, Step 1606 may analyze the vibration data received by Step 1602 using a second analysis step to detect the action performed in the retail environment, the second analysis step may differ from the first analysis step.
In some examples, Step 1606 may analyzing the vibration data received by Step 1602 to select a portion of the at least one image received by Step 1604. For example, in response to a first vibration data received by Step 1602, Step 1604 may select a first portion of the at least one image received by Step 1604, and in response to a second vibration data received by Step 1602, Step 1604 may select a second portion of the at least one image received by Step 1604, the second portion may differ from the first portion. In another example, the vibration data received by Step 1602 may include spatial properties, and Step 1606 may select the portion of the at least one image received by Step 1604 based on the spatial properties. For example, the spatial properties may include an indication of a region in the retail environment, and Step 1606 may select a portion of the at least one image received by Step 1604 corresponding to the indicated region of the retail environment. Further, in some examples, Step 1606 may analyzing the selected portion of the at least one image to detect the action performed in the retail environment, for example using the image analysis described above.
In some examples, Step 1606 may comprise analyzing the vibration data received by Step 1602 to attempt to detect the action performed in the retail environment, for example using a pattern recognition algorithm. In some examples, for example in response to a failure of the attempt to successfully detect the action, Step 1606 may analyze the at least one image received by Step 1604 to detect the action performed in the retail environment, for example using a visual action recognition algorithm. In one example, for example in response to a failure to successfully detect the action, method 1600 may trigger the capturing of the at least one image using the at least one image sensor. In one example, the failure to successfully detect the action may be a failure to successfully detect the action at a confidence level higher than a selected threshold. In another example, the failure to successfully detect the action may be a failure to determine at least one aspect of the action. Some non-limiting examples of such aspect may include at least one of a type of the action, a product type associated with the action, and a quantity of products associated with the action.
In some examples, Step 1608 may comprise providing information based on the action detected by Step 1606. For example, providing the information based on the action detected by Step 1606 may comprise at least one of storing the information in memory, transmitting the information to an external device, providing the information to a user (for example, visually, audibly, textually, through a user interface, etc.), and so forth.
In some examples, detecting the action performed in the retail environment by Step 1606 may further include recognizing a type of the action. For example, Step 1606 may use a classification model to classify the action to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different type of action. In another example, Step 1606 may analyze the vibration data received by Step 1602 and the at least one image received by Step 1604 (for example using the classification mode, using a machine learning model trained using training examples to recognize types of actions from records including both vibration data and images, using an artificial neural network, and so forth) to recognize the type of the action. Some non-limiting examples of such types of actions may include picking an item, picking a product, placing an item, placing a product, moving an item, moving a product, placing a label (such as a shelf label), remoting a label (such as a shelf label), placing a promotional sign, removing a promotion sign, changing a price, cleaning, restocking, rearranging products, and so forth. Further, in some examples, the information provided by Step 1608 may be based on the type of the action. In one example, the information provided by Step 1608 may include an indication of the type of the action. In one example, in response to a first type of the action, Step 1608 may provide first information, and in response to a second type of the action, Step 1608 may provide second information, the second information may differ from the first information. In one example, in response to a first type of the action, Step 1608 may provide the information, and in response to a second type of the action, Step 1608 may forgo providing the information.
In some examples, detecting the action performed in the retail environment by Step 1606 may further include identifying a product type associated with the action. For example, Step 1606 may use a classification model to classify the action to a particular class of a plurality of alternative classes, each class of the plurality of alternative classes may correspond to a different product type. In another example, Step 1606 may analyze the vibration data received by Step 1602 and the at least one image received by Step 1604 (for example using the classification mode, using a machine learning model trained using training examples to identify product types of products associated with actions from records including both vibration data and images, using an artificial neural network, and so forth) to identify the product type. In one example, the action may include at least one of picking, placing and moving a product, and the product type associated with the action may be a product type of the product. In one example, the action may include at least one of placing and remoting a label (such as a shelf label), and the product type associated with the action may be a product type indicated by the label (for example, by text printed on the label, by a logo on the label, by a picture on the label, by a visual code on the label, and so forth). In one example, the action may include at least one of placing and removing a promotion sign, and the product type associated with the action may be a product type associated with the promotion sign. In one example, the action may include changing a price of products of a particular product type, and the product type associated with the action may be the particular product type. Further, in some examples, the information provided by Step 1608 may be based on the product type associated with the action. In one example, the information provided by Step 1608 may include an indication of the product type (for example, textual indication, a picture of a product of the product type, a barcode associated with the product type, and so forth). In one example, in response to a first product type associated with of the action, Step 1608 may provide first information, and in response to a second product type associated with of the action, Step 1608 may provide second information, the second information may differ from the first information. In one example, in response to a first product type associated with of the action, Step 1608 may provide the information, and in response to a second product type associated with of the action, Step 1608 may forgo providing the information.
In some examples, detecting the action performed in the retail environment by Step 1606 may further include determining a quantity of products associated with the action. For example, Step 1606 may use a regression model to determine the quantity of products associated with the action. In another example, Step 1606 may analyze the vibration data received by Step 1602 and the at least one image received by Step 1604 (for example using the classification mode, using a machine learning model trained using training examples to determine quantity of products associated with actions from records including both vibration data and images, using an artificial neural network, and so forth) to determine the quantity of products associated with the action. In one example, the action may include at least one of picking, placing and moving at least one product, and the quantity of products associated with the action may be the quantity of products picked, placed and/or moved in the action. In one example, the action may include at least one of placing and removing a promotion sign, and the quantity of products associated with the action may be a quantity of products indicated in the promotion sign. Further, in some examples, the information provided by Step 1608 may be based on the quantity of products associated with the action. In one example, the information provided by Step 1608 may include an indication of the quantity of products associated with the action. In one example, in response to a first quantity of products associated with the action, Step 1608 may provide first information, and in response to a second quantity of products associated with the action, Step 1608 may provide second information, the second information may differ from the first information. In one example, in response to a first quantity of products associated with the action, Step 1608 may provide the information, and in response to a second quantity of products associated with the action, Step 1608 may forgo providing the information.
In some examples, the vibration data received by Step 1602 may include a time series of samples captured using the one or more vibration sensors at different points in time. In some examples, Step 1604 may further comprise analyzing the time series of the samples captured using the one or more vibration sensors at the different points in time to select the at least one image of a plurality of images. For example, in response to a first result of the analysis of the time series of samples, Step 1604 may selected a first subgroup of the plurality of images, and in response to a second result of the analysis of the time series of samples, Step 1604 may selected a second subgroup of the plurality of images, the second subgroup may differ from the first subgroup. In another example, Step 1604 may analyze the time series of the samples captured using the one or more vibration sensors at the different points in time to select a particular point in time (for example, a point in time corresponding to an extremum of the samples, a point in time corresponding to a sample satisfying a particular criterion, and so forth), each image of the plurality of images may correspond to a different point in time (for example, based on the capturing time of the image), and Step 1604 may select the image of the plurality of images corresponding to the particular point in time (or corresponding to a point in time nearest to the particular point in time of the points in time corresponding to the plurality of images).
In some example, Step 1606 may calculate a convolution of at least part of the at least one image to obtain a value of the calculated convolution. Further, Step 1606 may analyze the vibration data to determine a frequency associated with the vibration data, for example using spectral analysis of the vibration data, using narrow-band frequency analysis, and so forth. Some non-limiting examples of such determined frequency associated with the vibration data may include a prominent periodic frequency, a prominent frequency in a selected range of frequencies, the second most prominent periodic frequency, and so forth. In one example, in response to a first combination of the value of the calculated convolution and the frequency associated with the vibration data, Step 1606 may detect the action performed in the retail environment, and in response to a second combination of the value of the calculated convolution and the frequency associated with the vibration data, Step 1606 may forgo the detection of the action performed in the retail environment. In another example, in response to a first combination of the value of the calculated convolution and the frequency associated with the vibration data, Step 1606 may determine a first type of the action performed in the retail environment, and in response to a second combination of the value of the calculated convolution and the frequency associated with the vibration data, Step 1606 may determine a second type of the action performed in the retail environment, the second type may differ from the first type.
The foregoing description has been presented for purposes of illustration. It is not exhaustive and is not limited to the precise forms or embodiments disclosed. Modifications and adaptations will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed embodiments. Additionally, although aspects of the disclosed embodiments are described as being stored in memory, one skilled in the art will appreciate that these aspects can also be stored on other types of computer readable media, such as secondary storage devices, for example, hard disks or CD ROM, or other forms of RAM or ROM, USB media, DVD, Blu-ray, 4K Ultra HD Blu-ray, or other optical drive media.
Computer programs based on the written description and disclosed methods are within the skill of an experienced developer. The various programs or program modules can be created using any of the techniques known to one skilled in the art or can be designed in connection with existing software. For example, program sections or program modules can be designed in or by means of .Net Framework, .Net Compact Framework (and related languages, such as Visual Basic, C, etc.), Java, C++, Objective-C, HTML, HTML/AJAX combinations, XML, or HTML with included Java applets.
Moreover, while illustrative embodiments have been described herein, the scope of any and all embodiments having equivalent elements, modifications, omissions, combinations (e.g., of aspects across various embodiments), adaptations and/or alterations as would be appreciated by those skilled in the art based on the present disclosure. The limitations in the claims are to be interpreted broadly based on the language employed in the claims and not limited to examples described in the present specification or during the prosecution of the application. The examples are to be construed as non-exclusive. Furthermore, the steps of the disclosed methods may be modified in any manner, including by reordering steps and/or inserting or deleting steps. It is intended, therefore, that the specification and examples be considered as illustrative only, with a true scope and spirit being indicated by the following claims and their full scope of equivalents.
Claims
1. A non-transitory computer-readable medium including instructions that when executed by a processor cause the processor to perform a method for triggering image processing based on infrared data analysis, the method comprising:
- receiving first infrared input data captured using a first group of one or more infrared sensors;
- analyzing the first infrared input data to detect an engagement of a person with a retail shelf;
- receiving second infrared input data captured using a second group of one or more infrared sensors after the capturing of the first infrared input data;
- analyzing the second infrared input data to determine a completion of the engagement of the person with the retail shelf;
- in response to the determined completion of the engagement of the person with the retail shelf, analyzing at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf; and
- using the analysis of the at least one image to determine a state of the retail shelf.
2. The non-transitory computer-readable medium of claim 1, wherein the first group of one or more infrared sensors is a group of one or more passive infrared sensors.
3. The non-transitory computer-readable medium of claim 1, wherein the first group of one or more infrared sensors is identical to the second group of one or more infrared sensors.
4. The non-transitory computer-readable medium of claim 1, wherein the first group of one or more infrared sensors is a group of one or more infrared sensors positioned below a second retail shelf, the second retail shelf is positioned above the retail shelf.
5. The non-transitory computer-readable medium of claim 1, wherein the determined state of the retail shelf includes an inventory data associated with products on the retail shelf after the engagement of the person with the retail shelf.
6. The non-transitory computer-readable medium of claim 1, wherein the determined state of the retail shelf includes facings data associated with products on the retail shelf after the engagement of the person with the retail shelf.
7. The non-transitory computer-readable medium of claim 1, wherein the determined state of the retail shelf includes planogram compliance status associated with the retail shelf after the engagement of the person with the retail shelf.
8. The non-transitory computer-readable medium of claim 1, wherein the method further comprises using the analysis of the at least one image and an analysis of one or more images of the retail shelf captured using the at least one image sensor before the engagement of the person with the retail shelf to determine a change associated with the retail shelf during the engagement of the person with the retail shelf.
9. The non-transitory computer-readable medium of claim 1, wherein the at least one image sensor is at least one image sensor mounted to a second retail shelf.
10. The non-transitory computer-readable medium of claim 1, wherein the at least one image sensor is at least one image sensor mounted to an image capturing robot.
11. The non-transitory computer-readable medium of claim 1, wherein the method further comprises, in response to the determined completion of the engagement of the person with the retail shelf, triggering the capturing of the at least one image of the retail shelf using the at least one image sensor.
12. The non-transitory computer-readable medium of claim 1, wherein the method further comprises:
- analyzing the first infrared input data to determine a type of the engagement of the person with the retail shelf;
- in response to a first determined type of the engagement, triggering the analyzing the at least one image of the retail shelf; and
- in response to a second determined type of the engagement, forgoing analyzing the at least one image of the retail shelf.
13. The non-transitory computer-readable medium of claim 1, wherein the method further comprises:
- analyzing the first infrared input data to determine a type of the engagement of the person with the retail shelf;
- in response to a first determined type of the engagement, including a first analysis step in the analysis of the at least one image of the retail shelf; and
- in response to a second determined type of the engagement, including a second analysis step in the analysis of the at least one image of the retail shelf, the second analysis step differs from the first analysis step.
14. The non-transitory computer-readable medium of claim 1, wherein the determination of the completion of the engagement of the person with the retail shelf is a determination that the person cleared an environment of the retail shelf.
15. The non-transitory computer-readable medium of claim 1, wherein the method further comprises:
- calculating a convolution of at least part of the first infrared input data;
- in response to a first value of the calculated convolution of the at least part of the first infrared input data, detecting the engagement of a person with a retail shelf; and
- in response to a second value of the calculated convolution of the at least part of the first infrared input data, forgoing detecting the engagement of a person with a retail shelf.
16. The non-transitory computer-readable medium of claim 1, wherein the method further comprises, in response to the detected engagement of a person with a retail shelf, analyzing one or more images of the retail shelf captured before the completion of the engagement of the person with the retail shelf to determine at least one aspect of the engagement.
17. The non-transitory computer-readable medium of claim 16, wherein the method further comprises updating a virtual shopping cart associated with the person based on the determined at least one aspect of the engagement.
18. The non-transitory computer-readable medium of claim 16, wherein the method further comprises using the analysis of the at least one image of the retail shelf captured after the completion of the engagement of the person with the retail shelf and the determined at least one aspect of the engagement to determine the state of the retail shelf.
19. A method for triggering image processing based on infrared data analysis, the method comprising:
- receiving first infrared input data captured using a first group of one or more infrared sensors;
- analyzing the first infrared input data to detect an engagement of a person with a retail shelf;
- receiving second infrared input data captured using a second group of one or more infrared sensors after the capturing of the first infrared input data;
- analyzing the second infrared input data to determine a completion of the engagement of the person with the retail shelf;
- in response to the determined completion of the engagement of the person with the retail shelf, analyzing at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf; and
- using the analysis of the at least one image to determine a state of the retail shelf.
20. A system for triggering image processing based on infrared data analysis, the system comprising:
- at least one processing unit configured to: receive first infrared input data captured using a first group of one or more infrared sensors; analyze the first infrared input data to detect an engagement of a person with a retail shelf; receive second infrared input data captured using a second group of one or more infrared sensors after the capturing of the first infrared input data; analyze the second infrared input data to determine a completion of the engagement of the person with the retail shelf; in response to the determined completion of the engagement of the person with the retail shelf, analyze at least one image of the retail shelf captured using at least one image sensor after the completion of the engagement of the person with the retail shelf; and use the analysis of the at least one image to determine a state of the retail shelf.
21.-100. (canceled)
Type: Application
Filed: Dec 20, 2021
Publication Date: May 19, 2022
Inventor: Youval Bronicki (Los Altos, CA)
Application Number: 17/555,628