GUIDED IMAGE CAPTURE USER INTERFACE
A system and method for generating user interfaces for presenting a set of templates to guide a user to capture images of the items correctly sized under different conditions is disclosed. The method includes generating a first user interface configured to receive and present product information for an item including dimensions of the item, receiving a first image, generating a second user interface to present a template, the template including a bounding box sized to match the dimensions of the item, the second user interface configured to present the bounding box overlaid over a second image, receiving input to capture a portion of the second image within the bounding box, responsive to the input to capture the second image, generating a third user interface to present the first image and the captured portion of the second image as variants of a face of the item, and storing the captured portion of the second image as a variant of the face of the item and the information of the item in a database.
Latest Ricoh Company, Ltd. Patents:
- COMMUNICATION MANAGEMENT SYSTEM, COMMUNICATION SYSTEM, COMMUNICATION MANAGEMENT DEVICE, IMAGE PROCESSING METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
- IMAGE PROCESSING DEVICE, IMAGE FORMING APPARATUS, AND EDGE DETECTION METHOD
- IMAGE FORMING APPARATUS
- IMAGE READING DEVICE, IMAGE FORMING APPARATUS, AND IMAGE READING METHOD
- PRINT MANAGEMENT SYSTEM, PRINT MANAGEMENT METHOD, AND NON-TRANSITORY COMPUTER-EXECUTABLE MEDIUM
The present application claims priority, under 35 U.S.C. § 119, of U.S. Provisional Patent Application No. 62/492,840, filed May 1, 2017 and entitled “Guided Image Capture User Interface,” which is incorporated by reference in its entirety.
BACKGROUND Field of the InventionThe specification generally relates to obtaining product images to build a product database for a computer vision system. In particular, the specification relates to a system and method for generating user interfaces for presenting a set of templates to guide a user to capture images of the items correctly sized under different conditions.
Description of the Background ArtTypically, computer vision systems are used to gain a high-level understanding from digital images or videos. Computer vision tasks include methods for acquiring, processing, analyzing and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information. One important aspect of computer vision is creating a database against which new images can be compared. In particular, with regard to the recognition of objects or products, a complete database is important. The image recognition process includes receiving a query image of a product and searching the database to determine whether one of the images stored in the database matches the query image. If there is a positive match, the image recognition succeeds. However, even if the database includes the information of the product, the image recognition does not always succeed when the database has limited information about each product.
Previous attempts at recognizing products have deficiencies. For example, recognizing a product may fail because stored images in a database used for searching a match for the query image of the product does not look “similar” to the query image. The right kind of images and a sufficient number of images should be stored in the database to support a robust image recognition. Unfortunately, current methods do not provide a reliable and efficient solution to this problem.
SUMMARYThe techniques introduced herein overcome the deficiencies and limitations of the prior art, at least in part, with a system and method for generating user interfaces for presenting a set of templates to guide a user to capture images of the items correctly sized under different conditions. In one embodiment, the system includes one or more processors and a memory storing instructions which when executed cause the one or more processors to perform steps to generate and present user interfaces. These steps include generating a first user interface configured to receive and present product information for an item including dimensions of the item. Then receiving a first image and generating a second user interface to present a template, the template including a bounding box sized to match the dimensions of the item, the second user interface configured to present the bounding box overlaid over a second image. The system then receives input to capture a portion of the second image within the bounding box. Responsive to the input to capture the portion of the second image, the system generates a third user interface to present the first image and the captured portion of the second image as variants of a face of the item; and store the captured portion of the second image as a variant of the face of the item and the information of the item in a database.
Other aspects include corresponding methods, systems, apparatuses, and computer program products for these and other innovative aspects.
The features and advantages described herein are not all-inclusive and many additional features and advantages will be apparent in view of the figures and description. Moreover, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes and not to limit the scope of the techniques described.
The techniques introduced herein are illustrated by way of example, and not by way of limitation in the figures of the accompanying drawings in which like reference numerals are used to refer to similar elements.
The network 105 can be a conventional type, wired or wireless, and may have numerous different configurations including a star configuration, token ring configuration, or other configurations. Furthermore, the network 105 may include a local area network (LAN), a wide area network (WAN) (e.g., the Internet), and/or other interconnected data paths across which multiple devices may communicate. In some embodiments, the network 105 may be a peer-to-peer network. The network 105 may also be coupled to or include portions of a telecommunications network for sending data in a variety of different communication protocols. In some embodiments, the network 105 may include Bluetooth communication networks or a cellular communications network for sending and receiving data including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, email, etc. Although
In some embodiments, the system 100 includes a recognition server 101 coupled to the network 105. The recognition server 101 may be, or may be implemented by, a computing device including a processor, a memory, applications, a database, and network communication capabilities. In the example of
In some embodiments, the recognition server 101 sends and receives data to and from other entities of the system 100 via the network 105. For example, the recognition server 101 sends and receives data including images to and from the client device 115. The images received by the recognition server 101 can include an image captured by the client device 115, an image copied from a web site or an email, or an image from any other source. Although only a single recognition server 101 is shown in
The client device 115 may be a computing device that includes a memory, a processor and a camera, for example a laptop computer, a desktop computer, a tablet computer, a mobile telephone, a smartphone, a personal digital assistant (PDA), a mobile email device, a webcam, a user wearable computing device or any other electronic device capable of accessing a network 105. The client device 115 provides general graphics and multimedia processing for any type of application. For example, the client device 115 may include a graphics processor unit (GPU) for handling graphics and multimedia processing. The client device 115 includes a display for viewing information provided by the recognition server 101. While
The client device 115 is adapted to send and receive data to and from the recognition server 101. For example, the client device 115 sends a captured image to the recognition server 101 and the recognition server 101 provides data in JavaScript Object Notation (JSON) format about one or more objects recognized in the captured image to the client device 115. The client device 115 may support use of graphical application program interface (API) such as Metal on Apple iOS™ or RenderScript on Android™ for determination of feature location and feature descriptors during image processing.
The image recognition application 103 may include software and/or logic to provide the functionality for presenting a set of templates to guide a user to capture images of the items correctly sized under different conditions. In some embodiments, the image recognition application 103 can be implemented using programmable or specialized hardware, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC). In some embodiments, the image recognition application 103 can be implemented using a combination of hardware and software. In other embodiments, the image recognition application 103 may be stored and executed on a combination of the client devices 115 and the recognition server 101, or by any one of the client devices 115 or recognition server 101.
In some embodiments, the image recognition application 103b may be a thin-client application with some functionality executed on the client device 115 and additional functionality executed on the recognition server 101 by the image recognition application 103a. For example, the image recognition application 103b on the client device 115 could include software and/or logic for capturing an image, transmitting the image to the recognition server 101, and displaying image recognition results. In another example, the image recognition application 103a on the recognition server 101 could include software and/or logic for generating a series of templates for use in the image captures. The image recognition application 103a or 103b may include further functionality described herein, such as, processing the image and performing feature identification. The operation of the image recognition application 103 and the functions listed above are described below in more detail below with reference to
The processor 235 may execute software instructions by performing various input/output, logical, and/or mathematical operations. The processor 235 may have various computing architectures to process data signals including, for example, a complex instruction set computer (CISC) architecture, a reduced instruction set computer (RISC) architecture, and/or an architecture implementing a combination of instruction sets. The processor 235 may be physical and/or virtual, and may include a single processing unit or a plurality of processing units and/or cores. In some implementations, the processor 235 may be capable of generating and providing electronic display signals to a display device, supporting the display of images, capturing and transmitting images, performing complex tasks including various types of feature extraction and sampling, etc. In some implementations, the processor 235 may be coupled to the memory 237 via the bus 220 to access data and instructions therefrom and store data therein. The bus 220 may couple the processor 235 to the other components of the computing device 200 including, for example, the memory 237, the communication unit 241, the image recognition application 103, and the data storage 243. It will be apparent to one skilled in the art that other processors, operating systems, sensors, displays, and physical configurations are possible.
The memory 237 may store and provide access to data for the other components of the computing device 200. The memory 237 may be included in a single computing device or distributed among a plurality of computing devices as discussed elsewhere herein. In some implementations, the memory 237 may store instructions and/or data that may be executed by the processor 235. The instructions and/or data may include code for performing the techniques described herein. For example, in one embodiment, the memory 237 may store the image recognition application 103. The memory 237 is also capable of storing other instructions and data, including, for example, an operating system, hardware drivers, other software applications, databases, etc. The memory 237 may be coupled to the bus 220 for communication with the processor 235 and the other components of the computing device 200.
The memory 237 may include one or more non-transitory computer-usable (e.g., readable, writeable) device, a static random access memory (SRAM) device, a dynamic random access memory (DRAM) device, an embedded memory device, a discrete memory device (e.g., a PROM, FPROM, ROM), a hard disk drive, an optical disk drive (CD, DVD, Blu-ray™, etc.) mediums, which can be any tangible apparatus or device that can contain, store, communicate, or transport instructions, data, computer programs, software, code, routines, etc., for processing by or in connection with the processor 235. In some implementations, the memory 237 may include one or more of volatile memory and non-volatile memory. It should be understood that the memory 237 may be a single device or may include multiple types of devices and configurations.
The display device 239 is a liquid crystal display (LCD), light emitting diode (LED) or any other similarly equipped display device, screen or monitor. The display device 239 represents any device equipped to display user interfaces, electronic images, and data as described herein. In different embodiments, the display is binary (only two different values for pixels), monochrome (multiple shades of one color), or allows multiple colors and shades. The display device 239 is coupled to the bus 220 for communication with the processor 235 and the other components of the computing device 200. It should be noted that the display device 239 is shown in
The communication unit 241 is hardware for receiving and transmitting data by linking the processor 235 to the network 105 and other processing systems. The communication unit 241 receives data such as requests from the client device 115 and transmits the requests to the controller 201, for example a request to process an image. The communication unit 241 also transmits information including recognition results to the client device 115 for display, for example, in response to processing the image. The communication unit 241 is coupled to the bus 220. In one embodiment, the communication unit 241 may include a port for direct physical connection to the client device 115 or to another communication channel. For example, the communication unit 241 may include an RJ45 port or similar port for wired communication with the client device 115. In another embodiment, the communication unit 241 may include a wireless transceiver (not shown) for exchanging data with the client device 115 or any other communication channel using one or more wireless communication methods, such as IEEE 802.11, IEEE 802.16, Bluetooth® or another suitable wireless communication method.
In yet another embodiment, the communication unit 241 may include a cellular communications transceiver for sending and receiving data over a cellular communications network such as via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, WAP, e-mail or another suitable type of electronic communication. In still another embodiment, the communication unit 241 may include a wired port and a wireless transceiver. The communication unit 241 also provides other conventional connections to the network 105 for distribution of files and/or media objects using standard network protocols such as TCP/IP, HTTP, HTTPS, and SMTP as will be understood to those skilled in the art.
The data storage 243 is a non-transitory memory that stores data for providing the functionality described herein. The data storage 243 may be a dynamic random access memory (DRAM) device, a static random access memory (SRAM) device, flash memory, or some other memory devices. In some embodiments, the data storage 243 also may include a non-volatile memory or similar permanent storage device and media including a hard disk drive, a floppy disk drive, a CD-ROM device, a DVD-ROM device, a DVD-RAM device, a DVD-RW device, a flash memory device, or some other mass storage device for storing information on a more permanent basis.
In the illustrated embodiment, the data storage 243 is communicatively coupled to the bus 220. The data storage 243 stores data for analyzing a received image and results of the analysis and other functionality as described herein. For example, the data storage 243 may store a database table for a plurality of items or objects (e.g., stock keeping units) for image recognition purposes. A stock keeping unit (SKU) is a distinct item, such as a product offered for sale. The term stock keeping unit or SKU may also refer to a unique identifier that refers to the particular product. The database table includes all attributes that makes the item distinguishable as a distinct object from all other items. For example, the attributes of a product include a unique identifier (e.g., Universal Product Code (UPC)), product name, physical dimensions (e.g., width, height, depth, etc.), size (e.g., liters, gallons, ounces, pounds, kilograms, fluid ounces, etc.), facing side (e.g., front, back, side, top, bottom, etc.), description, brand manufacturer, color, packaging version, material, model number, price, discount, base image, etc. The attributes may be automatically detected and determined by the system 100, or received from user input, or obtained based on the combination of the two. In some embodiments, the data storage 243 also stores variant images associated with one or more sides of an item and templates used in capturing the variant images. In other embodiments, the data storage 243 further stores a received image of the item and the set of features determined for the received image.
The capture device 247 may be operable to capture an image or data digitally of an object of interest. For example, the capture device 247 may be a high definition (HD) camera, a regular 2D camera, a multi-spectral camera, a structured light 3D camera, a time-of-flight 3D camera, a stereo camera, a standard smartphone camera, or a wearable computing device. The capture device 247 is coupled to the bus to provide the images and other processed metadata to the processor 235, the memory 237, or the data storage 243. It should be noted that the capture device 247 is shown in
In some embodiments, the image recognition application 103 may include a controller 201, an image processing module 203, a user interface module 205, a product image capture module 207, a guided capture module 209, and a dynamic template generation module 211. The components of the image recognition application 103 are communicatively coupled via the bus 220. The components of the image recognition application 103 may each include software and/or logic to provide their respective functionality. In some embodiments, the components of the image recognition application 103 can each be implemented using programmable or specialized hardware including a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC). In some embodiments, the components of the image recognition application 103 can each be implemented using a combination of hardware and software executable by the processor 235. In some embodiments, the components of the image recognition application 103 may each be stored in the memory 237 and be accessible and executable by the processor 235. In some embodiments, the components of the image recognition application 103 may each be adapted for cooperation and communication with the processor 235, the memory 237, and other components of the image recognition application 103 via the bus 220.
The controller 201 may include software and/or logic to control the operation of the other components of the image recognition application 103. The controller 201 controls the other components of the image recognition application 103 to perform the methods described below with reference to
In some embodiments, the controller 201 sends and receives data, via the communication unit 241, to and from one or more of the client device 115 and the recognition server 101. For example, the controller 201 receives, via the communication unit 241, an image from a client device 115 operated by a user and sends the image to the image processing module 203. In another example, the controller 201 receives data for providing a graphical user interface to a user from the user interface module 205 and sends the data to a client device 115, causing the client device 115 to present the user interface to the user.
In some embodiments, the controller 201 receives data from other components of the image recognition application 103 and stores the data in the data storage 243. For example, the controller 201 receives data including features identified for an image from the image processing module 203 and stores the data in the data storage 243. In other embodiments, the controller 201 retrieves data from the data storage 243 and sends the data to other components of the image recognition application 103. For example, the controller 201 retrieves data including an item or product from the data storage 243 and sends the retrieved data to the user interface module 205.
In some embodiments, the communications between the image recognition application 103 and other components of the computing device 200 as well as between the components of the image recognition application 103 can occur autonomously and independent of the controller 201.
The image processing module 203 may include software and/or logic to provide the functionality for receiving and processing one or more images from the client device 115. For example, the images may include an image of a product in a retail store. If no information of this product is found in a product database, this image serves as a starting point in creating the database for future recognition of the product.
In some embodiments, the image processing module 203 receives one or more images from the client device 115 for recognition and may include one or more objects of interest. For example, the image can be an image of a packaged product (e.g., a coffee package, a breakfast cereal box, a soda bottle, etc.) on a shelf of a retail store. A packaged product of a brand manufacturer may include textual and pictorial information printed on its surface that distinguishes it from packaged products belonging to one or more other brand manufacturers. The packaged products may also sit in an orientation on the shelf exposed to the user looking at the shelf. For example, a box-like packaged product might be oriented with the front, the back, the side, the top, or the bottom of the product exposed to the user looking at the shelf. It should be understood that there can be other products displayed on shelves without having a package.
In some embodiments, the image processing module 203 determines whether successful recognition is likely on the received image and instructs the user interface module 205 to generate graphical data including instructions for the user to retake the image if a section of the image captured by the client device 115 has limited information for complete recognition (e.g., a feature rich portion is cut off), the image is too blurry, the image has an illumination artifact (e.g., excessive reflection), etc. In other embodiments, the image processing module 203 may receive a single image as it is without any distortion.
In some embodiments, the image processing module 203 determines a set of features for the image. For example, the image processing module 203 may determine a location (X-Y coordinates), an orientation, and an image descriptor for each feature identified in the image. In some embodiments, the image processing module 203 uses corner detection algorithms for determining feature location. For example, the corner detection algorithms can include Shi-Tomasi corner detection algorithm, Harris and Stephens corner detection algorithm, etc. In some embodiments, the image processing module 203 uses feature description algorithms for determining efficient image feature descriptors. For example, the feature description algorithms may include Binary Robust Independent Elementary Features (BRIEF), Scale-Invariant Feature Transform (SIFT), etc. An image descriptor of a feature may be a 256-bit bitmask which describes the image sub-region covered by the feature. In some embodiments, the image processing module 203 may compare each pair of 256 pixel pairs near the feature for intensity and based on each comparison, the image processing module 203 may set or clear one bit in the 256-bit bitmask.
In some embodiments, the image processing module 203 matches the features of the image with the features of templates associated with a plurality of items for performing image recognition. For example, the image processing module 203 uses the database table storing information for products in the data storage 243 for analyzing the features of the image. The image processing module 203 identifies a region of interest (ROI) bordering each of the matched items in the image. A region of interest can be of any shape, for example, a polygon, a circle with a center point and a diameter, a rectangle having a width, a height and one or more reference points for the region (e.g., a center point, one or more corner points for the region), etc. For example, the region of interest may be a recognition rectangle bordering the matched item in its entirety. In another example, the region of interest may border the exposed label containing pictorial and textual information associated with the matched item.
In some embodiments, the image processing module 203 recognizes an item or product associated with the region of interest based on matching the image features from the image with the template features stored for a plurality of items. Symbolic information or metadata is determined in association with a recognition result for an identified item by the image processing module 203, and the symbolic information may include a Universal Product Code (UPC), position (e.g., position in relative X-Y coordinates, a slot position on a shelf, a particular shelf off the ground, etc.), facing side (e.g., top, bottom, front, back, or side) and dimensions (e.g., width, height, etc.) of the region of interest, and other metadata (e.g., packaging version). In some embodiments, the image processing module 203 determines the coordinate position and the dimensions of the items recognized in the image in relative units. The relative units do not correspond to physical dimensions, such as inches.
In some embodiments, the image processing module 203 sends data including the recognition result to the product image capture module 207, the guided capture module 209, and the dynamic template generation module 211 to start a guided information capture process to capture one or more images of an item. In other embodiments, the image processing module 203 stores the data in the data storage 243.
The user interface module 205 may include software and/or logic for providing user interfaces to a user. In some embodiments, the user interface module 205 receives instructions from the image processing module 203 to generate a user interface for the display of recognition results on the client device 115. In some embodiments, the user interface module 205 communicates with the product image capture 207, the guided capture module 209, and the dynamic template generation module 211 to generate graphical user interfaces to provide instructions and parameters on the display of the client device 115 such that a user can be instructed and use the parameters to capture an image of an item. In some embodiments, the user interface module 205 generates graphical user interface for displaying the product database as a tabular representation for searching by the user. In other embodiments, the user interface module 205 sends graphical user interface data to an application (e.g., a browser) in the client device 115 via the communication unit 241 causing the application to display the data as a graphical user interface.
While the present disclosure is described in the context of being part of the image recognition application 103 here and below, it should be understood that this is just one implementation example, and that the present disclosure, particularly the product image capture module 207, the guided capture module 209, and the dynamic template generation module 211 may be implemented in a number of various other configurations. For example, the product image capture module 207, the guided capture module 209, and the dynamic template generation module 211 may be used together as a stand-alone application to add images or augment a database used by a computer vision system. In another configuration, the product image capture module 207, the guided capture module 209, and the dynamic template generation module 211 may be used together as a mobile application for a mobile phone or tablet. In such cases, the stand-alone application may be a product image capture (PIC) application that includes the controller 201, the image processing module 203, the user interface module 205, the product image capture module 207, the guided capture module 209, and the dynamic template generation module 211.
The product image capture module 207 may include software and/or logic for generating user interfaces to receive information of an item (e.g., a product) and guide a user to capture one or more images of the item based on the received information. In some embodiments, the product image capture module 207 communicates with the user interface module 205, the guided capture module 209, the dynamic template generation module 211 as well as other components of the image recognition application 103 to perform the methods described below with reference to
One goal of the product image capture module 207 working together with other components of the image recognition application 103 (e.g., as a universal OS application) is to implement a guided image capture process to create a library with captured core product images. Another goal is to generate improved user interfaces for instructing users to capture images. The user interfaces are improved in that they are generated based on one or more cards, they improve a workflow of capturing and storing product information, and they are more aesthetically pleasing user interfaces.
In some embodiments, the product image capture module 207 determines one or more cards, and communicates with the user interface module 205, the guided capture module 209, and the dynamic template generation module 211 to generate user interfaces based on the one or more cards. A card includes format and layout information of labels and fields displayed on a user interface. For example, the product image capture module 207 may determine a card to include a label name, a location of the label, a font and a font size for the label, a color of the label, etc., and cause a user interface to be generated for displaying the label with the format specified in the card. A label may be related to product information such as a version, a facing, etc. For example, the product image capture module 207 determines that a user interface should include a “UPC” label in a first line of the user interface with “Times New Roman” font in size 15 pt, a “product name” label in a second line of the user interface with wrapping text and with “Helvetica Neue light” font in size 10 pt, etc., based on the determined card. Or a label may be of an action such as “continue,” “submit,” “back,” etc., which, once selected by a user via the user interface, causes an updated user interface. In some embodiments, a label is also associated with one or more fields that accept input from the user. For example, a “category” label is associated with a field into which the user can enter the actual category that a product belongs.
In some embodiments, the product image capture module 207 communicates with the user interface module 205, the guided capture module 209, and the dynamic template generation module 211 to generate user interfaces that improves a product information workflow.
In other embodiments, the product image capture module 207 incorporates with the user interface module 205 to generate the user interfaces to provide functionality for packaged information uploading as well as incremental information uploading.
The product image capture module 207 provides alternative information upload options to accommodate different needs. Using the packaged information upload, the product image capture module 207 stores 322 the information in the local cache until it is ready to submit 324 to a cloud repository, for example, responsive to detecting good network connectivity and network bandwidth. This avoids data transmission when the network connectivity is limited, but may require an explicit user uploading request and more bandwidth for transmitting a package that is larger than an increment. Using the incremental information upload, the product image capture module 207 handles information upload automatically with no need for a user uploading request, but may require APIs to take care of details of each product. The product image capture module 207 is therefore able to efficiently utilize system and network resources (e.g., by automatically balancing the data amount and network connectivity/bandwidth), and improves the product information workflow.
In some embodiments, the product image capture module 207 also communicates with a profile module (not shown) to authenticate a user, for example, by matching the user inputted credentials to the stored credentials. Based on the authentication result, the product image capture module 207 allows or prohibits the user from capturing product images and adding product information to a database. As depicted in the of
The guided capture module 209 may include software and/or logic for generating and providing instructions and parameters to help a user to capture one or more images of an item or a product that facilitate image recognition and retrieval when stored in a database. For example, the guided capture module 209 may generate and provide different size parameters to a user such that images of a product can be taken from different distances from a shelf on which the product is placed. The guided capture module 209 may generate and provide different size parameters for capturing images of a product at different perspectives or angles, and different sides, top or bottom of a product. The guided capture module 209 may also instruct the user to take product images of the same product placed at different positions of the shelf. Since the product images taken from different distances for a product at different positions of a shelf share more similarities with a random query image of the product, a comparison between the query image and the product images stored in a product database may result in more positive matches, and therefore enable a more robust image recognition and greater accuracy. In some embodiments, the guided capture module 209 may also communicate with the product image capture module 207, the dynamic template generation module 211 as well as other components of the image recognition application 103 to apply a machine learning algorithm to train the received images and learn from the training how to improve the searchability of the database.
The dynamic template generation module 211 may include software and/or logic for dynamically generating one or more templates, and sending the one or more templates to the guided capture module 209 for use in guiding the user to capture product images. In some embodiments, the dynamic template generation module 211 uses a machine learning algorithm to determine additional templates that can be added to existing templates for a particular product to enhance the database so that when it is used for recognition, the recognition will be more robust and accurate. For example, for a given product, there are images corresponding to three templates that have been captured and are being used and stored in the database. The dynamic template generation module 211 may train the machine learning algorithm to analyze all images currently representing a product and make recommendations on what additional images to capture to improve future recognitions. As a result, the dynamic template generation module 211 will generate one or more new templates and send the new template to the guided capture module 209 such that a recommended image that fits within this new template can be captured.
Referring now to
In some embodiments, an item or a product in a database can be broken into a hierarchical structure that includes a UPC, one or more versions, one or more faces, and one or more variants from the top level to the bottom level. The UPC is a unique index used to identify the product. A product identified by a UPC may have multiple versions, where a version represents a different product packaging instance. For example, a product may have a Christmas version, a Thanksgiving version, etc. Each version of a product may have different packaging for the product and contain several faces, e.g., the distinct sides of the product. For example, a product may have front, back, top, bottom, left side, and right-side faces. Each face may include many variants or variant images representing the variety of ways the product appears in the real world. For example, a version of a product includes five variant images of the front to show the slight differences of the product in different angles or different lighting conditions. Sometimes a variant is also referred as an entry.
In some embodiments, the guided capture module 209 performs a guided information capture process to capture variants of a face for a version of a product.
In
The search result is shown in the user interface 410 of
Once the user selects 414 to add a new version, the guided capture module 209 presents the “Select Face” screen in the user interface 416, where the user selects which face (top, front, right side, etc.) he or she will be capturing. For example, the user chooses the “front” 418 to capture as depicted by the callout.
Responsive to a selection of a product face (e.g., front) to capture in the user interface 416 of
In
In
The camera button 440 is highlighted to show that the user may select it to add a variant image to the face, e.g., the front face. Responsive to receiving the selection of the camera button 440, the guided information capture process starts as the user will operate to obtain product information based on instructions and parameters provided by the user interfaces.
Once the user selects the camera button 440, in
When the user positions the camera to have the actual product entered into the camera view of the user interface 442, the guided capture module 209 is configured to present the template or the rectangular overlaid over the image of the actual product shown on the camera view. The user adjusts the position of the camera such that the actual product fits within the rectangle and takes a picture.
When the guided capture module 209 determines that an image has been captured, the guided capture module 209 updates the user interface 444 to display a cropping screen 446. This cropping screen 446 is different from the cropping screen depicted in
Responsive to the user selection of the “Continue” button in the user interface 444, the guided capture module 209 saves the newly captured image as a second image of the product in the database, and presents the “Product Face” view again in the user interface 450 of
Once receiving the user input for adding the next variant image, the guided capture module 209 generates and presents the user interface 460 in
As shown at 602 of
Referring now to
In some embodiments, the dynamic template generation module 211 dynamically generates one or more templates, and sends the one or more templates to the guided capture module 209 for use in guiding the user to capture product images. In some embodiments, the dynamic template generation module 211 communicates with a machine learning system to receive feedback to the captured information to make adjustments to the size and location in the camera view. For example, if five images have been captured and stored in the database, the dynamic template generation module 211 may train the machine learning algorithm to analyze all images currently representing a product and make recommendations on what additional images to capture to improve future recognitions. As a result, the dynamic template generation module 211 will generate a new template and send the new template to the guided capture module 209 such that a recommended image that fits within this new template can be captured.
Once the information for the item is received from the user or retrieved from the database, at 912, the guided capture module 209 communicates with the product image capture module 207 and the user interface module 205 to generate a user interface to instruct the user to capture an image of the item. In some embodiments, the information of the item includes dimensions of the item. The guided capture module 209 may generate a user interface for presenting a template and instructing the user to capture a variant image of the item based on the template. The variant image corresponds to a face of the item. The template is a bounding box that is sized to match the dimensions of the item.
At 914, the image is captured by the user and the guided capture module 209 adds the image of the item to the database. At 916, the guided capture module 209 determines whether more images need to be captured. If so, the method 900 returns back to 912 to generate a user interface to instruct the user to capture more images. If no more images are needed, for example, the number of the received images is sufficient for robust image recognition on the database, the method 900 ends.
Referring now to
Referring to
A system and method for generating user interfaces for presenting a set of templates to guide a user to capture images of the items correctly sized under different conditions has been described. In the above description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the techniques introduced above. It will be apparent, however, to one skilled in the art that the techniques can be practiced without these specific details. In other instances, structures and devices are shown in block diagram form in order to avoid obscuring the description and for ease of understanding. For example, the techniques are described in one embodiment above primarily with reference to software and particular hardware. However, the present invention applies to any type of computing system that can receive data and commands, and present information as part of any peripheral devices providing services.
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
Some portions of the detailed descriptions described above are presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are, in some circumstances, used by those skilled in the data processing arts to convey the substance of their work to others skilled in the art. An algorithm is here, and generally, conceived to be a self-consistent sequence of steps leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing”, “computing”, “calculating”, “determining”, “displaying”, or the like, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.
The techniques also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may comprise a general-purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a non-transitory computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, and magnetic disks, read-only memories (ROMs), random access memories (RAMs), EPROMs, EEPROMs, magnetic or optical cards, flash memories including USB keys with non-volatile memory or any type of media suitable for storing electronic instructions, each coupled to a computer system bus.
Some embodiments can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. One embodiment is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
Furthermore, some embodiments can take the form of a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. For the purposes of this description, a computer-usable or computer readable medium can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
A data processing system suitable for storing and/or executing program code can include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intervening I/O controllers.
Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Finally, the algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the required method steps. The required structure for a variety of these systems will appear from the description above. In addition, the techniques are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the various embodiments as described herein.
The foregoing description of the embodiments has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the specification to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the embodiments be limited not by this detailed description, but rather by the claims of this application. As will be understood by those familiar with the art, the examples may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. Likewise, the particular naming and division of the modules, routines, features, attributes, methodologies and other aspects are not mandatory or significant, and the mechanisms that implement the description or its features may have different names, divisions and/or formats. Furthermore, as will be apparent to one of ordinary skill in the relevant art, the modules, routines, features, attributes, methodologies and other aspects of the specification can be implemented as software, hardware, firmware or any combination of the three. Also, wherever a component, an example of which is a module, of the specification is implemented as software, the component can be implemented as a standalone program, as part of a larger program, as a plurality of separate programs, as a statically or dynamically linked library, as a kernel loadable module, as a device driver, and/or in every and any other way known now or in the future to those of ordinary skill in the art of computer programming. Additionally, the specification is in no way limited to embodiment in any specific programming language, or for any specific operating system or environment. Accordingly, the disclosure is intended to be illustrative, but not limiting, of the scope of the specification, which is set forth in the following claims.
Claims
1. A computer-implemented method comprising:
- generating a first user interface configured to receive and present product information for an item including dimensions of the item;
- receiving a first image;
- generating a second user interface to present a template, the template including a bounding box sized to match the dimensions of the item, the second user interface configured to present the bounding box overlaid over a second image;
- receiving input to capture a portion of the second image within the bounding box;
- responsive to the input to capture the portion of the second image, generating a third user interface to present the first image and the captured portion of the second image as variants of a face of the item; and
- storing the captured portion of the second image as a variant of the face of the item and the information of the item in a database.
2. The computer-implemented method of claim 1, further comprising generating an initial user interface instructing the capture of an initial image, and using the initial image to retrieve the product information for the item and present the product information in the first user interface.
3. The computer-implemented method of claim 1, further comprising determining the template based on one from the group of a distance between the item and a capture device capturing the second image of the item, a position of the item, and lighting condition of the item.
4. The computer-implemented method of claim 1, further comprising:
- determining an aspect ratio based on the dimensions of the item;
- identifying a capture zone from the bounding box based on the aspect ratio; and
- wherein receiving the input to capture the portion of the second image within the bounding box comprises receiving the portion of the second image that is cropped based on the capture zone.
5. The computer-implemented method of claim 1, further comprising generating a new template based on training received images of the item using a machine learning algorithm.
6. The computer-implemented method of claim 1, further comprising generating a set of templates to be overlaid at different positions of a set of user interfaces, and receiving a set of images of the item that fit within the set of templates.
7. The computer-implemented method of claim 1, wherein generating the first user interface is based on a card, the card including format and layout information of labels and fields displayed on the first user interface.
8. The computer-implemented method of claim 1, further comprising authenticating the user.
9. A system comprising:
- one or more processors; and
- a memory, the memory storing instructions, which when executed cause the one or more processors to: generate a first user interface configured to receive and present product information for an item including dimensions of the item; receive a first image; generate a second user interface to present a template, the template including a bounding box sized to match the dimensions of the item, the second user interface configured to present the bounding box overlaid over a second image; receive input to capture a portion of the second image within the bounding box; responsive to the input to capture the portion of the second image, generate a third user interface to present the first image and the captured portion of the second image as variants of a face of the item; and store the captured portion of the second image as a variant of the face of the item and the information of the item in a database.
10. The system of claim 9, wherein the instructions further cause the one or more processors to generate an initial user interface instructing the capture of an initial image, and use the initial image to retrieve the product information for the item and present the product information in the first user interface.
11. The system of claim 9, wherein the instructions further cause the one or more processors to determine the template based on one from the group of a distance between the item and a capture device capturing the second image of the item, a position of the item, and lighting condition of the item.
12. The system of claim 9, wherein the instructions further cause the one or more processors to:
- determine an aspect ratio based on the dimensions of the item;
- identify a capture zone from the bounding box based on the aspect ratio; and
- wherein receiving the input to capture the portion of the second image within the bounding box comprises receiving the portion of the second image that is cropped based on the capture zone.
13. The system of claim 9, wherein the instructions further cause the one or more processors to generate a new template based on training received images of the item using a machine learning algorithm.
14. The system of claim 9, wherein the instructions cause the one or more processors to generate a set of templates to be overlaid at different positions of a set of user interfaces, and receive a set of images of the item that fit within the set of templates.
15. The system of claim 9, wherein the instructions cause the one or more processors to generate the first user interface based on a card, the card including format and layout information of labels and fields displayed on the first user interface.
16. A computer program product comprising a non-transitory computer readable medium storing a computer readable program, wherein the computer readable program when executed causes a computer to:
- generate a first user interface configured to receive and present product information for an item including dimensions of the item;
- receive a first image;
- generate a second user interface to present a template, the template including a bounding box sized to match the dimensions of the item, the second user interface configured to present the bounding box overlaid over a second image;
- receive input to capture a portion of the second image within the bounding box;
- responsive to the input to capture the portion of the second image, generate a third user interface to present the first image and the captured portion of the second image as variants of a face of the item; and
- store the captured portion of the second image as a variant of the face of the item and the information of the item in a database.
17. The computer program product of claim 16, wherein the computer readable program causes the computer to generate an initial user interface instructing the capture of an initial image, and use the initial image to retrieve the product information for the item and present the product information in the first user interface.
18. The computer program product of claim 16, wherein the computer readable program causes the computer to determine the template based on one from the group of a distance between the item and a capture device capturing the second image of the item, a position of the item, and lighting condition of the item.
19. The computer program product of claim 16, wherein the computer readable program causes the computer to:
- determine an aspect ratio based on the dimensions of the item;
- identify a capture zone from the bounding box based on the aspect ratio; and
- wherein receiving the input to capture the portion of the second image within the bounding box comprises receiving the portion of the second image that is cropped based on the capture zone.
20. The computer program product of claim 16, wherein the computer readable program causes the computer to generate a new template based on training received images of the item using a machine learning algorithm.
Type: Application
Filed: Aug 8, 2017
Publication Date: Nov 1, 2018
Applicant: Ricoh Company, Ltd. (Tokyo)
Inventors: Jamey Graham (Cupertino, CA), Daniel G. Van Olst (Cupertino, CA)
Application Number: 15/671,278