MOBILE DEVICE IMAGE ITEM REPLACEMENTS
A system for replacing physical items in images is discussed. A depicted item can be selected and removed from an image via image mask data and pixel merging techniques. Virtual light source positions can be generated based on real-world light source data from the image. A rendered simulation of a virtual item can then be integrated into the image to create a modified image for display.
This application is a continuation of U.S. patent application Ser. No. 16/521,359, filed. Jul. 24, 2019, the content of which is incorporated herein by reference in its entirety.
TECHNICAL FIELDEmbodiments of the present disclosure relate generally to image manipulation and, more particularly, but not by way of limitation, to image processing.
BACKGROUNDIncreasingly, users would like to simulate an object (e.g., chair, table, lamp) in a physical room without having access to the object. For example, a user may be browsing a network site (e.g., website) and see a floor lamp that may or may not match the style of the user's living room. The user may take a picture of his living room and overlay an image of the floor lamp in the picture to simulate what the floor lamp would look like in the living room. However, it can be difficult to adjust the floor lamp within the modeling environment using a mobile client device, which has limited resources (e.g., a small screen, limited processing power). Additionally, if the user living room already has a floor lamp, it is difficult to replace the physical floor lamp in the image with a simulated floor lamp through the mobile client device (e.g., in images or video generated by the mobile client device).
To easily identify the discussion of any particular element or act, the most significant digit or digits in a reference number refer to the figure (“FIG.”) number in which that element or act is first introduced.
The description that follows includes systems, methods, techniques, instruction sequences, and computing machine program products that embody illustrative embodiments of the disclosure. In the following description, for the purposes of explanation, numerous specific details are set forth in order to provide an understanding of various embodiments of the inventive subject matter. It will be evident, however, to those skilled in the art, that embodiments of the inventive subject matter may be practiced without these specific details. In general, well-known instruction instances, protocols, structures, and techniques are not necessarily shown in detail.
With reference to
In various implementations, the client device 110 comprises a computing device that includes at least a display and communication capabilities that provide access to the networked system 102 via the network 104. The client device 110 comprises, but is not limited to, a remote device, work station, computer, general-purpose computer, Internet appliance, hand-held device, wireless device, portable device, wearable computer, cellular or mobile phone, personal digital assistant (PDA), smart phone, tablet, ultrabook, netbook, laptop, desktop, multi-processor system, microprocessor-based or programmable consumer electronic system, game console, set-top box, network personal computer (PC), mini-computer, and so forth. In an example embodiment, the client device 110 comprises one or more of a touch screen, accelerometer, gyroscope, biometric sensor, camera (e.g., an RGB based camera, a depth sensing camera), microphone, Global Positioning System (GPS) device, and the like.
The client, device 110 communicates with the network 104 via a wired or wireless connection. For example, one or more portions of the network 104 comprise an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the public switched telephone network (PSTN), a cellular telephone network, a wireless network, a WI-FI® network, a Worldwide Interoperability for Microwave Access (WiMax) network, another type of network, or any suitable combination thereof.
Users (e.g., the user 106) comprise a person, a machine, or other means of interacting with the client device 110. In some example embodiments, the user 106 is not part of the network architecture 100, but interacts with the network architecture 100 via the client device 110 or another means. For instance, the user 106 provides input (e.g., touch-screen input or alphanumeric input) to the client device 110 and the input is communicated to the networked system 102 via the network 104. In this instance, the networked system 102, in response to receiving the input from the user 106, communicates information to the client device 110 via the network 104 to be presented to the user 106. In this way, the user 106 can interact with the networked system 102 using the client device 110.
An API server 120 and a web server 122 are coupled to, and provide programmatic and web interfaces respectively to, one or more application servers 140. The application server 140 can host a physical item replacement system 150, which can comprise one or more modules or applications, and which can be embodied as hardware, software, firmware, or any combination thereof. The application server 140 is, in turn, shown to be coupled to a database server 124 that facilitates access to one or more information storage repositories, such as a database 126. In an example embodiment, the database 126 comprises one or more storage devices that store information to be accessed by the physical item replacement system 150. Additionally, in some embodiments, the information may be cached locally on the client device 110. Further, while the client-server-based network architecture 100 shown in
The removal engine 210 is configured to receive a selection of a region in an image and remove an object depicted in the region using areas surrounding the selected region. For example, the removal engine 210 can generate an image mask for a given image that indicates which region includes the object to be replaced (e.g., the mask is used to denote or create an image hole in the original image to be filled in via inpainting or other interpolation approaches).
The mask engine 215 is configured to generate the mask data based on an input selection received from the user. For example, the user can perform a circle gesture on the item depicted on a touch screen to indicate that the encircled image is to be removed, or the user can tap on the item and a segmented portion of the image that contains the depicted item is then stored as the mask area. In some example embodiments, the mask engine 215 comprises an image segmentation neural network that segments an image into different areas. The segmented areas can then be selected via tapping, as described above.
The pose engine 220 is configured to determine the pose of a selected item to be removed. The determined pose is then used to arrange the virtual item that is to replace the removed item in the same pose. In some example embodiments, the pose engine 220 is trained on images of different classes of objects (e.g., images of chairs and lamps), and the pose engine 220 attempts to generate the pose data using the model for a given object type (e.g., if a chair object category is detected, the pose engine 220 applies a neural network model that has been trained on images of chair poses/orientations). As such, according to some example embodiments, the pose engine 220 comprises a plurality of pose detection neural networks, where each neural network is trained for a different type of object.
The light engine 225 manages detecting light sources in an image, which can be used by the model engine 227 to position virtual light sources for virtual object rendering. The model engine 227 is configured to manage a virtual 3D modeling environment for rendering of a virtual item for overlay over the image captured by the capture engine 205. The display engine 230 is configured to generate user interfaces for interaction with a user of a client device, and receive interactions (e.g., selection of a region in an image) from the user through said user interfaces.
At operation 315, the removal engine 210 removes the object from the image. In some example embodiments, at operation 315 the removal engine 210 removes the object by merging areas surrounding the image into the image area (e.g., inpainting, interpolation). At operation 320, the model engine 227 generates a render of a virtual object to replace the removed object in the image. For example, after the object in the image has been removed via inpainting, the model engine 227 generates a render of a 3D chair model for integration into the image. At operation 325, the model engine 227 generates a modified image by overlaying and integrating (e.g., blending) the render into the image.
At operation 410, the mask engine 215 generates an image mask from the specified region. For example, if the user drags a rectangular UI shape element over the depicted chair, then at operation 410 the mask engine 215 generates an image mask where the pixels of the rectangular region are masked (e.g., set to “0”) while the surrounding areas are unaltered or set to another value (e.g., set to “1”). After the user input is received and region data stored (e.g., stored as an image mask), the stored region data can be input into a machine learning scheme to remove the depicted physical object from the image. In some example embodiments, at operation 410 the mask data is applied to the image to create a “hole” in the image corresponding to the masked areas. For example, all pixels in the original image of the chair denoted by the rectangular region can be deleted or otherwise removed to create a hole in the original image where the chair was originally depicted. According to sonic example embodiments, the original image with the hole created by the image mask is the data used for inpainting and interpolation.
At operation 505, the removal engine 210 segments an image into regions. For example, the removal engine 210 implements a convolutional neural network trained to perform image segmentation to label different areas of an image (e.g., a background area, a chair area, a human face area, etc.) and create masks to denote the different regions/segments. At operation 510, the mask engine 215 receives a selection within the image. For example, the user may tap or mouse click on a chair to be removed in the image. At operation 515, the removal engine 210 identifies the region corresponding to the selection. For example, if, at operation 510, the user selects any pixel depicting a chair region, then at operation 515 the removal engine 210 identifies all pixels labeled as being a chair region at operation 505, or selects an image mask for the chair region. At operation 520, the mask engine 215 stores the pixel data of the region for input into the neural network for object removal. For example, at operation 520, the mask engine 215 stores an image mask for the region selected via a tap gesture. 100371
Attorney Docket No. 4536.018US2 10 operation 605 the classification engine 207 determines that the selected object is a type of chair and therefore generates and stores a chair category for the item. At operation 610, the pose engine 220 selects a pose estimation scheme based on the classification generated at operation 605. For example, at operation 610, the pose engine 220 selects a convolutional neural network trained to detect chair poses based on chair training images.
At operation 615, the pose engine 220 determines the pose of the depicted physical object. For example, at operation 615 the pose engine 220 applies the selected machine learning scheme for the given classification assigned to the depicted object to determine that the chair backside is facing the wall, away from the user at an angle.
At operation 620, the model engine 227 arranges the virtual object to match the pose of the depicted physical object. For example, the model engine 227 arranges a chair 3D virtual model so that the backside of the chair is not facing the virtual camera (where the virtual camera is set by the user's perspective, as discussed in further detail with reference to
At operation 625, the model engine 227 arranges virtual light sources in a modeling environment (e.g., a 3D model rendering environment executing on the user device) to cast virtual light rays on the virtual item to mimic the real-world environment depicted in the image (e.g., the room being imaged and displayed in real time on the display device). At operation 630, the model engine 227 generates a render of the arranged and virtually illuminated virtual item, which can then be blended into the image and displayed on the mobile device screen.
At operation 705, the light engine 225 separates the image into regions, such as a top left region, a top right region, a bottom left region, and a bottom right region. At operation 710, the light engine 225 determines the brightest regions based on luminance or pixel values in the regions. For example, the light engine 225 determines that the top right region is the brightest region. At operation 715, the light engine 225 stores virtual light position data (e.g., top right region as the brightest region), and the model engine 227 uses the position data to position a virtual light in the upper right portion of the virtual room (e.g., above and to the right of a virtual item in the modeling environment). For example, the light engine 225 can further store subarea position data indicating that, within the top right region, the top left portion is brightest, thereby indicating to the model engine 227 to position a virtual light source to correspond to the subarea position data.
As illustrated by the examples of
In some example embodiments, image processing or rendering techniques are implemented to simulate the lighting of the environment, in addition to placement of virtual light sources. For example, the image of the physical environment can be analyzed to determine a lighting scheme (e.g., overall brightness or luminance value of the image, identification of lighter and darker areas, etc.) and the lighting scheme can be simulated by darkening the render of the virtual object (e.g., darkening the texture surface, darkening the spectral quality, reflectance, and so on) in addition to simulating the lighting sources via virtual light source placement. In this way, for example, a virtual render of a chair in a shadowy corner can be first darkened using a global exposure setting for the rendered object, and then virtual rays from one or more virtual light sources can reflect off the virtual chair to further increase simulation accuracy.
Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules can constitute either software modules (e.g., code embodied on a machine-readable medium) or hardware modules. A “hardware module” is a tangible unit capable of performing certain operations and can be configured or arranged in a certain physical manner. In various example embodiments, one or more computer systems (e.g., a standalone computer system, a client computer system, or a server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) can be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.
In some embodiments, a hardware module can be implemented mechanically, electronically, or any suitable combination thereof. For example, a hardware module can include dedicated circuitry or logic that is permanently configured to perform certain operations. For example, a hardware module can be a special-purpose processor, such as a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC). A hardware module may also include programmable logic or circuitry that is temporarily configured by software to perform certain operations. For example, a hardware module can include software executed by a general-purpose processor or other programmable processor. Once configured by such software, hardware modules become specific machines (or specific components of a machine) uniquely tailored to perform the Configured functions and are no longer general-purpose processors. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) can be driven by cost and time considerations.
Accordingly, the phrase “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instant in time. For example, where a hardware module comprises a general-purpose processor configured by software to become a special-purpose processor, the general-purpose processor may be configured as respectively different special-purpose processors (e.g., comprising different hardware modules) at different times. Software accordingly configures a particular processor or processors, for example, to constitute a particular hardware module at one instant of time and to constitute a different hardware module at a. different instant of time.
Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules can be regarded as being communicatively coupled. Where multiple hardware modules exist contemporaneously, communications can be achieved through signal transmission (e.g., over appropriate circuits and buses) between or among two or more of the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module can perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module can then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules can also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).
The various operations of example methods described herein can be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors constitute processor-implemented modules that operate to perform one or more operations or functions described herein. As used herein, “processor-implemented module” refers to a hardware module implemented using one or more processors. 100591 Similarly, the methods described herein can be at least partially processor-implemented, with a particular processor or processors being an example of hardware. For example, at least some of the operations of a method can be performed by one or more processors or processor-implemented modules. Moreover, the one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), with these operations being accessible via a network 104 (e.g., the Internet) and via one or more appropriate interfaces (e.g., an application programming interface (API)).
The performance of certain of the operations may be distributed among the processors, not only residing within a single machine, but deployed across a number of machines In some example embodiments, the processors or processor-implemented modules can be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the processors or processor-implemented modules are distributed across a number of geographic locations.
The memory/storage 1530 can include a memory 1532, such as a main memory, or other memory storage, and a storage unit 1536, both accessible to the processors 1510 such as via the bus 1502. The storage unit 1536 and memory 1532 store the instructions 1516 embodying any one or more of the methodologies or functions described herein. The instructions 1516 can also reside, completely or partially, within the memory 1532, within the storage unit 1536, within at least one of the processors 1510 (e.g., within the processor's cache memory), or any suitable combination thereof, during execution thereof by the machine 1500. Accordingly, the memory 1532, the storage unit 1536, and the memory of the processors 1510 are examples of machine-readable media.
As used herein, the term “machine-readable medium” means a device able to store the instructions 1516 and data temporarily or permanently and may include, but not be limited to, random-access memory (RAM), read-only memory (ROM), buffer memory, flash memory, optical media, magnetic media, cache memory, other types of storage (e.g., erasable programmable read-only memory (EPROM)), or any suitable combination thereof. The term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store the instructions 1516. The term “machine-readable medium” shall also be taken to include any medium, or combination of multiple media, that is capable of storing instructions (e.g., the instructions 1516) for execution by a machine (e.g., the machine 1500), such that, the instructions, when executed by one or more processors of the machine (e.g., the processors 1510), cause the machine to perform any one or more of the methodologies described herein. Accordingly, a “machine-readable medium” refers to a single storage apparatus or device, as well as “cloud-based” storage systems or storage networks that include multiple storage apparatus or devices. The term “machine-readable medium” excludes signals per se.
The 110 components 1550 can include a wide variety of components to receive input, provide output, produce output, transmit information, exchange information, capture measurements, and so on. The specific I/O components 1550 that are included in a particular machine will depend on the type of machine. For example, portable machines such as mobile phones will likely include a touch input device or other such input mechanisms, while a headless server machine will likely not include such a touch input device. It will be appreciated that the I/O components 1550 can include many other components that are not shown in
Communication can be implemented using a wide variety of technologies. The I/O components 1550 may include communication components 1564 operable to couple the machine 1500 to a network 1580 or devices 1570 via a coupling 1582 and a coupling 1572, respectively. For example, the communication components 1564 include a network interface component or other suitable device to interface with the network 1580. In further examples, the communication components 1564 include wired communication components, wireless communication components, cellular communication components, near field communication (NFC) components, BLUETOOTH® components e.g., BLUETOOTH® Low Energy), WI-FI® components, and other communication components to provide communication via other modalities. The devices 1570 may be another machine or any of a wide variety of peripheral devices (e.g., a peripheral device coupled via a Universal Serial Bus (USB)).
Moreover, the communication components 1564 can detect identifiers or include components operable to detect identifiers. For example, the communication components 1564 can include radio frequency identification (RFID) tag reader components, NFC smart tag detection components, optical reader components (e.g., an optical sensor to detect one-dimensional bar codes such as a Universal Product Code (UPC) bar code, multi-dimensional bar codes such as a Quick Response (QR) code, Aztec Code, Data Matrix, Dataglyph, MaxiCode, PDF117, Ultra Code, Uniform Commercial Code Reduced Space Symbology (UCC RSS)-2D bar codes, and other optical codes), acoustic detection components (e.g., microphones to identify tagged audio signals), or any suitable combination thereof. In addition, a variety of information can be derived via the communication components 1564, such as location via Internet Protocol (IP) geo-location, location via WITIO signal triangulation, location via detecting a BLUETOOTH® or NFC beacon signal that may indicate a particular location, and so forth.
In various example embodiments, one or more portions of the network 1580 can be an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), the Internet, a portion of the Internet, a portion of the public switched telephone network (PSTN), a plain old telephone service (POTS) network, a cellular telephone network, a wireless network, a WI-FIS network, another type of network, or a combination of two or more such networks. For example, the network 1580 or a portion of the network 1580 may include a wireless or cellular network, and the coupling 1582 may be a Code Division Multiple Access (CDMA) connection, a Global System for Mobile communications (GSM) connection, or another type of cellular or wireless coupling. In this example, the coupling 1582 can implement any of a variety of types of data transfer technology, such as Single Carrier Radio Transmission Technology (1xRTT), Evolution-Data Optimized (EVDO) technology, General Packet Radio Service (GPRS) technology, Enhanced. Data rates for GSM Evolution (EDGE) technology, third Generation Partnership Project (3GPP) including 3G, fourth generation wireless (4G) networks, Universal Mobile Telecommunications System (UMTS), High-Speed Packet Access (HSPA), Worldwide Interoperability for Microwave Access (WiMAX), Long-Term Evolution (LTE) standard, or others defined by various standard-setting organizations, other long-range protocols, or other data-transfer technology.
The instructions 1516 can be transmitted or received over the network 1580 using a transmission medium via a network interface device (e.g., a network interface component included in the communication components 1564) and utilizing any one of a number of well-known transfer protocols (e.g., Hypertext Transfer Protocol (HTTP)). Similarly, the instructions 1516 can be transmitted or received using a transmission medium via the coupling 1572 (e.g., a peer-to-peer coupling) to the devices 1570. The term “transmission medium” shall be taken to include any intangible medium that is capable of storing, encoding, or carrying the instructions 1516 for execution by the machine 1500, and includes digital or analog communications signals or other intangible media to facilitate communication of such software.
Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.
Although an overview of the inventive subject matter has been described with reference to specific example embodiments, various modifications and changes may be made to these embodiments without departing from the broader scope of embodiments of the present disclosure. Such embodiments of the inventive subject matter may be referred to herein, individually or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any single disclosure or inventive concept if more than one is, in fact, disclosed.
The embodiments illustrated herein are described in sufficient detail to enable those skilled in the art to practice the teachings disclosed. Other embodiments may be used and derived therefrom, such that structural and logical substitutions and changes may be made without departing from the scope of this disclosure. The Detailed Description, therefore, is not to be taken in a limiting sense, and the scope of various embodiments is defined only by the appended. claims, along with the full range of equivalents to which such claims are entitled.
As used herein, the term “or” may be construed in either an inclusive or exclusive sense. Moreover, plural instances may be provided for resources, operations, or structures described herein as a single instance. Additionally, boundaries between various resources, operations, modules, engines, and data stores are somewhat arbitrary, and particular operations are illustrated in a context of specific illustrative configurations. Other allocations of functionality are envisioned and may fall within a scope of various embodiments of the present disclosure. In general, structures and functionality presented as separate resources in the example configurations may be implemented as a combined structure or resource. Similarly, structures and functionality presented as a single resource may be implemented as separate resources. These and other variations, modifications, additions, and improvements fall within a scope of embodiments of the present disclosure as represented by the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Claims
1.-20. (canceled)
21. A method comprising:
- generating, using one or more processors of a user device, an image of a physical environment;
- receiving, on a display device of the user device, a selection of an object to be replaced in the image;
- determining a three-dimensional orientation of the object as depicted within the image using a pose detection neural network comprising a convolutional neural network trained to detect three-dimensional orientation of objects in a plurality of object training images, the objects of the plurality of object training images being of a same type as the object detected in the image;
- removing, from the image, the object using regions that are proximate to the object in the image; and
- generating a modified image that depicts a render of a virtual model that replaces the object in the physical environment.
22. The method of claim 21, further comprising:
- generating the render of the virtual model in the three-dimensional orientation and as illuminated by one or more virtual light sources based on a lighting scheme in the image.
23. The method of claim 22, further comprising:
- determining the lighting scheme of the image.
24. The method of claim 23, wherein determining the lighting scheme comprises determining one or more bright regions of the image.
25. The method of claim 24, further comprising:
- positioning, in a virtual environment, the one or more virtual light sources based on locations of the one or more bright regions of the image.
26. The method of claim 24, wherein the determining of the one or more bright regions of the image comprises determining an area of pixels in the image having higher brightness values.
27. The method of claim 21, wherein, in the image, the object is depicted in an object image region, and the regions that are proximate to the object in the image are proximate regions that are external to the object image region.
28. The method of claim 27, wherein the object is removed by merging the proximate regions and the object image region.
29. The method of claim 28, wherein the proximate regions and the object image region are merged using a neural network that implements partial convolutional layers.
30. The method of claim 27, wherein the object is removed by interpolating the proximate regions and the object image region.
31. The method of claim 21, further comprising:
- displaying the image on a display device of the user device; and
- receiving selection of the object through the display device of the user device.
32. The method of claim 31, wherein receiving selection of the object comprises receiving selection of a selected region of the image that depicts the object.
33. The method of claim 32, further comprising:
- generating an image mask using the selected region.
34. The method of claim 32, further comprising:
- segmenting the image into segment regions using an image segmentation convolutional neural network (CNN), wherein the selected region is identified from a user input on the image as displayed on the display device of the user device.
35. The method of claim 34, wherein the user input is one of: a tap gesture or a click.
36. The method of claim 32, wherein receiving selection of the object through the display device comprises:
- receiving, on the display device of the user device, a swipe gesture over at least a portion of the object as depicted in the image.
37. A system comprising:
- one or more processors;
- a display device
- a memory storing instructions that, when executed by the one or more processors, cause the system to perform operations comprising:
- generating an image of a physical environment;
- receiving, on the display device, a selection of an object to be replaced in the image;
- determining a three-dimensional orientation of the object as depicted within the image using a pose detection neural network comprising a convolutional neural network trained to detect three-dimensional orientation of objects in a plurality of object training images, the objects of the plurality of object training images being of a same type as the object detected in the image;
- removing, from the image, the object using regions that are proximate to the object in the image; and
- generating a modified image that depicts a render of a virtual model that replaces the object in the physical environment.
38. The system of claim 37, the operations further comprising:
- generating the render of the virtual model in the three-dimensional orientation and as illuminated by one or more virtual light sources based on a lighting scheme in the image.
39. The system of claim 38, the operations further comprising:
- determining the lighting scheme of the image.
40. A machine-readable storage device embodying instructions that, when executed by a device, cause the device to perform operations comprising:
- generating an image of a physical environment;
- receiving, on a display device, a selection of an object to be replaced in the image;
- determining a three-dimensional orientation of the object as depicted within the image using a pose detection neural network comprising a convolutional neural network trained to detect three-dimensional orientation of objects in a plurality of object training images, the objects of the plurality of object training images being of a same type as the object detected in the image;
- removing, from the image, the object using regions that are proximate to the object in the image; and
- generating a modified image that depicts a render of a virtual model that replaces the object in the physical environment.
Type: Application
Filed: Oct 25, 2021
Publication Date: May 19, 2022
Inventors: Xiaoyi Huang (Palo Alto, CA), Jingwen Wang (Palo Alto, CA), Yi Wu (San Jose, CA), Xin Ai (Palo Alto, CA)
Application Number: 17/509,784