SURFACE DISTINCTION FOR MOBILE RENDERED AUGMENTED REALITY

Info

Publication number: 20210248826
Type: Application
Filed: Feb 8, 2021
Publication Date: Aug 12, 2021
Inventors: Ketaki Lalitha Uthra Shriram (San Francisco, CA), Jhanvi Samyukta Lakshmi Shriram (San Francisco, CA), Luis Pedro Oliveira da Costa Fonseca (Porto)
Application Number: 17/170,431

Abstract

An augmented reality (AR) system hosted and executed on a mobile client enables surface distinction for AR objects to interact with multiple surfaces captured within a camera view of the mobile client. The AR system receives an image from a camera of a mobile client, where the image depicts planar surfaces. The AR system identifies clusters of feature points corresponding to respective planar surfaces and generates meshes from the identified clusters. The AR system generates a 3D virtual coordinate plane that includes the identified clusters and generated meshes that correspond to respective surfaces. The AR system determines a spatial relationship between identified surfaces (e.g., a height between surfaces). The AR system receives a user interaction based on the identified surfaces and provides an AR object for display at the mobile client based on the received interaction (e.g., an AR object traveling between identified surfaces).

Description

Description

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/971,766, filed Feb. 7, 2020, which is incorporated by reference in its entirety.

TECHNICAL FIELD

The disclosure generally relates to the field of mobile rendered augmented reality and more specifically to surface distinction in mobile rendered augmented reality environments.

BACKGROUND

Conventional augmented reality (AR) systems do not enable a user to control an AR object to interact between two surfaces in mobile computer environments. An AR object limited to a fixed position or the range of interactions between the AR object and the environment was confined to a single surface. In particular, conventional AR systems for mobile clients are unable to identify surfaces within an environment and determine spatial relationships between the identified surfaces (e.g., the height or distance between points on respective surfaces). Moreover, conventional solutions often require immense computing resources, power consumption, and memory capacity that are lacking in mobile computer environments. Accordingly, there is a need for a practical, mobile surface distinction solution that allows AR users to have an immersive experience where AR objects can naturally interact with identified surfaces.

BRIEF DESCRIPTION OF DRAWINGS

The disclosed embodiments have other advantages and features which will be more readily apparent from the detailed description, the appended claims, and the accompanying figures (or drawings). A brief introduction of the figures is below.

FIG. 1 illustrates an augmented reality (AR) game system environment, in accordance with at least one embodiment.

FIG. 2 is a block diagram of the surface distinction application of FIG. 1, in accordance with at least one embodiment.

FIG. 3 is a flowchart illustrating a process for providing an AR object for display based on surface distinction, in accordance with at least one embodiment.

FIGS. 4A and 4B are flowcharts illustrating a process for providing an AR object for display based on the process of FIG. 3, in accordance with at least one embodiment.

FIGS. 5A, 5B, 5C, 5D, and 5E illustrate a process for controlling an AR object based on user interactions, at a mobile client, with planar surfaces detected by an AR system, in accordance with at least one embodiment.

FIG. 6 illustrates a block diagram including components of a machine able to read instructions from a machine-readable medium and execute them in a processor (or controller), in accordance with at least one embodiment.

DETAILED DESCRIPTION

The Figures (FIGS.) and the following description relate to preferred embodiments by way of illustration only. It should be noted that from the following discussion, alternative embodiments of the structures and methods disclosed herein will be readily recognized as viable alternatives that may be employed without departing from the principles of what is claimed.

Reference will now be made in detail to several embodiments, examples of which are illustrated in the accompanying figures. It is noted that wherever practicable similar or like reference numbers may be used in the figures and may indicate similar or like functionality. The figures depict embodiments of the disclosed system (or method) for purposes of illustration only. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated herein may be employed without departing from the principles described herein.

Configuration Overview

In one example embodiment of a disclosed system, method and computer readable storage medium, surfaces within a user's environment are identified to enable an augmented reality (AR) object to be provided for display on a mobile client based on the identified surfaces. Conventional AR systems for mobile clients are unable to identify surfaces within an environment and determine spatial relationships between the identified surfaces (e.g., the height or distance between points on respective surfaces). The systems and methods described herein may selectively limit the data stored to conserve memory resources. Furthermore, the surfaces may be identified using generated meshes rather than individual feature points, which optimizes processing resources (e.g., by analyzing one mesh as opposed to multiple feature points that make up the mesh). Accordingly, described is a configuration that performs these functions, which may be referred to herein as “surface distinction,” and enables AR objects to interact with the identified surfaces in a mobile rendered AR system while optimizing for the memory, power, and processing constraints (e.g., of the mobile client).

In one example configuration, a camera coupled with the mobile client (e.g., integrated with the mobile client or wirelessly or wired connection with the mobile client) captures a camera view of an environment. The environment may correspond to the physical world, which includes surfaces within a field of view of the camera (i.e., the “camera view”). A processor (e.g., of the mobile device) processes program code that causes the processor to execute specified functions as are further described herein. Accordingly, the processor receives the image and several feature points, identified from the image, associated with the surfaces in the environment. The processor generates a three-dimensional (3D) virtual coordinate space using the feature points. The processor identifies two or more clusters from the feature points, each cluster corresponding to a different surface. The processor generates meshes from the clusters, where each mesh is defined by coordinates in the 3D virtual coordinate space (e.g., the shape of the mesh may be outlined by the coordinates). Using these coordinates, the processor may determine a height difference between two surfaces (e.g., a difference between two Z-coordinates of respective meshes). The processor receives a user interaction between the two surfaces. For example, the user makes a swipe on the display starting from one surface and ending at the other surface. The processor provides an AR object for display at the mobile client, where the AR object is configured to interact with the two surfaces based on the user interaction. For example, the processor provides an AR avatar for display that runs from one surface to the other surface.

Surface distinction allows the user to control AR objects within an AR application as though the objects are interacting with reality around the user, presenting an immersive AR experience for the user. In particular, the methods described herein allow surface distinction on a mobile client executed AR application without consuming too much processing and/or battery power.

Augmented Reality System Environment

FIG. 1 illustrates an AR system environment, in accordance with at least one embodiment. The AR system environment may be an AR game system environment that enables a user to play AR games on a mobile client 100, and in some embodiments, presents immersive gaming experiences to the user via surface distinction. The system environment includes a mobile client 100, an AR system 110, an AR engine 120, a surface distinction application 130, a database 140, and a network 150. The AR system 110, in some example embodiments, may include the mobile client 100, the AR engine 120, the surface distinction application 130, and the database 140. In other example embodiments, the AR system 110 may include the AR engine 120, the surface distinction application 130, and the database 140, but not the mobile client 100, such that the AR system 110 communicatively couples (e.g., wireless communication) to the mobile client 100 from a remote server.

The mobile client 100 is a mobile device that is or incorporates a computer. The mobile client may be, for example, a relatively small computing device in which network processing (e.g., processor and/or controller) and power resources (e.g., battery) may be limited and have a formfactor size such as a smartphone, tablet, wearable device (e.g., smartwatch) and/or a portable internet enabled device. The limitations of such device extend from scientific principles that must be adhered to in designing such products for portability and use away from constant power draw sources.

The mobile client 100 has general and/or special purpose processors, memory, storage, networking components (either wired or wireless). The mobile client 100 can communicate over one or more communication connections (e.g., a wired connection such as ethernet or a wireless communication via cellular signal (e.g., LTE, 5G), WiFi, satellite) and includes a global positioning system (GPS) used to determine a location of the mobile client 100. The mobile client 100 also includes a screen 103 (e.g., a display) and a display driver to provide for display interfaces on the display associated with the mobile client 100. The mobile client 100 executes an operating system, such as GOOGLE ANDROID OS and/or APPLE iOS, and includes a display and/or a user interface that the user can interact with.

The mobile client 100 also includes one or more cameras (e.g., the camera 102) that can capture forward and rear facing images and/or videos. The one or more cameras 102 may be configured to capture depths of objects within an image. For example, the one or more cameras 102 may be a dual camera, Light Detection and Ranging (LiDAR) camera, ultrasonic imaging camera, or any suitable camera capable of determining distance between an object and the camera.

In some embodiments, the mobile client 100 couples to the AR system 110, which enables it to execute an AR application (e.g., the AR client 101). The AR engine 120 interacts with the mobile client 100 to execute the AR client 101 (e.g., an AR game). For example, the AR engine 120 may be a game engine such as UNITY and/or UNREAL ENGINE. The AR engine 120 displays, and the user interacts with, the AR game via the mobile client 100. Although the AR application refers to an AR gaming application in many instances described herein, the AR application may be a retail application integrating AR for modeling purchasable products, an educational application integrating AR for demonstrating concepts within a learning curriculum, or any suitable interactive application in which AR may be used to augment the interactions. In some embodiments, the AR engine 120 is integrated into and/or hosted on the mobile client 100. In other embodiments, the AR engine 120 is hosted external to the mobile client 100 and communicatively couples to the mobile client 100 over the network 150. The AR system 110 may comprise program code that executes functions as described herein.

In some example embodiments, the AR system 110 includes the surface distinction application 130. The surface distinction application enables surface distinction an AR game such that AR objects (e.g., virtual objects rendered by the AR engine 120) may appear to interact with various surfaces in an environment of the user. The user may capture an image and/or video of an environment, which may include one or more objects (e.g., a table, a book, etc.) captured within a camera view of the camera 102 or the mobile client 100. While the surface distinction application 130 may identify surfaces depicted within both images and videos, many instances described herein will refer to surface distinction in images captured by a mobile client. The AR engine 120 renders an AR object, where the rendering may be based on the identified surfaces within the environment (e.g., the surface of a table).

During game play, the surface distinction application 130 identifies and distinguishes surfaces (e.g., floors and walls) in an image of the environment. For example, the surface distinction application 130 may distinguish a surface of a floor from a surface of a table. In some embodiments, the surface distinction application 130 provides an AR object for display as interacting with one or more surfaces. For example, the AR object is an AR avatar (e.g., a AR representation of a human resembling the user) and the avatar is displayed sitting on the floor, climbing from a table to a book resting on the table, etc. FIGS. 5A-5E, described further herein, illustrate one example of the surface distinction application 130 identifying surfaces within a living room and enabling a user to control an AR avatar that is configured to travel between identified surfaces (e.g., walking in a visually natural way between surfaces). In some embodiments, the AR system 110 includes applications instead of and/or in addition to the surface distinction application 130. In some embodiments, the surface distinction application 130 may be hosted on and/or executed by the mobile client 100. In other embodiments, the surface distinction application 130 is communicatively coupled to the mobile client 100.

The database 140 stores images or videos that may be used by the surface distinction application 130 to identify surfaces within an image or video. The mobile client 100 may transmit images or videos collected by the camera 102 during the execution of the AR client 101 to the database 140. The data stored within the database 140 may be collected from a single client (e.g., the mobile client 100) or multiple clients (e.g., other mobile clients that are communicatively coupled to the AR system 110 through the network 150). The surface distinction application 130 may use images and/or videos of environments stored in the database 140 to train a model to classify objects within the environments. In turn, the classified objects may be used to determine a particular surface depicted in the image. The classification of objects within an image is further described in the description of FIG. 2.

The network 150 transmits data between the mobile client 100 and the AR system 110. The network 150 may be a local area and/or wide area network that uses wired and/or wireless communication systems, such as the internet. In some embodiments, the network 150 includes encryption capabilities to ensure the security of data, such as secure sockets layer (SSL), transport layer security (TLS), virtual private networks (VPNs), internet protocol security (IPsec), etc.

Example Surface Detection Configuration

FIG. 2 is a block diagram of the surface distinction application 130 of FIG. 1, in accordance with at least one embodiment. The surface distinction application 130 includes a network interface 210, a cluster module 220, a mesh generation module 230, and a rendering module 240. In some embodiments, the surface distinction application 130 includes modules other than those shown in FIG. 2. For example, the surface distinction application 130 may include a machine learning model trained to classify objects within images received from the mobile client 100. The modules may be embodied as program code (e.g., software comprised of instructions stored on non-transitory computer readable storage medium and executable by at least one processor such as the processor 602 in FIG. 6) and/or hardware (e.g., application specific integrated circuit (ASIC) chips or field programmable gate arrays (FPGA) with firmware. The modules correspond to at least having the functionality described when executed/operated.

The network interface 210 may be a communication interface for the surface distinction application 130 to communicate with various components in the AR system 110, such as the mobile client 100, the AR engine 120, and the database 140. The mobile client 100 may transmit requests with data payloads to the surface distinction application 130 via the network interface 210. The requests may be to identify surfaces within an image, modify the display of AR objects (e.g., in response to user interactions with the mobile client 100 with a request to control the state of an AR object), or any suitable action to provide an interactive AR experience using the AR client 101. The requests may have data payloads such as an image, a video, or information about a user interaction (e.g., coordinates on the screen 103 that the user has interacted with in a request to control an AR object). Although not depicted in FIG. 2, the “network interface” may be referred to as an “application interface” in embodiments where the surface distinction application is hosted and executed on the mobile client 100. The application interface may be a communication interface for the surface distinction application 130 to communication with various components in the mobile client 100 such as the camera 102 and the screen 103.

Requests received by the network interface 210 from the mobile client 100 may be automatically generated during execution of the AR client 101 (e.g., during gameplay). Alternatively or additionally, the requests may be generated responsive to a user interaction with the mobile client 110. For example, the user may tap the screen 103 at a location corresponding to a book, and the AR client 101 may send a request to the surface distinction application 130 to identify a surface, based on the user interaction, within an image depicting the table. The requests from the mobile client 100 may identify the mobile client 100 and/or the AR client 101. Other components within the AR system 110 may communicate with the surface distinction application 130 via the network interface 210. For example, the AR Engine 120 may provide feature points of an image to the surface distinction application 130 via the network interface 210. The network interface 210 may take various forms. For example, in some embodiments, the network interface 210 takes the form of an application programming interface (API) such as REST (representational state transfer), SOAP (Simple Object Access Protocol), RPC (remote procedural call), or another suitable type.

The network interface 210 may receive an image captured within the camera view of the camera 102 of the mobile client 100. The network interface 210 may transmit the image to the AR engine 120, which determines feature points in the image of the environment and provides the feature points to the cluster module 220 (e.g., via the network interface 210). Alternatively, the mobile client 100 may provide the image to the AR engine 120, which subsequently provides feature points to the cluster module 220. The feature points may provide information about the content of an image for subsequent image processing, where the information indicates features of structures within the image such as surfaces, corners, points, objects, etc. The use of feature points by the surface distinction application 130 is further described with respect to the cluster module 220 and mesh generation module 230.

The cluster module 220 groups the feature points received from the AR engine 120 into clusters. The clusters may be indicative of distinct features in the image such as objects and surfaces (e.g., chairs, tables, walls, floors, etc.). An object depicted in an image from a camera view of the environment may be referred to herein as a “real-world object” as such objects have physical form and existence. In some embodiments, the cluster module 220 identifies clusters using Euclidean clustering, where feature points within a threshold distance of one another form a cluster. The cluster module 220 may identify clusters using alternative methods, in other embodiments. The cluster module 220 may generate a three-dimensional (3D) virtual coordinate space using the received feature points. Each feature point may be characterized by Cartesian coordinates, which may indicate a depth and a height of the feature point within the 3D virtual coordinate space.

In some embodiments, the surface distinction application 130 (e.g., via the cluster module 220) may store a first set of feature points of an first image corresponding to what is depicted in a camera view of a mobile device and subsequently discard the first set of feature points so that memory resources may be saved for a second set of feature points (e.g., when the camera view changes to depict different objects). For example, a first image may depict a table and the cluster module 220 may receive a first set of feature points identified for the table. A subsequently received second image, may depict a chair and the cluster module 220 may receive a second set of feature points identified for the chair. The cluster module 220 may determine, using the received feature points, that camera view of the mobile device is capturing different objects or a different view of the environment. For example, the cluster module 220 may compare the values of the first set and second set of feature points to determine that they correspond to different objects. The cluster module 220 may determine to discard the feature points corresponding to objects that are not captured within the most recent camera view. For example, the cluster module 220 may discard the first set of feature points (e.g., responsive to determining that the first set and second set of feature points are different or exceed a threshold measure of difference). In this way, the surface distinction application 130 may conserve memory resources.

A cluster indicates physical surfaces within the environment. The surfaces may be of a real-world object (e.g., the top of a table or seat of a chair) or standalone surfaces such as floors. Cluster attributes such as its size and location within the captured image relative to other clusters may correspond to similar attributes of the surface relative to other surfaces in the environment. For example, the cluster module 220 may identify a cluster in an image of a room with a box. The identified cluster may correspond to the surfaces of the box, indicate a location of the box within the image of the room, and scale according to a size of the box in the image. For example, in response to an image where the box is closer, and therefore appears larger, to the mobile client 100, the cluster corresponding to the box would also become larger (e.g., more feature points in the cluster). Feature point density may be used to determine the presence of an object. In some embodiments, feature point density is proportional to image texture, where an object with varying colors or surface features (e.g., protrusions, corners, ridges, etc.) may be associated with a higher density of clusters than a flat surface whose image texture does not vary as much.

In some embodiments, the cluster module 220 identifies a plurality of clusters from the feature points received from the AR engine 120, e.g., each cluster corresponding to a real-world object in the camera view of the environment. In response, the cluster module 220 determines a size of each of the clusters. Based on the determined sizes, the cluster module 220 selects one of the clusters. For example, the cluster module 220 may select the largest cluster. In another example embodiment, the cluster module 220 receives user input provided by the mobile client 100 via the network interface 210. The user input may be a tap or swipe on the screen 103, a click or drag with a computer cursor, or any other suitable user input via the mobile client 100 indicating a selection of a real-world object displayed on the screen 103. The cluster module 220 subsequently selects a cluster corresponding to the selected real-world object.

The cluster module 220 may distinguish clusters of feature points associated with real-world objects from clusters of feature points associated with flat, two-dimensional surfaces (2D) (e.g., walls, floors) in the camera view. In particular, the cluster module 220 applies a Random Sample Consensus (RANSAC) algorithm to the image to identify these 2D surfaces. In some embodiments, a custom matrix system application programming interface (API) enables the cluster module 220 to use the RANSAC methodology on the mobile client 100.

The mesh generation module 230 generates a 3D mesh for a real-world object or surface from its corresponding cluster. The mesh generation module 230 may use triangulation (e.g., Delaunay triangulation) to generate a 3D mesh from the cluster of feature points. In some embodiments, the mesh generation module 230 builds a stencil shader and accordingly generates a 3D mesh that corresponds to bounds of the identified cluster. Therefore, the 3D mesh approximates the shape of the real-world object. The mesh generation module 230 may iterate through each cluster identified by the cluster module 220 to generate 3D meshes. In some embodiments, the mesh generation module 230 may generate a 3D mesh of a user-specified cluster responsive to a user selection of a point or area within the image (e.g., using the mobile client 100) corresponding to the cluster.

In some embodiments, the mesh generation module 230 may detect 2D surfaces depicted within an image. The mesh generation module 230 may determine features of the mesh (e.g., depth, convexity, etc.) and use the determined features to determine whether the mesh corresponds to a 3D image. The mesh generation module 230 allows surface distinction to be performed using a mesh rather than individual feature points. By using the generated mesh instead of individual feature points to detect surfaces, the surface distinction application 130 is processing, for example, one 3D mesh as opposed to multiple feature points (e.g., millions of feature points). This way, the surface distinction application 130 may optimize processing resources.

In one example, a cluster determined by the cluster module 220 corresponds to a box. The mesh generation module 230 may generate a 3D mesh that is shaped similarly to the box. The 3D mesh of the box would correspond to a height, length, and width of the box. In addition, the mesh generation module 230 may be configured to refine the density of the mesh. For example, a lower density mesh may provide the primary points for the 3D mesh of the object in terms of dimensions. The more refined, or higher density, mesh may allow for generating a 3D mesh that may also generate a corresponding texture of the object (e.g., ridges, bumps, etc.). The mesh generation module 230 provides the 3D mesh to the rendering module 240 using, for example, a .NET bridge.

In some embodiments, the mesh generation module 230 generates the 3D mesh for a cluster after classifying a real-world object within the received image. For example, the surface distinction application 130 may include a machine learning model that classifies real-world objects in images of the environment. The mesh generation module 230 may apply an image to the machine learning model, which outputs a type for each of the real-world objects in the image. The mesh generation module 230 may generate the 3D mesh for the selected real-world object based on the type output by the machine learning model. Once the mesh generation module 230 generates the 3D mesh, the module may determine coordinates of the 3D mesh within the 3D virtual coordinate space generated by the cluster module 220. These determined coordinates may be boundary coordinates outlining, for example, edges, surfaces, cavities, etc. of the 3D mesh as located within the virtual coordinate space representing the environment in the camera view.

In some embodiments, the surface distinction application 130 includes a machine learning model training engine that accesses the database 140 for images of real-world objects to train the machine learning model. The machine learning model training engine may generate training sets using the accessed images and labels identifying the objects depicted within the images. The training engine may then use the training sets to train the machine learning model.

The rendering module 240 provides for display, on the mobile client 100, an AR object that is generated, or “rendered,” by the AR engine 120. To provide the AR object for display at the mobile client 100, the rendering module 240 may determine a location on a virtual coordinate space to position the rendered AR object and transmit the location and the rendered AR object to the mobile client 100 for display.

The rendering module 240 may determine various states to display an AR object depending on an identified surface on which the AR object is rendered and/or a location within the identified surface. For example, the rendering module 240 may render an AR avatar that appears to look down when the AR avatar is displayed at the edge of a surface. The rendering module 240 may determine that the edge of the surface has a coordinate within the virtual coordinate space with a non-zero Z-coordinate or value greater than a threshold Z-coordinate and display the AR avatar in a state where it appears to look down. The rendering module 240 may also provide an AR object for display that indicates the presence of other identified surfaces to which it may travel. For example, the AR avatar may be rendered to point to a table's surface to which it may climb. When the AR object is at a location on the virtual coordinate space that overlaps with the generated 3D mesh of a surface, the rendering module 240 provides for display the AR object as interacting with the surface (e.g., standing on the surface, traveling on the surface, sitting on the surface, etc.).

In some embodiments, the rendering module 240 provides the 3D meshes of the surfaces for display on the mobile client 100. The mesh generation module 230 may provide the generated 3D meshes for display on the mobile client's display 103. In some embodiments, the mesh generation module 230 provides an indication of identified surfaces for display by overlaying 3D meshes of the surfaces over the image from the camera view of the mobile client 100.

The rendering module 240 may receive, from the mobile client 100, data indicative of user interactions indicating one or more locations at which an AR object is requested to be provided at the screen 103. For example, a user may tap or swipe the screen 103 (e.g., a touchscreen display) and the AR client 101 may provide one or more points (e.g., coordinates corresponding to areas of the screen 103) to the surface distinction application 210. The rendering module 240 then determines one or more corresponding coordinates on the 3D virtual coordinate space to provide the AR object for display.

The rendering module 240 may determine a set of coordinates in the 3D virtual coordinate space that correspond to a path between two identified surfaces: a floor and a table. The rendering module 240 may use the Z-coordinates of respective identified surfaces (e.g., coordinates corresponding to the center each surface) within the 3D virtual coordinate space to determine height differences between two surfaces. In addition to determining a height difference, the rendering module 240 may determine a depth difference using X and Y coordinates of the respective identified surfaces. Using the determined height and depth differences, the rendering module 240 may determine a shortest path between the centers of each surface. The rendering module 240 may provide a path (e.g., stepping stones) along this shortest path for display at the mobile client 100. For example, the rendering module 240 requests the AR engine 120 render AR stepping stones and subsequently transmits the rendered stepping stones to the mobile client 100 for display.

The state that the rendering module 240 determines to display an AR object may include the size or ratio of the AR object, which may vary depending on the location within the 3D virtual coordinate space that the AR object is placed. Continuing the previous example, the rendering module 240 may display the AR avatar, as it travels along the AR stepping stones, in an increasingly larger size ratios if the avatar is traveling towards the user or a decreasingly smaller size ratios if the avatar is traveling away. Thus, the rendering module 240 promotes a more visually natural display of the AR objects as they interact with identified surfaces.

The rendering module 240 may render instructions for display to direct a user to perform a particular user interaction. The rendered instructions may include text, pictures, animations, AR objects, etc. For example, the rendering module 240 may provide an arrow for display pointing from one surface to another, indicating that the user make a swiping motion on the screen 103 to follow the arrow and cause a set of AR steps to be generated and thus, creating a path between the two surfaces for an AR avatar to travel between. The rendering module 240 may determine the instructions using the coordinates of the 3D meshes corresponding to the surfaces. For example, the rendering module 240 may use the Cartesian coordinates of two points within respective surface meshes to determine a line in the 3D virtual coordinate space between the two meshes. The rendering module 240 may then use the line to determine a location to provide the instructional arrow for display.

Processes for Providing AR Objects Based on Surface Distinction

FIG. 3 is a flowchart illustrating an example process 300 for providing an AR object for display based on surface distinction, in accordance with at least one embodiment. The process 300 may be performed by the surface distinction application 130. The surface distinction application 130 may perform operations of the process 300 in parallel or in different orders, or may perform different, additional, or fewer steps. For example, prior to identifying 306 the clusters, the surface distinction application 130 may receive, from the mobile client, an request based on a user interaction identify a particular surface (e.g., the user taps on a book displayed in an image shown on the screen 103) and an identified 306 cluster corresponds to the user's selected surface. Furthermore, while an image is referenced in the process 300, the surface distinction application may also be applied to videos. The process 300 may be performed by a surface distinction application hosted on a mobile client or hosted on a remote server that is communicatively coupled to the mobile client.

The surface distinction application 130 receives 302 an image from a camera of a mobile client, where the image depicts planar surfaces. As refer to herein, “planar surfaces” are 2D surfaces with minimal or no texture (e.g., cavities, ridges, protrusions, etc.). The terms “planar surface” and “surface” may be used interchangeably herein unless the context in which either term is used indicates otherwise. Planar surfaces may include relatively flat surfaces such as floors, walls, surfaces of flat objects (e.g., a top of a box) or semi-flat objects (e.g., a page of an open book), or any suitable 2D surface. The planar surface may have a slope. For example, the planar surface may be the exterior of a triangular roof. In one example, the mobile client 100 transmits an image from a camera view (e.g., field of view from the camera 102) of a park with grass, a bench, and a tree. Planar surfaces depicted in this image include the grass and the bench (e.g., the seat and the back of the bench). The surface distinction application 130 may provide the received image to the AR engine 120 to identify feature points within the image. An interface of the surface distinction application such as the network interface 210 or the application interface (e.g., for surface distinction applications hosted on the mobile client 100) may receive 302 the image.

The surface distinction application 130 receives 304 feature points associated with the planar surfaces. The AR engine 120 may identify feature points within the received image to provide to the surface distinction application 130. The received feature points may be characterized by coordinates to indicate the position of distinct, 3D features within the 2D image relative to other objects within the image. For example, a coordinate corresponding to a point on a bench within an image of a park may be represented by Cartesian coordinates. The coordinate may indicate the depth of the point on the bench. The depth may be represented as a distance away from a reference point (e.g., an origin in the 3D virtual coordinate space), which may be a point on an object within the image or the camera 102. An interface of the surface distinction application such as the network interface 210 or the application interface (e.g., for surface distinction applications hosted on the mobile client 100) may receive 304 the feature points.

The surface distinction application 130 identifies 306 clusters based on the feature points, each cluster corresponding to a respective planar surface of the planar surfaces. The cluster module 220 may apply a RANSAC algorithm to the image to identify clusters of feature points that correspond to the respective planar surfaces. For example, the cluster module 220 identifies a first cluster of feature points identified in the image of the park that correspond to where in the image that grass is depicted and a second cluster of feature points that correspond to the bench seat.

The surface distinction application 130 determines 308, based on the identified clusters, locations of the planar surfaces in a 3D virtual coordinate space. The cluster module 220 may generate a 3D virtual coordinate space using the received feature points. Using the generated 3D virtual coordinate space, mesh generation module 230 may determine the coordinates of the identified clusters, where those coordinates indicate the locations of the planar surfaces. The mesh generation module 230 may generate meshes using the feature points in the identified clusters and determine additional coordinates outlining the shape of the surface. For example, the mesh generation module 230 determines the coordinates of the feature points in a cluster representing a bench in an image of a park and generates, using those feature points, a 3D mesh (e.g., using Delaunay triangulation) where the 3D mesh is characterized by more coordinates than the feature points in the cluster. Each coordinate composing the 3D mesh may be a location of a planar surface of the bench.

The surface distinction application 130 provides 310 for display at the mobile client an AR object, where the AR object is provided for display at a location of the determined locations. The rendering module 240 may provide 310 an AR object (e.g., an AR ball) for display at the mobile client 100, where the AR ball is displayed as resting on the surface of a real-world bench captured within the camera view of the mobile client 100. In this example, the mesh generation module 230 may have determined at least one location of a surface of the bench (e.g., using coordinates of the 3D mesh of the bench within the 3D virtual coordinate space). The AR ball may be rendered by the AR engine 120 and positioned, by the rendering module 240, within the 3D virtual coordinate space at a location that is equivalent to a coordinate of the 3D mesh of the bench such that the AR ball appears to rest atop a surface of the bench.

FIGS. 4A and 4B are flowcharts illustrating an example process 400 for providing an AR object for display based on the process 300 of FIG. 3, in accordance with at least one embodiment. The process 400 may be performed by the surface distinction application 130. The surface distinction application 130 may perform operations of the process 400 in parallel or in different orders, or may perform different, additional, or fewer steps. Furthermore, while an image is referenced in the process 400, the surface distinction application may also be applied to videos.

The surface distinction application 130 generates 402 a 3D virtual coordinate space following receiving 304 feature points associated with planar surfaces (e.g., from the AR engine 120). The cluster module 220 may use the received feature points to generate the coordinate space. The feature points may be characterized by a coordinate system when received such that the feature points indicate at least a height and a depth for features in an image. The cluster module 220 may use the coordinate system provided, convert the feature points to another coordinate system, or perform coordinate transformation to rotate axes.

The surface distinction application 130 identifies 404 and 406 a first and second cluster, respectively, from the feature points. The cluster module 220 may perform identification 404 and 406 as part of identifying 306 clusters. Each identified cluster may correspond to a respective planar surface. Continuing the example described with reference to the process 300, the cluster module 220 may identify 404 a first cluster from the feature points corresponding to grass in an image of a park and identify 406 a second cluster from the feature points corresponding to a bench in the image.

By way of example, the surface distinction application 130 generates 408 and 410 a first and a second mesh, respectively. The mesh generation module 230 may generate 408 a first 3D mesh from the first cluster and generate 410 a second 3D mesh from the second cluster. For example, the first mesh represents the grass identified by the first cluster of feature points and the second mesh represents the bench identified by the second cluster of feature points. The first mesh may be associated with a first Z-coordinate in the 3D virtual coordinate space. The first mesh may include a feature point with a Z-coordinate of 0 to indicate a height of the grass in the image relative to other objects (e.g., a mesh with a negative Z-coordinate has a feature that is below the point at the grass with a Z-coordinate of 0). The second mesh may be associated with a second Z-coordinate in the 3D virtual coordinate space. The second mesh may include a feature point with a non-zero Z-coordinate to indicate that a point on the bench has a surface that is higher than the grass at the Z-coordinate of 0. The 3D meshes may also be similarly characterized by X and Y coordinates, which may be used to indicate depths of surfaces relative to other surfaces in the image.

The surface distinction application 130 determines 412 a height difference between a first planar surface and a second planar surface based on the first and second Z-coordinates. The mesh generation module 230 may determine 412 a difference between two Z-coordinates to determine the height difference between two surfaces at the respective points. For example, the mesh generation module 230 determines a height difference between grass and a bench depicted in an image of a park using Z-coordinates of feature points of the respective 3D meshes of the grass and bench.

The surface distinction application 130 provides 414 for display at the mobile client indication of respective locations of the first and second planar surfaces. The indications of the locations of surfaces identified by the surface distinction application 130 may be the 3D meshes generated by the mesh generation module 230. The render module 240 may provide 414 3D meshes of identified planar surfaces for display at the mobile client 100 (e.g., using the screen 103), where the 3D meshes are displayed overlaying the corresponding surfaces. For example, the render module 240 may provide 414 a 3D mesh of the grass overlaying the grass in the image of the park and provide 414 a 3D mesh of the bench overlaying the surface of the bench. The user of the mobile client 100 may see, on the screen 103, the image and the 3D meshes overlaying the image.

The surface distinction application 130 receives 416 a user interaction between the first and second planar surfaces. In one example, the network interface 210 receives 416 a user interaction of a swipe of the user's finger across the screen 103 between a starting coordinate at the 3D mesh corresponding to the grass and an ending coordinate at the 3D mesh corresponding to the bench.

The surface distinction application provides 418 for display at the mobile client an AR object. The AR object may be configured to interact with the first and second planar surfaces based on the user interaction. For example, the rendering module 240 provides 418 for display an AR avatar at the mobile client 100 that is configured to travel between the grass and the bench at the locations where the user's finger swiped across the image. The rendering module 240 may provide 418 the AR object for display as a part of providing 310 for display the AR object at a determined location of a planar surface in the 3D virtual coordinate space.

Example AR Application with Surface Distinction

FIGS. 5A, 5B, 5C, 5D, and 5E. illustrate an example process for controlling an AR object based on user interactions, at a mobile client, with planar surfaces detected by an AR system, in accordance with at least one embodiment. Various environments, both real and virtual, are depicted in FIGS. 5A-5E. As referred to herein, an “environment” may refer to a real-world environment and a “virtual environment” may refer to an environment that has been captured by a computer (e.g., via imaging) for processing and may not necessarily be presented to the user of the mobile client 100. Virtual environments are presented herein to promote clarity with describing the process for identifying surfaces.

FIG. 5A shows an environment 500a of a living room where example objects and respective first, second, and third surfaces 510510, 511, and 512 exist. In this example, the first surface 510510 is a floor of the living room, the second surface 511 is a surface of a table on the surface 510510, and the third surface 512 is a surface of a book on the surface 511. Although not shown in FIG. 5A, a mobile client 100 may capture an image or video of the environment 500a while using the AR client 101. A captured image may be transmitted to the surface distinction application 130 for processing, as shown in FIGS. 5B and 5C. The network interface 210 may receive this image.

FIG. 5B shows a virtual environment 500b where feature points 520 have been identified within an image of the environment 500a that includes depictions of surfaces 510, 511, and 512. The feature points 520 depicted are examples of feature points and may vary in size and/or density or may be greater or fewer points than what is depicted in FIG. 5B. The feature points 520 may indicate corners and edges of surfaces depicted within the image, which may correspond to boundary points of surfaces. For example, the surface 511 corresponding to the top of a table is represented by feature points. The cluster module 220 may apply a RANSAC algorithm to identify a 2D surface, from the feature points, that corresponds to the top of the table. The cluster module 220 may generate a 3D virtual coordinate space using the feature points 520.

FIG. 5C shows a virtual environment 500c where first, second and third 3D meshes 530, 531, and 532 have been identified within the environment 500b for respective first, second and third surfaces 510, 511, and 512. The mesh generation module 230 may use clusters of feature points 520 shown in environment 500b to generate the 3D meshes for the surfaces depicted within the image of environment 500a. The rendering module 240 may provide the generated meshes for display at the mobile client 100 to indicate to a user surfaces identifies by the surface distinction application 130.

FIG. 5D shows an environment 500d where a mobile client 100 is executing the AR client 101 while capturing an image of the living room of environment 500a. The mobile client 100 displays depictions of surfaces 510, 511, and 512, an AR avatar 540, and one or more AR stepping stones 541. The rendering module 240 may provide the AR avatar 540 and the AR stepping stones 541 for display on the mobile client 100. The user of the mobile client 100 may swipe his finger across the display, where the swipe may be characterized by a starting coordinate on the 3D virtual coordinate space corresponding to the surface 510 of the living room floor and an ending coordinate corresponding to the surface 511 of the top of the table. This user interaction may be received by the network interface 210 as a request for the rendering module 240 to render the AR stepping stones 541 along the path of the swipe. The rendering module 240 may determine a path within the 3D coordinate space that maps to a line between the starting and ending coordinates and generate the AR stepping stones 541 along the determined line.

FIG. 5E shows an environment 500e where the mobile client 100 displays the AR avatar 540 at a location of a identified surface 511, the top of the table. Although not shown, the user may have performed an additional user interaction with the display of the mobile client 100 to request that the AR avatar 540 move up the AR stepping stones 541. The network interface 210 may receive the user interaction and the rendering module 240 may provide the AR avatar 540 in a state depicted as traveling from the floor to the table. The user may continue to perform user interactions such as tapping the display of the mobile client 100 where the surface 512 of the book is depicted to request that the AR avatar 540 interact with the surface 512 (e.g., sit on the book). The rendering module 240 may decrease the size of the AR objects depending on their location within the 3D virtual coordinate space to promote a natural appearance when displayed on the mobile client 100. For example, as the AR avatar 540 moves from the floor to the table, the rendering module 240 may decrease the size ratio of the AR avatar 540 to indicate that the AR object is moving away from the user and towards the table.

Computing Machine Architecture

FIG. 6 is a block diagram illustrating components of an example machine able to read instructions from a machine-readable medium and execute them in a processor (or controller). Specifically, FIG. 6 shows a diagrammatic representation of a machine in the example form of a computer system 600 within which program code (e.g., software) for causing the machine to perform any one or more of the methodologies discussed herein may be executed. The program code may correspond to functional configuration of the modules and/or processes described with FIGS. 1-5E. The program code may be comprised of instructions 624 executable by one or more processors 602. In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server machine or a client machine in a server-client network environment, or as a peer machine in a peer-to-peer (or distributed) network environment.

The machine may be a portable computing device or machine (e.g., smartphone, tablet, wearable device (e.g., smartwatch)) capable of executing instructions 624 (sequential or otherwise) that specify actions to be taken by that machine. Further, while only a single machine is illustrated, the term “machine” shall also be taken to include any collection of machines that individually or jointly execute instructions 624 to perform any one or more of the methodologies discussed herein.

The example computer system 600 includes at least one processor 602 (e.g., a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP), one or more application specific integrated circuits (ASICs), one or more radio-frequency integrated circuits (RFICs), or any combination of these), a main memory 604, and a static memory 606, which are configured to communicate with each other via a bus 608. The computer system 600 may further include visual display interface 610. The visual interface may include a software driver that enables displaying user interfaces on a screen (or display). The visual interface may display user interfaces directly (e.g., on the screen) or indirectly on a surface, window, or the like (e.g., via a visual projection unit). For ease of discussion the visual interface may be described as a screen. The visual interface 610 may include or may interface with a touch enabled screen. The computer system 600 may also include alphanumeric input device 612 (e.g., a keyboard or touch screen keyboard), a cursor control device 614 (e.g., a mouse, a trackball, a joystick, a motion sensor, or other pointing instrument), a storage unit 616, a signal generation device 618 (e.g., a speaker), and a network interface device 620, which also are configured to communicate via the bus 608.

The storage unit 616 includes a machine-readable medium 622 on which is stored instructions 624 (e.g., software) embodying any one or more of the methodologies or functions described herein. The instructions 624 (e.g., software) may also reside, completely or at least partially, within the main memory 604 or within the processor 602 (e.g., within a processor's cache memory) during execution thereof by the computer system 600, the main memory 604 and the processor 602 also constituting machine-readable media. The instructions 624 (e.g., software) may be transmitted or received over a network 626 via the network interface device 620.

While machine-readable medium 622 is shown in an example embodiment to be a single medium, the term “machine-readable medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, or associated caches and servers) able to store instructions (e.g., instructions 624). The term “machine-readable medium” shall also be taken to include any medium that is capable of storing instructions (e.g., instructions 624) for execution by the machine and that cause the machine to perform any one or more of the methodologies disclosed herein. The term “machine-readable medium” includes, but not be limited to, data repositories in the form of solid-state memories, optical media, and magnetic media.

Additional Configuration Considerations

While using an AR application (e.g., an AR game), a user may want to have an AR object interact with surfaces in the real-world environment. Conventional implementations for mobile clients do not allow for surface distinction. Since conventional systems are unable to spatially differentiate surfaces, they limit AR objects to interact with a single surface and may cause AR objects to be displayed disproportionate to a particular surface without knowledge of the depth or height of one surface relative to another. The methods described herein detect surfaces within an image and determines spatial relationships between the surfaces through clustering and mesh generation. Furthermore, the methods described herein may selectively store feature point data corresponding to the most recent camera view of the phone. For example, a user may direct their mobile phone's camera at a table, but then turn around to capture a chair. The feature points corresponding to the table may be discarded. Thus, by storing information from a recent camera view, the methods described herein promote efficient memory use and as a consequence, promote efficient power and processing use as well by not expending those resources on the data that was discarded. Additionally, the methods herein include a clustering algorithm that scans identified feature points, groups the ones closest together, and generates a mesh. By using generated mesh instead of individual feature points to detect surfaces, processing resources of a device are optimized (e.g., processing one mesh vs. millions of feature points). Accordingly, the methods described herein enable surface distinction on mobile client rendered AR systems without consuming excessive amounts of processing power, thus presenting an immersive gaming experience to the user.

Throughout this specification, plural instances may implement components, operations, or structures described as a single instance. Although individual operations of one or more methods are illustrated and described as separate operations, one or more of the individual operations may be performed concurrently, and nothing requires that the operations be performed in the order illustrated. Structures and functionality presented as separate components in example configurations may be implemented as a combined structure or component. Similarly, structures and functionality presented as a single component may be implemented as separate components. These and other variations, modifications, additions, and improvements fall within the scope of the subject matter herein.

Certain embodiments are described herein as including logic or a number of components, modules, or mechanisms. Modules may constitute either software modules (e.g., code embodied on a machine-readable medium or in a transmission signal) or hardware modules. A hardware module is tangible unit capable of performing certain operations and may be configured or arranged in a certain manner. In example embodiments, one or more computer systems (e.g., a standalone, client or server computer system) or one or more hardware modules of a computer system (e.g., a processor or a group of processors) may be configured by software (e.g., an application or application portion) as a hardware module that operates to perform certain operations as described herein.

In various embodiments, a hardware module may be implemented mechanically or electronically. For example, a hardware module may comprise dedicated circuitry or logic that is permanently configured (e.g., as a special-purpose processor, such as a field programmable gate array (FPGA) or an application-specific integrated circuit (ASIC)) to perform certain operations. A hardware module may also comprise programmable logic or circuitry (e.g., as encompassed within a general-purpose processor or other programmable processor) that is temporarily configured by software to perform certain operations. It will be appreciated that the decision to implement a hardware module mechanically, in dedicated and permanently configured circuitry, or in temporarily configured circuitry (e.g., configured by software) may be driven by cost and time considerations.

Accordingly, the term “hardware module” should be understood to encompass a tangible entity, be that an entity that is physically constructed, permanently configured (e.g., hardwired), or temporarily configured (e.g., programmed) to operate in a certain manner or to perform certain operations described herein. As used herein, “hardware-implemented module” refers to a hardware module. Considering embodiments in which hardware modules are temporarily configured (e.g., programmed), each of the hardware modules need not be configured or instantiated at any one instance in time. For example, where the hardware modules comprise a general-purpose processor configured using software, the general-purpose processor may be configured as respective different hardware modules at different times. Software may accordingly configure a processor, for example, to constitute a particular hardware module at one instance of time and to constitute a different hardware module at a different instance of time.

Hardware modules can provide information to, and receive information from, other hardware modules. Accordingly, the described hardware modules may be regarded as being communicatively coupled. Where multiple of such hardware modules exist contemporaneously, communications may be achieved through signal transmission (e.g., over appropriate circuits and buses) that connect the hardware modules. In embodiments in which multiple hardware modules are configured or instantiated at different times, communications between such hardware modules may be achieved, for example, through the storage and retrieval of information in memory structures to which the multiple hardware modules have access. For example, one hardware module may perform an operation and store the output of that operation in a memory device to which it is communicatively coupled. A further hardware module may then, at a later time, access the memory device to retrieve and process the stored output. Hardware modules may also initiate communications with input or output devices, and can operate on a resource (e.g., a collection of information).

The various operations of example methods described herein may be performed, at least partially, by one or more processors that are temporarily configured (e.g., by software) or permanently configured to perform the relevant operations. Whether temporarily or permanently configured, such processors may constitute processor-implemented modules that operate to perform one or more operations or functions. The modules referred to herein may, in some example embodiments, comprise processor-implemented modules.

Similarly, the methods described herein may be at least partially processor-implemented. For example, at least some of the operations of a method may be performed by one or processors or processor-implemented hardware modules. The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the processor or processors may be located in a single location (e.g., within a home environment, an office environment or as a server farm), while in other embodiments the processors may be distributed across a number of locations.

The one or more processors may also operate to support performance of the relevant operations in a “cloud computing” environment or as a “software as a service” (SaaS). For example, at least some of the operations may be performed by a group of computers (as examples of machines including processors), these operations being accessible via a network (e.g., the Internet) and via one or more appropriate interfaces (e.g., application program interfaces (APIs).)

The performance of certain of the operations may be distributed among the one or more processors, not only residing within a single machine, but deployed across a number of machines. In some example embodiments, the one or more processors or processor-implemented modules may be located in a single geographic location (e.g., within a home environment, an office environment, or a server farm). In other example embodiments, the one or more processors or processor-implemented modules may be distributed across a number of geographic locations.

Some portions of this specification are presented in terms of algorithms or symbolic representations of operations on data stored as bits or binary digital signals within a machine memory (e.g., a computer memory). These algorithms or symbolic representations are examples of techniques used by those of ordinary skill in the data processing arts to convey the substance of their work to others skilled in the art. As used herein, an “algorithm” is a self-consistent sequence of operations or similar processing leading to a desired result. In this context, algorithms and operations involve physical manipulation of physical quantities. Typically, but not necessarily, such quantities may take the form of electrical, magnetic, or optical signals capable of being stored, accessed, transferred, combined, compared, or otherwise manipulated by a machine. It is convenient at times, principally for reasons of common usage, to refer to such signals using words such as “data,” “content,” “bits,” “values,” “elements,” “symbols,” “characters,” “terms,” “numbers,” “numerals,” or the like. These words, however, are merely convenient labels and are to be associated with appropriate physical quantities.

Unless specifically stated otherwise, discussions herein using words such as “processing,” “computing,” “calculating,” “determining,” “presenting,” “displaying,” or the like may refer to actions or processes of a machine (e.g., a computer) that manipulates or transforms data represented as physical (e.g., electronic, magnetic, or optical) quantities within one or more memories (e.g., volatile memory, non-volatile memory, or a combination thereof), registers, or other machine components that receive, store, transmit, or display information.

As used herein any reference to “one embodiment” or “an embodiment” means that a particular element, feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.

Some embodiments may be described using the expression “coupled” and “connected” along with their derivatives. It should be understood that these terms are not intended as synonyms for each other. For example, some embodiments may be described using the term “connected” to indicate that two or more elements are in direct physical or electrical contact with each other. In another example, some embodiments may be described using the term “coupled” to indicate that two or more elements are in direct physical or electrical contact. The term “coupled,” however, may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other. The embodiments are not limited in this context.

As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).

In addition, use of the “a” or “an” are employed to describe elements and components of the embodiments herein. This is done merely for convenience and to give a general sense of the invention. This description should be read to include one or at least one and the singular also includes the plural unless it is obvious that it is meant otherwise.

Upon reading this disclosure, those of skill in the art will appreciate still additional alternative structural and functional designs for a system and a process for surface distinction in an augmented reality system executed on a mobile client through the disclosed principles herein. Thus, while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims.

Claims

1. A non-transitory computer readable storage medium comprising stored instructions, the instructions when executed by a processor cause the processor to:

receive an image from a camera of a mobile client, the image depicting a plurality of planar surfaces;

receive a plurality of feature points associated with the plurality of planar surfaces, the plurality of feature points identified within the image;

generate, based on the plurality of feature points, a three-dimensional (3D) virtual coordinate space, each feature point of the plurality of feature points having corresponding coordinates within the 3D virtual coordinate space;

identify a first cluster from the plurality of feature points, the first cluster corresponding to a first planar surface of the plurality of planar surfaces;

generate a first mesh from the first cluster, the first mesh associated with a first Z-coordinate in the 3D virtual coordinate space;

identify a second cluster from the plurality of feature points, the second cluster corresponding to a second planar surface of the plurality of planar surfaces;

generate a second mesh from the second cluster, the second mesh associated with a second Z-coordinate in the 3D virtual coordinate space;

determine, based on the first Z-coordinate and second Z-coordinate, a height difference between the first planar surface and the second planar surface;

receive-a user interaction between the first and second planar surfaces; and

provide for display at the mobile client an augmented reality (AR) object from an AR engine, the AR object configured to interact with the first and second planar surfaces based on the user interaction.

2. The non-transitory computer readable storage medium of claim 1, wherein instructions to identify the first cluster from the plurality of feature points comprise further instructions that when executed by the processor cause the processor to apply a random sample consensus (RANSAC) algorithm to the received image.

3. The non-transitory computer readable storage medium of claim 1, wherein the first mesh and second mesh are associated with approximations of respective shapes of real-world objects.

4. The non-transitory computer readable storage medium of claim 1, wherein the instructions further comprise instructions that when executed by the processor cause the processor to provide for display the first mesh and the second mesh on the mobile client.

5. The non-transitory computer readable storage medium of claim 1, wherein the user interaction comprises a user input indicating a starting coordinate corresponding to the first planar surface and an ending coordinate corresponding to the second planar surface.

6. The non-transitory computer readable storage medium of claim 5, wherein the instructions further comprise instructions that when executed by the processor cause the processor to provide for display the AR engine rendered object appearing to move from the starting coordinate at the first planar surface to the ending coordinate at the second planar surface.

7. The non-transitory computer readable storage medium of claim 5, wherein instructions to provide for display at the mobile client the AR object from the AR engine comprise further instructions that when executed by the processor cause the processor to:

determine, based the starting coordinate, a first ratio of a plurality of dimensions of the AR object;

determine, based on the ending coordinate, a second ratio of the plurality of dimensions of the AR object;

provide for display the AR object using the first ratio at the starting coordinate at a first time; and

provide for display the AR object using the second ratio at the ending coordinate at a second time subsequent to the first time.

8. The non-transitory computer readable storage medium of claim 5, wherein the instructions further comprise instructions that when executed by the processor cause the processor to:

determine a path in the 3D virtual coordinate space comprising the starting coordinate and the ending coordinate; and

provide for display a plurality of AR engine rendered objects on the mobile client along the determined path.

9. The non-transitory computer readable storage medium of claim 1, wherein instructions to generate the first mesh from the first cluster comprise further instructions that when executed by the processor cause the processor to:

apply a machine learning model to the image, the machine learning model configured to classify real-world objects in the environment;

determine, based on a classification of an object in the environment by the machine learning model, an object type, the object including the first planar surface; and

generating, based on the determined object type and the first cluster, the first mesh.

10. The non-transitory computer readable storage medium of claim 1, wherein the AR engine is a game engine.

11. A method comprising:

receiving an image from a camera of a mobile client, the image depicting a plurality of planar surfaces;

receiving a plurality of feature points associated with the plurality of planar surfaces, the plurality of feature points identified within the image;

generating, based on the plurality of feature points, a three-dimensional (3D) virtual coordinate space, each feature point of the plurality of feature points having corresponding coordinates within the 3D virtual coordinate space;

identifying a first cluster from the plurality of feature points, the first cluster corresponding to a first planar surface of the plurality of planar surfaces;

generating a first mesh from the first cluster, the first mesh associated with a first Z-coordinate in the 3D virtual coordinate space;

identifying a second cluster from the plurality of feature points, the second cluster corresponding to a second planar surface of the plurality of planar surfaces;

generating a second mesh from the second cluster, the second mesh associated with a second Z-coordinate in the 3D virtual coordinate space;

determining, based on the first Z-coordinate and second Z-coordinate, a height difference between the first planar surface and the second planar surface;

receiving a user interaction between the first and second planar surfaces; and

providing for display at the mobile client an augmented reality (AR) object from an AR engine, the AR object configured to interact with the first and second planar surfaces based on the user interaction.

12. The method of claim 11, wherein the user interaction comprises a user input indicating a starting coordinate corresponding to the first planar surface and an ending coordinate corresponding to the second planar surface.

13. The method of claim 12, further comprising providing for display the AR object appearing to move from the starting coordinate at the first planar surface to the ending coordinate at the second planar surface.

14. The method of claim 12, wherein providing for display at the mobile client the AR object from the AR engine comprises:

determining, based the starting coordinate, a first ratio of a plurality of dimensions of the AR object;

determining, based on the ending coordinate, a second ratio of the plurality of dimensions of the AR object;

providing for display the AR object using the first ratio at the starting coordinate at a first time; and

providing for display the AR object using the second ratio at the ending coordinate at a second time subsequent to the first time.

15. The method of claim 12, further comprising:

determining a path in the 3D virtual coordinate space comprising the starting coordinate and the ending coordinate; and

providing for display a plurality of AR objects on the mobile client along the determined path.

16. A system comprising:

an application interface configured to: receive an image from a camera of a mobile client, the image depicting a plurality of planar surfaces; and receive a plurality of feature points associated with the plurality of planar surfaces, the plurality of feature points identified within the image;

a cluster module configured to: generate, based on the plurality of feature points, a three-dimensional (3D) virtual coordinate space, each feature point of the plurality of feature points having corresponding coordinates within the 3D virtual coordinate space; identify a first cluster from the plurality of feature points, the first cluster corresponding to a first planar surface of the plurality of planar surfaces; and identify a second cluster from the plurality of feature points, the second cluster corresponding to a second planar surface of the plurality of planar surfaces;

a mesh generation module configured to: generate a first mesh from the first cluster, the first mesh associated with a first Z-coordinate in the 3D virtual coordinate space; and generate a second mesh from the second cluster, the second mesh associated with a second Z-coordinate in the 3D virtual coordinate space;

a rendering module configured to determine, based on the first Z-coordinate and second Z-coordinate, a height difference between the first planar surface and the second planar surface;

the application interface further configured to receive-a user interaction between the first and second planar surfaces; and

the rendering module further configured to provide for display at the mobile client an augmented reality (AR) object from an AR engine, the AR object configured to interact with the first and second planar surfaces based on the user interaction.

17. The system of claim 16, wherein the user interaction comprises a user input indicating a starting coordinate corresponding to the first planar surface and an ending coordinate corresponding to the second planar surface.

18. The system of claim 17, wherein the rendering module is further configured to provide for display the AR object appearing to move from the starting coordinate at the first planar surface to the ending coordinate at the second planar surface.

19. The system of claim 17, wherein the rendering module is configured to provide for display at the mobile client the AR object from the AR engine by:

determining, based the starting coordinate, a first ratio of a plurality of dimensions of the AR object;

determining, based on the ending coordinate, a second ratio of the plurality of dimensions of the AR object;

providing for display the AR object using the first ratio at the starting coordinate at a first time; and

providing for display the AR object using the second ratio at the ending coordinate at a second time subsequent to the first time.

20. A method comprising:

receiving an image from a camera of a mobile client, the image depicting a plurality of planar surfaces;

receiving a plurality of feature points associated with the plurality of planar surfaces, the plurality of feature points identified within the image;

identifying a plurality of clusters based on the plurality of feature points, each cluster corresponding to a respective planar surface of the plurality of planar surfaces;

determining, based on the identified plurality of clusters, a plurality of locations of the plurality of planar surfaces in a three-dimensional (3D) virtual coordinate space; and

providing for display at the mobile client an augmented reality (AR) object, the AR object at a location of the plurality of determined locations.

21. The method of claim 20, further comprising:

determining a path in the 3D virtual coordinate space comprising a starting coordinate and an ending coordinate, the starting coordinate corresponding to the location of the plurality of determined locations; and

providing for display the AR object appearing to move along the determined path.