LOCALIZATION AND MAPPING UTILIZING VISUAL ODOMETRY
In one embodiment, a method includes determining correspondence data between a sequence of images based on identified features in the sequence of images and predicted pose based on motion data, and determining current state information based on the correspondence data and the motion data. The current state information comprises at least a current pose of the wearable device relative to the environment capture by the one or more cameras. Furthermore, the method comprises receiving map points in a three-dimensional map and their associated descriptors based on the identified features in the sequence of images and identifying one or more of the map points in the sequence of images based on the associated descriptors associated with the map points. The current state information is further determined based on the identified one or more of the map points.
This disclosure generally relates to simultaneous localization and mapping (SLAM), and more specifically methods, apparatus, and system for SLAM using visual inertial odometry.
BACKGROUNDMobile devices like AR/VR headsets face several practical design constraints, such as the need to minimize power consumption, in-device memory requirements, and weight. An important feature of AR/VR devices is to be able to solve the simultaneous localization and mapping problem, which is needed to enable, for example, world-locked rendering. For example, displaying a virtual pet at the same spot on a real-world table regardless of where viewer moves. However, to achieve the above feature, simultaneous localization and mapping requires either a large memory to store a map or continuously retrieving a live map online. Since accessing and storing map data is expensive, bulky, and power-consuming, it is desirable for the AR/VR devices to be able to solve for its own localization locally and globally with an optimized power performance and mobility.
SUMMARY OF PARTICULAR EMBODIMENTSTo address the foregoing problems, disclosed are methods, apparatuses, and a system, to perform simultaneous localization and mapping (SLAM) using visual inertial odometry (VIO). The present disclosure provides a self-sufficient VIO-based SLAM tracking system which comprises a tracking engine and a mapping engine to resolve the above issues. The tracking engine comprises a tracking unit, a filter unit, and an inertial measurement unit (IMU) integration unit to determine a location and a state of a user. The tracking unit is configured to find correspondences between observed objects in sequential frames (e.g., by matching the descriptors of each patch). To help with the search for correspondences, the tracking unit may leverage predicted poses generated from the IMU integration unit, so that the tracking process may also be used as a guided search. The filter unit receives the correspondences processed by the tracking unit, along with relevant IMU data, and generates a state information for a wearable device. Furthermore, the mapping engine may perform global mapping operations with the tracking engine at a much lower frequency than the tracking engine itself to be cost-efficient and power saving.
The embodiments disclosed herein are only examples, and the scope of this disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed herein. According to one embodiment of a method, the method comprises, by a computing system, receiving, at an IMU integration unit, motion data captured by one or more motion sensors of a wearable device. The method further comprises generating, at the IMU integration unit, a predicted pose of the wearable device based on the motion data of the wearable device. The method yet further comprises receiving, at a tracking unit, a sequence of images of an environment captured by one or more cameras. The method additionally comprises identifying, at the tracking unit, features in the sequence of images. The method additionally comprises determining, at the tracking unit, correspondence data between the sequence of images based on the identified features in the sequence of images and the predicted pose received from the IMU integration unit. The method additionally comprises determining, at a filter unit, current state information of the wearable device based on the correspondence data received from the tracking unit and the motion data received from the IMU integration unit. The current state information comprises at least a current pose of the wearable device relative to the environment capture by the one or more cameras. Furthermore, the method comprises receiving, at the tracking unit, map points in a three-dimensional map and associated descriptors for the map points based on the features in the sequence of images. The method additionally comprises identifying, at the tracking unit, one or more of the map points in the sequence of images based on one or more of the associated descriptors associated with the one or more of the map points. The current state information is further determined based on the identified one or more of the map points.
Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However, any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof are disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
Certain aspects of the present disclosure and their embodiments may provide solutions to these or other challenges. There are, proposed herein, various embodiments which address one or more of the issues disclosed herein. The methods disclosed in the present disclosure may provide a self-efficient, VIO-based tracking engine to localize the device in an environment and provide current state information of the user, in order to realize simultaneous localization and mapping locally. Furthermore, the methods disclosed in the present disclosure also provide a mapping engine to assist the tracking engine with global mapping, so that the methods disclosed in the present disclosure may generate permanent virtual tags in the global map by integrating the built state information for other users. In addition, the mapping engine performs the retrieval of map at a much lower frequency than the tracking engine to save power and cost.
Particular embodiments of the present disclosure may include or be implemented in conjunction with an artificial reality system. Artificial reality is a form of reality that has been adjusted in some manner before presentation to a user, which may include, e.g., a virtual reality (VR), an augmented reality (AR), a mixed reality (MR), a hybrid reality, or some combination and/or derivatives thereof. Artificial reality content may include completely generated content or generated content combined with captured content (e.g., real-world photographs). The artificial reality content may include video, audio, haptic feedback, or some combination thereof, and any of which may be presented in a single channel or in multiple channels (such as stereo video that produces a three-dimensional effect to the viewer). Additionally, in some embodiments, artificial reality may be associated with applications, products, accessories, services, or some combination thereof, that are, e.g., used to create content in an artificial reality and/or used in (e.g., perform activities in) an artificial reality. The artificial reality system that provides the artificial reality content may be implemented on various platforms, including a head-mounted display (HMD) connected to a host computer system, a standalone HMD, a mobile device or computing system, or any other hardware platform capable of providing artificial reality content to one or more viewers.
The embodiments disclosed herein are only examples, and the scope of this disclosure is not limited to them. Particular embodiments may include all, some, or none of the components, elements, features, functions, operations, or steps of the embodiments disclosed above. Embodiments according to the invention are in particular disclosed in the attached claims directed to a method, a storage medium, a system and a computer program product, wherein any feature mentioned in one claim category, e.g. method, can be claimed in another claim category, e.g. system, as well. The dependencies or references back in the attached claims are chosen for formal reasons only. However, any subject matter resulting from a deliberate reference back to any previous claims (in particular multiple dependencies) can be claimed as well, so that any combination of claims and the features thereof are disclosed and can be claimed regardless of the dependencies chosen in the attached claims. The subject-matter which can be claimed comprises not only the combinations of features as set out in the attached claims but also any other combination of features in the claims, wherein each feature mentioned in the claims can be combined with any other feature or combination of other features in the claims. Furthermore, any of the embodiments and features described or depicted herein can be claimed in a separate claim and/or in any combination with any embodiment or feature described or depicted herein or with any of the features of the attached claims.
The patent or application file contains drawings executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
The accompanying drawing figures incorporated in and forming a part of this specification illustrate several aspects of the disclosure, and together with the description serve to explain the principles of the disclosure.
Currently, AR/VR devices face multiple challenges, such as rendering a permanent virtual tag or object in a real-world map in a precise and cost-efficient way and manufacturing a light-weighted wearable device. Retrieving an online map continuously to perform simultaneous localization and mapping is expensive and power-consuming. An existing solution to avoid retrieving the online map constantly is equipped a memory for storing maps in the AR/VR devices, however, the trade-off of the solution is the mobility of the AR/VR device because of the increased weight and volume. Particular embodiments disclosed in the present disclosure provide a self-efficient VIO-based SLAM tracking system, which comprises a tracking engine and a mapping engine performed at different frequencies to provide a continuous tracking a pose of the user in an environment and a localization of the user in a live map.
Particular embodiments disclosed in the present disclosure provide a tracking engine in the tracking system comprising a tracking unit, an IMU integration unit, and a filter unit to generate a state of the user in an environment at high frequency. The filter unit in the present disclosure estimates the state of the user in the environment based on the correspondence data identified in a series of images sent from the tracking unit and aggregated IMU measurements sent from the IMU unit. Furthermore, the IMU integration unit further provides predicted poses to the tracking unit to facilitate the identification of the correspondence data. The filter unit also sends a most-updated state to the IMU integration unit to refine IMU measurements. Therefore, the tracking engine disclosed in the present disclosure is able to perform a precise, self-efficient tracking and localization for the user or a device.
Particular embodiments disclosed in the present disclosure further provide a mapping engine in the tracking the tracking system comprising a mapping unit. The mapping unit in the present disclosure retrieves a corresponding global map based on key points in the images sent from the tracking unit and the state of the user sent from the filter unit. The mapping unit may retrieve the corresponding map from an on-device storage or from a cloud periodically or based on demands, so that the tracking system may perform a global localization for the user in a cost-efficient way. In addition, the mapping unit disclosed in the present disclosure further builds or updates live maps or local maps based on the received key points in the images if needed. Furthermore, the mapping unit may send the mapped points, which are corresponding to the key points and the descriptors in the images, in the maps to an anchor interface for sharing with other users utilizing the same global map as a persistent anchor.
In
In
In
In particular embodiments, the tracking system 100 may be implemented in any suitable computing device, such as, for example, a personal computer, a laptop computer, a cellular telephone, a smartphone, a tablet computer, an augmented/virtual reality device, a head-mounted device, a portable smart device, a wearable smart device, or any suitable device which is compatible with the tracking system. In the present disclosure, a user which is being tracked and localized by the tracking device may be referred to a device mounted on a movable object, such as a vehicle, or a device attached to a person. In the present disclosure, a user may be an individual (human user), an entity (e.g., an enterprise, business, or third-party application), or a group (e.g., of individuals or entities) that interacts or communicates with the tracking system 100. In particular embodiments, the IMU integration unit 118, the tracking unit 114, and the filter unit 116 in the tracking engine 110 are located within a head-mounted device, and the mapping unit 132 in the mapping engine 130 is implemented in a local computing device separated from the head-mounted device. In particular embodiments, the IMU integration unit 118 is located within a head-mounted device, and the tracking unit 114, the filter unit 116, and the mapping unit 132 are implemented in a local computing device separated from the head-mounted device. The local computing device comprises one or more processors configured to implement the tracking unit 114, the filter unit 116, and the mapping unit 132. In one embodiment, each of the processors is configured to implement the tracking unit 114, the filter unit 116, and the mapping unit 132 separately.
This disclosure contemplates any suitable network to connect each element in the tracking system 100 or to connect the tracking system 100 with other systems. As an example and not by way of limitation, one or more portions of network may include an ad hoc network, an intranet, an extranet, a virtual private network (VPN), a local area network (LAN), a wireless LAN (WLAN), a wide area network (WAN), a wireless WAN (WWAN), a metropolitan area network (MAN), a portion of the Internet, a portion of the Public Switched Telephone Network (PSTN), a cellular telephone network, or a combination of two or more of these. Network may include one or more networks.
The IMU integration unit 300 integrates rotational velocity measurements to track an orientation of the user, integrates acceleration measurements to track velocity of the user, and furthermore, double-integrates rotational velocity and acceleration to track a position of the user. In particular embodiments, the IMU integration unit 300 determines predicted poses 310 of the user based on rotational velocity and specific forces detected from the user, e.g. body acceleration plus gravity in body frame, included in the raw IMU data 304. The IMU integration unit 300 sends the predicted poses 310 to a tracking unit 306 for assisting with feature search. The IMU integration unit 300 further aggregates one or more IMU measurements to provide pre-integration data 314 to a filter unit 312 for estimating a state 316 of the user. In particular embodiments, the IMU integration unit 300 may also receive the state 316 of the user from the filter unit 312 to calibrate its IMU measurements. Furthermore, the IMU integration unit 300 may send low-latency poses 320 to one or more warp engines 318 for late-stage warp. In particular embodiments, the low-latency pose 320 may be specific to a pose in a relatively short time period, for example, less than 0.5 second.
The filter unit 900 operates tightly with the tracking unit 902 and the IMU integration unit 906 and estimates the state of the user coupled with the pre-integration data 908 and vision measurements. Therefore, the filter unit 900 may improve robustness and accuracy and provide reliable uncertainty estimates. The state of the user may contain a configurable selection of: current pose and velocity, sliding window of past poses, positions of a subset of currently visible features, pose of map anchors for hosting visible map-features or for being used in rendering, and calibration parameters. In particular embodiments, the pose of map anchors may be used for rendering a virtual tag/object. Detailed demonstrations regarding map anchors may be further described in
The mapping unit 1000 may retrieve map data at two levels. For a first level of map-data retrieval, the first level of the map-data retrieval is performed between the mapping unit 1000 and the cloud server 1014. The cloud server 1014 stores a global map and the mapping unit 1000 stores a smaller, local map, e.g. stored in the on-device storage 1010. The mapping unit 1000 may download local map data based on the images and/or matched descriptors 1004 sent by the tracking unit 1002. In particular embodiments, the mapping unit 1000 or a client device implanted with the tracking system may download map data based on GPS data. For a second level of map-data retrieval, the second level of the map-data retrieval is performed between the mapping unit 1000 and the tracking unit 1002. The mapping unit 1000 sends 3D map-points and descriptors 1018 to the tracking unit 1002, and the tracking unit 1002 determines the user's location relative to the descriptors 1018. In particular embodiments, when the mapping unit 1000 receives the descriptors 1004 from the tracking unit 1002, the mapping unit 1000 may perform a matching for the descriptors 1004 and send the matched 3D map-points 1018 back to the tracking unit 1002. In particular embodiments, the mapping unit 1000 may send 3D-map points 1018 based on the state 1008 of the user sent by the filter unit 1006. For example, if the user is looking at a particular region in a map, corresponding 3D map-points 1018 in the map will be sent to the tracking unit 1002.
At step 1520, the method 1500 may determine correspondence data between key points in the images and the predicted poses. In particular embodiments, the method 1500 may execute the tracking system to determine the correspondence data between key points in the images and the predicted poses. In particular embodiments, the tracking unit of the tracking engine in the tracking system may determine the correspondence data based on corresponding features in a sequence of images. In particular embodiments, the tracking system may identify a first feature in a first image of the sequence of images and search, in a second image of the sequence of images, for a second feature that corresponds to the first feature in the first image. In particular embodiments, searching features in the images may be performed along an epipolar line segment determined using the predicted pose.
At step 1530, the method 1500 may send the correspondence data to a filter unit of the tracking engine in the tracking system. In particular embodiments, the method 1500 may execute the tracking system to send the correspondence data from the tracking unit to the filter unit of the tracking engine, or any suitable processor which may integrate data from camera(s) and IMU(s).
At step 1540, the method 1500 may receive the correspondence data at the filter unit of the tracking engine in the tracking system. In particular embodiments, the filter unit may receive the correspondence data from the tracking unit and also receive pre-integration data from the IMU integration unit. In particular embodiments, the pre-integration data sent from the IMU integration unit may comprise aggregated IMU measurements, such as adjusted IMU measurements by calculating position error based on an initial state estimation.
At step 1550, the method 1500 may determine a state of the user in an environment including a user's pose at the filter unit of the tracking engine in the tracking system. In particular embodiments, the filter unit may determine current state information based on the correspondence data and the motion data. The current state information may comprise a current pose of the user relative to the environment captured by the camera(s). In particular embodiments, the filter unit may further send the current state information to the IMU integration unit to be used to generate a next predicted pose of the user. The current state information may comprise a current pose and velocity of the user and IMU calibration data. In particular embodiments, the filter unit may determine the current state information using an optimization algorithm. In particular embodiments, the IMU integration unit may operate at a higher frequency than the tracking unit and the filter unit. For example, the IMU integration unit may operate at 200-1000 Hz, and the tracking unit and the filter unit may operate at 5-10 Hz.
Particular embodiments may repeat one or more steps of the method of
At step 1620, the method 1600 may receive the key points, features, descriptors, and state information at the mapping unit of the mapping engine in the tracking system.
At step 1630, the method 1600 may retrieve a corresponding map from a storage or a cloud based on the key points, features, descriptors, and state information. In particular embodiments, the corresponding map may be retrieved from an on-device storage. In particular embodiments, the corresponding map may be retrieved from live maps in the cloud or a local server.
At step 1640, the method 1600 may send the corresponding map to the tracking unit. In particular embodiments, the corresponding map may comprise 3D map points which are identified based on the key points, features, descriptors, and state information.
At step 1650, the method 1600 may receive the corresponding map at the tracking unit from the mapping unit. In particular embodiments, the mapping unit may be configured to operate on demand or at a lower frequency than the IMU integration unit, the tracking unit, and the filter unit. In particular embodiments, the mapping unit may operate at 0-2 Hz.
At step 1660, the method 1601 may send association data to the filter unit of the tracking engine. The association data may comprise matched map points which are determined based on the association between the map points and the key point and descriptors.
At step 1670, the method 1601 may receive the association data from the tracking unit at the filter unit of the tracking engine.
At step 1680, the method 1601 may determine a pose of the user in the corresponding map based on the association data. In particular embodiments, the filter unit may update the pose of the user in the global map, which is the received corresponding map, and localize the user in the global map based on the association data. In particular embodiments, the update state information may further comprise a current position of the user relative to a three-dimensional map. Furthermore, the filter unit may perform triangulation between a sequence of poses and the matched map points in the association data to refine state information of the user.
Particular embodiments may repeat one or more steps of the methods of
This disclosure contemplates any suitable number of computer systems 1700. This disclosure contemplates computer system 1700 taking any suitable physical form. As example and not by way of limitation, computer system 1700 may be an embedded computer system, a system-on-chip (SOC), a single-board computer system (SBC) (such as, for example, a computer-on-module (COM) or system-on-module (SOM)), a desktop computer system, a laptop or notebook computer system, an interactive kiosk, a mainframe, a mesh of computer systems, a mobile telephone, a personal digital assistant (PDA), a server, a tablet computer system, an augmented/virtual reality device, or a combination of two or more of these. Where appropriate, computer system 1700 may include one or more computer systems 1700; be unitary or distributed; span multiple locations; span multiple machines; span multiple data centers; or reside in a cloud, which may include one or more cloud components in one or more networks. Where appropriate, one or more computer systems 1700 may perform without substantial spatial or temporal limitation one or more steps of one or more methods described or illustrated herein. As an example and not by way of limitation, one or more computer systems 1700 may perform in real time or in batch mode one or more steps of one or more methods described or illustrated herein. One or more computer systems 1700 may perform at different times or at different locations one or more steps of one or more methods described or illustrated herein, where appropriate.
In particular embodiments, computer system 1700 includes a processor 1702, memory 1704, storage 1706, an input/output (I/O) interface 1708, a communication interface 1710, and a bus 1712. Although this disclosure describes and illustrates a particular computer system having a particular number of particular components in a particular arrangement, this disclosure contemplates any suitable computer system having any suitable number of any suitable components in any suitable arrangement.
In particular embodiments, processor 1702 includes hardware for executing instructions, such as those making up a computer program. As an example and not by way of limitation, to execute instructions, processor 1702 may retrieve (or fetch) the instructions from an internal register, an internal cache, memory 1704, or storage 1706; decode and execute them; and then write one or more results to an internal register, an internal cache, memory 1704, or storage 1706. In particular embodiments, processor 1702 may include one or more internal caches for data, instructions, or addresses. This disclosure contemplates processor 1702 including any suitable number of any suitable internal caches, where appropriate. As an example and not by way of limitation, processor 1702 may include one or more instruction caches, one or more data caches, and one or more translation lookaside buffers (TLBs). Instructions in the instruction caches may be copies of instructions in memory 1704 or storage 1706, and the instruction caches may speed up retrieval of those instructions by processor 1702. Data in the data caches may be copies of data in memory 1704 or storage 1706 for instructions executing at processor 1702 to operate on; the results of previous instructions executed at processor 1702 for access by subsequent instructions executing at processor 1702 or for writing to memory 1704 or storage 1706; or other suitable data. The data caches may speed up read or write operations by processor 1702. The TLBs may speed up virtual-address translation for processor 1702. In particular embodiments, processor 1702 may include one or more internal registers for data, instructions, or addresses. This disclosure contemplates processor 1702 including any suitable number of any suitable internal registers, where appropriate. Where appropriate, processor 1702 may include one or more arithmetic logic units (ALUs); be a multi-core processor; or include one or more processors 1702. Although this disclosure describes and illustrates a particular processor, this disclosure contemplates any suitable processor.
In particular embodiments, memory 1704 includes main memory for storing instructions for processor 1702 to execute or data for processor 1702 to operate on. As an example and not by way of limitation, computer system 1700 may load instructions from storage 1706 or another source (such as, for example, another computer system 1700) to memory 1704. Processor 1702 may then load the instructions from memory 1704 to an internal register or internal cache. To execute the instructions, processor 1702 may retrieve the instructions from the internal register or internal cache and decode them. During or after execution of the instructions, processor 1702 may write one or more results (which may be intermediate or final results) to the internal register or internal cache. Processor 1702 may then write one or more of those results to memory 1704. In particular embodiments, processor 1702 executes only instructions in one or more internal registers or internal caches or in memory 1704 (as opposed to storage 1706 or elsewhere) and operates only on data in one or more internal registers or internal caches or in memory 1704 (as opposed to storage 1706 or elsewhere). One or more memory buses (which may each include an address bus and a data bus) may couple processor 1702 to memory 1704. Bus 1712 may include one or more memory buses, as described below. In particular embodiments, one or more memory management units (MMUs) reside between processor 1702 and memory 1704 and facilitate accesses to memory 1704 requested by processor 1702. In particular embodiments, memory 1704 includes random access memory (RAM). This RAM may be volatile memory, where appropriate. Where appropriate, this RAM may be dynamic RAM (DRAM) or static RAM (SRAM). Moreover, where appropriate, this RAM may be single-ported or multi-ported RAM. This disclosure contemplates any suitable RAM. Memory 1704 may include one or more memories 1704, where appropriate. Although this disclosure describes and illustrates particular memory, this disclosure contemplates any suitable memory.
In particular embodiments, storage 1706 includes mass storage for data or instructions. As an example and not by way of limitation, storage 1706 may include a hard disk drive (HDD), a floppy disk drive, flash memory, an optical disc, a magneto-optical disc, magnetic tape, or a Universal Serial Bus (USB) drive or a combination of two or more of these. Storage 1706 may include removable or non-removable (or fixed) media, where appropriate. Storage 1706 may be internal or external to computer system 1700, where appropriate. In particular embodiments, storage 1706 is non-volatile, solid-state memory. In particular embodiments, storage 1706 includes read-only memory (ROM). Where appropriate, this ROM may be mask-programmed ROM, programmable ROM (PROM), erasable PROM (EPROM), electrically erasable PROM (EEPROM), electrically alterable ROM (EAROM), or flash memory or a combination of two or more of these. This disclosure contemplates mass storage 1706 taking any suitable physical form. Storage 1706 may include one or more storage control units facilitating communication between processor 1702 and storage 1706, where appropriate. Where appropriate, storage 1706 may include one or more storages 1706. Although this disclosure describes and illustrates particular storage, this disclosure contemplates any suitable storage.
In particular embodiments, I/O interface 1708 includes hardware, software, or both, providing one or more interfaces for communication between computer system 1700 and one or more I/O devices. Computer system 1700 may include one or more of these I/O devices, where appropriate. One or more of these I/O devices may enable communication between a person and computer system 1700. As an example and not by way of limitation, an I/O device may include a keyboard, keypad, microphone, monitor, mouse, printer, scanner, speaker, still camera, stylus, tablet, touch screen, trackball, video camera, another suitable I/O device or a combination of two or more of these. An I/O device may include one or more sensors. This disclosure contemplates any suitable I/O devices and any suitable I/O interfaces 1708 for them. Where appropriate, I/O interface 1708 may include one or more device or software drivers enabling processor 1702 to drive one or more of these I/O devices. I/O interface 1708 may include one or more I/O interfaces 1708, where appropriate. Although this disclosure describes and illustrates a particular I/O interface, this disclosure contemplates any suitable I/O interface.
In particular embodiments, communication interface 1710 includes hardware, software, or both providing one or more interfaces for communication (such as, for example, packet-based communication) between computer system 1700 and one or more other computer systems 1700 or one or more networks. As an example and not by way of limitation, communication interface 1710 may include a network interface controller (NIC) or network adapter for communicating with an Ethernet or other wire-based network or a wireless NIC (WNIC) or wireless adapter for communicating with a wireless network, such as a WI-FI network. This disclosure contemplates any suitable network and any suitable communication interface 1710 for it. As an example and not by way of limitation, computer system 1700 may communicate with an ad hoc network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a metropolitan area network (MAN), or one or more portions of the Internet or a combination of two or more of these. One or more portions of one or more of these networks may be wired or wireless. As an example, computer system 1700 may communicate with a wireless PAN (WPAN) (such as, for example, a BLUETOOTH WPAN), a WI-FI network, a WI-MAX network, a cellular telephone network (such as, for example, a Global System for Mobile Communications (GSM) network), or other suitable wireless network or a combination of two or more of these. Computer system 1700 may include any suitable communication interface 1710 for any of these networks, where appropriate. Communication interface 1710 may include one or more communication interfaces 1710, where appropriate. Although this disclosure describes and illustrates a particular communication interface, this disclosure contemplates any suitable communication interface.
In particular embodiments, bus 1712 includes hardware, software, or both coupling components of computer system 1700 to each other. As an example and not by way of limitation, bus 1712 may include an Accelerated Graphics Port (AGP) or other graphics bus, an Enhanced Industry Standard Architecture (EISA) bus, a front-side bus (FSB), a HYPERTRANSPORT (HT) interconnect, an Industry Standard Architecture (ISA) bus, an INFINIBAND interconnect, a low-pin-count (LPC) bus, a memory bus, a Micro Channel Architecture (MCA) bus, a Peripheral Component Interconnect (PCI) bus, a PCI-Express (PCIe) bus, a serial advanced technology attachment (SATA) bus, a Video Electronics Standards Association local (VLB) bus, or another suitable bus or a combination of two or more of these. Bus 1712 may include one or more buses 1712, where appropriate. Although this disclosure describes and illustrates a particular bus, this disclosure contemplates any suitable bus or interconnect.
Herein, a computer-readable non-transitory storage medium or media may include one or more semiconductor-based or other integrated circuits (ICs) (such, as for example, field-programmable gate arrays (FPGAs) or application-specific ICs (ASICs)), hard disk drives (HDDs), hybrid hard drives (HHDs), optical discs, optical disc drives (ODDs), magneto-optical discs, magneto-optical drives, floppy diskettes, floppy disk drives (FDDs), magnetic tapes, solid-state drives (SSDs), RAM-drives, SECURE DIGITAL cards or drives, any other suitable computer-readable non-transitory storage media, or any suitable combination of two or more of these, where appropriate. A computer-readable non-transitory storage medium may be volatile, non-volatile, or a combination of volatile and non-volatile, where appropriate.
Herein, “or” is inclusive and not exclusive, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A or B” means “A, B, or both,” unless expressly indicated otherwise or indicated otherwise by context. Moreover, “and” is both joint and several, unless expressly indicated otherwise or indicated otherwise by context. Therefore, herein, “A and B” means “A and B, jointly or severally,” unless expressly indicated otherwise or indicated otherwise by context.
The scope of this disclosure encompasses all changes, substitutions, variations, alterations, and modifications to the example embodiments described or illustrated herein that a person having ordinary skill in the art would comprehend. The scope of this disclosure is not limited to the example embodiments described or illustrated herein. Moreover, although this disclosure describes and illustrates respective embodiments herein as including particular components, elements, feature, functions, operations, or steps, any of these embodiments may include any combination or permutation of any of the components, elements, features, functions, operations, or steps described or illustrated anywhere herein that a person having ordinary skill in the art would comprehend. Furthermore, reference in the appended claims to an apparatus or system or a component of an apparatus or system being adapted to, arranged to, capable of, configured to, enabled to, operable to, or operative to perform a particular function encompasses that apparatus, system, component, whether or not it or that particular function is activated, turned on, or unlocked, as long as that apparatus, system, or component is so adapted, arranged, capable, configured, enabled, operable, or operative. Additionally, although this disclosure describes or illustrates particular embodiments as providing particular advantages, particular embodiments may provide none, some, or all of these advantages.
According to various embodiments, an advantage of features herein is that the tracking engine of the VIO-based tracking system may perform, at high frequency, localization and mapping for the device in the environment self-efficiently, and provide, at low frequency, descriptors and related features to the mapping unit to update or associate with the online map, in order to save power and cost. Particular embodiments of the present disclosure enable the tracking system to locate the device in environment precisely and self-efficiently by the current state information of the device (including a user's pose) determined by the processed images and pre-integrated motion data. Furthermore, by associating a map from the mapping engine with the observed information at the tracking engine discontinuously or on-demand, particular embodiments disclosed in the present disclosure may provide a light weight, power-efficient, continuously-tracking wearable device which comprises the tracking engine, and a local device which comprises the mapping engine.
While processes in the figures may show a particular order of operations performed by certain embodiments of the invention, it should be understood that such order is exemplary (e.g., alternative embodiments may perform the operations in a different order, combine certain operations, overlap certain operations, etc.).
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
Claims
1. A method comprising, by a computing system:
- receiving, at an IMU integration unit, motion data captured by one or more motion sensors of a wearable device;
- generating, at the IMU integration unit, a predicted pose of the wearable device based on the motion data of the wearable device;
- receiving, at a tracking unit, a sequence of images of an environment captured by one or more cameras;
- identifying, at the tracking unit, features in the sequence of images;
- determining, at the tracking unit, correspondence data between the sequence of images based on the identified features in the sequence of images and the predicted pose received from the IMU integration unit;
- determining, at a filter unit, current state information of the wearable device based on the correspondence data received from the tracking unit and the motion data received from the IMU integration unit, the current state information comprising at least a current pose of the wearable device relative to the environment capture by the one or more cameras;
- receiving, at a mapping unit of the computing system, regional map data from a remote map server, the regional map data being associated with a portion of a three-dimensional map hosted by the remote server;
- receiving, at the tracking unit and from the mapping unit, map points and associated descriptors for the map points from the regional map data received from the remote map server; and
- identifying, at the tracking unit, one or more of the map points in the sequence of images based on one or more of the associated descriptors associated with the one or more of the received map points, wherein the determining of the current state information is further based on the identified one or more of the map points within the sequence of images.
2. The method of claim 1, wherein the determining of the correspondence data comprises:
- identifying a first feature in a first image of the sequence of images; and
- searching, in a second image of the sequence of images, for a second feature that corresponds to the first feature in the first image;
- wherein the searching is performed along an epipolar line segment determined using the predicted pose.
3. The method of claim 1, wherein the current state information is determined based on an aggregation of the motion data.
4. The method of claim 1, wherein the current state information is determined using an optimization algorithm.
5. The method of claim 1, wherein the current state information is used to generate a next predicted pose of the wearable device.
6. The method of claim 1, wherein the IMU integration unit operates at a higher frequency than the tracking unit and the filter unit.
7. The method of claim 1, wherein the wearable device is an augmented-reality device, wherein the method further comprises:
- rendering augmented-reality content based on the current pose.
8. The method of claim 1, wherein the current state information further comprises a current position of the wearable device relative to the three-dimensional map.
9. The method of claim 1,
- wherein the mapping unit is configured to operate on demand or at a lower frequency than the IMU integration unit, the tracking unit, and the filter unit.
10. The method of claim 9,
- wherein the IMU integration unit is located within a head-mounted device; and
- wherein the tracking unit, the filter unit, and the mapping unit are implemented in a local computing device separated from the head-mounted device.
11. The method of claim 9,
- wherein the IMU integration unit, the tracking unit, and the filter unit are located within a head-mounted device; and
- wherein the mapping unit is implemented in a local computing device separated from the head-mounted device.
12. The method of claim 10, wherein the local computing device comprises one or more processors, wherein the one or more processors are configured to implement the tracking unit, the filter unit, and the mapping unit.
13. One or more computer-readable non-transitory storage media embodying software that is operable when executed to:
- receive motion data captured by one or more motion sensors of a wearable device;
- generate a predicted pose of the wearable device based on the motion data of the wearable device;
- receive a sequence of images of an environment captured by one or more cameras;
- identify features in the sequence of images;
- determine correspondence data between the sequence of images based on the identified features in the sequence of images and the predicted pose;
- determine current state information of the wearable device based on the correspondence data and the motion data, the current state information comprising at least a current pose of the wearable device relative to the environment capture by the one or more cameras;
- receive regional map data from a remote map server, the regional map data being associated with a portion of a three-dimensional map hosted by the remote server;
- receive map points and associated descriptors for the map points from the regional map data received from the remote map server; and
- identify one or more of the map points in the sequence of images based on one or more of the associated descriptors associated with the one or more of the received map points, wherein the determining of the current state information is further based on the identified one or more of the map points within the sequence of images.
14. The media of claim 13, wherein the determining of the correspondence data comprises:
- identifying a first feature in a first image of the sequence of images; and
- searching, in a second image of the sequence of images, for a second feature that corresponds to the first feature in the first image;
- wherein the searching is performed along an epipolar line segment determined using the predicted pose.
15. The media of claim 13, wherein the current state information is determined based on an aggregation of the motion data.
16. The media of claim 13, wherein the current state information is determined using on an optimization algorithm.
17. The media of claim 13, wherein the current state information is used to generate a next predicted pose of the wearable device.
18. The media of claim 13, wherein the wearable device is an augmented-reality device, wherein the software is further operable when executed to:
- render augmented-reality content based on the current pose.
19. The media of claim 13, wherein the current state information further comprises a current position of the device relative to the three-dimensional map.
20. A system comprising: one or more processors; and one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to:
- receive, at an IMU integration unit, motion data captured by one or more motion sensors of a wearable device;
- generate, at the IMU integration unit, a predicted pose of the wearable device based on the motion data of the wearable device;
- receive, at a tracking unit, a sequence of images of an environment captured by one or more cameras;
- identify, at the tracking unit, features in the sequence of images;
- determine, at the tracking unit, correspondence data between the sequence of images based on the identified features in the sequence of images and the predicted pose received from the IMU integration unit;
- determine, at a filter unit, current state information of the wearable device based on the correspondence data received from the tracking unit and the motion data received from the IMU integration unit, the current state information comprising at least a current pose of the wearable device relative to the environment capture by the one or more cameras;
- receive, at a mapping unit, regional map data from a remote map server, the regional map data being associated with a portion of a three-dimensional map hosted by the remote server;
- receive, at the tracking unit and from the mapping unit, map points and associated descriptors for the map points from the regional map data received from the remote map server; and
- identify, at the tracking unit, one or more of the map points in the sequence of images based on one or more of the associated descriptors associated with the one or more of the received map points, wherein the determining of the current state information is further based on the identified one or more of the map points within the sequence of images.
Type: Application
Filed: Aug 9, 2019
Publication Date: Feb 11, 2021
Inventors: Jakob Julian Engel (Seattle, WA), Anastasios Mourikis (Seattle, WA), Raul Mur Artal (Redmond, WA)
Application Number: 16/537,111