APPARATUS AND METHOD OF 1:1 MATCHING HEAD MOUNTED DISPLAY VIEW TO HEAD MOVEMENT THAT CONTROLS ARTICULATED CAMERA
Tracking a user head position detects a change to a new head position and, in response, a remote camera is instructed to move to a next camera position. A camera image frame, having an indication of camera position, is received from the camera. Upon the camera position not aligning with the next camera position, an assembled image frame is formed, using image data from past views, and rendered to appear to the user as if the camera moved in 1:1 alignment with the user's head to the next camera position.
Latest Microsoft Patents:
- Developing an automatic speech recognition system using normalization
- System and method for reducing power consumption
- Facilitating interaction among meeting participants to verify meeting attendance
- Techniques for determining threat intelligence for network infrastructure analysis
- Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices
This disclosure relates generally to telepresence via remote user control of movable cameras, and, more particularly, to matching of remote user head motion and rotation to the video presented to the user.
BACKGROUNDA user can operate a joystick or other manual interface to remotely control a camera equipped drone, while watching the drone's camera image on a display. This technique can be acceptable for certain applications, but has limitations. One is that high user skill may be required. Another is that for some applications and users, watching the camera view on a head mounted display (HMD) may be preferable. However, a user controlling a drone with a joystick or other manual interface while watching the camera view on an HMD can find the experience unsettling. This can be due to changing orientation of the view on the HMD without any inner ear sense of a corresponding changing of head orientation.
The HMD can be provided with a controller or sensor package that observes the user's head position and orientation and transmits corresponding signals to the drone, with the objective of the drone tracking that position and orientation. However, technical issues in this technique can make it unsuitable for various applications. One is matching the orientation of the image on the HMD to the movement of the user's head with low latency to avoid motion sickness.
SUMMARYThis Summary is provided to introduce a selection of concepts in a simplified form that is further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure
Disclosed apparatuses include an apparatus that can include a head mounted display, configured to be worn by a user, a head position tracker configured to track a position of the user's head, and generate a corresponding head position signal, and a camera position controller, configured to detect a change in the position of the user's head from a current head position to a new head position and, based at least in part on detecting the change, to communicate a command to a movable support for a camera to a next camera position, the next camera position being aligned with the new head position. The apparatus can include a data storage, configured to store a surface map, the surface map including a population of past views of respective portions of an environment and, for each past view, information identifying its viewing position. The apparatus can include an image assembly module, coupled to the data storage, and configured to determine, based on the next camera position, a next camera viewing region, the next camera viewing region being a region of the environment that will be in a camera field of the camera when in the next camera position, receive a camera image frame from the camera, the camera image frame including an indication of camera position, determining, based at least in part on the indicated camera position, whether the image frame covers all of the next camera viewing region and, upon determining the camera image frame does not cover all of the next camera viewing region, generate an assembled image frame that encompasses the next camera viewing region, the assembled image frame including image data from at least one of the past views, and can include a rendering module, configured to render a 3D image from the assembled image frame, the 3D image appearing as if viewed from the next camera position. Technical features provided by the assembly and rendering can include, as will be understood from this disclosure, the 3D image appearing to the user as if the camera moved in 1:1 non-delayed alignment with the user's head.
Disclosed methods include a method that can include storing a surface map, the surface map including a population of past views of respective portions of an environment and, for each past view, information identifying its viewing position, tracking a position of a user's head, detecting, based the tracking, a change in a position of the user's head from a current head position to a new head position, upon detecting the change in the position of the user's head, communicating a command to a movable support for a camera to a next camera position, the next camera position being aligned with the new head position. The method can include determining, based on the next camera position, a next camera viewing region, the next camera viewing region being a region of the environment that will be in a camera field of the camera when in the next camera position, receiving a camera image frame from the camera, the camera image frame including a camera position stamp, determining, based at least in part on the camera position stamp, whether the image frame covers all of the next camera viewing region. The method can include, upon determining the camera image frame does not cover all of the next camera viewing region, generating an assembled image frame that encompasses the next camera viewing region, the assembled image frame including image data from at least one of the past views, and can include rendering a 3D image from the assembled image frame, the 3D image appearing as if viewed from the next camera position.
The drawing figures depict one or more implementations in accord with the present teachings, by way of example only, not by way of limitation. In the figures, like reference numerals refer to the same or similar elements.
In the following detailed description, numerous specific details are set forth by way of examples in order to provide a thorough understanding of the disclosed subject matter. It will be apparent to persons of ordinary skill, upon reading this description, that various aspects can be practiced without such details. As will be understood by persons of skill upon reading this disclosure, benefits and advantages provided and enabled by disclosed subject matter and its various implementations can include, but are not limited to, a solution to the technical problem arising in head motion control of drone supported or other remote articulated or movable cameras, arising from mismatches between the user's head motion and images on the user's head-mounted display (HMD). Technical features include 1:1 matching the motion of the camera on the drone to the user's head movement, and “1:1” includes tracking the head motion and using a result to render at least a portion of the camera image data from known past locations and orientations of the camera, to a view of the local environment as would be seen by the user from his current location and orientation, rendered to appear to the user as if the camera moved in 1:1 non-delayed alignment with the user's head to the next camera position.
It will be understood that
Referring to
The head position tracker 110 can be configured to track user head location in three-dimensions that can be defined, for example, by orthogonal (X, Y, Z) axes. The (X, Y) axes can be, for example, a two-dimensional position on a floor space or other area, and (Z) can be an elevation. The “Z” feature can provide the user with head control of an elevation of the drone 106, for example by squatting or standing on his toes. The X, Y, Z axes can be termed a “user location reference.” In an aspect, the directions of the X, Y, Z axes can be defined by the world view that is presented on the user's HMD 102.
Regarding axes for tracking user head orientation, the head position tracker 110 can be configured to track left-to-right head rotation about the
Regarding a frame of reference for the location of camera 108,
The camera position controller 112 and an image processor 114, described in greater detail later, can store a translation map between the camera location reference and the user location reference, i.e., an X′, Y′, Z′ to X, Y, Z translation, and between the head orientation reference and the camera orientation reference, i.e., an AZ, EL, TL to YW, PT, RL translation.
As described above, the head position tracker 110 can be integrated with the HMD 102, and can include resources external to the HMD 102. For example, the head position tracker 110 can include an external camera (not visible in
The camera 108 can be configured to provide a given field of view. The field of view can be termed, for purposes of description, as a “camera field.” The camera field is preferably, but not necessarily, greater than or equal to the field of view provided by the HMD 102. Exemplary relation of camera field to HMD field is described in greater detail later.
The camera 108 can be configured to generate time-stamped R x S pixel camera frames. The generation can be at a fixed frame rate FR, a variable frame rate, or event-triggered, or any combination thereof. The camera 108 can transmit the frames to the wireless base transceiver 104, for delivery to the image processor 114. The image processor 114 can include a frame storage (not visible in
The surface map is not limited to frames provided by the camera 108. The surface map can include, for example, 3D polygonal or point cloud data constructed from depth sensors. The surface map can also include or visual tracking information, such as image feature tracking.
In one or more implementations, each surface map frame can be indexed, or searchable, according to its camera position and camera orientation. A surface map can be referred to as “complete” if every surface (within resolution limits) of the inspection site appears in at least one of the surface map frames. A complete surface map can include, as arbitrary example populations, more or less than a hundred surface map frames. The population can likewise include a thousand surface frames or tens of millions of surface map frames, and less than or more than any of these example populations.
In example processes in or on systems and methods according to this disclosure, a surface map of an inspection site can be constructed prior to a virtual presence. Such a surface map can be constructed, for example, using the same drone 106 and camera 108. Such a surface map can be constructed, for example, using another drone supporting another camera. Alternatively, or additionally, the surface map can be constructed, at least in part, prior to a virtual presence, and updated with frames generated during the virtual presence operations.
Each of the ten surface map frames S(Lx, Rx, Tx) captures a surface area of the remote inspection site 202.
The
Referring to
The image assembly process can be performed, for example, by the image processor 114 operating on the surface map stored in a frame memory, with the image processor 114 executing computer executable instructions stored in an instruction memory, as described in greater detail later.
The result of the image assembly can be termed a “raw HMD view” because the assembly frame group, even though it covered the viewing area Al, consists of surface map frames that captured the surface area from camera locations and camera orientations different from the camera location and camera orientation, i.e., CP(Next), that matches the user's new head position. Accordingly, in processes according to this disclosure, an adjusted rendering can be applied. The adjusted rendering can be an inverse of the distortion, of the raw HMD view, using the location and orientation information of each surface map frame in the assembly frame group to be formed. The result of the adjusted rendering can be a 3D image of what, or close to what, the user would likely see if, as a hypothetical, the camera 108 had tracked the change in head position, with zero delay (which is viewing position corresponding to CP(Next). The result is what the user would “likely see” because the assembly frame group is formed of past frames. It does not reflect a current camera view of Al.
In processes in one or more systems and methods according to this disclosure, the output of the adjusted rendering can be presented as the HMD view. Processes can also include applying a late stage correction to the output of the adjusted rendering. The late stage correction can include, for example, update information from new camera frames. This may occur if the camera field now covers any of A1. This is not the case for the example illustrated in
The adjusted rendering and the late stage correction can be performed, for example, by the image processor 114 executing computer executable instructions stored in the instruction memory described above.
The example assumes that, at T20, the head tracker 110 detected the user's head had moved to another new location and orientation, labeled CP(Nxt1). However, the camera 108 lags the user's head because of system delays as described above.
Referring to
An adjusted rendering can then be applied to the result of assembling S(L2, R2, T9) and the camera frame at CP(T20). As described above, the adjusted rendering can be an inverse of the distortion, of the raw HMD view, using the location and orientation information of each surface map frame in the assembly frame group. The result of the adjusted rendering can be a 3D image pf what the user would likely see if, as a hypothetical, the camera 108 had tracked the change in head position, with zero delay (which is viewing position corresponding to CP(Nxt1)). In this example, the assembly frame group includes the current camera field at CP(T20), and therefore the adjusted rendering output has a higher likelihood of success.
The result of the adjusted rendering applied to the result of assembling S(L2, R2, T9) and the camera frame at CP(T20) can be presented as the HMD view. Processes can also include applying a late stage correction to the output of the adjusted rendering, for example, to compensate for a late detection of additional head movement by the user.
The above-described adjusted rendering and the late stage correction can be performed, for example, by the image processor 114 executing computer executable instructions stored in the instruction memory described above.
Referring to
The flow 700 can then, preferably while operations at 710 are ongoing, proceed to 712 and 714 to collect data and download image data. Operations at 712 can include generation of image frames, e.g., frames and, for each, applying a camera position and time stamp and transmitting that over the wireless link LK, back to the image processor 114. The flow 700 can then proceed to 716 and perform operations of assembling the frame data into a form usable by, e.g., aligned with, and adding informational content useable for or by the user's HMD worldview. After the assembly operations, the flow 700 can proceed to 718 and perform the above-described adjusted rendering. After the adjusted rendering, the flow 700 can proceed to 720 and apply LSR, for example, to adjust for additional head motion by the user, or incorporate new information, or both, and then proceed to 722 to display the information in the user HMD.
Referring to
Features of the surface map module 914 can include storing, for example, in the frame memory 908, a population of the surface map frames S(Lx, Rx, Tx) forming a surface map, as described above. The surface map module 914 can be further configured to update the set of frames, and therefore update and extend the surface map, in response to new image information received from the camera 108 as the user moves his head location, or orientation, or both in performing virtual presence inspection of the regions of interest.
While the foregoing has described what are considered to be the best mode and/or other examples, it is understood that various modifications may be made therein and that the subject matter disclosed herein may be implemented in various forms and examples, and that the teachings may be applied in numerous applications, only some of which have been described herein. It is intended by the following claims to claim any and all applications, modifications and variations that fall within the true scope of the present teachings.
Unless otherwise stated, all measurements, values, ratings, positions, magnitudes, sizes, and other specifications that are set forth in this specification, including in the claims that follow, are approximate, not exact. They are intended to have a reasonable range that is consistent with the functions to which they relate and with what is customary in the art to which they pertain.
The scope of protection is limited solely by the claims that now follow. That scope is intended to be as broad as is consistent with the ordinary meaning of the language that is used in the claims when interpreted in light of this specification and the prosecution history that follows, and to encompass all structural and functional equivalents. Notwithstanding, none of the claims are intended to embrace subject matter that fails to satisfy the requirement of Sections 101, 102, or 103 of the Patent Act, nor should they be interpreted in such a way. Any unintended embracement of such subject matter is hereby disclaimed.
Except as stated immediately above, nothing that has been stated or illustrated is intended or should be interpreted to cause a dedication of any component, step, feature, object, benefit, advantage, or equivalent to the public, regardless of whether it is or is not recited in the claims.
It will be understood that the terms and expressions used herein have the ordinary meaning as is accorded to such terms and expressions with respect to their corresponding respective areas of inquiry and study except where specific meanings have otherwise been set forth herein.
Relational terms such as first and second and the like may be used solely to distinguish one entity or action from another without necessarily requiring or implying any actual such relationship or order between such entities or actions. The terms “comprises,” “comprising,” and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. An element preceded by “a” or “an” does not, without further constraints, preclude the existence of additional identical elements in the process, method, article, or apparatus that comprises the element.
The Abstract of the Disclosure is provided to allow the reader to quickly identify the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, it can be seen that various features are grouped together in various examples for the purpose of streamlining the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that any claim requires more features than the claim expressly recites. Rather, as the following claims reflect, inventive subject matter lies in less than all features of a single disclosed example. Thus the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separately claimed subject matter.
Claims
1. An apparatus comprising:
- a head mounted display configured to be worn by a user;
- a head position tracker configured to track a position of the user's head, and generate a corresponding head position signal;
- a camera position controller configured to detect a change in the position of the user's head from a current head position to a new head position and, based at least in part on detecting the change, to communicate a command to a movable support for a camera to a next camera position, the next camera position being aligned with the new head position;
- a data storage configured to store a surface map, the surface map including a population of past views of respective portions of an environment and, for each past view, information identifying its viewing position;
- an image assembly module, coupled to the data storage, and configured to determine, based on the next camera position, a next camera viewing region, the next camera viewing region being a region of an environment that will be in a camera field of the camera when in the next camera position, receive a camera image frame from the camera, the camera image frame including an indication of camera position, determine, based at least in part on the indicated camera position, whether the image frame covers all of the next camera viewing region, and upon determining the camera image frame does not cover all of the next camera viewing region, generate an assembled image frame that encompasses the next camera viewing region, the assembled image frame including image data from at least one of the past views; and
- a rendering module configured to render a 3D image from the assembled image frame, the 3D image appearing as if viewed from the next camera position.
2. The apparatus of claim 1, wherein the rending module is an adjusted rendering module, configured to perform a rendering of the assembled image frame and to include in the rendering an adjustment for differences between the viewing positions of past views and the next camera position.
3. The apparatus of claim 2, wherein:
- the indication of position of the camera is a camera position stamp,
- the past views are surface map frames,
- each surface map frame includes a viewing position stamp, and
- differences between the viewing positions of past views and the next camera viewing position are based on differences between the viewing position stamps of the surface map frames and the next camera position.
4. The apparatus of claim 3, wherein rendering the assembled image frame includes:
- forming an image frame group, the image frame group including, as members, surface map frames that each map at least a portion of a respective surface region that is within the next camera viewing region, wherein
- the assembled image frame includes members of the image frame group.
5. The apparatus of claim 4, wherein the surface map frames further include respective time stamps, and the image assembly module is further configured to include, in forming the image frame group, the respective time stamps of the surface map frames.
6. The apparatus of claim 4, wherein the image assembly module is further configured to:
- determine if the camera image frame includes at least a portion of the next camera viewing region, and
- upon determining that the camera image frame includes at least a portion of the next camera viewing region, to include at least a portion of the camera image frame in the assembled image frame.
7. The apparatus of claim 1, wherein:
- the surface map further includes depth sensor data associated with sensed portions of the environment, and
- the image assembly module is further configured to:
- determine, upon determining that the camera image frame does not cover all of the next camera viewing region, whether the sensed portions of the environment are within the next camera viewing region, and
- include, upon determining that the sensed portions of the environment are within the next camera viewing region, at least a portion of the depth sensor data in the assembled image frame.
8. The apparatus of claim 1, wherein the image assembly module is further configured to update the surface map based at least in part on the received camera image frame.
9. The apparatus of claim 1, further comprising:
- a late stage re-projection module configured to perform a late stage re-projection of the 3D image for display on the head mounted display.
10. The apparatus of claim 1, wherein the camera position controller is further configured to:
- receive camera position information from the movable camera support, and
- generate the camera control signal further based on the camera position information.
11. A method comprising:
- storing a surface map, the surface map including a population of past views of respective portions of an environment and, for each past view, information identifying its viewing position;
- tracking a position of a user's head;
- detecting, based the tracking, a change in a position of the user's head from a current head position to a new head position;
- upon detecting the change in the position of the user's head, communicating a command to a movable support for a camera to a next camera position, the next camera position being aligned with the new head position;
- determining, based on the next camera position, a next camera viewing region, the next camera viewing region being a region of the environment that will be in a camera field of the camera when in the next camera position;
- receiving a camera image frame from the camera, the camera image frame including a camera position stamp;
- determining, based at least in part on the camera position stamp, whether the image frame covers all of the next camera viewing region;
- upon determining the camera image frame does not cover all of the next camera viewing region, generating an assembled image frame that encompasses the next camera viewing region, the assembled image frame including image data from at least one of the past views; and
- rendering a 3D image from the assembled image frame, the 3D image appearing as if viewed from the next camera position.
12. The method of claim 11, wherein the rending includes an adjusted rendering that included adjusting for differences between the viewing positions of past views and the next camera position.
13. The method of claim 12, wherein:
- the past views are surface map frames,
- each surface map frame includes a viewing position stamp, and
- differences between the viewing positions of past views and the next camera viewing position are based on differences between the viewing position stamps of the surface map frames and the next camera position.
14. The method of claim 13, wherein generating the assembled image frame includes:
- forming an image frame group, the image frame group including, as members, surface map frames that each map at least a portion of a respective surface region that is within the next camera viewing region, wherein
- the assembled image frame includes members of the image frame group.
15. The method of claim 14, wherein:
- the surface map frames further include respective time stamps, and
- forming the image frame group is further based, at least in part, on the respective time stamps of the surface map frames.
16. The method of 14, wherein generating the assembled image frame further includes:
- determining whether the camera image frame includes at least a portion of the next camera viewing region, and
- upon determining that the camera image frame includes at least a portion of the next camera viewing region, including at least a portion of the camera image frame in the assembled image frame.
17. The method of claim 16, wherein the surface map further includes depth sensor data associated with sensed portions of the environment, and wherein generating the assembled image frame further includes:
- upon determining that the camera image frame does not cover all of the next camera viewing region, determining whether the sensed portions of the environment are within the next camera viewing region, and
- upon determining that the sensed portions of the environment are within the next camera viewing region, including at least a portion of the depth sensor data in the assembled image frame.
18. The method of claim 11, wherein rendering the 3D image includes rendering the 3D image for display on a head mounted display worn by the user, and wherein the method further includes displaying the 3D image on the head mounter display.
19. The method of claim 11, further comprising:
- detecting, based on the tracking the position of the user's head, a further change in the position of the user's head and;
- late stage correcting the 3D image to a corrected 3D image, based at least in part on the detected further change in the position of the user's head.
20. The method of claim 11, further comprising updating the surface map, based at least in part on the received camera image frame.
Type: Application
Filed: Jun 5, 2017
Publication Date: Dec 6, 2018
Applicant: MICROSOFT TECHNOLOGY LICENSING, LLC (Redmond, WA)
Inventors: Alexandre DA VEIGA (San Francisco, CA), Roger Sebastian Kevin SYLVAN (Seattle, WA), Kenneth Liam KIEMELE (Redmond, WA), Nikolai Michael FAALAND (Sammamish, WA), Aaron Mackay BURNS (Newcastle, WA)
Application Number: 15/614,594