MAPPING SOUND SPATIALIZATION FIELDS TO PANORAMIC VIDEO
Systems and methods are disclosed for mapping a sound spatialization field to a displayed panoramic image as the viewing angle of the panoramic image changes. As the viewing angle of the image data changes, the audio data is processed to rotate the captured sound spatialization field to the same extent. Thus, the audio data remains mapped to the image data whether the image data is rotated about a single axis or about more than one axis.
Latest Microsoft Patents:
- Developing an automatic speech recognition system using normalization
- System and method for reducing power consumption
- Facilitating interaction among meeting participants to verify meeting attendance
- Techniques for determining threat intelligence for network infrastructure analysis
- Multi-encoder end-to-end automatic speech recognition (ASR) for joint modeling of multiple input devices
It is known to map audio to video images for a fixed frame of reference. For example, when a car is displayed to a user on a screen moving from left to right, the audio can be mixed so as to appear to move with the car. The frame of reference is fixed in that the user does not change the viewing angle of the displayed images. Panoramic video systems are also known which simulate immersion of a user within a three-dimensional scene, and which allow a dynamic image frame of reference. Such systems may be experienced by a user over a television, or by a head mounted display unit, which occludes the real world view and instead displays recorded images of the panorama to the user. In such systems, a user may dynamically change their field of view of the panorama to pan left, right, straight ahead, etc. Thus, in the above example, instead of the car moving from left to right in the user's field of view, the user can change the viewing angle of the panorama so that the car remains stationary in the user's field of view (for example centered on the television) while the background panorama changes.
In such instances, a static audio field will not properly track with a change of the viewing angle. The volume of the audio may work properly, for example as the apparent distance between the car of the above example and user's vantage point changes. However, while the user may track the car with the controller to stay stationary in his field of view (for example centered on the television), the audio of the car will appear to move from left to right within the sound field.
SUMMARYDisclosed herein are systems and methods for mapping a sound spatialization field to a displayed panoramic image as the viewing angle of the panoramic image changes. In one example, the present technology includes an image capture device and a microphone array for capturing image and audio data of a real person, place or thing. The images captured may be around a 360° panorama, and the microphone array captures a spherical sound spatialization field of the panorama. The audio data may be processed and stored in a variety of multi-channel formats, including for example ambisonic B-format.
A user may thereafter experience the image and audio data via a display and a sound transmitter such as for example an array of loudspeakers. The user has a controller which allows the user to change the view provided on the display to pan to different areas of the captured panoramic image. The image may be changed to rotate at least in a horizontal plane around the panorama, but may also be changed about any of one or more of three orthogonal axes.
As the viewing angle of the image data changes, the present system processes the audio data to rotate the captured sound spatialization field to the same extent. Thus, the audio data remains mapped to the image data whether the image data is rotated about a single axis or about more than one axis.
In one embodiment, the present technology relates to a method of mapping audio data of a real person, place and/or thing to image data of a panorama including the real person, place and/or thing, comprising: (a) processing the image data of the panorama including the real person, place and/or thing to show the image data from a selected viewing angle; and (b) processing audio data of the real person, place and/or thing to map a sound spatialization field of the audio data to align with the selected viewing angle of the image data.
In another embodiment, the present technology relates to a system for presenting panoramic image data and associated audio data from a user-selected perspective, the image and audio data captured from a real person, place and/or thing, the system comprising: a display for displaying images from the panoramic image data; an audio transmitter for providing audio associated with the panoramic image data; a controller for varying of the image data displayed on the display; and a computing device for mapping a sound spatialization field to the panoramic image data so that the audio transmitted by the audio source matches an image displayed by the display.
In a further embodiment, the present technology relates to a computer-readable storage medium for programming a processor to perform a method of mapping audio data of a real person, place and/or thing to image data of a panorama including the real person, place and/or thing, comprising: (a) displaying a first image generated from the image data of a first portion of the panorama including the real person, place or thing; (b) playing audio data to recreate a sound spatialization field aligned in three-dimensions with the image data of the real person place or thing; (c) receiving an indication to change a viewing angle of the image displayed in said step (a); (d) processing the image data to rotate an image displayed about one or more orthogonal axes; (e) displaying a second image generated from the image data of a second portion of the panorama including the real person, place or thing based on processing the image data in said step (d); (f) processing the audio data to rotate the sound spatialization field about the one or more orthogonal axes to the same extent the image was rotated in said step (d); and (g) playing the audio data to recreate the sound spatialization field processed in said step (f).
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter. Furthermore, the claimed subject matter is not limited to implementations that solve any or all disadvantages noted in any part of this disclosure.
Embodiments of the present technology will now be described with reference to
In examples, the images used in the system of the Panoramic Imaging Application may be of real events, people, places or things. As just some non-limiting examples, the images may be of a sporting event or music concert, where the user has the ability to view the event from on the field of play, on the stage or anywhere else the image-gathering cameras are positioned.
The present technology operates in conjunction with the technology described in the Panoramic Imaging Application by recording the audio from the captured scene. Thereafter, as explained below, when the captured images are displayed to the user, the associated audio may be played as well. The present system maps a sound spatialization field to the captured panoramic image. Thus, as a user views the panoramic images from different viewing angles, the sound spatialization field moves with the images.
Humans hear sound in three-dimensions, using for example head related transfer functions (HRTFs) and head motion. As such, in examples, audio may be recorded on multiple channels using multiple recording devices to provide a spatialized effect of a three-dimensional sound spatialization field (“SSF” in the drawings). One method of providing a 3D sound spatialization field is by recording acoustic sources using a technique referred to ambisonics. The ambisonic approach is described for example in the publication by M. A. Gerzon, “Ambisonics in Multichannel Broadcasting and Video,” Journal of the Audio Engineering Society, Vol. 33, No. 11, pp. 859-871 (October, 1985), which publication is incorporated by reference herein in its entirety.
Ambisonic recording is one of a variety of technologies which may be used in the present system for effectively recording sound directions and amplitudes, and reproducing them over loudspeaker systems so that listeners can perceive sounds located in three-dimensional space. In embodiments, the ambisonic system records sound signals in “ambisonic B-format” over four discrete channels. The B-format channel information includes three microphone channels (X, Y, Z), in addition to an omnidirectional channel (W). In further embodiments, audio signals may be recorded using fewer or greater numbers of channels. In one further embodiment, 2D (horizontal-only) 360-degree signals may be recorded using three channels.
In an embodiment using four channels, the sound signals convey directionally encoded information with a resolution equal to first-order microphones (cardioid, figure-eight, etc.). In one example, an ambisonic system may use a specialized microphone array, called a SoundField™ microphone. One example of a SoundField microphone is a marketed under the brand name TetraMic™ from Core Sound LLC, Teaneck, N.J., USA.
Reproduction of the B-format sound signals may be done using two or more loudspeakers, depending in part upon the required reproduction (2D or 3D). It is understood that more than two loudspeakers may be used in further embodiments. In one further embodiment, there may be 4 loudspeakers, and in a further embodiment, there may be 8 loudspeakers.
In operation, the user may manipulate the controller 112 by tilting it about x, y and/or z axes to control the panoramic images displayed on display 118. As one example, where the display 118 is perpendicular to the y-axis, the user may tilt the controller about the z-axis (along arrow A-A in
The controller 112 may be a known device, including for example a 3-axis accelerometer and/or other sensors for sensing movement of the controller. The controller 112 may communicate with the computing device 108 via wireless communication protocols, such as for example Bluetooth. It is understood that the controller 112 may operate by other mechanisms to affect movement of the image in further embodiments.
Sounds recorded in ambisonic B-format using microphone array 102 of
(x2+y2+z2)<=1,
where x is the distance along the X, or left-right axis; y is the distance along the Y, or front-back axis; and z is the distance along the Z or up-down axis.
When a monophonic signal is positioned on the surface of the sphere, its coordinates x, y and z are given by:
x=(sin A)(cos B),
y=(cos A)(cos B), and
z=sin B,
referenced to the center front position of the sphere, where A is the horizontal angle subtended at the listening position, and B is the vertical angle subtended at the listening position.
These coordinates may be used as multipliers to produce the B-format output signals X, Y, Z and W as follows:
X=(input signal)(sin A)(cos B),
Y=(input signal)(cos A)(cos B),
Z=(input signal)(sin B), and
W=(input signal)(0.707).
The 0.707 multiplier on W is equal to the sin 45°, and gives a more even distribution of signal levels within the four channels. These multiplying coefficients can be used to position monophonic sounds anywhere on the surface of the sound field.
While embodiments of the present system described above and hereafter use ambisonic recording and playback of audio data, it is understood that other sound recording and playback systems may be used. For example, the present technology may be adapted to operate with other formats such as Stereo Quadraphonic, Quadraphonic Sound, CD-4, Dolby MP, Dolby surround AC-3 and other surround sound technologies, Dolby Pro-logic, Lucas Film THX, etc. A further discussion of the capture and playback of sound spatialization fields, by ambisonic and other theories, is provided in the following publications, each of which is incorporated by reference herein in its entirety:
- Bamford, J. & Vanderkooy, J., “Ambisonic Sound For Us,” Preprint from 99th AES Convention, Audio Engineering Society (Preprint No 4138) (October, 1995);
- Begault, D., “Challenges to the Successful Implementation of 3-D Sound,” Journal of the Audio Engineering Society, Vol. 39, No 11, pp 864-870 (1991);
- Gerzon, M., “Optimum Reproduction Matrices For Multi-Speaker Stereo,” Journal of the Audio Engineering Society, Vol. 40, No 7/8, pp 571-589 (1992);
- Gerzon, M., “Surround Sound Psychoacoustics,” Wireless World December, Vol. 80, pp 483-485 (1974);
- Malham, D. G., “Computer Control of Ambisonic Soundfields,” Preprint from 82.sup.nd AES Convention, Audio Engineering Society (Preprint No 2463) (March, 1987);
- Malham, D. G. & Clarke, J., “Control Software for a Programmable Soundfield Controller,” Proceedings of the Institute of Acoustics Autumn Conference on Reproduced Sound 8, Windermere, pp 265-272 (1992);
- Malham. D. G. & Myatt, A., “3-D Sound Spatialization Using Ambisonic Techniques,” Computer Music Journal, Vol. 19 No 4, pp 58-70 (1995);
- Naef, M., Staadt, O., Gross, M., “Spatialized Audio Rendering for Immersive Virtual Environments,” In Proceedings of the ACM Symposium on Virtual Reality Software and Technology, H. Sun and Q. Peng, Eds. ACM Press, 65-72. (2002);
- Poletti. M., “The Design of Encoding Functions for Stereophonic and Polyphonic Sound Systems,” Journal of the Audio Engineering Society, Vol. 44, No 11, pp 948-963 (1996);
- Vanderkooy. J. & Lipshitz. S., “Anomalies of Wavefront Reconstruction in Stereo and Surround-Sound Reproduction,” Preprint from 83rd AES Convention, Audio Engineering Society (Preprint No 2554) (October, 1987); and
- U.S. Pat. No. 6,259,795, entitled “Methods and Apparatus For Processing Spatialized Audio,” issued Jul. 10, 2001.
Operation of the present system for mapping of a recorded sound spatialization field to a recorded panoramic image will now be described with reference to the flowchart of
In step 208, the recorded audio data and captured frame of image data are time stamped. This will allow easy synchronization of the image and audio data when played back as explained below. In step 212, the captured image data is processed into cylindrical image data of a panorama. In one embodiment described in the above referenced Panoramic Imaging Application, the image data is processed into left and right cylindrical images which together provide a stereoscopic view of a panorama, possibly around 360°. In further embodiments, the computing device 104 may skip step 212 when the image data is captured and instead store the raw image data. In such embodiments, the raw image data may be processed into the cylindrical view of the panorama (stereoscopic or otherwise) at the time the image is displayed to the user.
In step 216, the computing device 104 (present but not shown in
When recorded, the sound spatialization field is aligned to the captured images in the device 100/array 102. That is, the capture device 100 is able to determine the vector orientation of an object, for example audio source 1 of
As explained below, when an image is initially displayed during playback of the image data, the system may initially position the unit vector between the user's head and the center of the display 118. Having also defined the same reference vector for the sound spatialization field, the field may initially map to reference vector during audio playback so that the sound spatialization field is initially correctly mapped to the displayed initial image. The image and sound spatialization field may thereafter be rotated in 3D space as explained hereinafter. The captured image data and recorded sound spatialization field may be stored and/or transmitted to another computing device in step 218.
After image and audio data has been captured by the capture device 100 and microphone 102, a user may experience the image and audio data at another time and place, from the data stored on the computing device 104 where the data was initially stored or from a computing device 108 which received a transmission of the data (computing device 108 is referred to in the following description). The operation of the system 106 for presenting this experience to the user is now explained with reference to the flowchart of
In step 224, the audio data is formatted to recreate the sound spatialization field around the user via the loudspeakers 110. As explained below, a user may alternatively experience the audio using headphones or earbuds. In such embodiments, the data would be specifically formatted to recreate the sound spatialization field for those sound transmission mediums.
In step 228 (
Referring now to
where yaw is the rotation angle about the z-axis of the current image, pitch is the rotation angle about the x-axis of the current image, and roll is the rotation angle about the y-axis of the current image.
Once the orientation matrix OM is calculated, it is possible to map the X, Y and Z coordinates of the computed B-format data for sound sources into an orientation matching the view orientation. In particular, with reference to
The omnidirectional channel for W may also be factored in:
Using this process, the position of all audio sources may be computed in room coordinates. Unlike the image data, where only those objects in the field of view are displayed, the full spherical sound spatialization field is produced from the loudspeakers, even for objects not appearing on the display.
Initially, where the image is displayed at the reference vector, the values for X′, Y′, Z′ and W′ will simply be the same as the B-format data values X, Y, Z and W for a given audio source. However, as explained below, as the image view is adjusted, the above matrix transformation will map the sound spatialization field to the adjusted view. Further detail with regard to mapping multiple audio sources in the sound spatialization field to view angle of the image is disclosed in U.S. Pat. No. 6,259,795, previously incorporated by reference above. A known software application applying an orientation matrix to re-orient a sound spatialization field is also commercially available under the brand name Rapture 3D from Blue Ripple Sound Limited, London, UK.
As noted above, the image data may be formed into cylindrical view of the panorama. In such embodiments, it is conceivable that the viewing angle only change with respect to rotation about the z-axis, with the displayed images remaining fixed with respect to rotation about the x- and y-axes. In such embodiments, the matrix transformation would alter only the z-axis orientation of the sound spatialization field, with the orientation of the field about the x- and y-axes remaining fixed. Rotation of the image about two axes or full three axes is also contemplated.
Referring now to step 230, the computing device 108 next ensures time synchronization between the image data and audio data. Further details of a suitable synchronization operation of step 230 are disclosed in applicant's co-pending U.S. patent application Ser. No. 12/772,802, entitled “Heterogeneous Image Sensor Synchronization,” filed May 3, 2010, which application is incorporated herein by reference in its entirety. However, as noted above, the video and corresponding audio were both time stamped when created. These time stamps may be used to ensure synchronous playback of the audio and video. Additionally, known gunlock and other audio/video synchronization techniques may be used.
In step 232, the current image frame is displayed to the user 116 on display 118. In embodiments, the display may be a television. However, in further embodiments, the display may be a head mounted display where the image of the real world is occluded and the user sees only the displayed image.
In step 234, the properly transformed, mapped and synchronized audio signal is converted to an output signal for the loudspeakers 110 to recreate the sound spatialization field around the user. In particular, as is known, the X′, Y′, Z′ and W′ components of the rotated B-format for each audio source data may be processed through one or more filtering elements of a formatting engine 132. As is known, these filtering elements may comprise a finite impulse response filter of length between 1 and 4 ms., though other filters may be used and for other lengths of time. The filtered outputs may then be summed together, converted from digital to analog signals by a D/A converter, and output to the loudspeakers 110. The conversion operation of step 234 is a known operation. Further details of step 234 are provided for example in U.S. Pat. No. 6,021,206, entitled “Methods and Apparatus for Processing Spatialised Audio,” issued Feb. 1, 2000, which patent is incorporated by reference herein in its entirety.
In step 238, the computing device 108 looks to whether the user has moved the controller 112. As indicated above, the controller has systems such as a three-axis accelerometer to determine when movement has occurred. If no movement is detected, the next image frame of data is retrieved from memory in step 242 and the system returns to step 224 to format the audio for that new frame as described above. If there is no movement of the controller, the computing device 108 will continue to process and provide video of the panorama from the same view angle, together with the mapped sound spatialization field.
On the other hand, if movement of the controller 112 is detected in step 238, the change in position of the controller about the x (pitch), y (roll) and/or z (yaw) axes is determined by the controller 112 and/or computing device 108. Movement of the controller 112 forward/back, side-to-side and up and down may also be tracked to affect a corresponding change in the view angle and sound spatialization fields. Systems are known for tracking the movement of the controller in six degrees of freedom, such as for example those available from Polhemus, Colchester, Vt., USA.
In step 250, once the change in position of the controller is determined, a corresponding change in the viewing angle of the image on the display 118 is affected. The process by which the image is changed upon controller movement to change the viewing angle to a new area of the panorama is known. However, in general, a rotation of the controller 112 will affect a rotation of the image about the z-axis. This will allow the user to pan around 360° of the panoramic image over time.
In embodiments, the sound spatialization field is mapped to adjusted orientation of the image data. In further embodiments, the sound spatialization field may be mapped to the orientation of the controller 112. In such embodiments, the pitch (x-axis), roll (y-axis) and yaw (z-axis) orientation of the controller 112 may be used as inputs to the orientation matrix OM, and the sound spatialization field adjusted accordingly upon a change in position of the controller.
Rotation of the controller about the x-axis may move the displayed image up or down on the display. And rotation of the controller about the y-axis may rotate the displayed image away from horizontal. As noted above, the system may alternatively ignore rotation of the controller about the x-axis and/or y-axis. In some embodiments, the system may only be sensitive to rotations of the image about the z-axis to pan the image left and right. Once the new view angle in the x, y and z orientations is determined in step 250, the next frame of image data at that view angle is retrieved from memory in step 254.
The flow then returns to step 224 to format the audio data for the current view angle. The ambisonic B-format data may be obtained as described above in step 224, and the transformation matrix may be applied to the B-format data as described above in step 228. As the image data has now been rotated about the x, y and/or z axes, the sound spatialization field will also undergo a corresponding rotation about the x, y and/or z axes so that the sound spatialization field remains mapped to the image data.
As one example,
In the embodiments described above, the sound spatialization field, mapped to the image data, is recreated around the user 116 via loudspeakers 110. Though ambisonics or some other stereophonic or surround sound technology, the loudspeakers are able to create the impression of sound sources within the space around the user which were captured by the microphone array 102 around the captured panorama. In a further embodiment shown in
Computer 610 typically includes a variety of computer readable media. Computer readable media can be any available media that can be accessed by computer 610 and includes both volatile and nonvolatile media, removable and non-removable media. By way of example, and not limitation, computer readable media may comprise computer storage media and communication media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computer 610. Communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media. The term “modulated data signal” means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media. Combinations of any of the above are also included within the scope of computer readable media.
The system memory 630 includes computer storage media in the form of volatile and/or nonvolatile memory such as read only memory (ROM) 631 and random access memory (RAM) 632. A basic input/output system 633 (BIOS), containing the basic routines that help to transfer information between elements within computer 610, such as during start-up, is typically stored in ROM 631. RAM 632 typically contains data and/or program modules that are immediately accessible to and/or presently being operated on by processing unit 620. By way of example, and not limitation,
The computer 610 may also include other removable/non-removable, volatile/nonvolatile computer storage media. By way of example only,
The drives and their associated computer storage media discussed above and illustrated in
The computer 610 may operate in a networked environment using logical connections to one or more remote computers, such as a remote computer 680. The remote computer 680 may be a personal computer, a server, a router, a network PC, a peer device or other common network node, and typically includes many or all of the elements described above relative to the computer 610, although only a memory storage device 681 has been illustrated in
When used in a LAN networking environment, the computer 610 is connected to the LAN 671 through a network interface or adapter 670. When used in a WAN networking environment, the computer 610 typically includes a modem 672 or other means for establishing communications over the WAN 673, such as the Internet. The modem 672, which may be internal or external, may be connected to the system bus 621 via the user input interface 660, or other appropriate mechanism. In a networked environment, program modules depicted relative to the computer 610, or portions thereof, may be stored in the remote memory storage device. By way of example, and not limitation,
The foregoing detailed description of the inventive system has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the inventive system to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the inventive system and its practical application to thereby enable others skilled in the art to best utilize the inventive system in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the inventive system be defined by the claims appended hereto.
Claims
1. A method of mapping audio data of a real person, place and/or thing to image data of a panorama including the real person, place and/or thing, comprising:
- (a) processing the image data of the panorama including the real person, place and/or thing to show the image data from a selected viewing angle; and
- (b) processing audio data of the real person, place and/or thing to map a sound spatialization field of the audio data to align with the selected viewing angle of the image data.
2. The method of claim 1, wherein the mapping of said step (b) maps the sound spatialization field to the image data in three dimensions about three orthogonal axes.
3. The method of claim 1, wherein the mapping of said step (b) maps the sound spatialization field to the image data in a single, horizontal dimension.
4. The method of claim 1, wherein the mapping of said step (b) maps the sound spatialization field to the image data around 360° of one or more orthogonal axes.
5. The method of claim 1, wherein the processing of said step (b) is performed on ambisonic B-format data representing the sound spatialization field.
6. The method of claim 5, wherein the ambisonic B-format data is processed by transforming the B-format data using a computed orientation matrix receiving pitch, yaw and roll data from a viewing angle of the image data relative to a reference position.
7. The method of claim 1, further comprising the step of time synchronizing the processed audio data to the processed image data.
8. The method of claim 1, said step (b) of processing the audio data comprising the step of processing the audio data to recreate the sound spatialization field via loudspeakers surrounding the user.
9. The method of claim 1, said step (b) of processing the audio data comprising the step of processing the audio data to recreate the sound spatialization field via binaural sound transmission.
10. A system for presenting panoramic image data and associated audio data from a user-selected perspective, the image and audio data captured from a real person, place and/or thing, the system comprising:
- a display for displaying images from the panoramic image data;
- an audio transmitter for providing audio associated with the panoramic image data;
- a controller for varying of the image data displayed on the display; and
- a computing device for mapping a sound spatialization field to the panoramic image data so that the audio transmitted by the audio source matches an image displayed by the display.
11. The system of claim 10, wherein the audio transmitter is one of a plurality of loudspeakers and a binaural source sound transmission system worn by the user.
12. The system of claim 10, wherein computing device performs the mapping based on a determined view of the image data relative to a reference position of the image data.
13. The system of claim 10, wherein rotation of the controller about one or more of three orthogonal axes results in rotation of an image presented by the image data about one or more of the three orthogonal axes.
14. The system of claim 13, wherein computing device performs the mapping based on a determined orientation of the controller relative to one or more of the three orthogonal axes.
15. The system of claim 10, wherein the display is one of a television and a head mounted display.
16. The system of claim 10, wherein the audio data is processed in four channels according to the ambisonic standard.
17. A computer-readable storage medium for programming a processor to perform a method of mapping audio data of a real person, place and/or thing to image data of a panorama including the real person, place and/or thing, comprising:
- (a) displaying a first image generated from the image data of a first portion of the panorama including the real person, place or thing;
- (b) playing audio data to recreate a sound spatialization field aligned in three-dimensions with the image data of the real person place or thing;
- (c) receiving an indication to change a viewing angle of the image displayed in said step (a);
- (d) processing the image data to rotate an image displayed about one or more orthogonal axes;
- (e) displaying a second image generated from the image data of a second portion of the panorama including the real person, place or thing based on processing the image data in said step (d);
- (f) processing the audio data to rotate the sound spatialization field about the one or more orthogonal axes to the same extent the image was rotated in said step (d); and
- (g) playing the audio data to recreate the sound spatialization field processed in said step (f).
18. The computer-readable storage medium of claim 17, wherein said step (d) rotates the image about a horizontal axis in response to the indication in said step (c), the sound spatialization field rotating about the single horizontal axis to the same degree.
19. The computer-readable storage medium of claim 18, wherein said step (d) rotates the image 360° about the horizontal axis in response to the indication in said step.
20. The computer-readable storage medium of claim 19, wherein said steps (a) and (d) display stereoscopic images of the panorama.
Type: Application
Filed: Dec 22, 2010
Publication Date: Jun 28, 2012
Applicant: MICROSOFT CORPORATION (Redmond, WA)
Inventors: Alex Garden (Bellevue, WA), Ben Vaught (Seattle, WA), Michael Rondinelli (Canonsburg, PA)
Application Number: 12/976,823
International Classification: H04N 13/00 (20060101); H04N 9/475 (20060101);