SYSTEMS AND METHODS TO FACILITATE INTERACTIONS WITH VIRTUAL CONTENT

According to some embodiments, a graphics platform may receive a video signal, including an image of a person, from a video camera. The graphics platform may then add a virtual object to the video signal to create a viewer or broadcast signal. 3D information associated with a spatial relationship between the person and the virtual object is determined. The graphics platform may then create a supplemental signal based on the 3D information, wherein the supplemental signal includes sufficient information to enable the person to interact with the virtual object as if such object was ‘seen’ or sensed from the person's perspective. The supplemental signal may comprise video, audio and/or pressure all as necessary to enable the person to interact with the virtual object as if he/she were physically present with the virtual object.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History

Description

CROSS-REFERENCE TO RELATED APPLICATIONS

This patent application claim the benefit of U.S. Provisional Patent Application No. 61/440,675 entitled “Interaction with Content Through Human Computer Interface” and filed on Feb. 8, 2011. The entire contents of that application are hereby incorporated by reference.

FIELD

The present invention relates to systems and methods to provide video signals that include both a person and a virtual object. Some embodiments relate to systems and methods to efficiently and dynamically generate a supplemental video signal to be displayed for the person.

BACKGROUND

An audio, visual or audio-visual program (e.g., a television broadcast) may include virtual content (e.g., computer generated, holographic, etc.). For example, a sports anchorperson might be seen (from the vantage point of the ‘audience’) evaluating the batting stance of a computer generated baseball player that is not physically present in the studio. Moreover, in some cases, the person may interact with virtual content (e.g., by walking around and pointing to various portions of the baseball player's body). It can be difficult, however, for the person to accurately and naturally interact with the virtual content that he or she cannot actually see. This may occur whether or not the studio anchorperson is actually in the final cut of the scene as broadcast to the ‘audience.’ In some cases, a monitor in the studio might display the blended broadcast image (that is, including both the person and the virtual content). With this approach, however, the person may keep glancing at the monitor to determine if he or she is standing the right area and/or is looking in the right direction. An anchorperson's difficulty in determining where or how to interact with the virtual image can be distracting to viewers of the broadcast and detracting to the quality of the anchorperson's overall interaction, making the entire scene, including the virtual content look less believable, let alone difficult to produce.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is an illustration of a video system.

FIG. 2 provides examples of images associated with a scene.

FIG. 3 is an illustration of a video system in accordance with some embodiments.

FIG. 4 provides examples of images associated with a scene according to some embodiments.

FIGS. 5A and 5B are flow charts of methods in accordance with some embodiments of the present invention.

FIG. 6 is block diagram of a system that may be provided in accordance with some embodiments.

FIG. 7 is a block diagram of a graphics platform in accordance with some embodiments of the present invention.

FIG. 8 is a tabular representation of a portion of data representing a virtual object and 3D information about a person, such as his or her position and/or orientation in accordance with some embodiments of the present invention.

DETAILED DESCRIPTION

Applicants have recognized that there is a need for methods, systems, apparatus, means and computer program products to efficiently and dynamically facilitate interactions between a person and virtual content. For example, FIG. 1 illustrates a system 100 wherein a set or scene 110 includes a person 120 and a virtual object 130. That is, the virtual object 130 is not actually physically present within the scene 110, but the image of the virtual object will be added either simultaneously or later (e.g., by a graphics rendering engine). A video camera 140 may be pointed at the scene 110 to generate a video signal provided to a graphics platform 150. By way of example, FIG. 2 illustrates an image 210 associated with such a video signal. Note that the image 210 generated by the camera 140 includes an image of a person 220 (e.g., a news anchorperson) but not a virtual object. The person or the virtual image do not necessarily need both to be in the final scene as presented to the audience. The invention solves the problem of allowing the person to relate to the virtual image, irrespective of whether both the person and the image are both presented finally to the viewing audience.

Referring again to FIG. 1, the graphics platform 150 may receive information about the virtual object 130, such as the object's location, pose, motion, appearance, audio, color, etc. The graphics platform 150 may use this information to create a viewer signal such as a broadcast signal or other signal to be output to a viewer, whether recorded or not, that includes images of both the person 120 and the virtual object 130, or only one or the other of the images. For example, FIG. 2 illustrates an image 212 associated with such a viewer signal. For example, but without limitation, the image 212 output by the graphics platform 150 includes images of both a person 222 and a virtual object 232 (e.g., a dragon).

Referring again to FIG. 1, it may be desirable for the person 120 to appear to interact with the virtual object 130. For example, the person 120 might want to appear to maintain eye contact with the virtual object 130 (e.g., along a line of sight 225 illustrated in FIG. 2). This can be difficult, however, because the person 120 cannot see the virtual object 130. In some cases, a monitor 160 might be provided with a display 162 so that the person 120 can view the broadcast or viewer signal. In this way, the person 120 can periodically glance at the display to determine if he or she is in the relatively correct position and/or orientation with respect to the virtual object 130. Such an approach, however, can be distracting for both the person 120 and viewers (who may wonder why the person keeps looking away).

To efficiently and dynamically facilitate interactions between a person and virtual content, FIG. 3 illustrates a system 300 according to some embodiments. As before, a set or scene 310 includes a person 320 and a virtual object 330, and a video camera 340 may be pointed at the scene 310 to generate a video and audio signal provided to a graphics platform 350. According to some embodiments, other types of sensory signals, such as an audio, thermal, or haptic signal (e.g., created by the graphics platform) could be used to signal the position or location of a virtual image in relation to the person (e.g., a “beep” might indicate when an anchorperson's hand is touching the “bat” of a virtual batter). According to some embodiments, audio may replace the “supplemental video” but note that the video camera may generate the “viewer video” that, besides being served to the audience, is also used to model the person's pose, appearance and/or location in the studio. Further note that some or all of this modeling may be done by other sensors.

The graphics platform 350 may, according to some embodiments, execute a rendering application, such as the Brainstorm eStudio® three dimensional real-time graphics software package. Note that the graphics platform 350 could be implemented using a Personal Computer (PC) running a Windows® Operating System (“OS”) or an Apple® computing platform, or a cloud-based program (e.g., Google® Chrome®). The graphics platform 350 may use information about the virtual object 330 (e.g., the object's location, motion, appearance, etc.) to create a broadcast or viewer signal that includes images of both the person 320 and the virtual object 330. For example, FIG. 4 illustrates an image 412 that includes images of both a person 422 and a virtual object 432.

Referring again to FIG. 3, to facilitate an appearance of interactions between the person 320 and the virtual object 330, 3D information about the person may be provided to the graphics platform 350 through the processing of data captured by the video camera 340 and/or various other sensors. As used herein, the phrase “3D information” might include location or position information, body pose, line of sight direction, etc. The graphics platform 350 may then use this information to generate a supplemental video signal to be provided to a display 360 associated with the person 320. For example, a Head Mounted Video Display (HMVD) may be used to display the supplemental video signal. In particular, the supplemental video signal may include an image generated by the graphics platform 350 that includes a view of the virtual object 330 as it would be seen from the person's perspective. That is, the graphics platform 350 may render a supplemental video feed in substantially real-time based on a spatial relationship between the person 320 and a virtual object 330. Note that, as used herein, the phrase “graphics platform” may refer to any device (or set of devices) that can perform the functions of the various embodiments described herein. Alternatively, or in addition, the graphics platform 350 may send an audio signal that indicates a situation where the person is within a certain distance to a virtual object or the relative direction of the virtual object to the person, for example. Note that a similar effect might be created using an audio or pressure signal or other type of signal (e.g., thermal) to indicate positioning.

FIG. 4 illustrates that the person 422 may be wearing a display 462 wherein an image of the supplemental video signal is projected onto lenses worn by the person 422. Moreover, as illustrated by the supplemental image 414, the supplemental video signal may include an image 474 of the virtual object as it would appear from the person's point of view. As illustrated in FIG. 4, the image 474 of the virtual object may comprise a skeleton or Computer Aided-Design (“CAD”) vector representation view of the virtual object. According to other embodiment, a high definition version might be provided instead. Note that a background behind the image 474 may or may not be seen in the supplemental image 414. In either case, viewing any representation of the virtual object from the performing person's perspective may allow the person 422 to more realistically interact with the virtual object 432 (e.g., the person 422 may know how to position himself or herself or gesture relative to the virtual object, or where to look in order appear to maintain eye contact along a line of sight 425. According to some embodiments, the actual line of sight of the person 422 may be determined, such as by using retina detectors incorporated into the display 462 (to take into account that the person 422 can look around without turning his or her head). According to other embodiments, a view could be controlled using a joystick. For example, when a joystick in a default or home position the viewpoint might represent a normal line of site, and when the joystick is moved or engaged the viewpoint might be adjusted such that it is away or from the normal position.

FIG. 5A illustrates a method that might be performed, for example, by some or all of the elements described herein. The flow charts described herein do not imply a fixed order to the steps, and embodiments of the present invention may be practiced in any order that is practicable. Note that any of the methods described herein may be performed by hardware, software, or any combination of these approaches. For example, a computer-readable storage medium may store thereon instructions that when executed by a machine result in performance according to any of the embodiments described herein.

At 502, 3D information about a virtual object is received at a graphics platform. For example, a location and dimensions of the virtual object may be determined by the graphics platform. At 504, 3D information associated with a person in a scene may be determined. The 3D information associated with the person might include the person's location, orientation, line of sight, pose, etc. and may be received, for example, from a video camera and/or one or more RTLS sensors using technologies such as RFID, infrared, and Ultra-wideband. . At 504, the graphics platform may create: (i) “a viewer signal” (possibly a video and/or audio signal) of the scene in relation to the person (whether or not actually including the person). For example, a viewer signal may include the virtual element and an animated figure of the person, and (ii) a supplemental signal of the scene (e.g. a video and/or audio signal), wherein the video signal and the supplemental signal are from different perspectives based at least in part on the 3D information. For example, the viewer signal might represent the scene from the point of view of a video camera filming the scene while the supplemental video signal represents the scene from the person's point of view. According to some embodiments, the supplemental video signal is displayed (or transmitted, e.g., via audio) to person to help him or her interact with the virtual object. In an embodiment of this invention the performing person may be a robot.

FIG. 5B is a flow chart of a method that may be performed in accordance with some embodiments described herein. At 512, a video signal including an image of a person may be received, and a virtual object may be added to the video signal to create a viewer signal at 514. The virtual object may be any type of virtual image, including for example, a Computer Generated Image (“CGI”) object, including a virtual human and/or a video game character, object or sound.

At 516, location information associated with a spatial relationship between the person and the virtual object may be determined. According to some embodiments, the location information may be determined by sensors or by analyzing the video signal from the camera. Moreover, a plurality of video signals might be received and analyzed by a graphics platform to model the person appearance and to determine a three dimensional location of the person. Other types or location information may include a distance between the person and virtual object, one or more angles associated with the person and virtual object, and/or an orientation of the person (e.g., where he or she is currently looking). Note that other types of RTLS sensor (e.g., using sound waves or any other way of measuring distance).

At 518, a supplemental signal may be created based on the location information. In a particular, the supplemental signal may include a view of the virtual object or a perspective of the virtual object as would be seen or perceived from the person's perspective. The perception of the virtual object might comprise a marker (e.g., a dot or “x” indicating where a person should look, or a sound when a person looks at the right direction), a lower resolution image as compared with the viewer signal, updated with a lower frame rate image as compared with the viewer signal, and/or include a dynamically generated occlusion zone. According to some embodiments, the supplemental signal is further based on an orientation of the person's line of sight (e.g., the supplemental video signal may be updated when a person turns his or her head). Moreover, multiple people and/or virtual objects may be involved in the scene and/or included in the supplemental signal. In this case, a supplemental signal may be created for each person, and each supplemental signal would include a view or perception of the virtual objects as would be seen or perceived from that person's perspective.

The supplemental signal may then be transmitted to a secondary device (e.g., a display device). According to some embodiments, the display device may be worn by the person, such as an eyeglasses display, a retinal display, and/or a contact lens display, or a hearing aid (for rending sound information). Moreover, according to some embodiments, the supplemental signal is wirelessly transmitted to the secondary device, hence, having the supplemental signal and its display to the performing person may be almost transparent to a viewer of the final broadcast.

Moreover, according to some embodiments, a command from the person may be detected and, responsive to said detection, the virtual object may be adjusted. Such a command might comprise, for example, an audible command and/or a body gesture command. For example, a graphics platform might detect that the person has “grabbed” a virtual object and then move the image of the virtual object as the person moves his or her hands. As another example, a person may gesture or verbally order that the motion of a virtual object be paused and/or modified. As another example, a guest or another third person (or group of persons), without access to the devices enabling perception of the virtual image, may gesture or verbally order motion of the virtual object, causing the virtual object to move (and for such movement to be perceived from the perspective of the original person wearing the detection device). For example, when an audience claps or laughs, the sound might cause the virtual object to take a bow, which the person may then be able to perceive via information provided in the supplemental feed

As used herein, the phrases “video feed” and “image” may refer to any signal conveying information about a moving or still image, including audio signals and including a High Definition-Serial Data Interface (“HD-SDI”) signal transmitted in accordance with the Society of Motion Picture and Television Engineers 292M standard. Although HD signals may be described in some examples presented herein, note that embodiments may be associated with any other type of video feed, including a standard broadcast feed and/or a three dimensional image feed. Moreover, video feeds and/or received images might comprise, for example, an HD-SDI signal exchanged through a fiber cable and/or a satellite transmission. Moreover, the video cameras described herein may be any device capable of generating a video feed, such as a Sony® studio (or outside) broadcast camera.

Thus, system and methods may be provided to improve the production of video presentation involving augmented reality technology. Specifically, some embodiments may produce an improved immersive video mixing subjects and a virtual environment. This might be achieved, for example, by reconstructing the subject's video and/or presenting the subject with a “subject-view” of the virtual environment. This may facilitate interactions between subjects and the virtual elements and, according to some embodiments, let a subject alter a progression and/or appearance of virtual imagery through gestures or audible sounds.

Augmented reality may fuse real scene video with computer generated imagery. In such a fusion, the virtual environment may be rendered from the perspective of a camera or other device that is used to capture the real scene video (or audio). Hence, knowledge of the camera's parameters may be required along with distances of real and virtual objects relative to the camera to resolve occlusion. For example, the image of part of a virtual element may be occluded by the image of a physical element in the scene or vice versa. Another aspect of enhancing video presentation through augmented reality is handling the interaction between the real and the virtual elements.

For example, a sports anchorperson may analyze maneuvers during a game or play segment. In preparation for a show, a producer might request a graphical presentation of a certain play in a game that the anchor wants to analyze. This virtual playbook might comprise a code module that, when executed on a three dimensional rendering engine, may generate a three dimensional rendering of the play. The synthesized play may then be projected from the perspective of the studio camera. To analyze the play, the anchor's video image may be rendered so that he or she appears standing on the court (while actually remaining in a studio) among the virtual players. He or she may then deliver the analysis while virtually engaging with the players. To position himself or herself relative to the virtual players, the anchor typically looks at a camera screen and rehearses the movements beforehand. Even then, it may be a challenge to make the interaction between a real person and a virtual person look natural.

Thus, when one or more persons interact with virtual content they may occasionally shift their focus to a video feed of the broadcast signal. This may create two problems. First, a person may appear to program viewers as unfocused because his or her gaze is directed slightly off from the camera shooting the program. Second, the person might not easily interact with the virtual elements, or move through or around a group of virtual elements (whether such virtual elements are static or dynamic). A person who appears somewhat disconnected from the virtual content may undermine the immersive effect of the show. Also note that interactions may be laborious from a production standpoint (requiring several re-takes and re-shoots when the person looks away from the camera, misses a line due to interacting incorrectly with virtual elements, etc.).

To improve interactions with virtual content, FIG. 6 is block diagram of a system 600 that may be provided in accordance with some embodiments. The system 600 creates an augmented reality environment using a camera 640 to capture a video sequence of a real-world scene 610 including a person 620 generates a graphical sequence that includes a virtual object 630. For example, in a studio, the broadcast camera 640 may record the person 620 (possibly using a “green screen” in the background) or on location “in the field.” A “virtual camera” (the perspective used by a graphic engine to render a virtual environment) may be aligned with the broadcast camera 640 so that the rendered environment matches the person's scale, movements, etc. Typically, the broadcast camera's perspective (including position, roll, pan, tilt, and focal-length) is extracted using sensors mounted on the broadcast camera 640 or by analyzing the video frames received from the broadcast camera 640. According to some embodiments described herein, capabilities to improve the accuracy and realism of the mixed real and virtual production may be provided by the system 600. Note that a person 620 who cannot “see” the virtual object 630 he or she interacts with will have less natural and accurate interactions with the virtual content. The system 600 disclosed herein may improve and extend the interaction between the person 620 and the virtual object 630 and allow the person 620 to spatially and temporally affect the rendering of the virtual object 630 during the production.

According to some embodiments, the video 640 of the person 620 may be altered to refine his or her pose (and/or possibly appearance) before mixing it with the virtual environment. This may be done by determining the person's three dimensional model including obtaining a three dimensional surface and skeleton representation (for example, based on an analysis of videos from multiple views) so that the image of the person 620 at a certain location and pose in the scene may be altered in relation to the virtual object. According to some embodiments, the person 620 may be equipped with a HMVD 660 (e.g., three dimensional glasses, virtual retinal displays, etc.) through which he or she can view the virtual environment, including the virtual object 630 from his or her perspective. That is, the virtual object 630 may be displayed to the person from his or her own perspective in a way that enhances the person's ability to navigate through the virtual world and to interact with the content without overly complicating the production workflow.

According to some embodiments, a 3D model of the person 620 (including his or her location, pose, surface, and texture and color characteristics) may be obtained through an analysis of the broadcast camera 640 and potentially auxiliary cameras and/or sensors 642 (attached to the person or external to the person). Once a 3D model of the person 620 is obtained, the image of the person 620 may be reconstructed into a new image that shows the person with a new pose and/or appearance relative to the virtual elements. According to some embodiments, the “viewer video” may be served to the audience. In addition, according to some embodiments, the person's location and pose may be submitted to a three dimensional graphic engine 680 to render a supplemental virtual environment view (e.g., a second virtual camera view) from the perspective of the person 620. This second virtual camera view is presented to the person 620 through the HMVD 660 or any other display device. In one embodiment, the second virtual camera view may be presented in a semi-transparent manner, so the person 620 can still see his or her surrounding real-world environment (e.g., studio, cameras, another person 622, etc.). In yet another embodiment, the second camera view might be presented to the person on a separate screen at the studio. Such an approach may eliminate the need to wear a visible display such as HMVD 660, but will require the person 620 to look at a monitor instead of directly at the virtual object 630.

To improve the interaction among real and virtual objects, some embodiments use computer-vision techniques to recognize and track the people 620, 622 in the scene 610. A three dimensional model of an anchor may be estimated and used to reconstruct his or her image at different orientations and poses relative to the virtual players or objects 630, 632. The anchor (or any object relative to the anchor) may be reconstructed, according to some embodiments, at different relative sizes, locations, or appearances. Three dimensional reconstruction of objects may be done, for example, based on an analysis of video sequences from which three dimensional information of static and dynamic objects was extracted.

With two or more camera views, according to some embodiments, an object's or person's structure and characteristics might be modeled. For example, based on stereoscopic matching of corresponding pixels from two or more views of a physical object, the cameras' parameters (pose) may be estimated. Note that knowledge of the cameras' poses in turn may provide for each object's pixel the corresponding real-world-coordinates. As a result, when fusing the image of a physical object with a virtual content (e.g., computer-generated imagery) the physical object's position in the real-world-coordinates may be considered relative to the virtual content to resolve problems of overlap and order (e.g., occlusion issues).

According to some embodiments, an order among physical and graphical elements may be facilitating using a depth map. A depth map of a video image may provide the distance between a point in the scene 610 (projected in the image) and the camera 640. Hence, a depth map may be used to determine what part of the image of a physical element should be rendered into the computer generated image, for example, and what part is occluded by a virtual element (and therefore should not be rendered). According to some embodiments, this information may be encoded in a binary occlusion mask. For example, mask pixel set to “1” might indicates that a physical element's image pixel should be keyed-in (i.e., rendered) while “0” indicates that it should not be keyed-in. A depth map may be generated, according to some embodiments, either by processing the video sequences of multiple views of the scene or by a three dimensional cameras such as a Light Detection And Ranging (“LIDAR”) camera. A LIDAR camera may be associated with an optical remote sensing technology that measures the distance to, or other properties, of a target by illuminating the target with light (e.g., using laser pulses). A LIDAR camera may use ultraviolet, visible, or near infrared light to locate and image objects based on the reflected time of flight. This information may then be used in connection with any of the embodiments described herein. Other technologies utilizing RF, infrared, and Ultra-wideband signals may be used to measure relative distances of objects in the scene. Note that a similar effect might be achieved using sound waves to determine an anchorperson's location.

Note that a covered scene might include one or more persons (physical objects) 620, 622 that perform relative to virtual objects or elements 630, 632. The person 620 may have a general idea as to the whereabouts and motion of these virtual elements 630, 632, although he or she cannot “see” them in real life. Capturing the scene may be a broadcast (main) camera 640. A control-system 660 may drive the broadcast camera 640 automatically or via an operator. In addition to the main camera 640, there may be any number of additional cameras and/or sensors 642 that are positioned at the scene 610 to capture video or any other telemetry (e.g. RF, UWB, audio, etc.) measuring the appearance and structure of the scene 610.

The control-system 660, operated either automatically or by an operator, may manage the production process. For instance, a game (e.g., a sequence of computer-generated imagery data) including a court/field with playing athletes (e.g., the virtual objects 630, 632) may be selected from a CGI database 670. A camera perspective may then be determined and submitted to the broadcast camera 640 as well as to a three dimensional graphic engine 680. The graphic engine 680 may, according to some embodiments, receive the broadcast camera's model directly from camera-mounted sensors. According to other embodiments, vision-based methods may be utilized to estimate the broadcast camera's model. The three dimensional graphic engine 680 may render the game from the same camera perspective as the broadcast camera 640. Next, the virtual render of the game may be fused with video capture of the people 620, 622 in the scene 110 to show all of the elements in an immersive fashion. According to some embodiments, a “person” may be able to know where he or she is in relation to the virtual object without having to disengage from the scene itself. In the event of a “horror” movie, for instance, the person may never have to see the virtual object in order to react logically to its eerie presence, the totality of which is being transmitted to the viewing audience. Note that this interaction may be conveyed to the audience, such as by merging the virtual and physical into the “viewer video.”

Based on analyses of the video, data, and/or audio streams and the telemetry signals fed to the video processor unit 650, various information details may be derived, such as: (i) the image foreground region of the physical elements (or persons), (ii) three dimensional modeling and characteristics, and/or (iii) real world locations. Relative to the virtual elements' presence in the scene 610, as defined by the game (e.g., in the CGI database 670), the physical elements may be reconstructed. Moreover, the pose and appearance of each physical element may be reconstructed resulting in a new video or rendition (replacing the main camera 640 video) in which the new pose and appearance is in a more appropriate relation to the virtual objects 630, 632. The video processor 650 may also generate an occlusion mask that, together with the video, may be fed into a mixer 690, where fusion of the video and the computer-generated-imagery takes place.

In some embodiments, the person 620 interacting with one or more virtual objects 630, 632, uses, for example, an HMVD 660 to “see” or perceive these virtual elements from his or her vantage point. Moreover, the person 620 (or other persons) may be able, using gestures or voice, to affect the virtual object's motion. In some embodiments, a head-mounted device for tracking the person's gaze may be used as means for interaction.

The video processor 650, where the person's location and pose in the real world coordinates are computed, may send the calculated person's perspective or an altered person's perspective to the three dimensional graphic engine 680. The graphic engine 680, in turn, may render the virtual elements and/or environment from the received person's perspective (vantage point) and send this computer generated imagery, wirelessly, to the person's HMVD 660. The person's gesture and/or voice may be measured by the plurality of cameras and sensors 642, and may be recognized and translated by the video processor 650 to be interpreted as a command. These commands may be used to alter the progression and appearance of the virtual play (e.g., pause, slow-down, replay, any special effect, etc.).

Thus, based on the person's location and movements, a version of the virtual image may be transmitted separately to the person 620. Note that this version can be simple, such as CAD-like drawings or an audio “beep,” or more complex, such as the entire look and feel of the virtual image used for the broadcast program. As the person 620 moves, or as the virtual objects 630, 632 move around the person 620, the person 620 may see the virtual image from his or her own perspective, and the virtual image presented as part of the programming may change. The person 620 may interact with the virtual objects 630, 632 (to make them appear, disappear, move, change, multiply, shrink, grow, etc.) through gestures, voice, or any other means. This image may then be transmitted to the person's “normal” eyeglasses (or contact lenses) through which the image is beamed to the person's retina (e.g., a virtual retinal displays) or projected on the eyeglasses' lenses. A similar effect could be obtained using hearing devices (e.g., where a certain sound is transmitted as the person interacts with the virtual object).

Note that some embodiments may be applied to facilitate interaction between two or more persons captured by two or more different cameras from different locations (and, possibly, different times). For example, an interviewer at the studio, with the help of an HMVD may “see” the video reconstruction of an interviewee. This video reconstruction may be from the perspective of the interviewer. Similarly, the interviewee may be able to “see” the interviewer from his or her perspective. Such a capability may facilitate a more realistic interaction between the two people.

FIG. 7 is a block diagram of a graphics platform 700 that might be associated with, for example, the system 300 of FIG. 3 and/or the system 600 of FIG. 6 in accordance with some embodiments of the present invention. The graphics platform 700 comprises a processor 710, such as one or more INTEL® Pentium® processors, coupled to communication devices 720 configured to communicate with remote devices (not shown in FIG. 7). The communication devices 720 may be used, for example, to receive a video feed, information about a virtual object, and/or location information about a person.

The processor 710 is also in communication with an input device 740. The input device 740 may comprise, for example, a keyboard, a mouse, computer media reader, or even a system such as that described by this invention. Such an input device 740 may be used, for example, to enter information about a virtual object, a background, or remote and/or studio camera set-ups. The processor 710 is also in communication with an output device 750. The output device 750 may comprise, for example, a display screen or printer or audio speaker. Such an output device 750 may be used, for example, to provide information about a camera set-up to an operator.

The processor 710 is also in communication with a storage device 730. The storage device 730 may comprise any appropriate information storage device, including combinations of magnetic storage devices (e.g., hard disk drives), optical storage devices, and/or semiconductor memory devices such as Random Access Memory (RAM) devices and Read Only Memory (ROM) devices.

The storage device 730 stores a graphics platform application 735 for controlling the processor 710. The processor 710 performs instructions of the application 735, and thereby operates in accordance any embodiments of the present invention described herein. For example, the processor 710 may receive a scene signal, whether or not including an image of a person, from a video camera. The processor 710 may insert a virtual object into the scene signal to create a viewer signal, such the viewer signal perspective is from a view of the virtual object as would be seen from the video camera's perspective. The processor 710 may also create a supplemental signal, such that the supplemental video includes information related to the view of the virtual object as would be seen from the person's perspective.

As used herein, information may be “received” by or “transmitted” to, for example: (i) the graphics platform 700 from other devices; or (ii) a software application or module within graphics platform 700 from another software application, module, or any other source.

As shown in FIG. 7, the storage device 730 also a rendering engine application 735 and virtual object and location data 800. One example of such a database 800 that may be used in connection with the graphics platform 700 will now be described in detail with respect to FIG. 8. The illustration and accompanying descriptions of the database presented herein are exemplary, and any number of other database arrangements could be employed besides those suggested by the figures.

FIG. 8 is a tabular representation of a portion of a virtual object and location data table 800 in accordance with some embodiments of the present invention. The table 800 includes entries associated with virtual objects and location information about people in a scene. The table 800 also defines fields for each of the entries. The fields might specify a virtual object or person identifier, a three dimensional location of an object or person, angular orientation information, a distance between a person and an object, occlusion information, field of view data, etc. The information in the database 800 may be periodically created and updated based on information received from a location sensor worn by a person.

Thus, embodiments described herein may use three dimensional information to adjust and/or tune the rendering of a person or object in a scene, and thereby simplify preparation of a program segment. It may let the person focus on the content delivery, knowing that his or her performance may be refined by reconstructing his or her pose, location, and, according to some embodiments, appearance relative to the virtual environment. Moreover, the person may receive a different image and/or perspective from what is provided to the viewer. This may significantly improve the person's ability to interact with the virtual content (e.g., reducing the learning curve for the person and allowing production to happen with fewer takes). In addition, to change or move a virtual element, a person may avoid interacting with an actual monitor (like a touch screen) or pre-coordinate his or her movements so as to appear as if an interaction is happening. That is, according to some embodiments described herein, a person's movements (or spoken words, etc.) can cause the virtual images to change, move, etc. Further, embodiments may reduce the use of bulky monitors, which may free up studio space and increase the portability of the operation (freeing a person to work in a variety of studio environments, including indoor and outdoor environments).

The following illustrates various additional embodiments of the invention. These do not constitute a definition of all possible embodiments, and those skilled in the art will understand that the present invention is applicable to many other embodiments. Further, although the following embodiments are briefly described for clarity, those skilled in the art will understand how to make any changes, if necessary, to the above-described apparatus and methods to accommodate these and other embodiments and applications.

Although three dimensional effects have been described in some of the examples presented herein, note that other effects might be incorporated in addition to (or instead of) three dimensional effects in accordance with the present invention. Moreover, although specific hardware and data configurations have been described herein, note that any number of other configurations may be provided in accordance with embodiments of the present invention (e.g., some of the information associated with the databases and engines described herein may be split, combined, and/or handled by external systems). Further note that embodiments may be associated with any number of different types of broadcast programs (e.g., sports, news, and weather programs).

The present invention has been described in terms of several embodiments solely for the purpose of illustration. Persons skilled in the art will recognize from this description that the invention is not limited to the embodiments described, but may be practiced with modifications and alterations limited only by the spirit and scope of the appended claims.

Claims

1. A method comprising:

receiving, at a graphics platform, 3D information about a virtual object;
determining 3D information associated with a person in a scene; and
creating, by the graphics platform: (i) a viewer signal of the scene and (ii) a supplemental signal, wherein the viewer signal and the supplemental signal are from different perspectives.

2. The method of claim 1, wherein the viewer signal is a viewer video signal generated from a perspective of a video camera filming the scene and wherein the viewer video signal includes the virtual object and a representation of the person.

3. The method of claim 1, wherein and the supplemental signal is a view of the virtual object from the perspective of the person.

4. The method of claim 1, wherein the supplemental signal is a sensory signal indicative of the location of the virtual object relative to the person.

5. The method of claim 4, wherein the sensory signal is associated with at least one of video information, audio information, pressure information, or thermal information.

6. The method of claim 1, wherein the 3D information associated with the person is received from one or more sensors.

7. The method of claim 1, wherein the 3D information associated with the person is determined by analyzing at least one of (i) video from at least one video camera or (ii) telemetry data from one or more sensors.

8. The method of claim 7, wherein a plurality of video signals are received and analyzed by the graphics platform to determine at least one of: (i) a three dimensional location of the person, (ii) a distance between the person and virtual object, (iii) one or more angles associated with the person and virtual object, or (iv) an orientation of the person.

9. The method of claim 1, wherein the view of the virtual object within the supplemental signal is associated with at least one of: (i) a marker, (ii) a lower or different resolution image as compared with the viewer signal, (iii) a lower or different frame rate image as compared with the viewer signal, or (iv) a dynamically generated occlusion zone.

10. The method of claim 9, further comprising:

transmitting the supplemental signal to a display device worn by the person.

11. The method of claim 10, wherein the display device comprises at least one of: (i) an eyeglasses display, (ii) a retinal display, (iii) a contact lens display, or (iv) a hearing aid or other in-“ear” device.

12. The method of claim 10, wherein the supplemental signal is wirelessly transmitted to the display device.

13. The method of claim 1, wherein the supplemental signal is further based on an orientation of the person's line of sight and/or body position.

14. The method of claim 1, wherein multiple people and virtual objects are able to interact, whether or not any or all such images are included in the final scene presented to the viewing audience, and further comprising:

creating a supplemental signal for each person, wherein each supplemental signal includes a view or a means for perceiving the location of the virtual objects as would be seen or perceived from that person's perspective.

15. The method of claim 1, further comprising:

detecting a command from an entity; and
responsive to said detection, adjusting the virtual object.

16. The method of claim 15, wherein the command comprises at least one of: (i) an audible command, (ii) a gesture command, or (iii) a third party or second virtual object command.

17. The method of claim 1, wherein the virtual object is associated with at least one of: (i) a virtual human, (ii) a video game, or (iii) an avatar.

18. A system, comprising:

a device to capture a feed including an image or location data of a person;
a platform to receive the feed from the device and to render a supplemental feed in substantially real-time, based on 3D information relating to the person and 3D information relating to a virtual object, wherein the supplemental feed includes sufficient information to provide the person with a means of perceiving a location of the virtual object as would be perceived from the person's perspective; and
a device worn by the person to receive and present to the person the supplemental feed.

19. The system of claim 18, wherein the platform includes at least one of: (i) a computer generated image database, (ii) a control system, (iii) a video processor, (iv) an audio processor, (v) a three dimensional graphics engine, (vi) a mixer, or (vii) a location sensor to detect a location of the person.

20. The system of claim 18, wherein the device comprises a light detection and ranging camera.

21. The system of claim 18, wherein the platform is further to render a viewer feed including a information of the virtual object as would be necessary to perceive the virtual object from a camera's perspective.

22. The system of claim 18, wherein the platform receives two feeds that include images of the human and performs stereoscopic matching of pixels within the feeds to determine a three dimensional location of the person.

23. The system of claim 18, wherein the platform uses a depth map to create a binary occlusion mask for the supplemental feed.

24. A non-transitory, computer-readable medium storing instructions adapted to be executed by a processor to perform a method, the method comprising:

receiving a signal from a camera, the signal including an image of a person;
inserting a virtual object into the signal to create a viewer signal, wherein the viewer signal includes a view of the virtual object as would be seen from the camera's perspective; and
creating a supplemental video signal, wherein the supplemental video signal includes a view of the virtual object as would be seen from the person's perspective.

25. The medium of claim 24, wherein the method further comprises:

outputting the supplemental feed to a device to be perceived by the person.

26. The medium of claim 24, wherein the method further comprises:

modeling the person based on the received video signal; and
using the model to adjust image information associated with the person in the viewer signal.

Patent History

Publication number: 20120200667
Type: Application
Filed: Nov 9, 2011
Publication Date: Aug 9, 2012
Inventors: Michael F. Gay (Burbank, CA), Frank Golding (Burbank, CA), Smadar Gefen (Burbank, CA)
Application Number: 13/292,560

Classifications

Current U.S. Class: Signal Formatting (348/43); Mixing Stereoscopic Image Signals (epo) (348/E13.063)
International Classification: H04N 13/00 (20060101);