INTERACTION PROCESSING METHOD AND APPARATUS, TERMINAL AND MEDIUM
Embodiments of this application provide a method and apparatus, a terminal and a medium. The method includes the following steps. A target terminal displays, in a video session, a video session interface including an image display region for displaying images associated with one or more users participating in the video session. The target terminal displays a target virtual image of a user of the target terminal in the image display region. The target terminal acquires movement information of the user and controls the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user. Finally, the target terminal transmits movement data of the target virtual image performing the target interaction action, to terminals of the other users of the video session to render the target virtual image to perform the target interaction action on the corresponding terminals.
This application is a continuation application of PCT Patent Application No. PCT/CN2022/088332, entitled “AN INTERACTIVE PROCESSING METHOD, DEVICE, TERMINAL AND MEDIUM” filed on Apr. 22, 2022, which claims priority to Chinese Patent Application No. 202110606182.7, filed with the State Intellectual Property Office of the People's Republic of China on May 31, 2021, and entitled “INTERACTION PROCESSING METHOD AND APPARATUS, TERMINAL AND MEDIUM”, all of which are incorporated herein by reference in their entirety.
FIELD OF THE TECHNOLOGYThis application relates to the field of computer technologies, and more particularly to an interaction processing method and apparatus, a terminal and a medium.
BACKGROUND OF THE DISCLOSUREWith rapid development of science and technology, multiple users in different places can conduct online sessions through an Internet. For example, multiple users may initiate an online video session through an Internet. Video sessions are widely used for their convenience, fastness, simplicity and the like.
In some video session scenarios, a camera acquires and transmits captured images containing real images of users, so that the real images of all the users are displayed in screens of terminals used by all the users participating in a video session. In some cases, users may be reluctant to present their real images in a video session, but may be more concerned about their privacies, so how to improve a privacy of a video session becomes a hot issue.
SUMMARYEmbodiments of this application provide an interaction processing method and apparatus, a terminal and a medium, which can replace a real image of a user with a virtual image to participate in a video session and improve data transmission efficiency while protecting user privacy.
An embodiment of this application provides a method performed by a computing device acting as a target terminal in a video session, the method:
displaying, by the target terminal, a video session interface, the video session interface including an image display region for displaying images associated with one or more users participating in the video session;
displaying, by the target terminal, a target virtual image of a user of the target terminal in the image display region;
controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user; and
transmitting, by the target terminal, movement data of the target virtual image performing the target interaction action, to terminals of the other users of the video session, wherein the movement data renders the target virtual image to perform the target interaction action on the corresponding terminals.
An embodiment of this application provides a computing device including:
a storage apparatus, configured to store a computer program; and
a processor, configured to run the computer program stored in the storage apparatus to implement the above processing method.
An embodiment of this application provides a non-transitory computer-readable memory which stores a computer application program, which, when executed, implements the above processing method.
An embodiment of this application further provides a computer program product or a computer program, the computer program product or the computer program including computer instructions stored in a computer-readable memory. The processor of the terminal reads the computer instructions from the computer-readable memory, and executes the computer instructions, so that the terminal implements the above interaction processing method.
The technical schemes in the embodiments of this application will be clearly and completely described below with reference to the drawings in the embodiments of this application, and it is obvious that the described embodiments are only a part of the embodiments of this application, but not all of them. Based on the embodiments of this application, all other embodiments obtained by those ordinarily skilled in the art without creative effort shall fall within the protection scope of this application.
The embodiments of this application involve a virtual image which may refer to: a virtual image used by a user to represent the user. The image may be a fictitious model (such as a cartoon model which does not really exist, an animation model, or the like) or a real model (such as a person model which is similar to a real person but is displayed in a screen of a terminal, or the like). Common virtual images may include, but are not limited to: a virtual person image (such as a cartoon figure image, an animation figure image, an anime image, etc.), a virtual animation image (such as a cartoon animal image, various object images, etc.), etc. For convenience of explanation, the following takes the virtual person image as an example for description. In a process of using a terminal, using a virtual image can improve a user's sense of substitution for a virtual image, so that the user is more immersed in operations. For example, in a video session scenario (such as a video call scenario and a video conference scenario), replacing a user with a virtual image to participate in a video session and simulating real-person interactions can improve a sense of immersion of a video session object. Video sessions may include an individual video session and a group video session. There are two users participating in the individual video session, and there are three or more users participating in the group video session. Embodiments of this application are not limited to types of video sessions, which is particularly stated here.
Based on this, an embodiment of this application provides an interaction processing scheme, which can replace, in a video session, a target session object with a target virtual image to participate in a video session, so as to fast displaying the target virtual image in a video session interface. In addition, movement information of the target session object may also be acquired, and the target virtual image is controlled, according to the movement information, to flexibly follow the target session object to perform a target interaction action, so as to realize participation in a video session through the target virtual image. Using the target virtual image to simulate real-person interactions can avoid displaying a real image of the target session object in the video session, to protect an image privacy of the target session object.
The above-mentioned interaction processing scheme may be performed by target terminals. The target terminals herein may include, but are not limited to: a smart phone, a tablet, a laptop, a desktop, and other terminal devices. An application program configured to perform the interaction processing scheme may be deployed in the target terminals. The application program may include but is not limited to: an Instant Messaging (IM) application program, a content interaction application program, and the like. The IM application program refers to an Internet-based instant messaging and social interaction application program. The IM application program may include, but is not limited to: QQ, WeChat, Enterprise WeChat, a map application program containing a social interaction function, a game APP, and the like. The content interaction application program refers to an application program capable of realizing content interactions, and may be, for example, e-bank, microblog, memories, news, and the like. In this way, a target user can open and use the application program deployed in the target terminal for a video session.
A video session scenario of the interaction processing scheme in an embodiment of this application is exemplarily described below with reference to
In this embodiment of this application, controlling the target virtual image to perform the target interaction action by means of rendering the target virtual image, instead of displaying a real image of the target session object in the image display region. Based on this, in this embodiment of this application, it is not necessary to transmit images about video session objects collected by a camera. The images include multiple frames of environment images of an environment captured by the camera. By only means of transmitting detected relevant data (such as face data and body data) about the video session objects, the virtual image that performs the target interaction action can be obtained, in a peer device, by rendering according to the relevant data. The transmission of only the relevant data of the video session objects can reduce a data transmission amount and improve data transmission efficiency compared with transmission of images. The above-mentioned flow diagram of transmitting relevant data of a video session object and obtaining, in a peer device, a target virtual image by rendering according to the relevant data of the video session object may be shown in
This embodiment of this application can also be combined with a blockchain technology, specifically as follows: The target terminal configured to implement the interaction processing scheme may be a node device in a blockchain network. The target terminal may post the collected movement information (such as limb information and face information) of the target session object to the blockchain network, and records, on a blockchain, relevant data (a type and information of the target interaction action) of the target interaction action controlled to be performed by the target virtual image according to the movement information. In this way, the collected movement information of the target session object can be prevented from being tampered in a transmission process, and each target interaction action performed by the target virtual image can also be effectively traced. In addition, the movement information is stored in the form of a block, so that distributed storage of the movement information may be implemented.
Based on the above-mentioned interaction processing scheme, an embodiment of this application provides a more detailed interaction processing method. The interaction processing method provided in this embodiment of this application will be described below with reference to the accompanying drawings.
Referring to
S201: Display a video session interface in a video session. When a video session object (namely, any user participating in a video session) opens and uses a video session function provided by a target application program (namely, any application program having a video session function), the target application program may display the video session interface corresponding to the video session. The video session interface may be used for presenting images of various video session objects participating in the video session, in particular, presenting the video session objects by means of presenting an environment image containing the video session objects. The environment image may be obtained by capturing an environment, and specifically collected, using a camera configured in the target terminal, from the environment in which a target session object is currently located. A process of capturing the environment image by the camera is performed after the camera is turned on in a video session scenario. If the camera of the terminal used by a certain video session object is not turned on in the video session scenario, only a configuration image (such as a user's ChatHead and a blank image) of the video session object is displayed in the video session interface. Specifically, the video session object may turn on the camera of the target terminal by means of triggering a camera option contained in the video session interface, which will not be described in detail herein. Recognizing a video session object contained in an environment image may be realized on the basis of Artificial Intelligence (AI). AI is a theory, method, technology and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, sense an environment, acquire knowledge and use the knowledge to obtain an optimal result. The recognition can be specifically realized on the basis of Computer Vision technology (CV) under an artificial intelligence technology. The CV technology generally includes common biological feature recognition technologies such as face recognition and fingerprint recognition.
An example in which a video session is an individual video session is used. An exemplary video session interface may be seen in
Still referring to
Attribute information such as a display area, shape and display region of the image display region on the video session interface is not unchanged. This embodiment of this application supports the target session object to adjust the attribute information of all or part of the image display regions on the target terminal. For example, the target session object is supported to drag, on the video session interface, any image display region from a first position to a second position (for example, another position, different from the first position, on the video session interface). For another example, the target session object is supported to adjust any image display region from a square shape to a round shape on the video session interface. Specifically, any shape may be selected from various of pre-configured shapes, or a shape may be artificially created by the target session object. For yet another example, the target session object is supported to perform a gesture of “tapping with two fingers” in the video session interface to enlarge the image display region with a relatively small display area and narrow the image display region with a relatively large display area, and the like.
S202: Display a target virtual image of the target session object associated with the video session in the image display region. This embodiment of this application supports switching from a real video session to a virtual video session in the video session scenario. The so-called real video session means that: Video session objects participating in a video session all participate in the video session by real images. At this time, the real images of the video session objects collected by a camera are displayed in image display regions. The so-called virtual video session means that: All or part of video session objects participating in a video session use virtual images to participate in the video session. For example, the target session object uses the target virtual image to participate in the video session, and at this time, the virtual image is displayed in the image display region. This manner of participating in a video session by video session objects using virtual images protects an image privacy of the video session objects.
In a specific implementation, when the target session object performs a virtual session operation on the video session interface, it is determined to enable a virtual video session mode. In the virtual video session mode, the real image of the target session object will not be displayed in the image display region used for displaying the environment image of the target session object on the video session interface, but the target virtual image that replaces the target session object is displayed. The virtual session operation may include, but is not limited to: an operation generated during triggering of a virtual session option; an operation generated when a shortcut gesture operation (such as a double-click operation, a single-click operation and a drag operation) is performed on the video session interface; or the like. The following mainly describes an example of performing the virtual session operation by means of triggering the virtual session option. Specifically, the virtual session option is displayed on the video session interface, or any option contained in the video session interface includes the virtual session option. When the virtual session option is triggered, it may be determined that the virtual session operation has been performed. With reference to
There may be various manners for determining a target virtual image that represents the target session object to participate in the video session. For example, the target virtual image is configured by a system, or is independently selected by the target session object. The following describes in more detail different manners for determining a target virtual image.
(1) A target virtual image is configured by a system. In a specific implementation, when the target terminal initiates a video session, a target image obtained by capturing an environment may be acquired. The target image may include a real image of the target session object (namely, a target user). Feature recognition processing is performed on the target session object in the target image to obtain a recognition result. A virtual image matching the recognition result is allocated to the target session object, and the virtual image matching the recognition result is determined as the target virtual image. The feature recognition may be recognizing the face of the target session object, which is not limited in this embodiment of this application. In other words, after the virtual video session mode is enabled, the target image captured by the camera can be acquired, and image recognition processing is performed on the target image to obtain a feature recognition result of the target session object. A virtual image matching the feature recognition result is then selected from a virtual image library according to the feature recognition result. The virtual image is determined as the target virtual image. For example: The feature recognition is performed on the face of the target virtual image to obtain a face recognition result indicating that the target session object is a user with beards, and the selected target virtual image matching the face recognition result may be a virtual image with beards. For another example, the feature recognition is performed on the head of the target virtual image to obtain a feature recognition result indicating that the target session object is a user wearing a hat, and the selected target virtual image matching the feature recognition result may be a virtual image wearing a hat.
(2) A target virtual image is independently selected by the target session object. In a specific implementation, an image selection window may be displayed in the terminal screen. The image selection window includes an image selection element. A reference virtual image is displayed in a reference display region in response to a trigger operation performed on the image selection element, and candidate virtual images are displayed in the image selection window. The reference virtual image is updated, in response to an image selection operation performed on the candidate virtual images, to be displayed as a target candidate virtual image selected by the image selection operation. It is determined, in response to a virtual image confirm operation, that the target candidate virtual image is the target virtual image. The so-called “response” corresponds to “request”. For example, when there is a trigger operation, in the image selection window, performed on the image selection element, a request for triggering the image selection element is generated in a background. At this time, a response may be made to the request, that is, to the trigger operation performed on the image selection element. In addition, the target virtual image may include a comic person image, an animal image, an object image, or the like, thereby enriching selectivity of video session objects.
The following describes the above implementation for independently selecting the target virtual image by the target session object with reference to
When any option (or element) in the image selection window is selected, the selected option may be highlighted to prompt the target session object that information displayed in the current image selection window is related to the selected option. For example, when the virtual session option 4021 in the image selection window is selected, the virtual session option 4021 may be highlighted in the image selection window (for example, a brightness of the virtual session option 4021 is higher than that of other options, and a transparency of the virtual session option 4021 is lower than that of other options) to prompt the target session object that the candidate virtual image is displayed in the current image selection window. Furthermore, since a display area of the terminal screen is limited, part of the candidate virtual images may be hidden. In this way, the image selection window may include a sliding axis. The hidden candidate virtual images may be slidably displayed by operating the sliding axis. Of course, in addition to slidably displaying the candidate virtual images in the image selection window via the sliding axis, the candidate virtual images may also be slidably displayed by pressing to slide any position of the image selection window, which is not limited in this embodiment of this application.
It is understood that images displayed in the image display region are not the same according to different moments when the image selection operation is performed and the camera is turned on. Specifically, in a video session scenario, when the target session object does not turn on the camera, when the target session object performs the image selection operation and selects the target virtual image, the camera of the target terminal may be automatically triggered to be turned on, and at this time, the target virtual image selected by the target session object is displayed in the image display region. If the target session object has turned on the camera, but does not select the target virtual image and does not enable the virtual video session mode, the environment image obtained by capturing, by the camera, the environment in which the user is currently located may be displayed in the image display region at this time. After the target session object performs the image selection operation, the environment image displayed in the image display region is replaced with an image containing the target virtual image. If the target session object has turned on the camera, but does not select the target virtual image and has enabled the virtual video session mode, the virtual image configured by the system may be displayed in the image display region. After the target session object performs the image selection operation, the virtual image displayed in the image display region is replaced with the target virtual image.
In addition, a background image is also displayed in the image display region contained in the video session interface. A more realistic session environment can be created by means of allocating a background to the target virtual image. Each candidate virtual image displayed in the image selection window may be matched with a default background image, so that the candidate virtual image may be better fused with the background. In a specific implementation, as shown in
In addition to the above-mentioned description, in addition to selecting the background image by means of selecting the virtual image, this embodiment of this application also supports the target session object to freely select the background image according to a preference of the target session object. In other words, the target session object may independently select the background image. In a specific implementation, a background selection element is displayed in the image selection window. Candidate background images are displayed in the image selection window in response to a user operation performed on the background selection element. A target candidate background image selected by the background selection operation is displayed in the reference display region in response to a background selection operation performed on the candidate background images. The target candidate background image is set as a background image of the image display region in response to a background image confirm operation. An exemplary schematic diagram of selecting a background image in an image selection window can be seen in
This embodiment of this application supports selecting the background image by the target session object only after selecting the target virtual image, thus helping better fuse the target virtual image with the background. In one implementation, when the target session object does not select the target virtual image, the background selection element in the image selection window is set to be non-selectable, namely, the background selection element cannot be triggered. In this way, it prompts the target session object to select the target virtual image first. On the contrary, when the target session object has selected the target virtual image, the background selection element is set to be selectable, that is, the background selection element supports to be clicked. Of course, when the target session object selects the target virtual image again after selecting the background image, the background image selected by the target session object is displayed during the selection of the target virtual image, instead of switching to display the background image corresponding to the target virtual image.
In addition, this embodiment of this application also supports configuring a sound effect for the target session object to further enrich video session manners. The sound effect configured for the target session object may match the target virtual image. In this implementation, when the target virtual image has been selected, it is determined to use a voice audio processing rule matching the target virtual image to perform simulation processing on a sound signal of the target session object received in the video session, so as to obtain the sound effect matching the target virtual image. The use of the sound effect matching the target virtual image for the video session improves an identity privacy of the target session object. Or the target session object selects a sound effect. In a specific implementation, a voice selection element is displayed in the image selection window. A candidate voice audio processing rule in the image selection window may be displayed in response to a selection operation performed on the voice selection element. It is determined, in response to a confirm operation performed on the candidate voice audio processing rule, that the candidate voice audio processing rule is a target voice audio processing rule, the target voice audio processing rule being used for performing simulation processing on a sound signal of the target session object received in the video session.
The above implementation of selecting the voice audio processing rule can be seen in
In summary, this embodiment of this application supports a video session object to freely select a background image, a virtual image and a sound effect, so as to meet personalized requirements of the video session object on the virtual image, the background image and the sound effect, and an interactivity of the video session object participating in the video session can be improved.
S203: Acquire movement information of the target session object, and control, according to the movement information of the target session object, the target virtual image displayed in the image display region to perform a target interaction action. In the virtual video session mode, the target virtual image may perform the target interaction action according to the movement information of the target session object. The target interaction action corresponds to an action performed by the target session object. In terms of a visual effect, the target virtual image follows the target session object to perform a corresponding action, and the target session object may conduct video sessions with other session objects by using the target virtual image.
The target interaction action performed by the target virtual image matches the collected movement information of the target session object. Matching here can be understood below: In one implementation, the target interaction action performed by the target virtual image is similar to an action indicated by the movement information. For example: If the movement information of the target session object is facial movement information, namely, when the movement information indicates that the face of the target session object performs an action, the target virtual image may be controlled to perform a facial interaction action. If the facial movement information of the target session object instructs: the target session object to wink with the right eye, the right eye of the target virtual image may be controlled to wink. As another example: If the movement information of the target session object is limb information, namely, when the movement information indicates that a limb of the target session object does an action, a target limb of the target virtual image is controlled to perform a limb action. If the limb action indicated by the limb information is: raising up the right hand of the target session object, the right arm of the target virtual image may be controlled to be raised up. As another example if the movement information of the target session object is position change information, namely, the movement information indicates that the position of the target session object changes, the target virtual image is controlled to perform a position change action in the image display region. If an action indicated by the position change information of the target session object is that: the target session object moves to the left along a horizontal direction, the target virtual image may be controlled to move to the left in the image display region.
In other implementations, there is a mapping relationship between the target interaction action performed by the target virtual image and the action indicated by the movement information. For example: if the movement information of the target session object is recognized as emotional information, a target facial resource in a mapping relationship with an emotion indicated by the emotional information may be used to replace the face of the target virtual image and displayed. At this time, a form (such as a facial form) presented by the target virtual image may not be the same as that presented by the target virtual image, but both convey the same emotion. If the emotional information of the target session object is laugh, a target facial resource in a mapping relationship with laugh may be used to replace the face of the target virtual image and displayed, so that an expression of the target virtual image after replacement displaying is laugh. For another example, when the emotional information of the target session object is anger, a facial resource (for example, a facial animation resource with fire on the eyes) in a mapping relationship with anger may be used to replace the face of the target virtual image and displayed, so that an expression of the target virtual image after replacement displaying is anger. The emotional information corresponding to the movement information can be determined by means of model recognition. Mapping relationships between emotions and facial resources can be pre-set, so as to facilitate the above-mentioned presentation of the face of the target virtual image and embody various relatively exaggerated emotions. The implementation process for controlling, according to the movement information of the target session object, the target virtual image to perform the target interaction action will be described in a later embodiment and will not be described in detail herein.
This embodiment of this application supports quitting the virtual video session mode, namely, changing from the virtual video session mode to a normal video session mode. In the normal video session mode, a real image of the target session object can be displayed in the image display region contained in a video session interface. In a specific implementation, the image selection window includes a quit option. The quit option in the image selection window as shown in
An implementation of quitting a virtual video session mode is described in conjunction with
In this embodiment of this application, the target virtual image can be displayed in the video session interface of the video session, and the target virtual image is driven, according to movement information of the target session object, to perform the target interaction action, so that the target session object can participate in the video session by using the target virtual image. This manner of outputting the target virtual image in the video session interface can quickly display the target virtual image. In addition, the target virtual image can be used for representing the target session object to participate in the video session. Using a virtual image to simulate real-person interactions can avoid displaying a real image of the target session object in the video session, to protect an image privacy of the target session object.
Referring to
S1001: Display a video session interface in a video session.
S1002: Display a target virtual image of the target session object associated with the video session in the image display region.
Specific implementations of steps S1001-S1002 can refer to the relevant descriptions of the specific implementations of steps S201-S202 in the embodiment shown in
S1003: Acquire a mesh set added for the target virtual image.
S1004: Acquire movement information of the target session object, and perform mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the target session object.
S1005: Render and display, on the basis of mesh data subjected to mesh deformation processing, the target virtual image that performs the target interaction action.
In steps S1003-S1005, the target virtual image provided by this embodiment of this application is two-dimensional (2D) virtual image. The 2D virtual image can be referred to as a two-dimensional image. Any point on the two-dimensional virtual image can be represented by an x-axis and a y-axis, namely, the two-dimensional virtual image is a planar graph. This 2D virtual image has a relatively low requirement for configuration capability of the target terminal, and is convenient and fast to move. Target virtual image control efficiency can be improved when the 2D virtual image is used in a video session scenario. Furthermore, in order to enable the 2D virtual image to perform the target interaction action more realistically and naturally, a large number of 3D to 2D methods are also used in this embodiment of this application. Specifically: (1) Turning actions can be expressed through deformations of the face, head, hairs and other elements of the target virtual image, so that the target virtual image looks full and three-dimensional in a micro inching range. (2) When the head element or the body element swings, a physical elasticity of a real world is increased, such as swinging elasticities of the hair element and ornament elements, so that the target virtual image looks more realistic. (3) When the target virtual image does not perform any interaction action, a feeling of fluctuating of the body (such as the chest) during breathing can be added for the target virtual image, and an action of making the body swing slightly can be performed randomly, which makes the target virtual image more realistic.
In order to enable the 2D virtual image to follow the target session object to perform three-dimensional fluent actions and expressions, this embodiment of this application also creates grids for various object elements contained in the target virtual image. The grids corresponding to all the object elements contained in the target virtual image constitute the mesh set. One mesh corresponds to one object element, and any mesh is composed of at least three mesh vertices. The object elements may refer to single elements constituting the target virtual image. For example, the hairs of the target virtual image may be referred to as a hair element. The hair element is composed of a plurality of hair sub-elements (such as a tilted frisette element and an earlock element). As another example, an arm of the target virtual image may be referred to as a limb element, or the like. In order to improve the fineness and accuracy of the target interaction action performed by the target virtual image, this embodiment of this application supports that the mesh corresponding to one object element may include a plurality of sub-grids. In this way, part of the position of the object element corresponding to the mesh can be changed by controlling a certain sub-mesh of the grid. In addition, for convenience of explanation, in this embodiment of this application, a mesh corresponding to an object element that performs an interaction action is referred to as a target grid. This is stated here.
An exemplary schematic diagram of adding a mesh for a target virtual image can be seen in
In a specific implementation, the mesh is driven by changing mesh data of any grid, so as to control the object element corresponding to the mesh in the target virtual object to perform a target interaction action. The mesh set corresponding to the target virtual image contains a plurality of grids and mesh data of the various grids. The mesh data of any mesh refers to state values of the various mesh vertices contained in the grid. The state value of any mesh vertex may refer to position information of the mesh vertex, a position relationship between the mesh vertex and other mesh vertices connected to the mesh vertex, or the like. When the various mesh vertices use different state values, the target virtual image rendered by using the mesh data performs different interaction actions. In this way, a target mesh can be driven to perform mesh deformation in real time by using the movement information of the target session object, so as to drive the corresponding object element in the target virtual object to perform the target interaction action, so that the 2D virtual image can also achieve various two-dimensional patch motion effects, and the target virtual image can flexibly follow the target session object to perform interaction actions.
The process of performing the mesh deformation on the mesh data of the mesh is described in conjunction with
In the process of rendering the target virtual image on the basis of the mesh data, the target virtual image is rendered according to a hierarchical relationship of virtual images in sequence. As shown in
Based on the descriptions related to the grids added for the target virtual image and the mesh data, the following uses an example in which the movement information is the facial movement information and the limb information to describe a front-end interface and a background technology of step S203 in the embodiment shown in
(1) The movement information of the target session object is facial movement information. Actions performed by the target session object and indicated by the facial movement information may include: an action of turning the head (such as turning the head towards a side, raising the head, lowering the head and tilting the head), an action of changing the five sense organs (for example, the corners of the mouth are upward, the corners of the mouth are downward, and the mouth is gathered to the middle according to opening and closing degrees), and the like. An exemplary schematic diagram of controlling, according to facial movement information of a target session object, a target virtual image to perform a target interaction action can be seen in
In this embodiment of this application, the mesh deformation processing described above can be performed on a target mesh corresponding to an object element to control the target virtual image to perform the target interaction action. In a specific implementation, the facial movement information of the target session object may include N pieces of feature point information of the face of the target session object, N being an integer greater than 1. An exemplary schematic for recognizing feature points of the face of the target session object may be seen in
The expression type of the target session object may refer to a type of an expression of the face of the target session object. Performance states represented by the same object element on the face of the target session object are different under different expression types. For example, when the expression type is smile, the performance state of an object element, i.e., the mouth is: The corners of the mouth are upward. For another example, when the expression type is cry, the performance state of an object element, i.e., the mouth, is: The corners of the mouth are downward. An expression base coefficient of an expression type may be a coefficient used for representing a performance state of an object element. For example: A range of the expression base coefficient of the mouth can be defined to be [0,1]. It is assumed that when the expression base coefficient is 0, it indicates that the performance state of the mouth is closing the mouth; when the expression base coefficient is any value (for example, 0.5) within (0, 1), it indicates that the performance state of the mouth is opening the mouth; and when the expression base coefficient is 1, it indicates that the performance state of the mouth is maximally opening the mouth.
The above-mentioned implementation of acquiring an expression base coefficient corresponding to an expression type after update may include: An image containing the target session object may be captured by using a camera, and information extraction processing is performed on feature points of the face of the target session object in the image to obtain the N pieces of feature point information. The expression base coefficients of the various object elements of the face under the current expression type are fitted and generated according to the N pieces of feature point information. Generally, N may be 83, namely, there are 83 pieces of feature point information of the target session object. A quantity of the fitted and generated expression base coefficients of the object elements of the face is 52, namely, the plurality of object elements of the face of the target session object may have 52 performance states in total.
It is understood that in the process of rendering the target virtual image according to the expression base coefficient corresponding to the expression type after update, in terms of a visual effect, the object element of the target virtual image presents a dynamic change process. In other words, an expression base coefficient of an object element with an intermediate performance state also needs to be obtained according to the expression base coefficients of the expression types before (namely, before the target session object performs an action) and after update, so as to perform rendering according to the expression base coefficients before update, of the intermediate expression state and after update to obtain a dynamic change process in which the object element is relatively continuous. For example, in terms of a visual effect, when the action performed by the target session object is opening the mouth, the object element, i.e., the mouth, of the target virtual image has a dynamic change process from closing to opening, namely, gradually increasing a size of the mouth, is implemented through dynamically changing the mesh corresponding to the mouth. An exemplary schematic diagram of dynamic changes of a mesh of the mouth can be seen in
In a specific implementation, an expression base coefficient of an expression type before update may be acquired; a difference value between the expression base coefficient corresponding to the expression type after update and the expression base coefficient corresponding to the expression type before update is calculated to obtain a difference value result; and the mesh deformation processing is performed, according to the difference value result, the expression base coefficient of the expression state after update and the expression base coefficient of the expression state before update, on the mesh where the object element corresponding to the expression type after update is located, thus obtaining intermediate mesh data subjected to mesh deformation processing, so that rendering is performed on the basis of the intermediate mesh data to obtain a dynamic change process in which the object element is relatively continuous. The above-mentioned process can be simply considered as an interpolation process. For example, in the process of increasing the size of the mouth, the interpolation algorithm may be used to insert some pixel points in an added region, so that various feature points on the mouth can be also clearly presented in the process of increasing the size of the mouth, so that the whole dynamic process looks fluent and clear. The interpolation algorithm may include, but is not limited to: a linear interpolation algorithm, a Bezier curve difference algorithm, and the like. An example in which the dynamic change of the mouth described in the above example is used. A flow diagram of determining an intermediate state of the mouth by using an interpolation algorithm can be seen in
(2) The movement information of the target session object is limb information. The limb information may be used for reflecting states of the body of the target session object. The body (or limbs) may include but is not limited to arms (such as large arms, small arms, and palms), thighs, shanks, feet, and the like. For convenience of explanation, the following will use an example, in which the limb is an arm, for description. This is stated here. For example, when the limb information of the target session object indicates that the right hand of the target session object waves, the right hand of the target virtual image may be controlled to also perform an interaction action of waving the hand. An exemplary schematic diagram of this implementation can be seen in
For example,
Of course, in addition to the performing the mesh deformation on the meshes corresponding to the body elements in the target virtual image according to the angle values, so as to make the body elements of the target virtual image perform an action with the same angle values as that of the action performed by the body elements of the target session object. This embodiment of this application also supports triggering to play a body animation configured for the target virtual image after an action type of the limb action of the target session object is detected, so as to control the target virtual image to perform a body action similar to the action of the target session object. For example, after it is detected that the target session object performs an action of waving a hand, an action of waving a hand configured for the target virtual image may be triggered to be played. At this time, the angle value between the large arm and the small arm (or the large arm and the shoulder) when the target virtual image performs the action of waving a hand does not necessarily match the angle value of the target session object. For another example, when it is detected that the target session object performs an action of making a heart shape with a single hand, a gesture of making a heart shape with two hands configured for the target virtual image can be triggered. For another example, when it is detected that the target session object performs an “OK” gesture, an “OK” gesture configured for the target virtual image can be triggered. Or the like. By this manner of directly triggering the target virtual image to perform a matching body animation after the action type of the target session object is detected, the fluency and fastness of the interaction action performed by the target virtual image can be improved to a certain extent.
In conclusion, steps S1003-S1005 show an implementation provided by this embodiment of this application for controlling, according to the movement information of the target session object, the target virtual image to perform the target interaction action, which is specifically implemented by taking the movement information respectively being the facial movement information and the limb information as an example, and performing the mesh deformation on the mesh according to the movement information. However, it is understood that the implementation of controlling, according to the movement information of the target session object, the target virtual image to perform the target interaction action is not necessarily obtained by performing the mesh deformation on the mesh. The following uses an example in which the movement information is respectively emotional information and position change information, to provide another implementation of a front-end interface and a background technology for controlling a target virtual image to perform a target interaction action.
(1) The movement information of the target session object is emotional information. The emotional information may be used for indicating a type of an emotion of the target session object. For example, types of emotions of the target session object include: laugh, anger, surprise, cry, and the like. It is understood that the above-mentioned driving, according to the type of the emotion of the target session object, the object element of the face of the target virtual image to perform some actions can convey the emotion of the target session object to a certain extent. However, a target virtual image of the comic world often uses more exaggerated expressions to express emotions. In order to ensure that the target virtual image presents a more exaggerated expression, this embodiment of this application sets a mapping relationship of exaggerated expressions between the target session object and the target virtual image. When it is detected that a preset exaggerated expression appears on the target session object, the preset exaggerated expression can be used to perform resource (or material) replacement on an original face material of the target virtual image, so as to change the original face material of the target virtual image with an exaggerated face material, so as to achieve rapid face change and improve control efficiency. With reference to
In a specific implementation, an emotional state of the target session object may be recognized according to the emotional information to obtain a current emotional state of the target session object. A target facial resource matching the current emotional state is determined on the basis of the current emotional state. The face of the target virtual image is updated in the image display region using the target facial resource to obtain a target virtual image after update. In some embodiments, an emotion recognition model can be used to recognize the emotional state of the target session object. In more detail, a deep learning method is used to perform emotion recognition and emotion classification on a real-time collected image containing the target session object, including but not limited to: surprise, laugh, anger, smile and other emotion classes with specific semantics. After an emotion recognition class corresponding to the target session object is recognized, the target virtual image can be triggered to perform an expression effect of a corresponding exaggerated expression. The expression effect of the exaggerated expression performed by the target virtual image is achieved in the following manner: After an emotion classification result of the target session object is recognized, the original facial resource of the target virtual image can be hidden, and the preset target facial resource can be displayed on the face to display the expression effect of the exaggerated expression. A flow diagram of the above-mentioned replacement of the facial resource can be seen in
(2) The movement information of the target session object is position change information. The position change information can be used for indicating movement information of the target session object in an environment. Specifically, position change information of a certain object element contained in the target session object can be used to represent the position change information of the target session object. For example: The face element of the target session object moves to the left in a horizontal direction; the face element of the target session object moves up in a vertical direction; a display area of the face element of the target session object is reduced (indicating that a distance from the target session object to the terminal screen varies); and the like. Driving, on the basis of the position change information of the target session object, the target virtual image to perform a corresponding position change action in the image display region enables the target session object and the target virtual image have a better sense of mapping.
In a specific implementation, after the position change information of the target session object is acquired, when it is detected that the position change information is movement information of a target image point, the target virtual image is controlled, according to the movement information of the target image point, to perform movement display in the image display region. If the position change information is movement information of a target image region, a display area of the target virtual image in the image display region may be adjusted according to the movement information of the target image region. The target image point may refer to a point of a display region where the target virtual image is located. For example, the target image point is a central point of the face of the target virtual image. The target image region may refer to an image region of the display region where the target virtual image is located. For example, the target image region is a face region of the target virtual image. In detail, in order to enable the target virtual image to have a three-dimensional perspective effect, this embodiment of this application supports driving, according to the position change information of the target session object, the target virtual image to move and rotate along three axes x, y and z, so as to control the target virtual image to move and rotate in the image display region.
An implementation logic of driving the target virtual image to move along the three axes x, y and z may include: (1) Acquire multiple continuous frames of environment images, recognize the position change information of the target image point (such as the central point of the face) of the target virtual image in the multiple frames of environment images, and drive, according to the position change information, the target virtual image to perform a matching movement in the image display region. The position movement information at this time may include: Any one of horizontal position change information along the x-axis, longitudinal position change position along the y-axis, and both horizontal position change information along the x-axis and longitudinal position change position along the y-axis. (2) Acquire multiple continuous frames of environment images, recognize change information of the display area occupied, in the environment images, by a target image region (such as a face region) in the target virtual image, and zoom in or out the target virtual image in the image display region according to the change information of the display area, so as to achieve a dynamic effect of controlling the target virtual image to change in a z-axis direction.
An implementation logic of driving the target virtual image to perform a rotation operation may include: During collection of the environment images, recognize the face of the target session object in the environment images, acquire an Euler angle to which the current face faces, and then perform, according to the Euler angle to which the current face faces, mesh deformation on the mesh corresponding to the face element of the target virtual image, so as to achieve a rotation effect that controls the face of the target virtual image. Of course, when other body parts (such as the shoulders) of the target session object rotate, the mesh deformation processing can be controlled to be performed on the mesh corresponding to the shoulder element of the target virtual image by using the above-mentioned implementation, so as to achieve a rotation effect that controls the shoulders of the target virtual image. This embodiment of this application does not limit an object element of the target virtual image controlled to rotate.
In this embodiment of this application, the target virtual image can be displayed in the video session interface of the video session, and the target virtual image is driven, according to movement information of the target session object, to perform the target interaction action, so that the target session object can participate in the video session by using the target virtual image. This manner of outputting the target virtual image in the video session interface can quickly display the target virtual image. In addition, the target virtual image can be used for representing the target session object to participate in the video session. Using a virtual image to simulate real-person interactions can avoid displaying a real image of the target session object in the video session, to protect an image privacy of the target session object. In addition, the target virtual image is a 2D virtual image, and a mesh corresponding to the 2D virtual image is driven to flexibly follow an action of the target session object to perform a target interaction action. Due to an innovation made to the 2D virtual image, the 2D virtual image with low production cost can achieve a realistic effect similar to that of a 3D virtual image, thereby reducing a communication running cost of a virtual image.
a display unit 1601, configured to display, in a video session, a video session interface, the video session interface including an image display region configured to display a video session object; the video session object including a target session object corresponding to the target terminal and other session objects participating in a video session; and
a processing unit 1602, configured to display a target virtual image of the target session object in the image display region;
the processing unit 1602 being further configured to acquire movement information of the target session object, and control, according to the movement information of the target session object, the target virtual image displayed in the image display region to execute a target interaction action.
In one embodiment, the processing unit 1602 is also configured to: transmit, to terminals of the other session objects, relevant data of the controlling, according to the movement information, the target virtual image to perform a target interaction action, the terminals of the other session objects performing, according to the relevant data, rendering to obtain the target virtual image that performs the target interaction action.
In some embodiments, the processing unit 1602 is further configured to:
display an image selection window, the image selection window including an image selection element;
display, in response to a trigger operation performed on the image selection element, a reference virtual image in a reference display region, and display candidate virtual images in the image selection window;
update, in response to an image selection operation performed on the candidate virtual images, the reference virtual image to be displayed as a target candidate virtual image selected by the image selection operation; and
determine, in response to a virtual image confirm operation, that the target candidate virtual image is the target virtual image.
In some embodiments, the processing unit 1602 is further configured to:
display a background selection element in the image selection window;
display candidate background images in the image selection window in response to a user operation performed on the background selection element;
display, in the reference display region in response to a background selection operation performed on the candidate background images, a target candidate background image selected by the background selection operation; and
set the target candidate background image as a background image of the image display region in response to a background image confirm operation.
In some embodiments, the processing unit 1602 is further configured to:
display a voice selection element in the image selection window;
display a candidate voice audio processing rule in the image selection window in response to a selection operation performed on the voice selection element; and
determine, in response to a confirm operation performed on the candidate voice audio processing rule, that the candidate voice audio processing rule is a target voice audio processing rule, the target voice audio processing rule being used for performing simulation processing on a sound signal of the target session object received in the video session, so as to obtain a sound effect matching the target virtual image.
In some embodiments, the image selection window includes a quit option, and the processing unit 1602 is further configured to:
display the video session interface in response to a click operation performed on the quit option;
display an environment image in the image display region contained in the video session interface, the environment image being obtained by capturing an environment; and
send the environment image to peer devices so that the peer devices display the environment image, the peer device referring to a device used by other users participating in a video session.
In some embodiments, when the processing unit 1602 is configured to the control, according to the movement information of the target session object, the target virtual image displayed in the image display region to perform a target interaction action, the processing unit is specifically configured to perform any one or more of the following steps:
controlling, when the movement information of the target session object is facial movement information, the target virtual image to perform a facial interaction action;
replacing, when the movement information of the target session object is emotional information, the face of the target virtual image with a target facial resource associated with the emotional information, and displaying the target facial resource;
controlling, when the movement information of the target session object is limb information, a target limb of the target virtual image to perform a limb action; and
controlling, in the image display region when the movement information of the target session object is position change information, the target virtual image to perform a position change action.
In some embodiments, when the processing unit 1602 is configured to the control, according to the movement information of the target session object, the target virtual image displayed in the image display region to perform a target interaction action, the processing unit is specifically configured to:
acquire a mesh set added for the target virtual image, the mesh set including a plurality of grids and mesh data of the various grids, and one mesh corresponding to one object element; any mesh being composed of at least three mesh vertices, and the mesh data of any mesh referring to state values of the various mesh vertices contained in the any grid; the object elements being single elements constituting the target virtual image;
perform mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the target session object; and
render and display, on the basis of mesh data subjected to the mesh deformation processing, the target virtual image that performs the target interaction action, wherein during the rendering and displaying the target virtual image that performs the target interaction action, a position and/or shape of the object element corresponding to the target mesh change(s).
In some embodiments, the movement information includes facial movement information; the facial movement information includes N pieces of feature point information of the face of the target session object, and N is an integer greater than 1. When the processing unit 1602 is configured to perform mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the target session object, the processing unit is specifically configured to:
determine an expression type after update of the target session object on the basis of the N pieces of feature point information;
acquire an expression base coefficient corresponding to the expression type after update; and
perform, according to the expression base coefficient corresponding to the expression type after update, the mesh deformation on a mesh where an object element corresponding to the expression type after update is located.
In some embodiments, when the processing unit 1602 is configured to perform, according to the expression base coefficient corresponding to the expression type after update, the mesh deformation on a mesh where an object element corresponding to the expression type after update is located, the processing unit is specifically configured to:
acquire an expression base coefficient of an expression type before update;
calculate a difference value between the expression base coefficient corresponding to the expression type after update and the expression base coefficient corresponding to the expression type before update to obtain a difference value result; and
perform, according to the difference value result, the expression base coefficient of the expression state after update and the expression base coefficient of the expression state before update, the mesh deformation on the mesh where the object element corresponding to the expression type after update is located.
In some embodiments, the movement information includes limb information, and when the processing unit 1602 is configured to perform mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the target session object, the processing unit is specifically configured to:
determine position information of S limb position points of the target session object according to the limb information of the target session object, S being an integer greater than zero;
calculate an angle value of a corresponding body element on the basis of the position information of the S limb position points; and
perform the mesh deformation on a mesh corresponding to the body element in the target virtual image according to the angle value.
In some embodiments, the movement information includes emotional information, and when the processing unit 1602 is configured to control, according to the movement information of the target session object, the target virtual image displayed in the image display region to perform a target interaction action, the processing unit is specifically configured to:
recognize an emotional state of the target session object according to the emotional information to obtain a current emotional state of the target session object;
determine, on the basis of the current emotional state, a target facial resource matching the current emotional state; and
update, in the image display region, the face of the target virtual image using the target facial resource to obtain a target virtual image after update.
In some embodiments, the movement information includes position change information, and when the processing unit 1602 is configured to control, according to the movement information of the target session object, the target virtual image displayed in the image display region to perform a target interaction action, the processing unit is specifically configured to:
control, according to the movement information of the target image point when the position change information is movement information of a target image point, the target virtual image to perform movement display in the image display region; and
adjust, when the position change information is movement information of a target image region, a display area of the target virtual image in the image display region according to the movement information of the target image region.
In some embodiments, the processing unit 1602 is further configured to:
acquire, when the target terminal initiates a video session, a target image acquired by collecting an environment;
perform feature recognition on the target session object in the target image to obtain a recognition result; and
allocate a virtual image matching the recognition result to the target session object, and determine the virtual image matching the recognition result as the target virtual image.
According to one embodiment of this application, various units in the interaction processing method shown in
In this embodiment of this application, the display unit 1601 can display the target virtual image in the video session interface of the video session, and the processing unit 1602 can drive, according to movement information of the target session object, the target virtual image is driven to perform the target interaction action, so that the target session object can participate in the video session by using the target virtual image. This manner of outputting the target virtual image in the video session interface can quickly display the target virtual image. In addition, the target virtual image can be used for representing the target session object to participate in the video session. Using the virtual image to simulate real-person interactions can avoid displaying a real image of the target session object in the video session, to protect an image privacy of the target session object. Furthermore, this embodiment of this application does not need to transmit videos collected by the camera and related to the video session objects, but transmits only the relevant data of the video session objects. Compared with transmission of videos, the transmission of only the relevant data reduces a data transmission amount and improves data transmission efficiency.
The storage apparatus 1701 may include a volatile memory such as an RAM. The storage apparatus 1701 may also include a non-volatile memory, such as a flash memory, a solid-state drive (SSD), and the like. The storage apparatus 1701 may also include a combination of the above kinds of memories.
The processor 1702 may be a central processing unit (CPU). The processor 1702 may further include a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logic device, PLD), and the like. The PLD may be a field-programmable gate array (FPGA), a generic array logic (generic array logic, GAL), and the like.
A computer program is stored in the storage apparatus 1701 of this embodiment of this application. The processor 1702 invokes the computer program in the storage apparatus. When the computer program is executed, the processor 1702 can be configured to implement the method described above, such as the embodiments corresponding to
In one embodiment, the terminal may correspond to the target terminal described above. The storage apparatus 1701 stores the computer program. The computer program is loaded and executed by the processor 1702 to implement the corresponding steps in the interaction processing method embodiment described above.
An embodiment of this application also provides a computer-readable memory that is a memory device in an electronic device to store programs and data. It is understood that the computer-readable memory here can include both a built-in memory in the electronic device and an extended memory supported by the electronic device. The computer-readable memory provides a storage space that stores a processing system of the electronic device. Furthermore, a computer program (including program codes) adapted to be loaded and executed by the processor 1702 is also stored in the storage space. The computer-readable memory herein may be either a high-speed RAM or a non-volatile memory, such as at least one disk memory. In some embodiments, there may be at least one computer-readable memory located away from the aforementioned processor.
According to one aspect of this application, a computer program product or a computer program is provided. The computer program product or the computer program includes computer instructions stored in a computer-readable memory. The processor of the terminal reads the computer instructions from the computer-readable storage medium, and executes the computer instructions, so that the terminal implements the interaction processing method provided in the various optional manners.
Those of skill in the art would recognize that the various illustrative units and algorithm steps described in connection with the embodiments disclosed in this application may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether these functions are implemented as hardware or software depends on particular application and design constraint conditions of the technical solutions. Professionals may implement the described functions in varying ways for each particular application, but such implementation should not be interpreted as departing from the scope of this application.
It is understood by those of ordinary skill in the art that all or part of the processes of the methods of the embodiments described above may be implemented by instructing relevant hardware by a computer program. The program may be stored in a computer-readable memory; and the program, when executed, may include the processes of the embodiments of the methods described above. The memory may be a magnetic disk, an optical disk, a ROM, an RAM, or the like.
In this application, the term “unit” or “module” in this application refers to a computer program or part of the computer program that has a predefined function and works together with other related parts to achieve a predefined goal and may be all or partially implemented by using software, hardware (e.g., processing circuitry and/or memory configured to perform the predefined functions), or a combination thereof. Each unit or module can be implemented using one or more processors (or processors and memory). Likewise, a processor (or processors and memory) can be used to implement one or more modules or units. Moreover, each module or unit can be part of an overall module that includes the functionalities of the module or unit. The above is only embodiments of this application and are not, of course, intended to limit the scope of this application, so that equivalent changes made according to the claims of this application still fall within the scope of this application.
Claims
1. A method performed by a computing device acting as a target terminal in a video session, the method comprising:
- displaying, by the target terminal, a video session interface, the video session interface including an image display region for displaying images associated with one or more users participating in the video session;
- displaying, by the target terminal, a target virtual image of a user of the target terminal in the image display region;
- controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user; and
- transmitting, by the target terminal, movement data of the target virtual image performing the target interaction action, to terminals of the other users of the video session, wherein the movement data renders the target virtual image to perform the target interaction action on the corresponding terminals.
2. The method according to claim 1, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises any one or more of following operations:
- controlling, when the movement information of the user is facial movement information, the target virtual image to perform a corresponding facial interaction action;
- replacing, when the movement information of the user is emotional information, the target virtual image with a target facial image associated with the emotional information, and displaying the target facial image;
- controlling, when the movement information of the user is limb information, a target limb of the target virtual image to perform a limb action; and
- controlling, when the movement information of the user is position change information, the target virtual image to perform a corresponding position change action in the image display region.
3. The method according to claim 1, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- acquiring a mesh set added for the target virtual image, the mesh set comprising a plurality of grids and mesh data of the various grids, and one mesh corresponding to one object element; any mesh being composed of at least three mesh vertices, and the mesh data of any mesh referring to state values of the various mesh vertices contained in the any grid; the object elements being single elements constituting the target virtual image;
- performing mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the user; and
- rendering, on the basis of mesh data subjected to the mesh deformation, the target virtual image that performs the target interaction action.
4. The method according to claim 1, wherein the movement information comprises emotional information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- recognizing an emotional state of the user according to the emotional information to obtain a current emotional state of the user;
- determining, on the basis of the current emotional state, a target facial resource matching the current emotional state; and
- updating the target virtual image using the target facial resource to obtain an updated target virtual image in the image display region.
5. The method according to claim 1, wherein the movement information comprises position change information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- when the position change information is movement information of a target image point, controlling the target virtual image to perform movement display in the image display region according to the movement information of the target image point; and
- when the position change information is movement information of a target image region, adjusting a display area of the target virtual image in the image display region according to the movement information of the target image region.
6. The method according to claim 1, the method further comprising:
- displaying an image selection window, the image selection window comprising an image selection element;
- displaying, in response to a trigger operation performed on the image selection element, a reference virtual image in a reference display region, and displaying candidate virtual images in the image selection window;
- updating, in response to an image selection operation performed on the candidate virtual images, the reference virtual image to be displayed as a target candidate virtual image selected by the image selection operation; and
- determining, in response to a virtual image confirm operation, that the target candidate virtual image is the target virtual image.
7. The method according to claim 1, the method further comprising:
- when the target terminal initiates the video session, acquiring a target image of the user;
- performing feature recognition on the user in the target image to obtain a recognition result of the user; and
- assigning a virtual image matching the recognition result of the user as the target virtual image.
8. A computing device acting as a target terminal in a video session, the computing device comprising: a storage apparatus and a processor;
- the storage apparatus being configured to store a computer program; and
- the processor being configured to run the computer program stored in the storage apparatus to implement a method including:
- displaying, by the target terminal, a video session interface, the video session interface including an image display region for displaying images associated with one or more users participating in the video session;
- displaying, by the target terminal, a target virtual image of a user of the target terminal in the image display region;
- controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user; and
- transmitting, by the target terminal, movement data of the target virtual image performing the target interaction action, to terminals of the other users of the video session, wherein the movement data renders the target virtual image to perform the target interaction action on the corresponding terminals.
9. The computing device according to claim 8, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises any one or more of following operations:
- controlling, when the movement information of the user is facial movement information, the target virtual image to perform a corresponding facial interaction action;
- replacing, when the movement information of the user is emotional information, the target virtual image with a target facial image associated with the emotional information, and displaying the target facial image;
- controlling, when the movement information of the user is limb information, a target limb of the target virtual image to perform a limb action; and
- controlling, when the movement information of the user is position change information, the target virtual image to perform a corresponding position change action in the image display region.
10. The computing device according to claim 8, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- acquiring a mesh set added for the target virtual image, the mesh set comprising a plurality of grids and mesh data of the various grids, and one mesh corresponding to one object element; any mesh being composed of at least three mesh vertices, and the mesh data of any mesh referring to state values of the various mesh vertices contained in the any grid; the object elements being single elements constituting the target virtual image;
- performing mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the user; and
- rendering, on the basis of mesh data subjected to the mesh deformation, the target virtual image that performs the target interaction action.
11. The computing device according to claim 8, wherein the movement information comprises emotional information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- recognizing an emotional state of the user according to the emotional information to obtain a current emotional state of the user;
- determining, on the basis of the current emotional state, a target facial resource matching the current emotional state; and
- updating the target virtual image using the target facial resource to obtain an updated target virtual image in the image display region.
12. The computing device according to claim 8, wherein the movement information comprises position change information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- when the position change information is movement information of a target image point, controlling the target virtual image to perform movement display in the image display region according to the movement information of the target image point; and
- when the position change information is movement information of a target image region, adjusting a display area of the target virtual image in the image display region according to the movement information of the target image region.
13. The computing device according to claim 8, wherein the method further comprises:
- displaying an image selection window, the image selection window comprising an image selection element;
- displaying, in response to a trigger operation performed on the image selection element, a reference virtual image in a reference display region, and displaying candidate virtual images in the image selection window;
- updating, in response to an image selection operation performed on the candidate virtual images, the reference virtual image to be displayed as a target candidate virtual image selected by the image selection operation; and
- determining, in response to a virtual image confirm operation, that the target candidate virtual image is the target virtual image.
14. The computing device according to claim 8, wherein the method further comprises:
- when the target terminal initiates the video session, acquiring a target image of the user;
- performing feature recognition on the user in the target image to obtain a recognition result of the user; and
- assigning a virtual image matching the recognition result of the user as the target virtual image.
15. A non-transitory computer-readable memory storing a computer program that, when executed by a processor of a computing device acting as a target terminal in a video session, causes the target terminal to implement a method including:
- displaying, by the target terminal, a video session interface, the video session interface including an image display region for displaying images associated with one or more users participating in the video session;
- displaying, by the target terminal, a target virtual image of a user of the target terminal in the image display region;
- controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user; and
- transmitting, by the target terminal, movement data of the target virtual image performing the target interaction action, to terminals of the other users of the video session, wherein the movement data renders the target virtual image to perform the target interaction action on the corresponding terminals.
16. The computing device according to claim 8, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises any one or more of following operations:
- controlling, when the movement information of the user is facial movement information, the target virtual image to perform a corresponding facial interaction action;
- replacing, when the movement information of the user is emotional information, the target virtual image with a target facial image associated with the emotional information, and displaying the target facial image;
- controlling, when the movement information of the user is limb information, a target limb of the target virtual image to perform a limb action; and
- controlling, when the movement information of the user is position change information, the target virtual image to perform a corresponding position change action in the image display region.
17. The computing device according to claim 8, wherein the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- acquiring a mesh set added for the target virtual image, the mesh set comprising a plurality of grids and mesh data of the various grids, and one mesh corresponding to one object element; any mesh being composed of at least three mesh vertices, and the mesh data of any mesh referring to state values of the various mesh vertices contained in the any grid; the object elements being single elements constituting the target virtual image;
- performing mesh deformation on the mesh data of a target mesh in the mesh set according to the movement information of the user; and
- rendering, on the basis of mesh data subjected to the mesh deformation, the target virtual image that performs the target interaction action.
18. The computing device according to claim 8, wherein the movement information comprises emotional information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- recognizing an emotional state of the user according to the emotional information to obtain a current emotional state of the user;
- determining, on the basis of the current emotional state, a target facial resource matching the current emotional state; and
- updating the target virtual image using the target facial resource to obtain an updated target virtual image in the image display region.
19. The computing device according to claim 8, wherein the movement information comprises position change information, and the controlling, according to movement information of the user captured by the target terminal, the target virtual image displayed in the image display region to perform a target interaction action corresponding to the movement information of the user comprises:
- when the position change information is movement information of a target image point, controlling the target virtual image to perform movement display in the image display region according to the movement information of the target image point; and
- when the position change information is movement information of a target image region, adjusting a display area of the target virtual image in the image display region according to the movement information of the target image region.
20. The computing device according to claim 8, wherein the method further comprises:
- when the target terminal initiates the video session, acquiring a target image of the user;
- performing feature recognition on the user in the target image to obtain a recognition result of the user; and
- assigning a virtual image matching the recognition result of the user as the target virtual image.
Type: Application
Filed: Apr 13, 2023
Publication Date: Aug 3, 2023
Inventors: Wenjing YIN (Shenzhen), Zebiao HUANG (Shenzhen), Xianyang XU (Shenzhen), Shu-hui CHOU (Shenzhen), Zhimiao YU (Shenzhen)
Application Number: 18/134,166