Abstract: Method and system for realizing an interaction between a video input and a virtual network scene. The method includes receiving input video data for a first user at a first terminal, sending information associated with the input video data through a network to at least a second terminal, processing information associated with the input video data, and displaying a video on or embedded with a virtual network scene at least the first terminal and the second terminal. The process for displaying includes generating the video based on at least information associated with the input video data.