METHOD FOR ENHANCING RESOLUTION OF STREAMING FILE

Info

Publication number: 20210160556
Type: Application
Filed: Apr 23, 2019
Publication Date: May 27, 2021
Inventor: Kyoung Ik JANG (Incheon)
Application Number: 16/618,335

Abstract

Disclosed is a method for enhancing resolution at a server for providing video data for streaming, the method including a processing operation for generating the video data, a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information, and a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file to the user device.

Description

Description

BACKGROUND OF THE INVENTION Field of the Invention

The present disclosure relates to a method for enhancing resolution of a video file based on artificial intelligence. More specifically, the present disclosure relates to a method for recovering low image quality data into high image quality data based on an artificial neural network that is trained by extracting information on a grid pattern occurring in the course of changing image data of a streaming file into the low image quality data.

Related Art

Recently, interest in artificial intelligence is increasing. The application of technology using artificial intelligence has been made in various service fields, and the fields in which such artificial intelligence is actively researched include natural language generation, speech recognition, virtual agents, and the like.

On the other hand, such artificial intelligence can be applied in Vision technology field as well. However, research on upscaling of a low resolution image or video file using artificial intelligence has not been made sufficiently. Existing upscaling techniques mainly use the method of compensating awkwardness by replacing the damaged areas of the image with arbitrary colors without using artificial intelligence, and examples of the method include bilinear interpolation and bicubic interpolation.

SUMMARY OF THE INVENTION

The present disclosure is to enhance resolution of a streaming video file using a convenient method.

The present disclosure enables generating video data in a file format divided to enhance a resolution of streamed video data upon video streaming and generating a neural network file.

In an aspect, a method for enhancing resolution include a processing operation for generating the video data, a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information, and a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file to the user device.

According to the present disclosure, it is possible to store video data by reducing a size of the video data and accordingly it is possible to effectively use a storage space.

According to the present disclosure, as data communication is performed by reducing a size of video data, it is possible to reduce an amount of data communication as well and support a data transmission function, including streaming, at a higher speed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a diagram illustrating a configuration of a system for enhancing resolution according to an embodiment of the present disclosure.

FIG. 2 is a diagram illustrating an example of an image quality enhancing operation according to an embodiment of the present disclosure.

FIG. 3 is a block diagram illustrating a configuration of a server according to an embodiment of the present disclosure.

FIG. 4 is a diagram illustrating a configuration of a material processor according to an embodiment of the present disclosure.

FIG. 5 is a diagram illustrating a configuration of a neural network trainer according to an embodiment of the present disclosure.

FIG. 6 is a diagram illustrating an example of size change performed by the size changer according to an embodiment of the present disclosure.

FIG. 7 is a diagram illustrating an example of a deep-learning training operation according to an embodiment of the present disclosure.

FIG. 8 is a diagram illustrating a user device according to an embodiment of the present disclosure.

FIG. 9 is a diagram illustrating a process of generating and transmitting a neural network file for image quality improvement according to an embodiment of the present disclosure.

FIG. 10 is a flowchart illustrating a process of generating a specialized neural network file based on additional learning according to an embodiment of the present disclosure.

DESCRIPTION OF EXEMPLARY EMBODIMENTS

According to an embodiment of the present disclosure, a method for enhancing resolution at a server for providing video data for streaming includes a processing operation for generating the video data, a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information, and a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file to the user device.

The invention may be variously modified in various forms and may have various embodiments, and specific embodiments thereof will be illustrated in the drawings and described in detail.

However, these embodiments are not intended for limiting the invention. Terms used in the below description are used to merely describe specific embodiments, but are not intended for limiting the technical spirit of the invention. An expression of a singular number includes an expression of a plural number, so long as it is clearly read differently.

Terms such as “include” and “have” in this description are intended for indicating that features, numbers, steps, operations, elements, components, or combinations thereof used in the below description exist, and it should be thus understood that the possibility of existence or addition of one or more different features, numbers, steps, operations, elements, components, or combinations thereof is not excluded.

On the other hand, elements of the drawings described in the invention are independently drawn for the purpose of convenience of explanation on different specific functions, and do not mean that the elements are embodied by independent hardware or independent software. For example, two or more elements out of the elements may be combined to form a single element, or one element may be split into plural elements. Embodiments in which the elements are combined and/or split belong to the scope of the invention without departing from the concept of the invention.

FIG. 1 is a diagram illustrating a configuration of a system for enhancing resolution according to an embodiment of the present disclosure.

The system for enhancing resolution according to an embodiment of the present disclosure may include a server 100 and a user device 200, as shown in FIG. 1.

According to an embodiment, the server 100 may include a server for providing a video on demand (VOD) service to the user device 200.

The server 100 may transmit video data to the user device 200 to provide a VOD service. At this point, the server 100 may transmit, to the user device 200, not just original video data, but a downscaled file of the original video data of which resolution is degraded. In addition, according to an embodiment of the present disclosure, the server 100 may calculate a neural network file, which is a file required to recover resolution of the video data (the downscaled file) to a preset match rate or higher, and the server may transmit the neural network file to the user device 200. Accordingly, the user device 200 may enhance the resolution of the low-quality data (the downscaled file), provided from the server 100, based on the neural network file.

In addition, the user device 200 may select video data (e.g., content name selection) to be transmitted, and request streaming or downloading of the selected video data from the server 100. In addition, the user device 200 may calculate user viewing pattern information based on video data selection information and video data reproduction information of the user device, and transmit the user viewing pattern information to the server 100.

FIG. 2 will be referred to describe to briefly explain an operation performed by the user device 200 to enhance resolution.

FIG. 2 is a diagram illustrating an example of an image quality enhancing operation according to an embodiment of the present disclosure.

As shown in FIG. 2, a user device 200 may generate a video file of which resolution is enhanced through a neural network file. At this point, the neural network file according to an embodiment of the present disclosure may be combined with any video file transmitted to the user device 200, thereby enhancing resolution.

A video file transmitted from a server 100 to the user device 200 for a streaming or downloading purpose may be a content divided into multiple segments, as shown in FIG. 2. In response, the neural network file may be also divided to correspond to respective video file segments. After being transmitted from the server 100 to the user device 200, the respective neutral network file segments and the respective video file segments may be labeled to be combined in the user device 200.

In the user device 200, each video file segment may be matched with a corresponding neural network file segment, thereby enhancing resolution. More specifically, the neural network file may include data on an artificial neural network algorithm for recovering resolution of the video file, and accordingly, the user device may perform an artificial neural network computing process using the respective video file segments and the neural network file segments so as to recover resolution.

According to an embodiment, a video file of the present disclosure may be a downscaled file corresponding to a low image quality data having resolution into which resolution of video data included in a server is converted, or may be the original video data having resolution equal to or smaller than a reference.

FIG. 3 is a block diagram illustrating a configuration of a server according to an embodiment of the present disclosure.

The server 100 according to an embodiment of the present disclosure may include a communicator 110, a storage 120, and a controller 130, as shown in FIG. 3. The controller 130 may include a material processor 131, a neural network trainer 132, a result evaluator 133.

The communicator 110 may use a network for data transmission and reception between a user device and a server, and a type of the network is not limited. The network may be, for example, an All IP network providing a service of transmitting and receiving large-scale data through an internet protocol (IP) or may be an ALL IP network which is a combination of different IP networks. In addition, the network may be one of a wired network, a wireless broadband (Wibro) network, a mobile communication network including WCDMA, a high speed downlink packet access (HSDPA) network, a mobile communication network including a long term evolution (LTE) network, a mobile communication network including an LTE advanced (LTE-A) and Five Generation (5G), a satellite communication network, and a Wi-Fi network, or may be a combination of at least one of the aforementioned networks.

The communicator 110 according to an embodiment of the present disclosure may perform data communication with an external web server, a plurality of user devices, and the like. For example, the communicator 110 may receive content data (a photo and a video) including an image from another web server or a user device (including a device for a manager). As the server 100 includes a server for providing a VOD service, the server 100 may transmit a VOD content to the user device 200.

According to various embodiments, the communicator 110 may receive and transmit a VOD file for a VOD service. However, aspects of the present disclosure are not limited thereto, and the communicator 110 may perform a communication function for collecting learning data required to generate a neural network file for enhancing resolution.

The neural network file may contain information necessary to recover resolution of damaged image data to be similar to original data through an artificial neural network algorithm, and may include information on various parameters necessary to be selected when the artificial neural network algorithm is driven.

The storage 120 may include, for example, an internal memory or an external memory. The internal memory may include, for example, at least one of a volatile memory (e.g., a dynamic RAM (DRAM), a static RAM (SRAM), or a synchronous dynamic RAM (SDRAM), and the like), and a non-volatile memory (e.g., a one time programmable ROM (OTPROM), a programmable ROM (PROM), an erasable and programmable ROM (EPROM), an electrically erasable and programmable ROM (EEPROM), a mask ROM, a flash ROM, a flash memory (e.g., a NAND flash memory, a NOR flash memory, or the like), a hard drive, or a solid state drive (SSD)).

The external memory may further include a flash drive, for example, a CF (compact flash), a SD (secure digital), a Micro-SD (micro secure digital), a Mini-SD (mini secure digital), a XD (extreme digital), an MMC (multi-media card), a memory stick, or the like. The external memory may be functionally and/or functionally connected with an electronic device through various interfaces.

The storage 120 according to an embodiment of the present disclosure may store original data matched with processed data (data corresponding to original image data reduced to a predetermined rate or data corresponding to the reduced original data enlarged to an original data size), which is obtained by processing image data (e.g., photo and video data) received from a user device (or a manager device) or another web server. The original data and the processed data may be used to extract information regarding a grid phenomenon that occurs when resolution is reduced.

In addition, after the information regarding the grid phenomenon is extracted, the storage 120 may store a neural network file for recovering resolution to an original image level, by removing a grid from the processed data through an artificial intelligence algorithm (e.g., a Super-Resolution Convolutional Neural Network (SRCNN)).

The controller 130 may be referred to as a processor, a controller, a microcontroller, a microprocessor, a microcomputer, or the like. The controller may be implemented by any one of hardware, firmware, and software, or a combination thereof.

In the implementation by firmware or software, an embodiment of the present disclosure may be implemented by a module, a procedure, a function, and the like for performing the above-described functions or operations. A software code may be stored in a memory and executed by the controller. The memory may be located inside or outside the user device and a server, and may exchange data with the controller through various well-known means.

The controller 130 according to an embodiment of the present disclosure may generate a neural network file that is a file required to improve resolution of image data through computation based on an artificial neural network.

The controller 130 may include a material processor 131, a neural network trainer 132, a result evaluator 133, and a use pattern calculator 134.

The material processor 131 may collect and process learning material necessary to produce a neural network file required to improve image quality of video data. The material processor 131 may perform primary change (reduction) and secondary change (enlargement of reduced image data) on a collected material. A detailed description of an operation of the material processor 131 will be provided with reference to FIG. 4.

The neural network trainer 132 may train a neural network through artificial intelligence based on processed data that is obtained after collecting and processing of a material by the material processor 131. The neural network trainer 132 may set a parameter required for a training process, and produce a neural network. A detailed description of the neural network trainer 132 will be described with reference to FIG. 5.

The result evaluator 133 may evaluate a result value obtained by applying the neural network file, produced by the neutral network trainer 132, to a user device 200.

Specifically, the result value evaluator 133 may determine an enhanced degree of resolution of data resulting from applying the neural network file to the user device 200. The result evaluator 133 may determine an error rate between result data and original data, the error rate resulting from applying the neural network file. At this point, a unit of comparison between the result data and the original data may be each frame included in an image or may be a piece into which an image is divided for transmission of the image.

Alternatively, according to various embodiments, based on image identicality, each image data may be divided into a plurality of frame chunks (e.g., when an image is displayed over 100 frames, 100 frames may be set as one chunk and there may be a plurality of frame chunks). Accordingly, a unit of comparison used to compare the result data and the original data may be a chunk unit that is divided based on image identicality.

Further, when it is determined that the error rate between the result data and the original data is equal to or greater than a reference, the result evaluator 133 may request modifying a weight, a bias, and the like that form the neural network file. That is, through the comparison between the original data and the result data, the result evaluator 133 may determine whether it is necessary to modify a parameter forming the neural network file.

According to an embodiment, the result evaluator 133 may calculate an importance of each image object from the original data, the importance which is required to comprehend an image. When it is determined that an error rate for one unit (e.g., one frame, one frame chunk, and the like) between the original data and the result data (data of which resolution is enhanced by applying the neural network file in the user device 200) is equal to or greater than a preset value while an image object equal to or greater than the preset value is included in the corresponding one unit, the result evaluator 133 may request modifying a weight, a bias, and the like forming the neural network file.

The importance for each image object may be calculated based on a size ratio of a corresponding image object occupied in one frame, a repetition rate of the corresponding image object, and the like.

According to various embodiments, the result evaluator 133 may calculate an importance for an image object based on a content characteristic of video data. First, the result evaluator 133 may check the content characteristic of the video data. At this point, the content characteristic of the video data may be calculated by the material processor 131. For example, the content characteristic of the video data may be calculated based on information on an uploading path of the video data to the server 100 (e.g., a name of a folder selected by a user or a manager when uploading a video file to the server 100), a content genre or field input by the user or the manager when uploading the corresponding video file to the server 100, and the like. The calculated content characteristic of the video data may be managed as metadata of the corresponding video data.

Accordingly, the result evaluator 133 may check content characteristic information of each video data extracted and stored at a point in time of uploading to the server and calculate an importance for each image object based on the content characteristic information. For example, the result evaluator 133 may classify an image object into categories including a human face, a text (e.g., subtitle), an object, and the like, and determine a category matched with the content characteristic information. Specifically, an item with a high image object importance in a drama content may be set as a human face, and a text item in a lecture content may be set to have a high importance.

FIG. 4 is a diagram illustrating a configuration of a material processor according to an embodiment of the present disclosure.

The material processor 131 according to an embodiment of the present disclosure may include a size changer 131a, an image divider 131b, and a characteristic area extractor 131c, as illustrated in FIG. 4.

In order to produce a neural network file for improving image quality of video data, a material to be input to an input layer or a feature value of the input material need to be prepared in the neural network trainer. The material and data to be input to the input layer may be prepared by the material processor 131.

First, the size changer 131a may perform primary size change for reducing a size of an image of video data from an original size to a preset value, and secondary size change for enlarging an image resulting from the primary adjustment to the original size. At this point, the secondary size change may be selectively performed. Size change performed by the size changer 131a will be described with reference to FIG. 6.

FIG. 6 is a diagram illustrating an example of size change performed by the size changer according to an embodiment of the present disclosure.

As illustrated in FIG. 6, the size changer 131a may perform an operation a, which corresponds to primary size change for reducing an original image 605 to a predetermined rate, and may perform an operation b, which corresponds to secondary size change for enlarging the a reduced image 610 resulting from the operation a to the same size of the original image. An image 615 generated after the processing operation (the primary change (a) and the secondary change (b)), may have resolution lower than resolution of the original image 605. It is because only the size of the image is enlarged without increasing the number of pixels that form the image.

When the image 615 (having resolution identical to the resolution of the image 610) and the original image 605 are compared, the image 615 increases in pixel size and accordingly a grid is formed in a mosaic shape.

According to an embodiment of the present disclosure, the server 100 may perform neural network training based on the processed image 615 and the original image 605 to in order to convert resolution conversion from resolution of the low-quality downscaled image 615 to resolution of the original image 605. To this end, the size changer 131a of the material processor 131 may perform primary size change for reducing the size of the original image by a preset value, and secondary size change for enlarging the image reduced by the primary size change to the same size as the original image. Further, the material processor 131 may extract the original image and a processed image, generated resulting from the first size change and the second size change, as learning data.

According to various embodiments, the material processor 131 may extract pattern information (location information, color information, and the like) of a grid formed in the processed image (an image enlarged after size reduction), and utilize data on the pattern information as input data for neural network training.

The image divider 131b may divide video data stored in the server 100 by a preset standard. At this point, the image divider 131b may perform an operation of dividing the video data based on the number of frames. Alternatively, the image divider 131b may divide the video datum by binding frames into chucks each having a match rate of image objects being equal to or greater a preset reference (e.g., 90%).). For example, a unit of division may be photographing the same person. In addition, the image divider 131b may divide video data into a plurality of chucks on the basis of a unit transmitted from the server 100 to the user device when a streaming service is provided.

The chunks divided by the image divider 131b may be utilized to train an artificial neural network and evaluate a resolution enhancement result.

The characteristic area extractor 131c may extract a characteristic area with a characteristic image included therein with reference to each frame or division unit of video data. The characteristic area extractor 131c may determine as to whether an image area meeting a preset characteristic area requirement is present in each frame or division unit. According to an embodiment, the characteristic area may be determine according to whether there is an image object of which an image object importance corresponding to a content genre is equal to or greater than a preset value. For example, the characteristic area extractor 131c may set an image object importance of a face image of a main character in a drama content to be higher, and accordingly, a characteristic area may be set as an area in which the face image of the main character is displayed (e.g., removed area and an object display area distinguishable from the background).

The characteristic area extractor 131c may extract not just a characteristic area in an image, but also a specific frame or a specific division unit in images of the whole frames of the video data.

A learning importance weight may be applied to the characteristic area extracted by the characteristic area extractor 131c, so that the learning repetition number increases. Alternatively, the characteristic area extracted by the characteristic area extractor 131c may be requested to generate an increased number of processed data. For example, when it is assumed that there are an area a set as a characteristic area in one frame and an area b set as a normal area, five processed images for the area a may be generated through size reduction (e.g., 80%, 50%, 30%, 20%, and 10%) that is performed an increased number of times (e.g., five), and two processed image for the area b may be generated through size reduction that is performed a normal number of times (e.g., two). As a result, a resolution recovery accuracy of a characteristic area selected by the characteristic area extractor 131c may be set to be higher than that of a normal area.

FIG. 5 is a diagram illustrating a configuration of a neural network trainer according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the neural network trainer 132 may include a learning importance identifier 132a, a similar data learning supporter 132b, and a neural network file calculator 132c, as illustrated in FIG. 5.

The neutral network trainer 132 may perform a deep-learning training process based on an artificial neural network and accordingly generate a neural network file that is a file required to improve image quality of low-resolution video data.

For brief description about a deep-learning training operation based on a neural network, FIG. 7 is referred to.

FIG. 7 is a diagram illustrating an example of a deep-learning training operation according to an embodiment of the present disclosure.

Referring to FIG. 7, there is illustrated perceptron that is a neural network model including an input layer, a hidden layer, and an output layer. According to an embodiment, neural network training according to the present disclosure may be performed using multi-layer perceptron that is implemented to include at least one hidden layer. Basically, the perceptron may receive an input of multiple signals and output one signal in response.

A weight and a bias required for a computation process using an artificial neural network model may be calculated through backward propagation. The artificial neural network training process extracts proper weight data and bias data through the backward propagation. Even in the present disclosure, a neural network file calculated through an artificial neural network may include the proper weight data, the bias data, and the like.

A training method using an artificial neural network through backward propagation, and parameter modification are well-known technologies and thus a detailed description thereof is omitted.

Preferably, the neural network trainer 132 according to an embodiment of the present disclosure may perform training using a convolution neural network (CNN) model in artificial neural network models. The CNN is characterized by maintaining a shape of input/output data of each layer, effectively recognizing a feature of an adjacent image while maintaining spatial information of an image, and extracting and learning a feature of an image through a plurality of filters.

A basic operating method of the CNN may utilize learning that is performed by scanning an area of a part of one image through a filter and discovering a result for the area. In this case, discovering a filter having a proper weight is the goal of the CNN. In the CNN, the filter may be generally defined as a square matrix such as (4,4) and (3,3). A set value of the filter for the CNN according to an embodiment of the present disclosure is not limited. The filter may calculate a convolution by iterating over input data at a predetermined interval.

According to an embodiment of the present disclosure, when receiving image datum provided as learning data, the learning importance identifier 132a may identify a learning importance assigned to a characteristic area or a specific frame chunk of the learning data.

According to an embodiment, the characteristic area 131c may divide one frame into a plurality of areas and set at least one area from among the plurality of divided areas as a characteristic area. In this case, a criterion for dividing a frame is not limited. In addition, when selecting a characteristic area, the characteristic area extractor 131c may assign a different learning importance for each characteristic area according to a set reference element (e.g., an image object importance, a size, and the like).

Based on the above, after receiving learning data from the material processor 131, the learning importance identifier 132a may identify a learning importance included in the learning data. Specifically, the learning data received from the material processor 131 may be identified as a plurality of divided segments, and accordingly, the learning importance identifier 132a may identify a learning importance assigned to each divided unit. Alternatively, the learning data received from the material processor 131 may be in a divided state, and accordingly, the learning importance identifier 132a may identify a learning importance assigned to each frame. In addition, when there are at least two areas to which learning importance is set differently, the learning importance identifier 132a may identify a learning importance for each area in a frame.

Then, the learning importance identifier 132a may extract learning option information regarding a learning number indicated by a learning importance assigned to each division unit or each frame, whether learning is performed using similar data, and the like. An operation of extracting the option information indicated by the learning importance may be, for example, in the case of an item with a learning importance of 1, extracting option information indicating that a learning number is three and a learning process using similar data is not performed, and, in the case of an item with a learning process of 2, extracting option information indicating that a learning number is four and a learning process using similar data is not performed.

The learning importance identifier 132a may transmit an instruction in accordance with option information indicated by a learning importance to the similar data learning supporter 132b and the neural network file calculator 132c.

The similar data learning supporter 132b may support to perform learning using similar data. When the learning importance identifier 132a identifies option information corresponding to a learning importance and accordingly determines that learning through similar data in the option information is being performed, the learning importance identifier 132a may transmit a relevant instruction to the similar data learning supporter 132b and the neural network calculator 132c.

Accordingly, as an instruction signal indicative of similar data learning from the learning importance identifier 132a is received, the similar data learning supporter 132b may perform an operation of acquiring similar data similar to a target image. The similar data may indicate a similar image that is found through an external web and the like. For example, when a cosmos image is provided as learning data and a similar data learning instruction is received with respect to the corresponding data, the similar data learning supporter 132b may search for and acquire cosmos images through a portal web and the like. From among found images, the similar data learning supporter 132b may select and acquire similar data based on similarity in the number of objects in the respective found images, similarity in resolution, similarity in color combination, and the like.

The neural network file calculator 132c may set an initial parameter value for performing a process regarding image data through a CNN.

The neural network file calculator 132c may determine a frame size of original data and a reduction rate which is set in the original data when processed data is generated, and may set a corresponding initial parameter.

In addition, according to various embodiments, the neural network file calculator 132c may specify a type of image data required for artificial neural network training, and request inputting the corresponding image data as learning data. As for a content which has a high proportion of human characters such as a drama and a movie, the neural network file calculator 132c may additionally request frame information including a relevant image object as learning data from the similar data learning supporter 132b in order to perform repetitive learning with respect to a main character.

The neural network file calculator 132c may perform learning by inputting a material processed by the material processor 131 into a preset artificial neural network model. In this case, the neural network file calculator 132c may extract information on a grid generated in the course of changing original data into processed data (grid generation pattern information), by inputting the original data and the processed data (reduced to a preset rate) into a CNN algorithm). More specifically, the grid generation pattern information calculated by the neural network file calculator 132c may be calculated based on a difference between the original data and the processed data, and may include pattern information regarding a location of the grid, and color change of the grid, and the like.

The neural network file calculator 132c may generate a neural network file required to recover the original image, by removing the grid from the processed data based on the calculated grid generation pattern information. The neural network file calculator 132c may perform computation by inputting downscaled data (processed data) as input data into an artificial neural network algorithm. When a match rate in resolution between data output from the computation and the original data is equal to or greater than a preset value, the neural network file calculator may terminate a data learning process. In a similar manner, the neural network file calculator 132c may repeatedly perform an operation of inputting a countless number of various types of processed data into an input layer to determine a match rate between an artificial neural network computation result and original data.

According to various embodiments, the neural network file calculator 132c may calculate grid generation pattern information that is created when an image of a specific size is reduced by inputting various types of original data and processed data. Accordingly, the neural network file calculator 132c may calculate grid generation pattern information that is commonly created not just in a specific image but also in various images when image reduction is performed.

The neural network file calculator 132c may input processed data into the input layer, and, when a match rate between output data and original data is equal to or greater than the preset value, the neural network file calculator 132c may generate a neural network file including information regarding parameters (e.g., a weight, a bias, a learning rate, and the like) set in a corresponding artificial neural network algorithm, an activation function for each layer, and the like.

That is, when the neural network file generated by the neural network file calculator 132c is transmitted to the user device 200, the user device 200 receives the neural network file and perform an artificial neural network test of low image quality video data (downscaled data) based on information on the neural network file and accordingly the user device may perform a function of enhancing resolution of the video data.

FIG. 8 is a diagram illustrating a user device according to an embodiment of the present disclosure.

As illustrated in FIG. 8, a user 200 according to an embodiment of the present disclosure may include a communicator 210, a storage 220, an input part 230, a display part 240, a camera part 250, and a controller 260. The controller 260 may include a video reproducer 261, a resolution converter 262, and a user information collector 263.

The communicator 210 according to an embodiment of the present disclosure may perform a communication function to receive a neural network file and video data from a server 100. Further, the communicator 210 may perform a communication operation to transmit feedback information collected from the user device 200 to the server 100.

According to an embodiment, the storage 220 may store the neural network file and the video data, each received from the server 100. According to various embodiments, the storage 220 may store or temporarily store result data (resolution enhanced data), that is a result from computation that is performed by applying a neural network file to downscaled data having resolution equal to or smaller than a preset reference.

The storage 220 may store the generated feedback information. Alternatively, the storage 220 may store information required to calculate feedback information. For example, when one frame is extracted from result data (resolution enhanced data), generated as the result of the computation of the artificial neural network algorithm, to provide feedback, the storage 220 may store reference information regarding the extraction (e.g., whether a user's frown face is detected during reproduction of a video, a content regarding extracting a frame corresponding to a timing when the frown face is detected, and the like).

The input part 230 may receive user selection information regarding a content genre, a content name, and the like.

When video data received from the server 100 or result data obtained after resolution enhancement of the video data is reproduced, the display part 240 may display a reproduction screen of a corresponding video.

The camera part 250 may photograph a picture and a video in response to a user request. Image information regarding the picture and the video photographed by the camera part 250 may be uploaded to the server 100 or another web server. Alternatively, image information photographed by the camera part 250 may be transmitted to another user device.

When photographing image information such as a picture and a video, or the like, the camera part 250 may first determine resolution based on a user request. According to an embodiment, based on whether a neural network file for image quality improvement is installed, the camera part 250 may store the photographed picture or video in a manner in which resolution of the photographed picture or video is reduced to a preset level or lower.

According to various embodiments, while resolution enhanced result data is reproduced, the camera part 250 may operate a camera that regularly at a preset reference interval photographs a user's face. It is possible to determine the user's facial expression or frowning face and extract feedback information in response to a determination.

The controller 260 may convert resolution of a video file downloaded from the server 100 or reproduce the video file. Specifically, the controller 260 may include a video reproducer 261, a resolution converter 262, and a user information collector 263.

First, the video reproducer 261 according to an embodiment of the present disclosure may perform a control to reproduce a streamed video file so that the streamed video file is displayed on the display part 240. The video reproducer 261 may determine resolution of video data which is requested to be output. When it is determined that the resolution of the video data requested to be output is requested to be improved to a preset level or lower, the video reproducer 261 may request resolution enhancement from the resolution converter 262. Then, the video reproducer 261 may reproduce resolution enhanced file through the resolution converter 262.

The resolution converter 262 may determine a current resolution of image data (a picture and a video) and a target resolution requested by a user. At this point, during a streaming operation, the resolution converter 262 may match segmented video data received from the server 100 and a neural network file and then execute an artificial neural network algorithm to convert downscaled data to a desired resolution.

The user information collector 263 may collect user information for feedback. The user information collector 263 may select a frame to be used as a feedback information from among result data obtained after resolution enhancement is performed based on an artificial neural network algorithm, and may store the selected frame. For example, while a user reproduces resolution enhanced video, the user information collector 263 may acquire the user's face information, and, when an event such as the user's frowning face occurs, the user information collector 263 may collect video frame information being displayed at a time when the event occurs.

In addition, the user information collector 263 may collect content information such as an item, a genre, and the like of a content that has been reproduced at or above a reference level. For example, the user information collector 263 may determine a reproduction frequency of an animation content compared to a documentary content (based on a photo image), a frequency of reproduction of a subtitleless content compared to a subtitle-contained content, and the like, and collect information regarding the determination. Reproduction information collected by the user information collector 263 may be provided to the server 100, and the server may generate user pattern information based on the reproduction information.

FIG. 9 is a diagram illustrating a process of generating and transmitting a neural network file for image quality improvement according to an embodiment of the present disclosure.

A server 100 according to an embodiment of the present disclosure may generate a neural network file for resolution enhancement and transmit the neural network file to a user device.

Specifically, the server 100 may first perform an operation 705 to process video data present in a preset data set. In this case, the data set may mean a learning data set for generating a basic neural network file. The data set may consist of random video data having various genres, various subjects, and various formats, and some meta-information including resolution may be normalized. Accordingly, various types of video data included in the data set may have been pre-processed to have the same resolution.

The operation 705 may be an operation for generating data to be learned by an artificial neural network algorithm, and may perform a downscaling process to degrade resolution of video data to generate data appropriate for learning.

According to an embodiment, the operation 705 may be a processing operation (image reproduction, downscaling) for each frame included in a video file. Alternatively, the operation 705 may be an operation of selecting a frame to be input to train an artificial neural network through sampling on a division unit and then processing (downscaling to a preset rate) the corresponding selected frame. For example, when it is assumed that a video file having 2400 frames in total is composed of 100 chunks each consisting of 24 frames, the server may sample one frame per corresponding video division unit and thereby process a total of 100 frames into learning data.

After the operation 705, the server 100 may perform an operation 710 to acquire grid generation pattern information based on processed video data. The processed video data may mean data obtained by reducing a size of original data (original data for learning is indicated from among data having resolution equal to or greater than a preset resolution) to a preset rate.

When a size of one image is reduced, downscaling occurs which indicates a state in which the number of pixels to be displayed in the same area is reduced and in turn resolution is automatically reduced. Accordingly, when a size of an image is reduced, a reduced resolution is maintained even after the image is enlarged to an original image size. As a result, a grid phenomenon occurs.

The server may acquire grid generation pattern information by comparing processed image with the grid phenomenon and the original image. The acquired gating occurrence pattern information may be used later to recover resolution by removing a grid from the image in which the grid phenomenon occurs.

After the operation 710 of acquiring the grid generation pattern information, the server 100 may perform an operation 715 to generate a neural network file for image quality improvement based on the grid generation pattern information. Then, the server 100 may generate a basic neural network file by calculating artificial neural network algorithm information (an activation function for each layer, a weight, a bias, and the like) that is required to recover the original image by removing grid from downscaled image data in which the grid has occurred.

An element provided as a result value, such as a weight, a bias, and the like may be determined based on a match rate between a final resulting product (image quality improved data) and original image data. When the match rate is equal to or greater than a preset level, the server 100 may determine weight and bias information, which has been applied when computing the corresponding artificial neural network, as information to be included in a neural network file.

Then, the server 100 may perform an operation 720 to confirm that a first streaming request (or a download request) regarding video data is received from the user device 200. In response to a user request, the server 100 may perform an operation 725 to transmit a low image quality version of the requested video data (downscaled data) along with a basic neural network file for image quality improvement. Accordingly, as the user device 200 receives the low image quality version of the video (downscaled data), the user device 300 may receive a content easily without a constraint to a network environment. By applying the received basic neural network file to the low image quality version of the video data (downscaled data), it is possible to reproduce a high image quality image at a level desired by a user.

FIG. 10 is a flowchart illustrating a process of generating a specialized neural network file based on additional learning according to an embodiment of the present disclosure.

According to an embodiment of the present disclosure, the controller 130 of the server 100 may perform an operation 805 to acquire new video data and confirm the acquisition of the data. Then, the controller 130 may perform an operation 810 to identify an additional learning condition of the new video data. For example, the controller 120 may determine whether a result of performing a recovery operation based on a basic neural network file generated in the operation 715 of FIG. 9 has a recovery rate equal to or greater than a reference level, and accordingly the controller 120 may determine whether to perform additional learning. In this case, the determination as to the recovery rate may be performed based on a structural similarity (SSIM) (which is an indicator to measure similarity of two images) and a peak signal to noise ratio (PSNR).

According to an embodiment, when it is determined that the newly acquired video data satisfies an additional learning condition (e.g., an SSIM less than 0.90, a PSNR less than 30), the controller 130 may perform an operation 815 to perform additional learning on the new video data. The additional learning may mean performing specialized learning of an image with respect to one item of the new video data. According to an embodiment, in order to perform additional learning, the controller 130 may change a meta-value including resolution of the new video data according to a standard used in previously performed image learning. As a pre-processing for equalizing the standard to data set used when generating the basic neural network file is completed, additional image learning may be performed.

As a result of performing the operation 815, the controller 130 may perform an operation 820 to generate a specialized neural network file regarding the new video data. In this case, the specialized neural network file may be generated by additionally performing artificial neural network training with respect to newly added video data after applying an artificial neural network algorithm and a parameter to initial values by the basic neural network file. That is, in order to generate the specialized neural network file, an operation of retrieving the basic neural network file generated in the operation 715 of FIG. 7 may be necessarily performed beforehand.

Then, when an operation 825 is performed to confirm reception of a second streaming request from the user device 200, the controller 130 may perform, in response, an operation 830 to transmit a downscaled file and the specialized neural network file to the user device 200.

The second streaming request may be to request the specialized neural network file to recover resolution of video data. In addition, the first streaming request and the second streaming request may be differentiated based on a type of a service used by a user. For example, a request received from a user of a low pricing plan is the first streaming request corresponding to a streaming method for providing a basic neural network file, and a streaming request received from a user of a relatively high pricing plan may be a second streaming request corresponding to a method for transmitting a specialized neural network file in some cases.

Although not illustrated in FIG. 10, the user device 200 may transmit feedback information regarding a state of video data that has been completely reproduced or converted, according to various embodiments. Accordingly, the user device 200 may calculate reproduction associated information for each user, such as a content genre reproduced at or above a reference level, a content characteristic, a primary reproduction requested time, and the like, and transmit the reproduction associated information to the server 100.

Further, the user device 200 may provide a frame sample of resolution enhanced result data to the server 100 in a preset period. Accordingly, the server 100 may compare a result data frame calculated after resolution enhancement, which is received from the user device 200, and an original data frame of the same content. The transmitted frame information may include reproduction location information in a content, and accordingly. The server 100 may retrieve a comparable frame image from original data.

The server 100 may compare an image frame provided as feedback and an original image frame of a corresponding content, and determine a match rate therebetween. When it is determined that the match rate is equal to or smaller than a preset reference, the server 100 may request a re-learning operation to update a neural network file and accordingly perform the re-learning operation.

Meanwhile, a neural network file generated according to various embodiments of the present disclosure may be compressed, when necessary. As an embodiment, the server may compress a neural network file in consideration of performance of the user device 200 and transmit the compressed neural network file to the user device 200.

The neural network file may be comprised using at least one of ng, Quantization, Decomposition, or Knowledge Distillation. Pruning is a compression technique for deleting a weight and a bias that are insignificant or do not affect an output value from among weights and biases of a neural network file. Quantization is a compression technique for quantizing respective weights to a preset bit. Decomposition is a compression technique for reducing a size of a weight by performing approximated decomposition of a weight matrix or tensor which is a set of weights. Knowledge Distillation is a compression technique for generating a student model smaller than an original model by using the original model as a teacher model and for learning the student model. In this case, the student model may be generated through Pruning, Decomposition, or Quantization, described above.

In this case, a degree of compression in accordance with performance of the user device 200 may be determined in various ways. In an embodiment, the degree of compression of a neural network file may be determined based on simple specification of the user device 200. That is, the degree of compression may be determined based on specification of a processor of the user device 200 and specification of a memory of the user device 200.

In another embodiment, the degree of compression of the neural network file may be determined based on a use state of the user device 200. Specifically, the server 100 may receive use state information from the user device 200, and acquire available resource information of the user device 200 in accordance with the received use state information. The server 100 may determine a degree of compression of a neural network file based on the available resource information. In this case, the available resource information may be information regarding an application being executed by the user device 200, a CPU or GPU occupancy rate determined depending on the application in execution, and information regarding a memory capacity of the user device 200.

For example, a method for enhancing resolution at a server for providing video data for streaming includes a processing operation for generating the video data, a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information, and a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file to the user device.

The generating operation may include a file generating operation for generating a basic neural network file based on a plurality of video data items included in a preset data set, an additional learning operation for, in response to a determination that any acquired new video data satisfies an additional learning condition, performing additional learning on the new video data, wherein the additional learning is performed through an artificial neural network algorithm to which the basic neural network file is applied, and a specialized neural network file generating operation for generating a downscaled file of the new video data as a result of the additional learning and a specialized neural network file corresponding to the new video data.

In addition, the additional learning operation may include an operation for determining whether the additional learning condition is satisfied according to a structural similarity (SSIM) and a peak-signal-to-noise ratio (PSNR) that are obtained by performing resolution recovery on the downscaled file of the new video data based on the basic neural network.

The processing operation may include a dividing operation for dividing the video data into a plurality of chucks by bundling a plurality of frames having a match rate of image objects being equal to or greater than a reference into one chunk, and a size changing operation for performing primary change to reduce a size of an image included in the video data by a preset value from an original size and selectively performing secondary change to enlarge the image having gone through the primary change to the original size.

The processing operation may include a characteristic area extracting operation for extracting a characteristic area including a characteristic image on the basis of each frame or division unit of the video data and assigning a learning importance to the extracted characteristic area. The characteristic area may include an image object of which an image object importance corresponding to a content field is equal to or greater than a preset value.

The generating operation may include a learning importance identifying a learning importance assigned to a characteristic area or a specific frame chuck of learning video data and extracting option information indicated by the learning importance, and a neural network file calculating operation for inputting original data in an original size and processed data reduced to a preset rate in the learning video data into a convolution neural network (CNN) algorithm to be learned. A neural network file is generated including a parameter and an activation function of an artificial neural network, the parameter ad the activation function which cause a match rate between a computation result value obtained by inputting the processed data into an artificial neural network and the original data to be equal to or greater than a preset value. The option information comprises a learning number and information regarding whether learning is performed through similar data.

The generating operation may further include a similar data acquiring operation for, when performing learning on learning data with a learning importance set thereto, in response to identifying an instruction for performing leaning using the similar data, acquiring the similar data similar to a target image to be learned. The similar data is acquired based on similarity in resolution and color combinations.

While the present disclosure has been described in detail with reference to the above-described examples, it should be understood by those skilled in the art that various changes, modifications, and alternations may be made without departing from the spirit and scope of the invention. To sum up, it should be also understood that it is not necessary to include all functional blocks illustrated in the drawings or to follow all sequences illustrated in the drawings as the sequences illustrated in order to achieve the effects intended by the present invention, and it should be also understood that all technical ideas within the equivalent scope belong to the technical scope of the present invention described in claims.

Claims

1. A method for enhancing resolution at a server for providing video data for streaming, the method comprising:

a processing operation for generating the video data;

a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information; and

a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file to the user device.

2. The method of claim 1, wherein the generating operation comprises:

a file generating operation for generating a basic neural network file based on a plurality of video data items included in a preset data set;

an additional learning operation for, in response to a determination that any acquired new video data satisfies an additional learning condition, performing additional learning on the new video data, wherein the additional learning is performed through an artificial neural network algorithm to which the basic neural network file is applied; and

a specialized neural network file generating operation for generating a downscaled file of the new video data as a result of the additional learning and a specialized neural network file corresponding to the new video data.

3. The method of claim 2, wherein the additional learning operation comprises an operation for determining whether the additional learning condition is satisfied according to a structural similarity (SSIM) and a peak-signal-to-noise ratio (PSNR) that are obtained by performing resolution recovery on the downscaled file of the new video data based on the basic neural network.

4. The method of claim 1, wherein the processing operation comprises:

a dividing operation for dividing the video data into a plurality of chucks by bundling a plurality of frames having a match rate of image objects being equal to or greater than a reference into one chunk; and

a size changing operation for performing primary change to reduce a size of an image included in the video data by a preset value from an original size and selectively performing secondary change to enlarge the image having gone through the primary change to the original size.

5. The method of claim 3, wherein:

the processing operation comprises a characteristic area extracting operation for extracting a characteristic area including a characteristic image on the basis of each frame or division unit of the video data and assigning a learning importance to the extracted characteristic area; and

the characteristic area comprises an image object of which an image object importance corresponding to a content field is equal to or greater than a preset value.

6. The method of claim 1, wherein the generating operation comprises:

a learning importance identifying a learning importance assigned to a characteristic area or a specific frame chuck of learning video data and extracting option information indicated by the learning importance; and

a neural network file calculating operation for inputting original data in an original size and processed data reduced to a preset rate in the learning video data into a convolution neural network (CNN) algorithm to be learned,

wherein a neural network file is generated including a parameter and an activation function of an artificial neural network, the parameter ad the activation function which cause a match rate between a computation result value obtained by inputting the processed data into an artificial neural network and the original data to be equal to or greater than a preset value,

wherein the option information comprises a learning number and information regarding whether learning is performed through similar data.

7. The method of claim 6, wherein:

the generating operation further comprises a similar data acquiring operation for, when performing learning on learning data with a learning importance set thereto, in response to identifying an instruction for performing leaning using the similar data, acquiring the similar data similar to a target image to be learned; and

the similar data is acquired based on similarity in resolution and color combinations.

8. A method for enhancing resolution at a server for providing video data for streaming, the method comprising:

a processing operation for processing the video data;

a generating operation for acquiring grid generation pattern information based on the processed video data and generating a neural network file required to enhance resolution of the video data based on the grid generation pattern information;

a transmitting operation for, in response to reception of a streaming request from a user device, dividing requested video data and a neural network file required to recover resolution of the requested video data and transmitting the divided video data and the divided neural network file, and

an operation for matching the divided video data and the divided neural network file and performing artificial neural network algorithm computation on the divided video data using the matched neural network file to recover resolution of the divided video data.