Rapid analysis of data stream for malware presence
A system, method and computer program product for anti-malware processing of data stream that includes a plurality of logical data streams formed from a primary data stream; and a plurality of stream buffers, each buffering data of a corresponding logical data stream. A plurality of processing handlers each associated with one of the data streams, where the handlers are processing the data of the logical data stream buffered by its stream buffer. Each processing handler is associated with a particular functionality and at least one processing handler scans its logical data stream for malware presence. Each stream buffer has a configurable buffering policy. At least one of the processing handlers decompresses the data into one or more secondary streams. At least one of the processing handlers parses its logical data stream, creating one or more instances of secondary data streams. The scanning can be based on a signature search. At least one of the processing handlers parses its logical data stream to identify headers, wherein new secondary data streams are instantiated based on regions of interest in a future stream data at positions identified by the headers. The set of conditions is stored e.g., in a table, a list, and/or a registry.
Latest Kaspersky Lab, ZAO Patents:
- System and method for blocking elements of application interface
- System and method for configuring a computer system according to security policies
- System and method for filtering spam messages based on user reputation
- System and method for dynamic adjustment of expiration date for authorization key for antivirus products
- System and method for blocking elements of application interface
1. Field of the Invention
The present invention relates to computer security, more particularly, to anti-virus protection of computer networks.
2. Description of the Related Art
Network gateways with anti-virus processing capabilities are widely used in computer networks. The traditional model of operation of these gateways involves scanning content data objects, passed through commonly used protocols, such as HTTP, FTP, SMTP, POP3 and the like. In order to perform anti-virus/anti-malware processing of data objects in the data stream, anti-virus gateways usually need to assemble the contents of data objects that are passing through the connection (e.g., files, HTML pages, email messages, etc). The need to assemble the entire data object limits scalability of anti-virus/anti-malware gateway solutions because the amount of memory required to store entire data objects can become very large for systems serving many connections and capable of analyzing large data objects.
To address this problem, a different approach to anti-virus scanning has been developed: “stream anti-virus scanning” Such systems work by analyzing stream content segment by segment, without assembling the entire transferred data object. Typically, different stages of processing stream data, e.g., decompressing, MIME parsing, virus checking, etc., can be interleaved, thus reducing the processing latency of each segment. In some hardware implementations, processing stages can be implemented with hardware assistance to improve performance. Examples of such conventional stream anti-virus scanning systems include SonicWall Deep Packet Inspection Engine (http://www.sonicwall.com), CP Secure stream anti-virus processors (http://www.cpsecure.com), etc.
Depending on the internal system 101 architecture, the packet queue 106 and the internal buffer 107 may utilize the same memory area for storing packets, thus avoiding the overhead of copying data between the modules.
Note that when transmitted through networks, the data often undergoes additional processing, such as encoding, compression, addition of headers for the relevant protocols, etc., which is usually determined by the protocol used to transmit the data, such as HTTP, SMTP, etc. Thus, a stream scanning system needs to have a means for extracting data objects for anti-virus analysis from the data stream that has been processed/encoded/encapsulated, etc. for the relevant network protocols.
The data relating to the body of the email message 203 is processed by an HTML analyzer 205, whose primary purpose is anti-virus analysis of the script and other objects present in the HTML part of the email. The attached file 204 is first processed by the unpacking module 206, which extracts from the archive the data relating to the files in the archive (in this example, the executable file 207 and a Microsoft Word document 208). The contents of the executable file 207 is sent for processing to the executable file analyzer 209, while the contents of the MS Word file 208 is sent for analysis to the file analyzer 210, that parses files having OLE2 document format. The analyzers 208 and 210 analyze the contents of the file, based on virus signatures, and other rules relating to their particular formats.
In conventional stream analysis systems, the processing and analysis modules (in this example, the modules 202, 205, 206, 209 and 210) need to process data portion by portion, without waiting for the entire data object to be received. This is due to the fact that many of the attachments can be fairly large, with current email technology, multi megabyte attachments are not uncommon, and even attachments that are tens of megabytes in size (or several attachments that collectively add up to several tens of megabytes) are not uncommon. Therefore, the design of such stream analysis systems must conform to certain architectural requirements.
One of the requirements is being able to effectively manage buffer memory. Another requirement is being able to reconfigure the stream processing logic to handle new threat types during regular updates of the system configuration.
Stream processing anti-virus scanning has its limitations. Many types of anti-virus analysis algorithms require access not only to the currently available data segment but also to some other portions of the data object being analyzed. Locations and sizes of these data objects, such as email attachments, cannot be determined in advance. Usually anti-virus algorithms request access to certain file areas dynamically, depending on the results of the previous analysis.
Accordingly, there is a need in the art for a system and method for rapid scanning of data streams for viruses and other forms of malware, particularly data streams that contain large and complex data objects, including packed, encoded and encrypted data objects. Such a system and method must not require an infinite amount of memory for buffering of the stream data and must dynamically and efficiently manage the available buffer memory. Also, there is a need in the art for a system and method that can be easily reconfigured for new types of data encoding and transmission and new types of malware.
SUMMARY OF THE INVENTIONAccordingly, the present invention is related to a system and method for rapid analysis of a data stream containing complex data objects that substantially obviates one or more of the disadvantages of the related art.
In one aspect of the invention, there is provided a system, method and computer program product for anti-malware processing of a data stream that includes a plurality of secondary data streams formed from a primary data stream, each data stream associated with a stream buffer that performs data buffering for the corresponding data stream. A plurality of processing handlers is associated with the data streams. The processing handlers receive the data from the stream buffer associated with the data stream. Each processing handler performs predefined actions on the received data, such as parsing the data transmission format (e.g., HTTP, FTP, SMTP, POP3, IRC, IMAP, MIME, HTML, ZIP, GZIP, RAR, ARJ, etc.), and scanning the received data for malware presence, typically using a signature search. A set of conditions for creation of new instances of data streams is stored in a list, a table, or a registry.
In a further optional aspect, each stream buffer has a configurable buffering policy that is defined when the instance of the data stream is created, or defined at the time of registration of its parameters in a registry. The buffering policy can include information about the maximum buffer size, whether the stream must be fully buffered, a size of a backtrack buffer, relative importance of buffered data and whether the buffered data may be discarded when available memory is low. At least one of the processing handlers transforms the data into a different format (for example, decompressing the input data or parsing a multi-part data format), creates one or more instances of secondary data streams and outputs the transformed data into these secondary data streams.
At least one of the processing handlers can employ signature search techniques to perform anti-virus analysis. Signature search is a widely used approach to detect known examples of malware. Signature search algorithms use sets of known malware signatures in the form of “signature databases” and perform simultaneous search of known signatures in the stream data.
At least one of the processing handlers can optionally perform resource-intensive operations (e.g., signature searching) employing hardware acceleration, when the corresponding hardware resources are available.
In a further optional aspect of the invention, each stream buffer keeps track of the amount of data consumed by processing handlers that it instantiated and it manages. New instances of data streams are created based on logical data stream offset, and the action to be performed when the new instance of a logical data stream is created is defined at the time of instantiation of the logical data stream. The processing handler(s) can parse its corresponding data stream header to identify the structure of the stream, and new secondary data streams can be instantiated based on the regions of interest in the data stream that has not been received yet (but is known to be located at specific offsets, identified through parsing stream headers). The set of conditions for instantiating new stream buffers is stored in, e.g., a table, a list and/or a registry.
Additional features and advantages of the invention will be set forth in the description that follows, and in part will be apparent from the description, or may be learned by practice of the invention. The advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are intended to provide further explanation of the invention as claimed.
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
In the drawings:
Reference will now be made in detail to the preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings.
In the discussion below, the following terminology is used regarding data streams. Network anti-virus scanners usually receive data in form of network packets (for example, IP protocol packets). To perform anti-malware analysis, the content of these packets must be assembled into a “stream,” corresponding to a transport protocol connection between networked systems. Typically, a network protocol scanner performs this using the process known as “TCP stream reassembly.” The resulting stream is referred to below as a “primary” data stream. The content of the primary data stream corresponds to some known protocol format, e.g., HTTP, SMTP and the like. In the discussion below, the assembly of the primary data stream is performed by the forwarding module 102. The content of the primary stream is passed portion by portion (as the stream is assembled) to the stream analysis module 103 for anti-malware analysis. The forwarding module may perform additional protocol analysis, for example, splitting SMTP protocol data stream into a sequence of individual RFC822-formatted messages. In this case the analysis module 103 will receive multiple primary streams, where each primary stream corresponds to an individual message. Thus, the exact content of the primary stream is a matter of coordination between the design of the forwarding module 102 and the analysis module 103.
A primary data stream can be divided into several logical (secondary) data streams, such that each of the secondary data stream can represent some meaningful portion of the primary data stream—for example, the body of the email, the archived attachment, etc, or the result of some transformation (decompression, decoding, etc) of stream data. Since the secondary data stream can itself contain complex data objects, it can in turn be transformed into tertiary data streams, etc.—however, for simplicity, tertiary, quaternary, etc. data streams that are “born” from a secondary data stream are also referred to as “secondary data streams.”
Generally, the invention is broadly applicable to content scanning of the data stream. This can be generally scanning for malware, or, e.g., scanning for particular types of malware, such as scanning for viruses, spam, trojans, rootkits, worms, adware, etc. In the discussion, the example of anti-virus scanning is used for illustration.
An architecture of a flexible stream-based anti-malware processing system is described below. The approach described herein permits implementing different scenarios for anti-malware processing, such as for different network protocols and data formats, supporting processing of secondary streams, which may require additional processing, etc. The approach also permits dynamic control over the scanning process, for example, by reconfiguring the scanning scenario for different processing algorithms, for different data formats and protocols, different regions of the data stream, etc. This is particularly useful where the data objects being scanned have a complex structure, including encoded files, compressed files, password-protected files and so on. Furthermore, the system has the advantage of flexibility and can be easily re-configured to counter new kinds of malware and unwanted content, which may require new types of algorithms for detection. Furthermore, new data formats and protocols can appear, which can be easily integrated into the system and method described herein, given the flexibility of the approach.
The present invention, in one aspect, is directed to implementation of the analysis module of a stream scanning system.
In one embodiment of the invention, the system controls a set of logical data streams, each of which can have a number of stream processing handlers. Each data stream has its own instance of a stream buffer. Each instance of a stream buffer has an associated buffer management policy that specifies the rules for allocating and freeing memory blocks for this instance of stream buffer. Stream processing handlers are typically associated with a particular processing algorithm, for example, decoding, decompressing or scanning stream data. Stream processing handlers can schedule creation and deletion of new instances of data streams and/or registration and deregistration of processing handlers, tied to particular regions in the stream data (already received, or expected in the future). The system manages the registry of stream regions and performs actions associated with specific region (including creation of new logical data streams and/or registering new processing handlers, when the data corresponding to this region becomes available).
(a) a stream processing manager 301 that coordinates stream processing activities and manages other data structures.
(b) multiple logical data streams, 302, each of which is associated with the stream buffer that controls buffering of the content data for the corresponding data stream.
(c) multiple processing handlers 306, such that each data stream 302 can have multiple handlers associated with it.
The stream processing manager 301 receives data corresponding to a primary stream 304 from a network/protocol stream control module (e.g., forwarding module 102 of
In the present invention, the anti-virus analysis module 103 during parsing and transformation of the primary stream data may create one or more instances of secondary data streams, such that each of the secondary data stream can represent a meaningful portion of the primary data stream—for example, the body of the email, archived attachment, etc, or result of some transformation (decompression, decoding, etc) of stream data.
An exemplary procedure for stream processing is as follows:
At time 1, a segment of data is received from the primary stream 304 (for example, with the help of forwarding module 102). At time 2, the stream processing manager 301 transfers the received segment to the data stream 302A which buffers the data using associated stream buffer 303A.
At time 3, the primary stream data 304 buffered in the stream buffer 303A is passed to the processing handler 306A.1 associated with the primary data stream 302A (see 305).
At time 4, the processing handler 306A.1 (for example, a decompression/unpacking module) produces a portion of unpacked data (see 307) that is passed to the secondary data stream 302B, which places the data into its associated stream buffer 303B
At time 5, the data from the stream buffer 303B associated with data stream 302B is passed to a processing handler 306B.1, which performs the anti-virus analysis of the secondary stream data (307), detects a malicious object (for example, presence of viruses, trojans, worms, rootkits, other unwanted content, etc.) and sends notification to the stream processing manager 301. At time 6, the stream processing manager 301 sends an appropriate notification 312 to the external module (for example, forwarding module 102).
In one embodiment of the invention, data streams 302 and stream processing handlers 306 may form a hierarchy.
Here, data stream “STREAM A” (302A) receives data 401 directly from the input data stream and is therefore called “primary data stream”.
It has two associated processing handlers: HANDLER A.1 (306A.1) and HANDLER A.2 (306A.2). Thus, the data buffered by stream buffer of STREAM A (302A) is passed both to HANDLER A.1 (306A.1) and HANDLER A.2 (306A.2).
HANDLER A.1 (306A.1) produces two different output data streams (see 402, 403) that are buffered by STREAM B (302B) and STREAM C (302C).
Both data streams STREAM B (302B) and STREAM C (302C) have a single associated processing handler each: HANDLER B.1 (306B.1) and HANDLER C.1 (306C.1), respectively.
HANDLER A.2 (306A.2) produces a single output data stream 404 buffered by the STREAM D (302D) which has two associated processing handlers: HANDLER D.1 (306D.1) and HANDLER D.2 (306D.2).
HANDLER B.1 (306B.1), HANDLER C.1 (306C.1), HANDLER D.1 (306D.1), HANDLER D.2 (306D.2) do not produce any output data streams (for example, they may perform anti-virus checking or some other function that does not generate any output data).
The process of anti-virus analysis of scripts embedded in HTML pages typically includes a step of “normalization” of the script text, when the plain text is transformed into some form of pseudocode (P-CODE). This normalization reduces the variability of script text, and makes it more convenient to analyze it (for example, using signature searching).
Signature searching is used to detect known byte patterns that uniquely identify presence of malware in a portion of data. Malware signatures may have a different form, from simple byte strings to regular expressions or some customized format. Typically, signature search algorithms use a database of known virus signatures and perform simultaneous search for all signatures in the database. To achieve high throughput, signature search algorithms can be implemented using hardware acceleration.
Here, data stream “STREAM A” (450) receives data directly from the primary data stream 451.
STREAM A (450) has two associated processing handlers: HANDLER A.1 (452) and HANDLER A.2 (453). Thus, the data buffered by manager STREAM A (450) is passed both to HANDLER A.1 (452) and HANDLER A.2 (453). In this example, HANDLER A.1 (452) parses the HTML format, identifying areas that may contain active content (scripts, applets, etc). HANDLER A.2 (453) performs signature scanning of the source HTML stream, looking for the signatures of malicious data objects that can be found directly in the HTML data stream without further processing. HANDLER A.1 (452) produces two distinct output data formats (see 454, 455) that are buffered using data streams STREAM B (456) and STREAM C (457). The content of STREAM B (456) is a normalized script text (for example, lower-cased, with trimmed spaces and removed comments). The content of STREAM C (457) is a script pseudocode (P_CODE).
Both data streams STREAM B (456) and STREAM C (457) have a single associated processing handler each: HANDLER B.1 (458) and HANDLER C.1 (459), respectively.
Processing handler HANDLER B.1 (458) performs signature search on the content of normalized script text.
Processing handler HANDLER C.1 (459) performs P-CODE analysis (using signature search or optionally employing advanced techniques, e.g. static control flow analysis and emulation).
HANDLER A.2 (453), HANDLER B.1 (458), HANDLER C.1 (459) do not produce any output data streams, instead they may generate stream control notifications 312 to indicate progress of stream analysis.
Note that, optionally, certain aspects of the processing handlers 306 can take advantage of hardware acceleration. For example, customized and standard integrated circuits are available for rapid scanning of a data stream for multiple virus signatures. Many data compression and encoding algorithms (e.g., LZW, LZSS, Inflate) can also be implemented in hardware, with the software-implemented processing handler using customized APIs of those integrated circuits to perform certain operations on the stream data, e.g., virus signature scanning, decoding/decompressing, etc. Any of these solutions can be used in the present invention.
A programming interface of a typical processing handler 306 is illustrated in
The amount of data, consumed and processed by processing handler 306, is returned via the output parameter (SIZE_CONSUMED). If the data stream buffer 302 is associated with more than one processing handler 306, each processing handler 306 might consume different amount of stream buffer data. Note that the system keeps track of how much data has been processed by each handler, associated with the data stream, and what portion of buffered data has not yet been processed by a handler. Furthermore, if the stream has been assigned to several handlers, different handlers might have different amounts of data that they have processed.
In this figure, stream data buffer 602 can be logically split into areas, one of which (607) contains the data processed by all handlers and the other (608) the data that has not yet been processed by any handler.
Buffer offsets are maintained according to the following algorithm:
When the next input data portion 606 arrives, the data stream 302 calls the PROCESS_DATA method (see
In
The data in the area A can be discarded from the buffer 602.
However, in some circumstances, it may be desirable to keep some amount of data that has been already processed. In some cases, the anti-virus processing algorithm can dynamically register a new processing handler 306 that points to the data in the area, containing already processed area. As a practical example, when a first processing handler analyzes the first few bytes from the data stream and “recognizes” that the data has a certain format, then that processing handler creates and registers another processing handler. The second handler performs the actual processing of the correspondingly formatted stream data. The second processing handler then starts processing data from the beginning of the stream, the area containing data that has already been processed by the first processing handler. Thus, the new handler is registered with a stream offset in the backtrack area, see 306(N) in
This case is illustrated in
In the described embodiment, anti-virus/anti-malware processing algorithms can dynamically register new regions and associate actions that will be executed when that region's data becomes available.
Each region descriptor can contain the following information:
Stream offset 908: byte offset from beginning of the data stream, where the region begins;
Region size 909: size of region data (in some cases, it may not be known in advance);
Importance 910: determines whether the buffered data for this region can be discarded in a low-memory condition;
Action data 911: actions that must be performed when the region data becomes available in the input stream.
Buffer ID 912: identifier of the memory buffer corresponding to that region. For the regions not yet reached (future data) or those already discarded, the buffer ID is not present.
When the data stream 801 receives portion of input data, it updates the value of the current stream offset 901. This value of the offset 901 is compared to the value of starting stream offset 908 of registered regions. If the portion of the received data falls inside the range of some registered region, a new memory buffer is allocated to store data for the region and the region is “activated”, that is, the system 350 executes actions associated with this region (action data 911).
Actions data 911 may contain the name of the procedure that can be called or some instructions to be executed by the system 350. The performed actions may involve creation of new instances of processing handlers 306 and associate them with the data stream 801 at an offset, corresponding to the starting offset of the region 908.
Processing handlers 306 may in turn create instances of secondary data streams 302 and perform anti-virus processing actions. If the region 803 has the size 909 defined, then the system can automatically deregister region and release memory buffer 906 when the value of current stream offset 901 becomes greater than the value of the starting region offset 908 plus the value of region size 909.
In other embodiments, the data stream 801 can aggressively buffer the received data for the whole stream, even for areas not belonging to registered regions. For such an approach, new regions can be registered in the “past” area of stream (i.e., in the area, where data can be normally discarded) (see 804 in
It should be noted that some of the handlers 306 may be tasked with parsing HTML pages, identifying scripts in the page and checking those scripts for virus signatures. Other stream processing tasks that a handler might face are considerably more complex. For example, many attachments today are sent in an archived or encoded form. For example, zip archives and RAR archives can be used, and viruses often are embedded in the packed (archived) files. Therefore, if the object being transmitted is an email, the attachment needs to be identified, for example, using headers, and then unpacked.
The handler, therefore, by parsing the body of the email, can identify the password (or a set of possible passwords), and try using them to unpack the encrypted file. Similarly, many Microsoft Word or Adobe Acrobat PDF files can also be protected by passwords (and can also include virus code as well). In some cases, the body of the email might also contain the password. In the context of the present discussion, the parsing of the email body can be done by the same handler or can be done by a different handler, for example, the one specifically instantiated for this purpose.
Thus, yet another difficulty faced by the stream processing approach is encryption or other forms of protection for the archived object. In some cases, the encryption is a necessary security feature for transmission of confidential data. In other cases, the encryption is specifically designed to defeat the anti-virus stream processing systems. For example, there are many forms of spam that transmit attachments in packed and encrypted formats. One approach to addressing this issue is to rely on an empirical observation that for many forms of malicious transmissions, such as spam-type emails with archived attachments, the body of the email frequently contains the password needed to unpack the archived file. Since the nature of the spam is such that the spammer has no realistic way of communicating with the recipient, other than the email itself, sending a packed and encrypted file would be a useless exercise, if the recipient were unable to open it. Therefore, the body of the email of such spam transmissions might include the password.
Having thus described a preferred embodiment, it should be apparent to those skilled in the art that certain advantages of the described method and apparatus have been achieved. It should also be appreciated that various modifications, adaptations and alternative embodiments thereof may be made within the scope and spirit of the present invention. The invention is further defined by the following claims.
Claims
1. A system for anti-malware processing of data stream, the system comprising:
- a processor;
- a memory coupled to the processor;
- a plurality of elements stored in the memory and executed by the processor, the elements comprising:
- dividing a primary data stream to form a plurality of logical data streams, wherein each logical data stream has a different data format;
- a plurality of stream buffers, each stream buffer buffering data of a corresponding logical data stream;
- a plurality of processing handlers separated from the stream buffers, each processing handler associated with one of the stream buffers and processing the data of the logical data stream buffered by its stream buffer,
- wherein:
- each logical data stream is processed inside the processing handler associated with the stream buffer; and
- each processing handler is associated with a particular data transmission mode and at least one of the processing handlers scans its logical data stream for malware presence, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
2. The system of claim 1, wherein each stream buffer has configurable buffering policy that is defined when that instance of stream buffer is created.
3. The system of claim 1, wherein each stream buffer has an individually configurable buffering policy that is defined at a time of its instantiation.
4. The system of claim 3, where the buffering policy includes any of information about maximum buffer size, whether the data stream must be fully buffered, a size of a backtrack buffer, relative importance of buffered data for later processing and whether the buffered data can be dropped when available memory is low.
5. The system of claim 1, wherein the corresponding processing handler identifies archived files in the logical data stream corresponding to an email, wherein the archived files represent attachments, and wherein the processing handler parses a body of the email to identify password information needed to unpack the archived files.
6. The system of claim 1, wherein at least one of the processing handlers decompresses the data into one or more logical data streams.
7. The system of claim 1, wherein at least one of the processing handlers parses its logical data stream, creating one or more instances of secondary data streams.
8. The system of claim 7, wherein the logical data stream is transmitted using a transmission mode of any of hypertext (HTTP), mail protocols, including any of SMTP, POP, and IMAP, file transfer protocols, including any of SMB and FTP, and instant messaging protocols, including ICQ protocols.
9. The system of claim 7, wherein the logical data stream uses a transmission mode of any of hypertext formats, including any of HTML and MIME, archive formats, including any of GZIP, ZIP, RAR, and ARJ formats, compound document formats, including OLE2 formats, executable file formats, including PE and ELF formats, dynamic content formats, including SWF formats, and media formats, including, GIF, JPEG, PNG, WAV, MPEG, MP3 and AVI formats.
10. The system of claim 1, wherein the at least one processing handler performs malware scanning using a signature search.
11. The system of claim 1, wherein at least one of the processing handlers uses hardware acceleration.
12. The system of claim 1, where each stream buffer keeps track of an amount of data consumed by their corresponding associated processing handlers.
13. The system of claim 1, wherein new instances of data streams are created based on logical data stream offset and action to be performed when the new instance of a logical data stream is created, is defined at time of creation of the logical data stream.
14. The system of claim 1, wherein at least one of the processing handlers parses its logical data stream to identify headers and wherein new secondary stream buffers are instantiated based on regions of interest in a future data stream identified by the headers.
15. The system of claim 1, further comprising a set of conditions for creation of new instances of stream buffers, wherein the set of conditions is stored in any of a table, a list, and a registry.
16. A method for anti-malware processing of data stream, the method being performed on a computer having a processor and a memory, the method comprising:
- generating a plurality of logical data streams by dividing a primary data stream, wherein each logical data stream has a different data format;
- instantiating a plurality of stream buffers, each stream buffer buffering data of a corresponding logical data stream;
- instantiating a plurality of processing handlers separated from the stream buffers, wherein each processing handler scans data of the logical data stream buffered by its stream buffer for malware presence;
- associating each processing handler with one of the logical data streams, wherein each logical data stream is processed inside the associated processing handler; and
- associating each processing handler with a particular data transmission mode, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
17. The method of claim 16, further comprising configuring a buffering policy of each stream buffer upon an instantiation of each logical data stream.
18. The method of claim 16, further comprising parsing at least one logical data stream to identify headers, and wherein new secondary data streams are instantiated based on regions of interest in a future data stream identified by the headers.
19. The method of claim 16, further comprising identifying archived files in the logical data stream corresponding to an email, wherein the archived files represent attachments, and parsing a body of the email to identify password information needed to unpack the archived files.
20. The method of claim 16, where the buffering policy includes any of information about maximum buffer size, whether a logical data stream must be fully buffered, a size of a backtrack buffer, relative importance of buffered data and whether portions of buffered data can be discarded when available memory is low.
21. The method of claim 16, wherein at least one of the processing handlers decompresses the data into one or more secondary data streams.
22. The method of claim 16, further comprising parsing at least one logical data stream using at least one processing handler, and instantiating one or more secondary stream buffers.
23. The method of claim 16, wherein the scanning uses a signature search.
24. The method of claim 16, further comprising instantiating new instances of data streams are created based on logical data stream offset and action that is to be performed when the new instance of a logical data stream is created is defined at time of creation of the logical data stream.
25. The method of claim 16, further comprising storing a set of conditions for instantiating new stream buffers in any of a table, a list, and a registry.
26. A system for anti-malware processing of data stream, the system comprising:
- a processor; a memory coupled to the processor;
- a plurality of elements stored in the memory and executed by the processor, the elements comprising:
- a plurality of logical data streams separated out from a primary data stream based on a data format;
- a plurality of stream buffers, each stream buffer buffering data of a corresponding logical data stream; and
- a plurality of processing handlers separated from the stream buffers, each processing handler associated with a corresponding stream buffer and processing the logical data stream stored by its stream buffer, and each processing handler associated with a particular data transmission mode, wherein each processing handler scans its logical data stream for malware presence,
- wherein each logical data stream is processed inside the associated processing handler, wherein at least some of the processing handlers identify, using header information, location of regions of interest in future data in the primary data stream and instantiate new instances of stream buffers to buffer the data in the regions of interest,
- wherein at least one of the processing handlers is adapted to parse an email body for password information, and
- wherein at least one other processing handler is adapted to unpack archived files based on the password information, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
27. A system for anti-malware processing of data stream, the system comprising:
- a processor; a memory coupled to the processor;
- a plurality of elements stored in the memory and executed by the processor, the elements comprising:
- a primary data stream that includes emails with archived password-protected attachments;
- a plurality of stream buffers, each stream buffer buffering data of a corresponding portion of the emails, wherein the portions include any of email body and the archived password-protected attachments; and
- a plurality of processing handlers separated from the stream buffers, each processing handler associated with a corresponding stream buffer and processing the portion of the email stored by its stream buffer,
- wherein each portion of the email is processed inside the associated processing handler;
- wherein each processing handler scans its portion of the email for malware,
- wherein at least some of the processing handlers identify, using header information, location of attachments in future data in the primary data stream and instantiate new instances of stream buffers to buffer the attachments, and
- wherein at least one of the processing handlers parses the email body for passwords for the attachments used to unpack the attachments, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
28. A system for anti-malware processing of data stream, the system comprising:
- a processor; a memory coupled to the processor;
- a plurality of elements stored in the memory and executed by the processor, the elements comprising:
- a logical data stream receiving input data portion by portion from a primary data stream, wherein the logical data stream receives data of a particular format;
- a stream buffer, assigned to the logical data stream, buffering the last N bytes associated with the logical data stream;
- a plurality of processing handlers separated from the stream buffer assigned to the logical data stream, wherein a portion of buffered data is passed to all the assigned processing handlers and each processing handler is permitted to consume less data than the entire portion, and the remainder of the buffered data, that has not yet been consumed, is passed to one of the plurality of processing handlers together with the next portion of input data when the next portion becomes available in the stream buffer,
- wherein each portion of the buffered data is processed inside the assigned processing handler;
- wherein at least one of the processing handlers scans its associated data for malware presence,
- wherein the amount of data consumed by different processing handlers from the same portion of buffered data is not the same, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
29. A non-transitory computer readable storage medium having computer executable program logic stored thereon, the computer executable program logic executing on a processor for anti-malware processing of data stream, the computer program logic comprising:
- computer program code means for generating a plurality of logical data streams by dividing a primary data stream, wherein each logical data stream has a different data format;
- computer program code means for instantiating a plurality of stream buffers, each stream buffer buffering data of a corresponding logical data stream;
- computer program code means for instantiating a plurality of processing handlers separated from the stream buffers, wherein each processing handler scans data of the logical data stream buffered by its stream buffer for malware presence;
- computer program code means for associating each processing handler with one of the logical data streams, wherein each logical data stream is processed inside the associated processing handler; and
- computer program code means for associating each processing handler with a particular data transmission mode, and
- wherein at least several of the plurality of processing handlers are associated with a single stream buffer buffering data, and wherein different processing handlers extract different amounts of the same data buffered by the single stream buffer, and
- wherein the data is deleted from the single stream buffer only after multiple processing handlers associated with the single stream buffer have processed the same data.
5319776 | June 7, 1994 | Hile et al. |
6088803 | July 11, 2000 | Tso et al. |
6658525 | December 2, 2003 | Huang et al. |
6993604 | January 31, 2006 | Dixon |
7448085 | November 4, 2008 | Reyes et al. |
20020133249 | September 19, 2002 | Fay et al. |
20020199115 | December 26, 2002 | Peterson et al. |
20030145228 | July 31, 2003 | Suuronen et al. |
20050097358 | May 5, 2005 | Yanovsky |
20050114700 | May 26, 2005 | Barrie et al. |
20050289254 | December 29, 2005 | Chien |
20060080467 | April 13, 2006 | Gould et al. |
20060206939 | September 14, 2006 | Chang et al. |
Type: Grant
Filed: Oct 18, 2006
Date of Patent: Oct 18, 2011
Assignee: Kaspersky Lab, ZAO (Moscow)
Inventor: Vyacheslav A. Batenin (Moscow)
Primary Examiner: Taghi Arani
Assistant Examiner: Madhuri Herzog
Attorney: Bardmesser Law Group
Application Number: 11/550,428
International Classification: G06F 11/00 (20060101); G06F 11/30 (20060101); G06F 13/00 (20060101); G08B 23/00 (20060101);