Systems and methods for enhancing performance of software applications
A system for enhancing performance of a software application obtains a list of files to be processed by the software application, and sorts that list in the order of the physical position of files in the list on a hard drive. The files are loaded in the sorted order which can decrease or minimize the total file seeking time, thereby increasing the speed of execution of the software.
Latest AVG Netherlands B.V. Patents:
The invention relates generally to the field of improving performance of software products, and, more specifically, to systems and methods for efficiently loading files required by a software product.
BACKGROUNDSome software products such as anti-virus scanners, picture/video viewers, etc., process a large number of files. For example, a typical anti-virus scanner may scan all files in one or more selected folders on a hard drive of a computer or even all of the files on the hard drive. These hundreds or even thousands of files are usually loaded in the computer memory (e.g., RAM) prior to being processed.
In a typical scenario, a module of the software product requests the operating system of the computer to provide a list of files to be loaded and processed that are within a specified scope (e.g., a folder, a list of folders, a disk, etc.). The operating system returns a list of file identifiers based on the order the files are listed in a file-system database maintained by the operating system. Then, the software module, with assistance from the operating system, loads the files associated with the file identifiers in the list from permanent storage (e.g., a hard drive). The software processes a file as it is loaded, or may processes a set of files after the files in the set are loaded.
The overall performance of the software product generally depends not only on the speed of processing the loaded files, but also on the speed of loading the files. In particular, when a large number of files are to be loaded, such as by an anti-virus software, slowly loading several files can significantly affect the overall software performance. Various known techniques, however, only improve the speed of processing of the files—not the speed with which they are loaded prior to processing. According to one method, the loading and processing of files are interleaved such that the processor may analyze a previously loaded file while waiting for one or more other files to be loaded. This can increase the processor utilization, improving the overall performance of the software. These systems, however, do not improve the speed of loading of the several files, which, as described above, also adversely affects software performance. Therefore, there is a need for methods and systems for addressing file loading times, thus improving the overall performance of software products requiring processing of a large number of files.
SUMMARY OF THE INVENTIONIn various embodiments of the present invention, the overall performance of a software product is improved by improving the speed of loading several files to be processed. This is achieved, in part, by obtaining additional information about the files from a list of file identifiers provided by the operating system. The additional information may include a file size, the physical position of the file on the disk, etc. Instead of accessing and loading the files in the order listed by the operating system, the files are loaded in an order based on the additional information, such as their physical position on the disk. Doing so generally causes the disk head to move in only one direction during file access. Moreover, successive movements of the disk head while accessing successive files may be shortened. This can significantly decrease the time taken to move the disk head to access each file, thereby significantly improving the speed of loading several files. This, in turn, can improve the overall software performance.
Accordingly, in one aspect, various embodiments feature a computer-implemented method for loading the files required by a software program in computer memory. The method includes obtaining a first list of file identifiers, each being associated with a respective file, and sorting the first list based on a first attribute of each of the associated files. The method also includes selecting, in sorted order, a file identifier in the sorted first list and loading the file associated with the selected file identifier. The selecting and loading steps are repeated until each file identifier in the sorted first list is selected, and the associated file is loaded.
The application program may include one or more of a virus scanner, a spyware scanner, an ad-ware scanner, a malware scanner, a backup program, a multicopy program, a compiler, and a data-mining program. In some embodiments, the file identifier includes a file entry in a directory record, and the first attribute may include location of files associated with the file identifiers. The location may be a cluster location.
In some embodiments, obtaining the first list of file identifiers includes receiving a second list of the file identifies. For each file identifier in the second list (i) a second attribute of a file associated with the file identifier is compared with a pre-determined threshold, and (ii) based on the comparison, the file identifier is selectively added to the first list. The second attribute may include file size and the threshold may be, for example, one Kbyte.
In some embodiments, the loaded files are analyzed and the analyzing step may include scanning the loaded file for at least one of virus, spyware, adware, and malware. The analyzing step may also include copying the file to another memory location, through a network or directly, compiling the file, and/or extracting information from the file.
In some embodiments, the software program is a boot program, and obtaining the first list of file identifies includes recording, during a prior execution of the boot program, each file loaded in the computer memory. Obtaining the first list further includes storing a file identifier corresponding to the loaded file in the first list of file identifiers, storing the first list on a non-volatile memory (e.g., a hard disk, flash memory, etc.), and accessing the stored first list during a subsequent execution of the boot program. The loading step may include storing the files in a cache memory, and the method may further include accessing the files from the cache memory, and analyzing the accessed files.
In another aspect, various embodiments feature another computer-implemented method for loading, in computer memory, files required by a software program. The method includes receiving a first list of file identifies, each being associated with a respective file. For each file identifier in the first list (i) a first attribute of the associated file is compared with a pre-determined threshold, and (ii) based on the comparison, that file identifier is selectively added to a second list. The method also includes sorting the second list based on a second attribute of the files associated with the file identifiers in the second list, selecting, in sorted order, a file identifier in the sorted second list, and loading the file associated with the selected file identifier. The selecting and loading steps are repeated until each file identifier in the sorted second list is selected.
In another aspect, various embodiments feature a system for enhancing performance of a software program. The system includes a sorter module for (i) obtaining a first list of file identifiers, each being associated with a respective file, and (ii) sorting the first list based on a first attribute of each of the associated files. The system also includes a loader module for selecting, in sorted order, each file identifier in the sorted first list, and loading the file associated with the selected file identifier. The application program may include one or more of a virus scanner, a spyware scanner, an ad-ware scanner, a malware scanner, a backup program, a multicopy program, a compiler, and a data-mining program.
In some embodiments, the file identifier includes a file entry in a directory record, and the first attribute may include location of files associated with the file identifiers. The location may include a cluster location. The sorter module may be configured to receive a second list of the file identifies, and for each file identifier in the second list (i) to compare a second attribute of a file associated with the file identifier with a pre-determined threshold, and (ii) based on the comparison, to add selectively the file identifier to the first list. The second attribute may include file size and the threshold may be, e.g., one Kbyte.
In some embodiments, the system further comprises an analyzer for analyzing the loaded file. The analyzer may be configured to scan the loaded file for at least one of virus, spyware, adware, and malware. The analyzer may also be configured to copy the file to another memory location, through a network or directly, compile the file, and/or to extract information from the file.
In some embodiments, the software program is a boot program, and the sorter module is configured to record, during a prior execution of the boot program, each file loaded in the computer memory. The sorter module is also configured to store a file identifier corresponding to the loaded file in the first list of file identifiers, to store the first list on a non-volatile memory (e.g., hard-disk, flash memory, etc.), and to access the stored first list during a subsequent boot operation. In some embodiments, the system comprises an analyzer module, and the loader module is configured to store the files in a cache memory. The analyzer module is configured to access the files from the cache memory, and to analyze the accessed files.
In another aspect, various embodiments feature another system for enhancing performance of a software program. The system includes a sorter module to receive a first list of file identifies, each being associated with a respective file. For each file identifier in the first list, the sorter module: (a) compares a first attribute of the associated file with a pre-determined threshold, and (b) based on the comparison, selectively adds that file identifier to a second list. Moreover, the sorter module sorts the second list based on a second attribute of the files associated with the file identifiers in the second list. The system includes a loader module for selecting, in sorted order, each file identifier in the sorted second list, and for loading the file associated with the selected file identifier.
Other aspects and advantages of the invention will become apparent from the following drawings, detailed description, and claims, all of which illustrate the principles of the invention, by way of example only.
In the drawings, like reference characters generally refer to the same parts throughout the different views. Also, the drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention.
A computer hard drive typically includes mechanical parts such as several rotating magnetic disks and one or more read/write heads that access the disks. Data representing the various files used by the computer are stored in cylindrical tracks on one of the magnetic disks. Before the file data can be accessed, i.e., read and/or written, the disk head is moved to a location over the track where that file (or portion of the file) is located. The disk-head movement takes some time, typically on the order of 3-12 ms on an average per file access, depending on the size and quality of the disk drive. In fact, the seek time, i.e., the time required to position the disk head at a required location, is often a significant bottleneck of accessing and loading files.
In general, there is no correlation between the physical position of a file on the hard disk and the position of its record (i.e., the entry of the file identifier) in the file system database. As a result, in a conventional system in which numerous files are accessed in the order of their record in the file-system database, the disk head typically moves back and forth in a far from optimal manner. For example,
With reference to
With reference to
For each file identifier in the list, in step 304 the size of the associated file is compared with a certain threshold (e.g., 1 Kbyte). A file smaller in size than the threshold may not be stored on the hard drive, and instead, may be stored in the directory record (e.g. a Master File Table (MFT) record) in the file-system database maintained by the operating system. Therefore, such a file is processed immediately in step 306, because the time to seek that file is not related to disk-head movement. In other alternative implementations, the files that are below this size threshold may be retrieved subsequent to retrieving the larger files from the disk.
Any file that is larger than the threshold is stored somewhere on the hard drive. In step 308, the file identifier associated such a file is stored in a second list. Various other file parameters, such as the volume handle, the unique file id, and the file start position (cluster) are also stored and associated with the file identifier in the second list. The steps 302, 304, and the step 306 or 308 are repeated until all of the file identifiers in the first list have been analyzed.
In step 310, the second list of file identifiers is sorted based on the starting cluster of the associated files. The starting cluster represents the physical position of a file on the hard drive. In step 312, a file identifier is selected from the sorted second list in order, i.e., in the order of the physical position of the associated file, and is loaded in computer memory using standard functions provided by the operating system. The full path of the file to determine its location may be obtained using information stored in the second list.
The loaded file is analyzed in step 314. The specific analysis performed depends on the overall functionality of the software product. For example, a virus/malware scanner may scan the loaded file for virus, adware, malware, spyware etc. Similarly, a photo-viewer may display an image in the file in a photo album. A file transfer, backup, and/or multicopy program may copy the file to a new location on the disk drive or to another computer over a network. Compilers and data-mining software products may also access a large number of files. The steps 312, 314 are repeated for all of the file identifiers in the sorted second list. In some embodiments, all or a subset of files are loaded in the memory and then the loaded files are analyzed. As described above with reference to
In an exemplary process 400 described with reference to
During a subsequent execution of the software, the list of required files may be found in step 402. In step 406, that list is sorted in the order of the physical positions of the files. The information about the physical position of a file may be obtained from the operating system using a unique file identifier associated with the file. Then in step 408, the files in the sorted list are loaded in order and may be stored in cache memory. The execution of the software (e.g., the boot component of the operating system) is suspended during the steps 406 and 408 while the required files are loaded. When execution of the software continues, the required files are accessed from the cache memory. Once again, the process 400 enhances software performance by decreasing or minimizing the total time required to load all of the files needed by the software, because the files are loaded in the order of their physical position on the hard drive, thereby avoiding unnecessary disk-head movements.
Referring to
The sorter module 522 receives a list of file identifiers associated with the files to be processes by the application 520 from the file-system database 514. Alternatively, the sorter module 522 can record the files processed by the software 520 and store a list of those files. Generally, the files are located on the hard drive 506, but some small files may be located in a table maintained by the file-system database 514.
The sorter module 522 optionally stores and sorts a list of file identifiers selected from the first list received or generated by the sorter module 522. Then, using the os-file-loader 516 the loader module 524 loads the files in the sorted order, as described with reference to
Each functional component described above (e.g., the sorter module, the loader module, and the file-analyzer module) may be implemented as stand-alone software components or as a single functional module. In some embodiments the components may set aside portions of a computer's random access memory to provide control logic that affects the interception, scanning and presentation steps described above. In such an embodiment, the program or programs may be written in any one of a number of high-level languages, such as FORTRAN, PASCAL, C, C++, C#, Java, Tcl, PERL, or BASIC. Further, the program can be written in a script, macro, or functionality embedded in commercially available software, such as EXCEL or VISUAL BASIC.
Additionally, the software may be implemented in an assembly language directed to a microprocessor resident on a computer. For example, the software can be implemented in Intel 80×86 assembly language if it is configured to run on an IBM PC or PC clone. The software may be embedded on an article of manufacture including, but not limited to, computer-readable program means such as a floppy disk, a hard disk, an optical disk, a magnetic tape, a PROM, an EPROM, or CD-ROM.
The invention can be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The foregoing embodiments are therefore to be considered in all respects illustrative rather than limiting on the invention described herein.
Claims
1. A computer-implemented method of loading, in computer memory, files required by a boot program, the method comprising:
- using at least one data processing device to: record, during at least one execution of the boot program, each file loaded in the computer memory; store, for each of the loaded files, a respective file identifier in a first list of file identifiers; store the first list on a non-volatile memory; and access the stored first list during subsequent execution of the boot program;
- for each file identifier in the accessed first list, (i) comparing a first attribute of the file corresponding to that file identifier with a predetermined threshold to identify whether retrieval of that file requires movement of a disk-head or access to a directory structure, and (ii) when the first attribute of that file exceeds the predetermined threshold, selectively adding that file identifier to a second list designated for files that require movement of the disk-head to access;
- sorting the second list based on a second attribute of the files corresponding to the file identifiers in the second list;
- selecting, in sorted order, a file identifier in the sorted second list;
- loading the file corresponding to the selected file identifier; and
- repeating the selecting and loading steps until each file identifier in the sorted second list is selected.
2. The method of claim 1, wherein the boot program comprises at least one of a virus scanner, a spyware scanner, an adware scanner, a malware scanner, a backup program, a multicopy program, a compiler, and a data-mining program.
3. The method of claim 1, wherein each of the file identifiers comprises a file entry in a directory record.
4. The method of claim 1, wherein the second attribute comprises file location.
5. The method of claim 4, wherein the location comprises a cluster location.
6. The method of claim 1, wherein the first attribute comprises file size and the threshold is one Kbyte.
7. The method of claim 1, further comprising, after loading the file, analyzing the file.
8. The method of claim 7, wherein analyzing comprises scanning the loaded file for at least one of viruses, spyware, adware, and malware.
9. The method of claim 1, wherein loading comprises storing the file in a cache memory, the method further comprising:
- accessing the file from the cache memory; and
- analyzing the accessed file.
10. The method of claim 1, further comprising, for each file identifier in the accessed first list, analyzing the file corresponding to that file identifier when the first attribute of that file does not exceed the predetermined threshold.
11. The method of claim 1, further comprising, for each file identifier in the accessed first list, accessing the file corresponding to that file identifier from the directory structure without use of the disk-head and analyzing the accessed file, when the first attribute of that file is less than or equal to the predetermined threshold.
12. The method of claim 1, wherein the directory structure is a Master File Table in a file system database maintained by an operating system.
13. A system for enhancing performance of a software program, the system comprising:
- a processor;
- a hard drive;
- a memory; and
- a boot program stored on the hard drive, the boot program comprising: a sorter module configured to: (i) use the processor to: record, during at least one execution of the boot program, each file loaded in the memory; store, for each of the loaded files, a respective file identifier in a first list of file identifiers; store the first list on the hard drive; and access the stored first list during subsequent execution of the boot program; (ii) for each file identifier in the first list, (a) compare a first attribute of the file corresponding to that file identifier with a predetermined threshold to identify whether retrieval of that file requires movement of a disk-head or access to a directory structure, and (b) when the first attribute of that file exceeds the predetermined threshold, selectively add that file identifier to a second list designated for files that require movement of the disk-head to access; and (iii) use the processor to sort the second list based on a second attribute of the files corresponding to the file identifiers in the second list; and a loader module configured to use the processor to: select, in sorted order, each file identifier in the sorted second list; and load the files corresponding to the selected file identifiers into the memory.
14. The system of claim 13, wherein the boot program comprises at least one of a virus scanner, a spyware scanner, an adware scanner, a malware scanner, a backup program, a multicopy program, a compiler, and a data-mining program.
15. The system of claim 13, wherein each of the file identifiers comprises a file entry in a directory record.
16. The system of claim 13, wherein the second attribute comprises file location on the hard drive.
17. The system of claim 16, wherein the file location comprises a cluster location.
18. The system of claim 13, wherein the first attribute comprises file size and the threshold is one Kbyte.
19. The system of claim 13, wherein the boot program further comprises an analyzer module configured to analyze each of the loaded files using the processor.
20. The system of claim 19, wherein the analyzer module is further configured to scan each of the loaded files for at least one of viruses, spyware, adware, and malware.
21. The system of claim 13, wherein the boot program further comprises an analyzer module, and wherein:
- the loader module is configured to store the files in a cache memory; and
- the analyzer module is configured to access the files from the cache memory, and to analyze the accessed files.
22. The system of claim 13, wherein the boot program further comprises an analyzer module configured to analyze, for each file identifier in the first list, the file corresponding to that file identifier when the first attribute of that file does not exceed the predetermined threshold.
23. The system of claim 13, wherein the sorter module is configured to, for each file identifier in the accessed first list, access the file corresponding to that file identifier from the directory structure without use of the disk-head and cause the accessed file to be analyzed, when the first attribute of that file is less than or equal to the predetermined threshold.
24. The system of claim 13, wherein the directory structure is a Master File Table in a file system database maintained by an operating system.
5345575 | September 6, 1994 | English et al. |
6073232 | June 6, 2000 | Kroeker et al. |
6081799 | June 27, 2000 | Beavin et al. |
6206289 | March 27, 2001 | Sharpe et al. |
6226422 | May 1, 2001 | Oliver |
6256728 | July 3, 2001 | Witt et al. |
6442682 | August 27, 2002 | Pothapragada et al. |
6574713 | June 3, 2003 | Kosche et al. |
6601167 | July 29, 2003 | Gibson et al. |
6606617 | August 12, 2003 | Bonner et al. |
6633968 | October 14, 2003 | Zwiegincew et al. |
6671774 | December 30, 2003 | Lam et al. |
6748534 | June 8, 2004 | Gryaznov et al. |
6751780 | June 15, 2004 | Neff et al. |
6981099 | December 27, 2005 | Paulraj et al. |
7000077 | February 14, 2006 | Grimsrud et al. |
7065790 | June 20, 2006 | Gryaznov |
7139909 | November 21, 2006 | Lee |
7359890 | April 15, 2008 | Ku et al. |
7380725 | June 3, 2008 | McGill |
7383417 | June 3, 2008 | Yasue et al. |
7434261 | October 7, 2008 | Costea et al. |
7437504 | October 14, 2008 | Waldvogel |
7506374 | March 17, 2009 | Carmona |
7568233 | July 28, 2009 | Szor et al. |
7581253 | August 25, 2009 | Challener et al. |
7647297 | January 12, 2010 | LaChapelle et al. |
7669044 | February 23, 2010 | Fitzgerald et al. |
7743060 | June 22, 2010 | Fontoura et al. |
7760654 | July 20, 2010 | Adya et al. |
7778983 | August 17, 2010 | Jorden et al. |
7788650 | August 31, 2010 | Johnson et al. |
7836504 | November 16, 2010 | Ray et al. |
7845008 | November 30, 2010 | Waltermann et al. |
7861296 | December 28, 2010 | Costea et al. |
7937263 | May 3, 2011 | Carrier et al. |
7949665 | May 24, 2011 | Millard et al. |
7975025 | July 5, 2011 | Szabo et al. |
7984503 | July 19, 2011 | Edwards |
1020922 | August 2011 | Tikkanen et al. |
1021920 | September 2011 | Asaad et al. |
1022512 | September 2011 | Jarrett et al. |
8087081 | December 27, 2011 | Chun et al. |
8087084 | December 27, 2011 | Andruss et al. |
8091134 | January 3, 2012 | Benton et al. |
20020067515 | June 6, 2002 | Abe |
20030115479 | June 19, 2003 | Edwards et al. |
20030120952 | June 26, 2003 | Tarbotton et al. |
20040158730 | August 12, 2004 | Sarkar |
20040230748 | November 18, 2004 | Ohba |
20050195435 | September 8, 2005 | Kojima et al. |
20050216759 | September 29, 2005 | Rothman et al. |
20050219657 | October 6, 2005 | Sasaki et al. |
20060101264 | May 11, 2006 | Costea et al. |
20060161546 | July 20, 2006 | Callaghan et al. |
20060294356 | December 28, 2006 | Kumar et al. |
20070055711 | March 8, 2007 | Polyakov et al. |
20070078915 | April 5, 2007 | Gassoway |
20070156677 | July 5, 2007 | Szabo |
20070294767 | December 20, 2007 | Piccard et al. |
20080005797 | January 3, 2008 | Field et al. |
20080086290 | April 10, 2008 | Wilson |
20080155339 | June 26, 2008 | Lowe et al. |
20080195676 | August 14, 2008 | Lyon et al. |
20080209198 | August 28, 2008 | Majni et al. |
20080229016 | September 18, 2008 | Waites |
20090049550 | February 19, 2009 | Shevchenko |
20090198738 | August 6, 2009 | Berger et al. |
20090199190 | August 6, 2009 | Chen et al. |
20090222923 | September 3, 2009 | Dixon |
20090235134 | September 17, 2009 | Guo et al. |
20090249055 | October 1, 2009 | Itoh |
20090287842 | November 19, 2009 | Plamondon |
20100011029 | January 14, 2010 | Niemela |
20100017591 | January 21, 2010 | Smith et al. |
20100023679 | January 28, 2010 | Blum |
20100031361 | February 4, 2010 | Shukla |
20100083381 | April 1, 2010 | Khosravi et al. |
20100149593 | June 17, 2010 | Dowling et al. |
20100162400 | June 24, 2010 | Feeney et al. |
20100169972 | July 1, 2010 | Kuo et al. |
20100241654 | September 23, 2010 | Wu et al. |
20100269032 | October 21, 2010 | King et al. |
20110271347 | November 3, 2011 | Zimmer et al. |
20110296525 | December 1, 2011 | Turbin |
2002346639 | June 2003 | AU |
2680601 | December 2009 | CA |
0201848 | November 1986 | EP |
1229433 | August 2002 | EP |
1835409 | September 2007 | EP |
2194471 | June 2010 | EP |
2251781 | November 2010 | EP |
2007317190 | December 2007 | JP |
2008191824 | August 2008 | JP |
2008305225 | December 2008 | JP |
2009020765 | January 2009 | JP |
2010074604 | April 2010 | JP |
2010237837 | October 2010 | JP |
99/15953 D2 | April 1999 | WO |
WO-2004055667 | July 2004 | WO |
WO-2008112770 | September 2008 | WO |
WO-2009042986 | April 2009 | WO |
WO-2010142594 | December 2010 | WO |
WO-2011153239 | December 2011 | WO |
- “On the Care and Feeding of the Spinning Disks” [online] by Alex Russell. Retrieved from the internet on Feb. 27, 2012: http://infrequently.org/2011/01/on-the-care-and-feeding-of-spinning-disks, 6 pp.
- “Prefetcher—Wikipedia, the free encyclopedia” [online]. Retrieved from the internet on Feb. 27, 2012: http://en.wikipedia.org/wiki/Prefetcher, 3 pp.
- International Search Report, Application No. PCT/IB2013/000696, date of mailing Sep. 2, 2014.
Type: Grant
Filed: Feb 28, 2012
Date of Patent: Aug 18, 2015
Patent Publication Number: 20130227544
Assignee: AVG Netherlands B.V. (Amsterdam)
Inventors: Yuval Ben-Itzhak (Brno), Ing. Z. Breitenbacher (Brno), Ji{hacek over (r)}í Bracek (Kyjov), Jaroslav Nix (Ti{hacek over (s)}nov), Martin Vejnár (Brno), Tomá{hacek over (s)} Benna (Krava{hacek over (r)}e), Marián Jurík (Bánov), Václav Pich (Brno)
Primary Examiner: Hang Pan
Application Number: 13/407,412
International Classification: G06F 9/445 (20060101); G06F 3/06 (20060101);