DEVICES, SYSTEMS, AND METHODS TO SYNCHRONIZE SIMULTANEOUS DMA PARALLEL PROCESSING OF A SINGLE DATA STREAM BY MULTIPLE DEVICES
Disclosed are methods and devices, among which is a system that includes a device that includes one or more pattern-recognition processors in a pattern-recognition cluster, for example. One of the one or more pattern-recognition processors may be initialized to perform as a direct memory access master device able to control the remaining pattern-recognition processors for synchronized processing of a data stream.
Latest Micron Technology, Inc. Patents:
- Memory arrays comprising strings of memory cells and methods used in forming a memory array comprising strings of memory cells
- Microelectronic devices with tiered decks of aligned pillars exhibiting bending and related methods
- Drift aware read operations
- Open translation unit management using an adaptive read threshold
- Internal and external data transfer for stacked memory dies
1. Field of Invention
Embodiments of the invention relate generally to pattern-recognition processors and, more specifically, in certain embodiments, to the synchronization of direct memory access controlled operations of the pattern-recognition processors.
2. Description of Related Art
In the field of computing, pattern recognition tasks are increasingly challenging. Ever larger volumes of data are transmitted between computers, and the number of patterns that users wish to identify is increasing. For example, spam and malware are often detected by searching for patterns in a data stream, e.g., particular phrases or pieces of code. The number of patterns increases with the variety of spam and malware, as new patterns may be implemented to search for new variants. Searching a data stream for each of these patterns can form a computing bottleneck. Often, as the data stream is received, it is searched for each pattern, one at a time. The delay before the system is ready to search the next portion of the data stream increases with the number of patterns. Thus, pattern recognition may slow the receipt of data.
Furthermore, efforts to increase the speed at which the data stream is searched can lead to synchronization problems with the data, as well as timing issues with regard to control of both the processing of an input data stream to the system and the output of the results of the searched data stream. Accordingly, a system is needed that may increase the speed at which a data stream may be searched, while maintaining a properly timed processing and flow of information both into and out of the system.
Each search criterion may specify one or more target expressions, i.e., patterns. The phrase “target expression” refers to a sequence of data for which the pattern-recognition processor 14 is searching. Examples of target expressions include a sequence of characters that spell a certain word, a sequence of genetic base pairs that specify a gene, a sequence of bits in a picture or video file that form a portion of an image, a sequence of bits in an executable file that form a part of a program, or a sequence of bits in an audio file that form a part of a song or a spoken phrase.
A search criterion may specify more than one target expression. For example, a search criterion may specify all five-letter words beginning with the sequence of letters “cl”, any word beginning with the sequence of letters “cl”, a paragraph that includes the word “cloud” more than three times, etc. The number of possible sets of target expressions is arbitrarily large, e.g., there may be as many target expressions as there are permutations of data that the data stream could present. The search criteria may be expressed in a variety of formats, including as regular expressions, a programming language that concisely specifies sets of target expressions without necessarily listing each target expression.
Each search criterion may be constructed from one or more search terms. Thus, each target expression of a search criterion may include one or more search terms and some target expressions may use common search terms. As used herein, the phrase “search term” refers to a sequence of data that is searched for, during a single search cycle. The sequence of data may include multiple bits of data in a binary format or other formats, e.g., base ten, ASCII, etc. The sequence may encode the data with a single digit or multiple digits, e.g., several binary digits. For example, the pattern-recognition processor 14 may search a text data stream 12 one character at a time, and the search terms may specify a set of single characters, e.g., the letter “a”, either the letters “a” or “e”, or a wildcard search term that specifies a set of all single characters.
Search terms may be smaller or larger than the number of bits that specify a character (or other grapheme—i.e., fundamental unit—of the information expressed by the data stream, e.g., a musical note, a genetic base pair, a base-10 digit, or a sub-pixel). For instance, a search term may be 8 bits and a single character may be 16 bits, in which case two consecutive search terms may specify a single character.
The search criteria 16 may be formatted for the pattern-recognition processor 14 by a compiler 18. Formatting may include deconstructing search terms from the search criteria. For example, if the graphemes expressed by the data stream 12 are larger than the search terms, the compiler may deconstruct the search criterion into multiple search terms to search for a single grapheme. Similarly, if the graphemes expressed by the data stream 12 are smaller than the search terms, the compiler 18 may provide a single search term, with unused bits, for each separate grapheme. The compiler 18 may also format the search criteria 16 to support various regular expressions operators that are not natively supported by the pattern-recognition processor 14.
The pattern-recognition processor 14 may search the data stream 12 by evaluating each new term from the data stream 12. The word “term” here refers to the amount of data that could match a search term. During a search cycle, the pattern-recognition processor 14 may determine whether the currently presented term matches the current search term in the search criterion. If the term matches the search term, the evaluation is “advanced”, i.e., the next term is compared to the next search term in the search criterion. If the term does not match, the next term is compared to the first term in the search criterion, thereby resetting the search.
Each search criterion may be compiled into a different finite state machine in the pattern-recognition processor 14. The finite state machines may run in parallel, searching the data stream 12 according to the search criteria 16. The finite state machines may step through each successive search term in a search criterion as the preceding search term is matched by the data stream 12, or if the search term is unmatched, the finite state machines may begin searching for the first search term of the search criterion.
The pattern-recognition processor 14 may evaluate each new term according to several search criteria, and their respective search terms, at about the same time, e.g., during a single device cycle. The parallel finite state machines may each receive the term from the data stream 12 at about the same time, and each of the parallel finite state machines may determine whether the term advances the parallel finite state machine to the next search term in its search criterion. The parallel finite state machines may evaluate terms according to a relatively large number of search criteria, e.g., more than 100, more than 1000, or more than 10,000. Because they operate in parallel, they may apply the search criteria to a data stream 12 having a relatively high bandwidth, e.g., a data stream 12 of greater than or generally equal to 64 MB per second or 128 MB per second, without slowing the data stream. In some embodiments, the search-cycle duration does not scale with the number of search criteria, so the number of search criteria may have little to no effect on the performance of the pattern-recognition processor 14.
When a search criterion is satisfied (i.e., after advancing to the last search term and matching it), the pattern-recognition processor 14 may report the satisfaction of the criterion to a processing unit, such as a central processing unit (CPU) 20. The central processing unit 20 may control the pattern-recognition processor 14 and other portions of the system 10.
The system 10 may be any of a variety of systems or devices that search a stream of data. For example, the system 10 may be a desktop, laptop, handheld or other type of computer that monitors the data stream 12. The system 10 may also be a network node, such as a router, a server, or a client (e.g., one of the previously-described types of computers). The system 10 may be some other sort of electronic device, such as a copier, a scanner, a printer, a game console, a television, a set-top video distribution or recording system, a cable box, a personal digital media player, a factory automation system, an automotive computer system, or a medical device. (The terms used to describe these various examples of systems, like many of the other terms used herein, may share some referents and, as such, should not be construed narrowly in virtue of the other items listed.)
The data stream 12 may be one or more of a variety of types of data streams that a user or other entity might wish to search. For example, the data stream 12 may be a stream of data received over a network, such as packets received over the Internet or voice or data received over a cellular network. The data stream 12 may be data received from a sensor in communication with the system 10, such as an imaging sensor, a temperature sensor, an accelerometer, or the like, or combinations thereof. The data stream 12 may be received by the system 10 as a serial data stream, in which the data is received in an order that has meaning, such as in a temporally, lexically, or semantically significant order. Or the data stream 12 may be received in parallel or out of order and, then, converted into a serial data stream, e.g., by reordering packets received over the Internet. In some embodiments, the data stream 12 may present terms serially, but the bits expressing each of the terms may be received in parallel. The data stream 12 may be received from a source external to the system 10, or may be formed by interrogating a memory device and forming the data stream 12 from stored data.
Depending on the type of data in the data stream 12, different types of search criteria may be chosen by a designer. For instance, the search criteria 16 may be a virus definition file. Viruses or other malware may be characterized, and aspects of the malware may be used to form search criteria that indicate whether the data stream 12 is likely delivering malware. The resulting search criteria may be stored on a server, and an operator of a client system may subscribe to a service that downloads the search criteria to the system 10. The search criteria 16 may be periodically updated from the server as different types of malware emerge. The search criteria may also be used to specify undesirable content that might be received over a network, for instance unwanted emails (commonly known as spam) or other content that a user finds objectionable.
The data stream 12 may be searched by a third party with an interest in the data being received by the system 10. For example, the data stream 12 may be monitored for text, a sequence of audio, or a sequence of video that occurs in a copyrighted work. The data stream 12 may be monitored for utterances that are relevant to a criminal investigation or civil proceeding or are of interest to an employer.
The search criteria 16 may also include patterns in the data stream 12 for which a translation is available, e.g., in memory addressable by the CPU 20 or the pattern-recognition processor 14. For instance, the search criteria 16 may each specify an English word for which a corresponding Spanish word is stored in memory. In another example, the search criteria 16 may specify encoded versions of the data stream 12, e.g., MP3, MPEG 4, FLAC, Ogg Vorbis, etc., for which a decoded version of the data stream 12 is available, or vice versa.
The pattern recognition processor 14 may be hardware that is integrated with the CPU 20 into a single component (such as a single device) or may be formed as a separate component. For instance, the pattern-recognition processor 14 may be a separate integrated circuit. The pattern-recognition processor 14 may be referred to as a “co-processor” or a “pattern-recognition co-processor”.
The recognition module 22 may include a row decoder 28 and a plurality of feature cells 30. Each feature cell 30 may specify a search term, and groups of feature cells 30 may form a parallel finite state machine that forms a search criterion. Components of the feature cells 30 may form a search-term array 32, a detection array 34, and an activation-routing matrix 36. The search-term array 32 may include a plurality of input conductors 37, each of which may place each of the feature cells 30 in communication with the row decoder 28.
The row decoder 28 may select particular conductors among the plurality of input conductors 37 based on the content of the data stream 12. For example, the row decoder 28 may be a one byte to 256 row decoder that activates one of 256 rows based on the value of a received byte, which may represent one term. A one-byte term of 0000 0000 may correspond to the top row among the plurality of input conductors 37, and a one-byte term of 1111 1111 may correspond to the bottom row among the plurality of input conductors 37. Thus, different input conductors 37 may be selected, depending on which terms are received from the data stream 12. As different terms are received, the row decoder 28 may deactivate the row corresponding to the previous term and activate the row corresponding to the new term.
The detection array 34 may couple to a detection bus 38 that outputs signals indicative of complete or partial satisfaction of search criteria to the aggregation module 24. The activation-routing matrix 36 may selectively activate and deactivate feature cells 30 based on the number of search terms in a search criterion that have been matched.
The aggregation module 24 may include a latch matrix 40, an aggregation-routing matrix 42, a threshold-logic matrix 44, a logical-product matrix 46, a logical-sum matrix 48, and an initialization-routing matrix 50.
The latch matrix 40 may implement portions of certain search criteria. Some search criteria, e.g., some regular expressions, count only the first occurrence of a match or group of matches. The latch matrix 40 may include latches that record whether a match has occurred. The latches may be cleared during initialization, and periodically re-initialized during operation, as search criteria are determined to be satisfied or not further satisfiable—i.e., an earlier search term may need to be matched again before the search criterion could be satisfied.
The aggregation-routing matrix 42 may function similar to the activation-routing matrix 36. The aggregation-routing matrix 42 may receive signals indicative of matches on the detection bus 38 and may route the signals to different group-logic lines 53 connecting to the threshold-logic matrix 44. The aggregation-routing matrix 42 may also route outputs of the initialization-routing matrix 50 to the detection array 34 to reset portions of the detection array 34 when a search criterion is determined to be satisfied or not further satisfiable.
The threshold-logic matrix 44 may include a plurality of counters, e.g., 32-bit counters configured to count up or down. The threshold-logic matrix 44 may be loaded with an initial count, and it may count up or down from the count based on matches signaled by the recognition module. For instance, the threshold-logic matrix 44 may count the number of occurrences of a word in some length of text.
The outputs of the threshold-logic matrix 44 may be inputs to the logical-product matrix 46. The logical-product matrix 46 may selectively generate “product” results (e.g., “AND” function in Boolean logic). The logical-product matrix 46 may be implemented as a square matrix, in which the number of output products is equal the number of input lines from the threshold-logic matrix 44, or the logical-product matrix 46 may have a different number of inputs than outputs. The resulting product values may be output to the logical-sum matrix 48.
The logical-sum matrix 48 may selectively generate sums (e.g., “OR” functions in Boolean logic.) The logical-sum matrix 48 may also be a square matrix, or the logical-sum matrix 48 may have a different number of inputs than outputs. Since the inputs are logical products, the outputs of the logical-sum matrix 48 may be logical-Sums-of-Products (e.g., Boolean logic Sum-of-Product (SOP) form). The output of the logical-sum matrix 48 may be received by the initialization-routing matrix 50.
The initialization-routing matrix 50 may reset portions of the detection array 34 and the aggregation module 24 via the aggregation-routing matrix 42. The initialization-routing matrix 50 may also be implemented as a square matrix, or the initialization-routing matrix 50 may have a different number of inputs than outputs. The initialization-routing matrix 50 may respond to signals from the logical-sum matrix 48 and re-initialize other portions of the pattern-recognition processor 14, such as when a search criterion is satisfied or determined to be not further satisfiable.
The aggregation module 24 may include an output buffer 51 that receives the outputs of the threshold-logic matrix 44, the aggregation-routing matrix 42, and the logical-sum matrix 48. The output of the aggregation module 24 may be transmitted from the output buffer 51 to the CPU 20 (
The memory cells 58 may include any of a variety of types of memory cells. For example, the memory cells 58 may be volatile memory, such as dynamic random access memory (DRAM) cells having a transistor and a capacitor. The source and the drain of the transistor may be connected to a plate of the capacitor and the output conductor 56, respectively, and the gate of the transistor may be connected to one of the input conductors 37. In another example of volatile memory, each of the memory cells 58 may include a static random access memory (SRAM) cell. The SRAM cell may have an output that is selectively coupled to the output conductor 56 by an access transistor controlled by one of the input conductors 37. The memory cells 58 may also include nonvolatile memory, such as phase-change memory (e.g., an ovonic device), flash memory, silicon-oxide-nitride-oxide-silicon (SONOS) memory, magneto-resistive memory, or other types of nonvolatile memory. The memory cells 58 may also include flip-flops, e.g., memory cells made out of logic gates.
As illustrated by
To compare a term from the data stream 12 with the search term, the row decoder 28 may select the input conductor 37 coupled to memory cells 58 representing the received term. In
In response, the memory cell 58 controlled by the conductor 60 may output a signal indicative of the data that the memory cell 58 stores, and the signal may be conveyed by the output conductor 56. In this case, because the letter “e” is not one of the terms specified by the search-term cell 54, it does not match the search term, and the search-term cell 54 outputs a 0 value, indicating no match was found.
In
The search-term cells 54 may be configured to search for more than one term at a time. Multiple memory cells 58 may be programmed to store a 1, specifying a search term that matches with more than one term. For instance, the memory cells 58 representing the letters lowercase “a” and uppercase “A” may be programmed to store a 1, and the search-term cell 54 may search for either term. In another example, the search-term cell 54 may be configured to output a match if any character is received. All of the memory cells 58 may be programmed to store a 1, such that the search-term cell 54 may function as a wildcard term in a search criterion.
As illustrated by
The activation-routing matrix 36, in turn, may selectively activate the feature cells 63, 64, and 66 by writing to the memory cells 70 in the detection array 34. The activation-routing matrix 36 may activate feature cells 63, 64, or 66 according to the search criterion and which search term is being searched for next in the data stream 12.
In
As illustrated by
In
Next, the activation-routing matrix 36 may activate the feature cell 66, as illustrated by
In
The end of a search criterion or a portion of a search criterion may be identified by the activation-routing matrix 36 or the detection cell 68. These components 36 or 68 may include memory indicating whether their feature cell 63, 64, or 66 specifies the last search term of a search criterion or a component of a search criterion. For example, a search criterion may specify all sentences in which the word “cattle” occurs twice, and the recognition module may output a signal indicating each occurrence of “cattle” within a sentence to the aggregation module, which may count the occurrences to determine whether the search criterion is satisfied.
Feature cells 63, 64, or 66 may be activated under several conditions. A feature cell 63, 64, or 66 may be “always active”, meaning that it remains active during all or substantially all of a search. An example of an always active feature cell 63, 64, or 66 is the first feature cell of the search criterion, e.g., feature cell 63.
A feature cell 63, 64, or 66 may be “active when requested”, meaning that the feature cell 63, 64, or 66 is active when some condition precedent is matched, e.g., when the preceding search terms in a search criterion are matched. An example is the feature cell 64, which is active when requested by the feature cell 63 in
A feature cell 63, 64, or 66 may be “self activated”, meaning that once it is activated, it activates itself as long as its search term is matched. For example, a self activated feature cell having a search term that is matched by any numerical digit may remain active through the sequence “123456xy” until the letter “x” is reached. Each time the search term of the self activated feature cell is matched, it may activate the next feature cell in the search criterion. Thus, an always active feature cell may be formed from a self activating feature cell and an active when requested feature cell: the self activating feature cell may be programmed with all of its memory cells 58 storing a 1, and it may repeatedly activate the active when requested feature cell after each term. In some embodiments, each feature cell 63, 64, and 66 may include a memory cell in its detection cell 68 or in the activation-routing matrix 36 that specifies whether the feature cell is always active, thereby forming an always active feature cell from a single feature cell.
Search criteria with different numbers of search terms may be formed by allocating more or fewer feature cells to the search criteria. Simple search criteria may consume fewer resources in the form of feature cells than complex search criteria. This is believed to reduce the cost of the pattern-recognition processor 14 (
As illustrated by
In
In
Next, as illustrated by
The pattern-recognition processor cluster 94 may be made up of a plurality of pattern-recognition processors 14. Accordingly, the pattern-recognition processor cluster 94 may search one or more target expressions. The pattern-recognition processor cluster 94 may utilize the pattern-recognition processors 14 collectively to search an individual target expression, or, alternatively, the pattern-recognition processor cluster 94 may utilize each pattern-recognition processor 14 to search an individual target expression.
Moreover, the pattern-recognition processor cluster 94 may be utilized in searching the data stream 12 based on a given search criterion. Each search criterion may specify one or more target expressions and each search criterion may be constructed from one or more search terms. Accordingly, the pattern-recognition processor cluster 94 may utilize the pattern-recognition processors 14 collectively to search each search criterion, or, alternatively, the pattern-recognition processor cluster 94 may utilize each pattern-recognition processor 14 to search a particular search criterion. In this manner, the system 10 may gain greater flexibility in its searching capability because the pattern-recognition processors 14 in the pattern-recognition processor cluster 94 may be flexible in their searching of the data stream 12. Part of the flexibility of the pattern-recognition processor cluster 94 may derive from the use of parallel direct memory access (PDMA) techniques that allow for a direct memory access (DMA) master to control the flow and timing of data to the DMA slaves, as well as the processing of the data in parallel.
The eight data pins 96-110 may be used to receive data from the data stream 12 for pattern-recognition processing. Likewise, the data pins 96-110 may be utilized in transmitting data upon completion of a pattern search by the one or more of the pattern-recognition processors 14 of the pattern-recognition processor cluster 94. Each of the pattern-recognition processors 14 also may include four address pins 112-118. These address pins 112-118 may be used to specify functions to be performed within a pattern-recognition processor 14. Alternatively, the address pins 112-118 may be utilized to select one or more feature cells 30 in the pattern-recognition processor 14. As such, the address pins 112-118 may collectively or singularly be utilized to control the operation of the pattern-recognition processors 14.
Each of the pattern-recognition processors 14 may also include a write strobe pin 120 and a read strobe pin 122. The write strobe pin 120 and the read strobe pin 122 may be utilized to set the pattern-recognition processor 14 to a write mode or a read mode, respectively. The pattern-recognition processors 14 may be placed into a write mode, for example, when the data stream 12 is being transmitted to the pattern-recognition processors 14 for processing. Thus, when the pattern-recognition processors 14 are to receive data along data lines 96-110, the write strobe pin 120 may be selected by, for example, transmitting a high signal to the write strobe pin 120.
The read strobe pin 122 may operate in a similar manner to the write strobe pin 120 described above. The read strobe pin 122 may be selected when the pattern-recognition processors 14 are to be placed into a read mode, for example, when the results of the pattern-recognition processing may be transmitted along data lines 96-110 to, for example, the CPU 20. To insure that the data lines 96-110 are operating in a read mode, the pattern-recognition processor 14 may allow for data, i.e. matching results, to be transmitted along the data lines 96-110 when the read strobe pin 122 is selected. Selection of the read strobe pin 122 may occur by transmitting a high signal to the read strobe pin 122. Thus, when the read strobe pin 122 is selected, the pattern-recognition processor 14 may be able to transmit data, for example, to the CPU 20 along data lines 96-110. In this manner, the pattern-recognition processors 14 may be able to receive data and transmit data along the same data lines 96-110 without conflict, since the pattern-recognition processors 14 may transmit data when the write strobe pin 120 is selected and read data when the read strobe pin 122 is selected.
Each of the pattern-recognition processors 14 may further include a chip select pin 124. The chip select pin 124 may be used to activate a given pattern-recognition processor 14. By utilizing a chip select pin 124 on each pattern-recognition processor 14, any individual and/or a plurality of pattern-recognition processors 14 may be activated. This activation may allow the individual pattern-recognition processors 14 to be individually configured. Furthermore, the chip select pin 124 may be utilized to determine the status of any specific pattern-recognition processor 14. In this manner, a minimum number of pattern-recognition processors 14 may be activated at any given time, as determined by the requirements of a given pattern search.
Each of the pattern-recognition processors 14 may also include a universal select pin 126. The universal select pin 126 may be utilized in a manner similar to the chip select pin 124 described above, however, the universal select pin 126 may be used to activate all the pattern-recognition processors 14 of a pattern-recognition processor cluster 94 in parallel. By activating all of the pattern-recognition processors 14 in parallel, synchronization of the processing between multiple pattern-recognition processors 14 may be achieved. For example, all of the pattern-recognition processors 14 may operate concurrently on the data stream 12. This may be accomplished by each of the pattern-recognition processors 14 receiving an activation signal at the universal select pin 126, allowing the pattern-recognition processors 14 to be activated simultaneously so that the pattern-recognition processors 14 may process the data received across the respective data pins 96-110.
Each of the pattern-recognition processors 14 may further include a PDMA M/S pin 128. The setting of the PDMA M/S pins 128 may determine if a pattern-recognition processor 14 will be placed into a DMA master or DMA slave device mode. A DMA master device may interface with the CPU 20 to request control of the data bus that provides the data stream. The CPU 20 may program the DMA master device with information relating to the size and location of a data stream 12 to be processed, and subsequently yield control of the data bus, used to provide the data stream. Upon completion of the processing, the DMA master may notify the CPU 20 that the search is complete and yield control back to the CPU 20. In this manner, a single pattern-recognition processor 14 may control the parallel data stream processing of the system 10 by acting as a DMA master device and transmitting a DMA cycle to all the DMA slave devices so that the data stream 12 may be simultaneously processed by the DMA master and DMA slave devices, whereby the DMA cycle may correspond to one or more single search cycles and a single search cycle may correspond to the time allotted to the pattern-recognition processors 14 for processing one unit of data in the data stream 12, such as, a single byte of data or a plurality of bytes of data. Alternatively, a system DMA device could be utilized as the DMA master, whereby the pattern-recognition processors 14 may each be DMA slave devices. Thus, regardless of the location of the DMA master device, utilization of a DMA master device may free the CPU 20 to perform other system tasks.
In one embodiment, a high signal on the M/S pin 128 is utilized to set a pattern-recognition processor 14 as the DMA master device, while a low signal on the remaining M/S pins 128 set the remaining pattern-recognition processors 14 to DMA slave devices. Alternatively, firmware or a computer program stored on a tangible machine readable medium, such as a memory device in the system 10, may be utilized to configure a pattern-recognition processor 14 as the DMA master device, as well as to configure the remaining pattern-recognition processors 14 of the pattern-recognition cluster 94 as DMA slave devices.
Each of the pattern-recognition processors 14 may also include a PDMA cycle pin 130 and a PDMA RD/WR pin 132. The PDMA cycle pin 130 may be utilized to control the movement of data from the data stream 12 into and out of the pattern-recognition processors 14. Furthermore, the DMA master device may utilize the PDMA cycle pins 130 of the DMA slave devices to control the DMA cycles for the pattern-recognition processors 14 so that the pattern-recognition processors 14 may operate on the same data concurrently. Furthermore, the PDMA RD/WR pin 132 for each of the pattern-recognition processors 14 may be utilized to set the pattern-recognition processors 14 to either a parallel read or a parallel write function. A parallel write function may include the pattern-recognition processors 14 receiving data for processing data concurrently. A parallel read function may include the pattern-recognition processors 14 transmitting processed data concurrently. Thus, the DMA master device may control the parallel read and write functions of the pattern-recognition processors 14 of the pattern-recognition processor cluster 94 via the PDMA RD/WR pins 132.
The PDMA control bus 134 may allow for communication between the CPU 20 and a pattern-recognition processor set as a DMA master device 15. In this manner, the DMA master device 15 may receive from the CPU 20, information relating to the size and location of a data stream 12 to be processed by the pattern-recognition processor cluster 94. The DMA master device 15 may then control the DMA slave devices, which may include the remaining pattern-recognition processors 14 of the pattern-recognition processor cluster 94, by transmitting PDMA control signals to the DMA slave devices to allow each processor 14 in the cluster 94 to process the data stream 12 during the same DMA cycle, i.e. for each processor 14 in the cluster 94 to concurrently respond to DMA bus cycles. Thus, the PDMA control signals become output signals of the DMA master device 15 and input signals to the DMA slave devices.
The PDMA control bus 134 may be utilized to transmit the DMA control signals from the DMA master device 15 to the DMA slave devices, i.e. the pattern-recognition processors 14. These PDMA control signals may include a PDMA cycle signal that may be utilized to control the movement of data from the data stream 12 into and out of the pattern-recognition processors 14. The PDMA cycle signal may be generated based on a DMA cycle signal received by the DMA master 15 from the CPU 20. The PDMA cycle signal may be driven from the DMA master device 15 and received by the slave devices as, for example, a parallel enable signal that allows the DMA master and DMA slave devices to perform an operation on concurrently received data.
The PDMA control bus 134 may also be utilized to transmit PDMA RD/WR signals from the DMA master device 15 to the DMA slave devices. The PDMA RD/WR signals may be utilized to set the DMA slave devices, i.e. the pattern-recognition processors 14, to either a parallel read or a parallel write function. A parallel write function may include the DMA slave devices receiving data for processing data concurrently while a parallel read function may include the DMA slave devices transmitting processed data concurrently. Thus, the DMA master device 15 may control the parallel read and write functions of the pattern-recognition processors 14 of the pattern-recognition processor cluster 94 via the PDMA control bus 134.
A second bus used in conjunction with the pattern-recognition processor cluster 94 is the data bus 136. The data bus 136 may include a plurality of bi-directional data lines that may be connected to the data pins 96-110 of each of the pattern-recognition processors 14 and the DMA master device 15 in the pattern-recognition processor cluster 94. The data bus 136 may also receive a data stream 12 and may be utilized to transmit search results to the CPU 20.
An additional input bus may be a command bus 138. The command bus 138 may be utilized to deliver command signals to the pattern-recognition processors 14 and/or the DMA master device 15 in the pattern-recognition processor cluster 94. These command signals may be transmitted to the write strobe pin 120 and/or the read strobe pin 122 of one or more of the pattern-recognition processors 14 and/or the DMA master device 15. Accordingly, the command signals may be utilized to place one of the pattern-recognition processors 14 or the DMA master device 15 into an operational mode. For example, the command signals may place a single pattern-recognition processor 14 into a read mode or a write mode.
An address bus 140 may also be utilized in the operation of the pattern-recognition processor cluster 94. The address bus 140 may be utilized to transmit signals to the address pins 112-118 for specifying functions within the pattern-recognition processors 14 and/or the DMA master device 15. Similarly, the address bus 140 may be utilized to transmit signals to the address pins 112-118 for specifying and/or activating certain registers within the pattern-recognition processors 14 and/or the DMA master device 15.
Additionally, the chip select bus 142 may be utilized to transmit selection signals to control circuitry for determining which of the pattern-recognition processors 14 and/or the DMA master device 15 are to be activated. The selection signals may determine if all of the pattern-recognition processors 14 and the DMA master device 15 are to be activated concurrently via each universal select pin 126, or, for example, whether a specific pattern-recognition processor 14 is to be activated for processing of the data stream 12.
By allowing one of the pattern-recognition processors 14 to become a DMA master device 15, the CPU 20 of the main system may be freed to perform other tasks as a data stream 12 is processed. Accordingly, the DMA master device may control the parallel operation, i.e. data reading, data processing, and data writing characteristics, of the pattern-recognition processor cluster 94, as well as notify the CPU 20 of the completion of any parallel operations. In this manner, the pattern-recognition processor cluster 94 may process a data stream 12 concurrently, i.e. during the same parallel DMA cycle, while the CPU 20 is free to perform tasks separate from the processing of the data stream 12.
While the invention may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, it should be understood that the invention is not intended to be limited to the particular forms disclosed. Rather, the invention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention as defined by the following appended claims.
Claims
1. An electronic device, comprising:
- a plurality of pattern-recognition processors, wherein each of the plurality of pattern-recognition processors comprises a parallel direct memory access (PDMA) master/slave input adapted to place that processor into a master or a slave mode, and a PDMA cycle connection adapted to transmit a direct memory access cycle from that processor, when placed into a master mode, to each of the plurality of pattern-recognition processors placed into a slave mode concurrently.
2. The electronic device of claim 1, wherein each of the plurality of pattern-recognition processors comprises a data input adapted to receive data to be processed, wherein the data may be received by the plurality of pattern-recognition processors concurrently.
3. The electronic device of claim 1, wherein each of the plurality of pattern-recognition processors comprises a plurality of address inputs adapted to receive address signals adapted to specify functions to be performed by that processor.
4. The electronic device of claim 3, wherein a function to be performed comprises processing a data stream concurrently with one or more of the other processors during a direct memory access cycle transmitted from the one of the plurality of pattern-recognition processors placed into a master mode.
5. The electronic device of claim 1, wherein that processor comprises a PDMA read/write input adapted to place that processor into a parallel read or a parallel write mode, whereby a data stream may be written into or read out of each of the plurality of pattern-recognition processors concurrently.
6. The electronic device of claim 5, wherein the data stream being written into or read out of each of the plurality of pattern-recognition processors concurrently is performed during a single direct memory access cycle.
7. An electronic system for processing data in parallel comprising:
- a processing unit adapted to activate a direct memory access (DMA) master device; and
- a plurality of processing circuits adapted to process a data stream concurrently and independently of the processor, wherein the plurality of processing circuits includes the DMA master device and one or more DMA slave devices adapted to concurrently respond to DMA bus cycles.
8. The electronic system of claim 7, wherein each of the plurality of processing circuits is a data pattern-recognition processor adapted to search the data according to a search criteria.
9. The electronic system of claim 8, wherein the search criteria comprises at least one sequence of data searched for by the data pattern-recognition processors during a single search cycle.
10. The electronic system of claim 9, wherein the DMA master device is adapted to transmit a direct memory access cycle to the DMA slave devices to initiate concurrent processing of the data stream, wherein the direct memory access cycle corresponds to one or more single search cycles.
11. The electronic system of claim 10, wherein the DMA master device is adapted to place each of the DMA slave devices into a parallel read or a parallel write mode.
12. The electronic system of claim 7, wherein the DMA master device is adapted to notify the processing unit of the completion of the data stream processing.
13. A method for processing data comprising:
- activating a direct memory access (DMA) master device adapted to control the synchronous processing of a stream of data amongst the DMA master device and one or more DMA slave devices;
- receiving the stream of data at the DMA master device and the one or more DMA slave devices; and
- synchronously processing the stream of data in the DMA master device and the one or more DMA slave devices.
14. The method of claim 13, wherein synchronously processing the stream of data concurrently comprises searching the stream of data for at least one sequence of data specified by a search criteria.
15. The method of claim 13, wherein the processing by the plurality of processing circuits is based on received address signals adapted to indicate a particular processing function.
16. The method of claim 15, comprising transmitting the processed stream of data to a processing unit.
17. A method of processing data, comprising:
- activating one of a plurality of processing circuits when a stream of data is to be processed singularly by the one of the plurality of processing circuits;
- activating a plurality of processing circuits when the stream of data is to be processed in parallel by the plurality of processing circuits;
- activating one of the plurality of processing circuits as a direct memory access (DMA) master device adapted to control the synchronous processing of the stream of data amongst the plurality of processing circuits if the plurality of processing circuits are activated;
- receiving the stream of data at all activated processing circuits concurrently; and
- concurrently processing the stream of data at all the activated processing circuits.
18. The method of claim 17, wherein processing the stream of data concurrently comprises searching the stream of data for least one sequence of data.
19. The method of claim 18, comprising transmitting the processed stream of data to a processing unit.
20. The method of claim 17, wherein the DMA master device is adapted to control the synchronous processing of the stream of data by transmitting a direct memory access cycle to other ones of the plurality of processing circuits, wherein the direct memory access cycle corresponds to one or more single search cycles.
21. The method of claim 20, wherein the single search cycle comprises a time allotted to the plurality of processing circuits to process one unit of data in the stream of data.
22. An electronic device, comprising:
- a processing unit adapted to transfer processing of a data stream to a plurality of processing circuits; and
- a plurality of processing circuits adapted to process the data stream concurrently and transmit the results to the processing unit, wherein the plurality of processing circuits includes a DMA master device and one or more DMA slave devices.
23. The electronic system of claim 22, wherein the plurality of processing circuits are data pattern-recognition processors adapted to process one unit of data during a single search cycle according to a search criteria.
24. The electronic system of claim 23, wherein the single search cycle corresponds to a parallel DMA cycle controlled from the DMA master device and responded to by one or more DMA slave devices.
25. The method of claim 23, wherein the at least one sequence of data to be searched is set at the DMA master device and at the one or more DMA slave devices based on received address and data signals.
26. An electronic device, comprising:
- a plurality of pattern-recognition processors, wherein each of the plurality of pattern-recognition processors comprises a parallel direct memory access (PDMA) master/slave input adapted to place that processor into a master or a slave mode, and a PDMA connection adapted to receive DMA bus cycles concurrently.
Type: Application
Filed: Dec 1, 2008
Publication Date: Jun 3, 2010
Applicant: Micron Technology, Inc. (Boise, ID)
Inventor: Harold B Noyes (Boise, ID)
Application Number: 12/325,986
International Classification: G06F 13/40 (20060101);