Transactional memory having local CAM and NFA resources
A remote processor interacts with a transactional memory that has a memory, local BWC (Byte-Wise Compare) resources, and local NFA (Non-deterministic Finite Automaton) engine resources. The processor causes a byte stream to be transferred into the transactional memory and into the memory. The processor then uses the BWC circuit to find a character signature in the byte stream. The processor obtains information about the character signature from the BWC circuit, and based on the information uses the NFA engine to process the byte stream starting at a byte position determined based at least in part on the results of the BWC circuit. From the time the byte stream is initially written into the transactional memory until the time the NFA engine completes, the byte stream is not read out of the transactional memory.
Latest Netronome Systems, Inc. Patents:
- Communicating a neural network feature vector (NNFV) to a host and receiving back a set of weight values for a neural network
- Network interface device that sets an ECN-CE bit in response to detecting congestion at an internal bus interface
- Configuration mesh data bus and transactional memories in a multi-processor integrated circuit
- Table fetch processor instruction using table number to base address translation
- Low cost multi-server array architecture
The described embodiments relate generally to automaton hardware engines.
BACKGROUND INFORMATIONA network processor is a device that executes programs to handle packet traffic in a data network. A network processor is also often referred to as a network flow processor or simply a flow processor. Examples include network processor integrated circuits employed in routers and in other network equipment. Ways of improving network processors are sought.
SUMMARYIn a first novel aspect, an automaton hardware engine employs a transition table organized into 2n rows, where each row comprises a plurality of n-bit storage locations, and where each storage location can store at most one n-bit entry value. Each row corresponds to an automaton state. In one example, at least two NFAs (Non-deterministic Finite Automatons) are encoded into the table. The first NFA is indexed into the rows of the transition table in a first way, and the second NFA is indexed into the rows of the transition table in a second way. Due to this indexing, all rows are usable to store entry values that point to other rows.
In a second novel aspect, an NFA hardware engine includes a pipeline and a controller. The pipeline includes a plurality of stages, where one of the stages includes an automaton transition table. Both a first automaton and a second automaton are encoded in the same transition table. The controller receives NFA engine commands onto the NFA engine and controls the pipeline in response to the NFA engine commands.
In a third novel aspect, a remote processor interacts with a transactional memory. The transactional memory (for example, a Cluster Local Scratch block of a ME island) includes a memory, a local BWC (Byte-Wise Compare) circuit, and local NFA engine resources. The processor causes a byte stream to be transferred into the transactional memory, and more specifically into the memory. The processor then uses the BWC circuit to find a character signature in the byte stream. The processor obtains information about the character signature from the BWC circuit, and based on the information uses the NFA engine to process the byte stream starting at a byte position determined based at least in part on the results of the BWC circuit. From the time the byte stream is initially written into the transactional memory (into the Cluster Local Scratch block) until the time the NFA engine completes, the byte stream is not read out of the transactional memory (out of the Cluster Local Scratch block).
In a fourth novel aspect, an NFA byte detector circuit includes a hardware byte characterizer, a first matching circuit (that performs a TCAM match function), a second matching circuit (that performs a wide match function), a multiplexer that outputs a selected output from either the first or second matching circuits, and a storage device. The storage device includes a first plurality of N storage locations, a second plurality of O storage locations, and a third plurality of P storage locations. N data values stored in the first storage locations of the storage device are supplied to the first matching circuit as an N-bit mask value and are simultaneously supplied to the second matching circuit as N bits of an N+O-bit mask value. O data values stored in the second storage locations of the storage device are supplied to the first matching circuit as the O-bit match value and are simultaneously supplied to the second matching circuit as O bits of the N+O-bit mask value. P data values stored in the third storage locations are supplied onto the select inputs of the multiplexer.
In a fifth novel aspect, a method of notifying a processor of completion of an NFA operation involves communicating a first command across a bus to an NFA engine. The first command is an instruction to the NFA engine to perform the NFA operation. The processor then communicates a second command across the bus to the NFA engine. The second command is an instruction to the NFA engine to return a reference value. The NFA engine carries out the first and second commands in order. As a result of carrying out the first command, the NFA engine performs the NFA operation, generates a result, and stores the result in a memory. As a result of carrying out the second command, the NFA engine writes the reference value across the bus, thereby informing the processor that the NFA has completed and that the results are available.
Further details and embodiments and techniques are described in the detailed description below. This summary does not purport to define the invention. The invention is defined by the claims.
The accompanying drawings, where like numerals indicate like components, illustrate embodiments of the invention.
Reference will now be made in detail to background examples and some embodiments of the invention, examples of which are illustrated in the accompanying drawings. In the description and claims below, relational terms such as “top”, “down”, “upper”, “lower”, “top”, “bottom”, “left” and “right” may be used to describe relative orientations between different parts of a structure being described, and it is to be understood that the overall structure being described can actually be oriented in any way in three-dimensional space.
In addition to the area of the input/output circuits outlined above, the IB-NFP integrated circuit 13 also includes two additional areas. The first additional area is a tiling area of islands 36-60. Each of the islands is either of a full rectangular shape, or is half the size of the full rectangular shape. For example, the island 41 labeled “PCIE (1)” is a full island. The island 46 below it labeled “ME CLUSTER (5)” is a half island. The functional circuits in the various islands of this tiling area are interconnected by: 1) a configurable mesh Command/Push/Pull (CPP) data bus, 2) a configurable mesh control bus (CB), and 3) a configurable mesh event bus (EB). Each such mesh bus extends over the two-dimensional space of islands with a regular grid or “mesh” pattern. For additional information on the CPP data bus, the control bus, and the event bus, see: U.S. patent application Ser. No. 13/399,433, entitled “Staggered Island Structure in an Island-Based Network Flow Processor,” filed on Feb. 17, 2012 (the entire subject matter of which is incorporated herein by reference).
In addition to this tiling area of islands 36-60, there is a second additional area of larger sized blocks 61-65. The functional circuitry of each of these blocks is not laid out to consist of islands and half-islands in the way that the circuitry of islands 36-60 is laid out. The mesh bus structures do not extend into or over any of these larger blocks. The mesh bus structures do not extend outside of islands 36-60. The functional circuitry of a larger sized block may connect by direct dedicated connections to an interface island and through the interface island achieve connectivity to the mesh buses and other islands.
The arrows in
For each packet received onto the IB-NFP 13, the functional circuitry of ingress NBI island 58 examines fields in the header portion of the packet to determine what storage strategy to use to place the payload of the packet into memory. In one example, NBI island 58 examines the header portion and from that determines whether the packet is an exception packet or whether the packet is a fast-path packet. If the packet is an exception packet then the ingress NBI island 58 determines a first storage strategy to be used to store the packet so that relatively involved exception processing can be performed efficiently, whereas if the packet is a fast-path packet then the ingress NBI island 58 determines a second storage strategy to be used to store the packet for more efficient transmission of the packet from the IB-NFP. Ingress NBI island 58 examines the packet headers of the header portion, performs packet preclassification, determines that the packet is a fast-path packet, and determines that the header portion of the packet should be placed into a CTM (Cluster Target Memory) in ME (Microengine) island 52. The header portion of the packet is therefore communicated across the configurable mesh data bus from ingress NBI island 58 to ME island 52. The CTM is tightly coupled to microengines in the ME island 52. The ME island 52 determines header modification and queuing strategy for the packet based on the packet flow (derived from packet header and contents) and the ME island 52 informs a second NBI island 49 of these. The payload portions of fast-path packets are placed into internal SRAM (Static Random Access Memory) MU block 64. The payload portions of exception packets are placed into external DRAM 18 and 19.
Half island 54 is an interface island through which all information passing into, and out of, SRAM MU block 64 passes. The functional circuitry within half island 54 serves as the interface and control circuitry for the SRAM within block 64. For simplicity purposes in the discussion below, both half island 54 and MU block 64 may be referred to together as the MU island, although it is to be understood that MU block 64 is actually not an island as the term is used here but rather is a block. The payload portion of the incoming fast-path packet is communicated from ingress NBI island 58, across the configurable mesh data bus to SRAM control island 54, and from control island 54, to the interface circuitry in block 64, and to the internal SRAM circuitry of block 64. The internal SRAM of block 64 stores the payloads of the fast-path packets so that they can be accessed for flow determination by the ME island.
In addition, a preclassifier in the ingress NBI island 58 determines that the payload portions for others of the packets should be stored in external DRAM 18 and 19. For example, the payload portions for exception packets are stored in external DRAM 18 and 19. The data payload for the HTTP GET message described in connection with
At this point in the operational example, the packet header portions and their associated payload portions are stored in different places. The payload portions of fast-path packets are stored in internal SRAM in MU block 64, whereas the payload portions of exception packets are stored in external DRAMs 18 and 19. The header portions of all packets are stored in CTM 66 in ME island 52. In the example of the HTTP GET message, the header portion is then further examined by circuitry of the ME island 52, including the NFA engine 2 of the ME island 52. The NFA engine can autonomously read and write SRAM memory unit 84.
When the packets are to be sent out of the IB-NFP, the ME island 52 informs egress NBI island 49 where the packet headers and the packet payloads can be found and provides the egress NBI island 49 with an egress packet descriptor for each packet. The egress packet descriptor indicates a queuing strategy to be used on the packet. The egress NBI island 49 uses the egress packet descriptor to read the packet headers and any header modification from ME island 52 and to read the packet payloads from either internal SRAM 64 or external DRAMs 18 and 19. The egress NBI island 49 places packet descriptors for packets to be output into the correct order. For each packet that is then scheduled to be transmitted, the egress NBI island 49 uses the packet descriptor to obtain the header portion and any header modification and the payload portion and to assemble the packet to be transmitted. The header modification is not actually part of the egress packet descriptor, but rather it is stored with the packet header by the ME when the packet is presented to the NBI. The egress NBI island 49 then performs any indicated packet modification on the packet. The resulting modified packet then passes from the egress NBI island 49 and to the egress MAC island 50. Egress MAC island 50 buffers the packets, and converts them into symbols. The symbols are then delivered by conductors from the MAC island 50 to the four SerDes I/O blocks 25-28. From SerDes I/O blocks 25-28, the 100 Gbps outgoing packet flow passes out of the IB-NFP integrated circuit 13 and to the switch fabric (not shown) of the router.
In the present example, the packet headers 69 of the packet containing the HTTP GET message are stored in CTM 66. Microengine 71 causes these packet headers 69 to be moved from CTM 66 into an SRAM memory unit 84 located in a transactional memory 87 of the ME island 52. The transactional memory 87 is referred to as the Cluster Local Scratch (CLS). The microengine 71 performs this data move by accessing the CTM 66 via BD island bridge 67 and CPP data bus interface 68, and by accessing the CLS 87 via the DB island bridge 67 and CPP data bus interface 85. The ME 71 reads from the CTM and writes to the CLS using CPP bus transactions just as if the CTM and CLS were located on another island, except that the destination values of the CPP bus transactions indicate that the CTM and CLS are local and are located on the same island as the microengine 71. For further detail on how an ME located in an island can act as a CPP bus master and engage in CPP bus transactions with another device located on the same island acting as a CPP bus target, see: U.S. patent application Ser. No. 13/399,433, entitled “Staggered Island Structure in an Island-Based Network Flow Processor,” filed on Feb. 17, 2012 (the entire subject matter of which is incorporated herein by reference).
The CLS 87 includes the novel NFA engine 2 mentioned above in connection with
Meanwhile, after decoding by decoder 97, the command passes through operation FIFO 101 and is translated into a set of opcodes 102 by translator 103. There is one opcode for each stage of the CLS pipeline 92. Each pipeline stage has an input register or FIFO, and an amount of logic referred to here generally as an ALU. Reference numerals 104-109 identify the incoming registers for pipeline stages 110-115, respectively. Reference numerals 116-121 identify the ALUs for pipeline stages 110-115, respectively. Each opcode determines what a corresponding pipeline stage will do during the clock cycle when the command is being processed by that stage. For example, if the command is a Byte-Wise Compare (BWC) CAM operation command, then the ring operation performs no function. The BWC CAM operation is a multi-character match, and requires sixty-four bits to data to be read from the SRAM memory unit 84. The read stage 111 outputs a read request via conductors 122. After a pull-id has been posted to the DB island bridge 67 as described above, it may take a substantial period of time for the requested pull data to be returned via pull FIFO 90. The wait stage 112 is controlled by one of the opcodes to slow the pipeline long enough for the returned pull data be present on the input of the pull stage 113 at the time when processing of the command is being performed by the pull stage. In the example of the BWC CAM operation, an 8-bit value to compare against the 64-bit value read from memory 84 is to be pulled from the master. SRAM memory unit 84 is organized as 64-bit bit words, so a word read received via conductors 123 is sixty-four bits long. In the appropriate clock cycle, the op code for the execute stage 114 causes the ALU 120 of the execute stage to compare the 8-bit pull data value passed in from the pull stage via register 108 with the sixty-four bits read from the SRAM memory unit 84. The ALU 120 generates an output value. If the command requires an output value to be written to the SRAM memory unit, then the write stage 115 causes an appropriate write to occur across conductors 124. Likewise, if the command requires an output value to be returned to the CPP bus master across the DB island bridge, then the write stage 115 causes an appropriate CPP push bus transaction value to be supplied push to FIFO 91 via conductors 125. In the case of the BWC CAM operation, the output value generated by the ALU 120 of the execute stage is pushed back to the CPP bus master that originally initiated the command. The pushed back CPP bus transaction value includes a push bus transaction value that the master uses to associate the incoming push with the previously issued command. The bus interface of the master then writes the data of the push transaction into the master's memory at the appropriate location.
The stages 110-115 are pipelined. The CLS pipeline 92 of processing stages does not have an instruction counter and does not fetch and decode instructions. In a first cycle of the clock signal CLK, the ring operation stage 110 performs its functions required by the command, in a second cycle of the clock signal CLK the read stage 111 performs its function required by the command, in a third cycle of the clock signal CLK the wait stage 112 performs its function required by the command, in a fourth cycle of the clock signal CLK the pull stage 113 performs its function required by the command, in a fifth cycle of the clock signal CLK the execute stage 114 performs its function required by the command, and in a sixth cycle of the clock signal CLK the write stage 115 performs its function required by the command. A different command is output from the operation FIFO 101 each cycle of the clock signal, so one command can be executed by the pipeline each cycle. A CPP bus master can use the CPP bus to write CLS pipeline commands into the CLS pipeline 92 that cause data to be written into SRAM memory unit 84, that cause data to be read from the SSB peripherals block 93, that cause data to be read from the SRAM memory unit 84, and that cause other operations to occur.
In addition to the CLS pipeline 92, the CLS 87 includes the SSB peripherals block 93. The SSB peripherals block 93 includes an event manager 128, a random number generator 129, and the novel NFA engine 2.
The IB-NFP integrated circuit 13 has a configurable mesh event bus structure of event ring segments. This configurable event bus structure is configurable to form one or more event rings that pass through event manager circuits in the various islands of the integrated circuit. Event manager circuits are disposed along a ring so that event packets passing through the ring pass into and out of event manager circuits as the packet travels around the ring. An event manager can inject an event packet into the ring such that the event packet then circulates through the ring to other event manager circuits in other islands. An event manager can also monitor event packets passing through the ring. The event ring structure provides a way to communicate events and status information among event manager circuits in the various islands of the IB-NFP. For example, functional circuitry in an island can cause a local event manager to inject an event packet onto an event ring, where the event packet then travels around a ring to other islands and serves to alert other functional circuits in other islands of the event. Filters in an event manager can be programmed by functional circuitry to ignore unwanted event packets, but to detect particular event packets and to alert the functional circuitry of such particular event packets.
Using this event bus mechanism, a CPP bus master can configure the NFA engine 2 via the CLS pipeline 92 so that when an NFA completes a particular NFA the NFA engine 2 will cause the event manager 128 to inject an event packet into the event ring, where the event packet carries a particular reference value (set up beforehand by the CPP bus master). The reference value indicates why the event packet was generated. Once the event packet has been injected into the ring, the event packet passes around the ring and upon its return is detected by the event manager 128, which in response alerts the autopush block 94 by sending the reference value (indicating why the event packet was generated). The autopush block 94 (see
Stage 2 181 of the NFA pipeline 140 includes sixteen byte detector configuration pipeline registers 188, pipeline registers 189-190, a hardware byte characterizer 191, register 192, sixteen byte detectors 193, and a two-stage combiner 194. Each of the sixteen pipeline registers 188 receives a portion of the config data that was stored in register 187 of stage 1 180 during the previous clock cycle. More particularly, the config data stored in one of the sixteen pipeline registers is configuration data for a corresponding one of the sixteen byte detectors 193. The pipeline register 189 stores another part of the configuration data that was stored in register 187 of stage 1 180 during the previous clock cycle, namely configuration information for the two stage combiner 194. Metadata pertaining to a particular byte is passed down from state machine 186 and is stored in pipeline register 190. This metadata about the byte is available to generate control signals STAGE 2_CTL for the second stage when the second stage is processing that byte. All this configuration and control information configures and controls the other parts of the stage to process the incoming data byte. The incoming data byte is characterized by hardware byte characterizer 191, thereby generating eight BYTE_CONTROL_MATCH[0 . . . 7] values and sixteen BYTE_RE_MATCH[0 . . . 15] values.
Stage 3 182 includes pipeline registers 200-202, transition table 203, next state logic 204, a multiplexer 205, and a current states register 206. Transition table 203 is a two-dimensional array of 4-bit storage locations, where each storage location can store a 4-bit “entry value”. The table is “two-dimensional” not necessarily in a spatial sense, but rather is two-dimensional in a logical sense. An entry value indicates one of sixteen states. There are sixteen rows of sixteen such entry values, where each row corresponds to a state. The top row corresponds to state “0000”, the next row down corresponds to state “0001”, and so forth. One or more of the rows can be “active”. A row is indicated to be “active” if its corresponding bit in a “current state vector” 212 is asserted. For example, if the bit 213 of the “current states vector” 212 is asserted, then the top row is active. Within a row, an entry value is “selected” if the row is active and if the entry value is in a column of a byte characteristic bit that is asserted. The sixteen byte characteristic bits 214 are shown coming down from the top of the transition table in the illustration of
The next state logic 204 includes sixteen 16:1 OR gates. Each row of the transition table supplies one bit to each OR gate. For example, the top row supplies the leftmost bit coming into OR gate 215, and supplies the leftmost bit coming into OR gate 216, and so forth. The second row from the top supplies the next leftmost bit coming into OR gate 215, and supplies the next leftmost bit coming into OR gate 216, and so forth. If any of the selected entry values in the top row is “0000”, then the leftmost bit coming into the leftmost OR gate 215 is set. If any of the selected entry values in the top row is “0001”, then the leftmost bit coming into the next leftmost OR gate 216 is set. If any of the sixteen bits supplied from the transition table into OR gate 215 is asserted, then OR gate 215 asserts the leftmost bit of the “next states vector” 211. The leftmost bit of the “next states vector” 211 being set indicates that one or more selected entry values in the transition table are pointing to a next state of “0000”. Similarly, the next leftmost bit of the “next states vector” 211 being set indicates that one or more selected entry values in the transition table are pointing to a next state of “0001”. The bits of the “next state vector” 211 indicate which of the sixteen states will be active in the next cycle of the NFA. The current states register 206 outputs the “current states vector” 212 to the transition table, and receives the “next state vector” 211 from the transition table. At the beginning of NFA operation, the active states are not determined by the transition table 203, but rather are part of the configuration data stored in pipeline register 201. This initial states vector is supplied from the pipeline register 201 via multiplexer 205 to be the current states vector 212. An NFA can start in multiple states, so more than one bit of the initial states vector can be set. The 4-bit transition table entry values can be preloaded into the 4-bit storage locations of the transition table 203 under the control of the command interpreter 143 as a result of executing an NFA engine config command whose subtype field indicates that transition table entry values are to be configured.
Stage 4 183 includes pipeline registers 207 and 209, a result formatting circuit 208, and an output register 210. In one example, state “1110” (14) is the result state. When the transition table indicates a transition to state “1110”, then the previous current active state that gave rise to the transition is communicated as part of a “terminating states vector” 217 into pipeline register 207. Each bit of the 16-bit “terminating states vector” 217 corresponds to a prior state that could have given rise to the transition to the result state. If the bit is set, then the corresponding state is indicated to have given rise to a transition to the result state. The result code 218 as passed to the pipeline register 209 of stage 4 183 indicates the format that the prior state information will be output in. As determined by the result code 218, a result value is typically formatted to include: 1) a 4-bit value that indicates the prior state that gave rise to the transition to the result state, and 2) a 12-bit byte offset from the start of the byte processed by the NFA, where the offset identifies the data byte in the byte stream that caused the transition. The two values are a pair and relate to the same transition to the result state. In an NFA, multiple terminations can occur in the same clock cycle, so multiple corresponding 16-bit result values can be generated during one clock signal as well. The 16-bit result values are stored in output register 210, and are then output from the NFA pipeline 140 via conductors 219 so that they can be stored as a list in SRAM memory unit 84. The end of a list of such 16-bit result values as stored in the SRAM memory unit 84 is marked by a stored 16-bit value of “FFFF”.
The composition and operation of a byte detector of stage 2 181 is described in further detail in connection with
The TCAM match circuit 221 of
The equal either circuit 222 of
The re-match circuit 223 of
The control match circuit 224 of
In step 302 of the method of
Ingress island 58 in the IB-NFP writes the packet data payload into SRAM in Memory Unit (MU) island 64. Ingress island 58 also writes the packet headers 69 into Cluster Target Memory (CTM) 66 in ME island 52. In addition the packet headers 69 are copied to the ME 71. In step 303, ME 71 determines that the received packet is the start of a Transmission Control Protocol (TCP) connection based on the packet headers. In step 304, ME 71 then moves the packet headers 69 from CTM 66 to the CLS memory unit SRAM 84. In step 305, the ME 71 issues a Content Addressable Memory (CAM) instruction to cause the CLS pipeline 92 to find the “G” in the TCP payload and to return the byte position of the “G” in the event that a “G” is found in the TCP payload. Searching for the “G” in the TCP payload aids in the determination of whether the TCP payload includes a HTTP “GET” message.
In step 306 of the method of
NFA#1 generates a first result value when “E” followed by “T” followed by “space” followed by a “not space” sequence is found. The first result includes the number of the state from which the transition to the result state occurred. As indicated by the graph of
NFA#1 next generates a second result value when NFA#1 finds a newline after the protocol version field of the parsed stream. The second result value includes the number of the state from which the transition to the result state occurred. As indicated by the graph of
The third result value is generated when NFA#1 finds the second newline after the user-agent field of the parsed stream. The third result value includes the number of the state from which the transition to the result state occurred. As indicated by the graph of
The fourth result value is generated when NFA#1 finds an “H” followed by an “O” followed by an “S” followed by a “T”. The fourth result value includes the number of the state from which the transition to the result state occurred. As indicated by the graph of
The fifth result value is generated when NFA#1 finds the third newline in the parsed stream. The fifth result value includes the number of the state from which the transition to the result state occurred. As indicated by the graph of
The sixth result value is generated when NFA#1 finds the fourth newline in the parsed stream. The sixth result value includes state number 5, and a byte offset of 84 bytes. Next, in step 313 of the method of
In step 314, in response to the NFA engine “GENERATE EVENT” command, the NFA engine 2 supplies an event value to the event manager 128. The event manager 128 in turn pushes an event packet on the event bus (EB).
NFA#2 then completes in step 322. In step 323, as a result of carrying out of the NFA engine “GENERATE EVENT” command, the NFA engine 2 supplies an event value to the event manager 128. The event manager 128 pushes an event packet onto the event bus EB. The data field of the event packet includes the second NFA reference value (NFA reference value #2) 401. NFA reference value #2 indicates that NFA#2 has completed. The event manager 128 then detects the event packet on the event bus, and in response causes an auto push into the CLS pipeline (step 324). The autopush in turn causes the CLS pipeline to do a CPP bus push across the CPP bus back to the ME. The push data includes NFA reference value #2 and this is used by the ME 95 an indication that NFA#2 has completed. In step 325, the ME reads the result values of NFA#2 (see
Although certain specific embodiments are described above for instructional purposes, the teachings of this patent document have general applicability and are not limited to the specific embodiments described above. Accordingly, various modifications, adaptations, and combinations of various features of the described embodiments can be practiced without departing from the scope of the invention as set forth in the claims.
Claims
1. A method comprising:
- (a) causing a byte stream to be transferred into a transactional memory and to be stored into a memory of the transactional memory, wherein the transactional memory includes the memory, a BWC (Byte-Wise Comparator) circuit and an NFA (Non-deterministic Finite Automaton) engine;
- (b) causing the transactional memory to use the BWC circuit to find a character signature in the byte stream thereby determining byte position information indicative of a byte position of the character signature in the byte stream;
- (c) receiving the byte position information from the transactional memory; and
- (d) causing the NFA engine to use a first NFA to process the byte stream starting at a byte position determined based at least in part on the byte position information of (b), wherein (a) through (d) are performed by a processor that is not a part of the transactional memory, and wherein the byte stream is not read out of the transactional memory at any time between the transferring of (a) until the processing of (d) is completed.
2. The method of claim 1, wherein the character signature is a single particular character, wherein the BWC circuit stores a plurality of bytes of the byte stream at a time, and wherein the BWC circuit outputs address information indicative of where the particular character is present in the plurality of bytes stored in the BWC circuit.
3. The method of claim 1, wherein the character signature is a single particular character, and wherein the first NFA of (d) determines if the particular character is a part of a predetermined string of characters.
4. The method of claim 1, wherein the first NFA of (d) determines if the byte stream contains a predetermined character string of an HTTP message.
5. The method of claim 1, further comprising:
- (e) receiving onto the processor one or more result values generated by the first NFA; and
- (f) based at least in part on the one or more result values of (e) causing the NFA engine to process at least a part of the byte stream using a second NFA.
6. The method of claim 5, wherein the second NFA of (f) determines all byte positions of a particular character in the byte stream.
7. The method of claim 1, wherein successive pluralities of bytes of the byte stream are read out of the memory and are stored into the BWC circuit.
8. The method of claim 1, wherein the byte stream is a serial stream of bytes of a network communication.
9. The method of claim 1, wherein the BWC circuit is a part of a pipeline, wherein the pipeline is coupled to a first port of the memory, and wherein the NFA is coupled to a second port of the memory.
10. An apparatus comprising:
- a transactional memory comprising: a memory having a first port and a second port; an Non-deterministic Finite Automaton (NFA) engine that is coupled to the memory via the second port; and a pipeline of processing stages, wherein one of the stages comprises a BWC (Byte-Wise Comparator) circuit, and wherein the pipeline is coupled to the memory via the first port; and a processor that transfers a byte stream into the memory, that causes the pipeline to use the BWC circuit to identify a character signature in the byte stream thereby generating byte position information, that receives a first communication from the transactional memory, and that in response to the first communication causes the NFA engine to use a first NFA to process at least a part of the byte stream.
11. The apparatus of claim 10, wherein the byte stream is not read out of the transactional memory at any time between the transferring of the byte stream into the memory until the first NFA has completed processing the byte stream.
12. The apparatus of claim 10, wherein the processor receives a second communication from the transactional memory that indicates that the first NFA has completed, and that in response to the second communication causes the NFA engine to use a second NFA to process at least a part of the byte stream.
13. The apparatus of claim 12, wherein the byte stream is not read out of the transactional memory at any time between the transferring of the byte stream into the memory until the second NFA has completed processing the byte stream.
14. The apparatus of claim 10, wherein the pipeline of processing stages does not have an instruction counter and does not fetch and decode instructions.
15. The apparatus of claim 10, wherein the NFA engine and the pipeline are relatively more tightly coupled to the memory, and wherein the processor is less tightly coupled to the memory.
16. The apparatus of claim 10, wherein the processor communicates with the transactional memory via a bus, and wherein the NFA engine and the pipeline read from the memory without using the bus.
17. An apparatus comprising:
- a transactional memory comprising: a memory; an Non-deterministic Finite Automaton (NFA) engine that is coupled to the memory; and a pipeline of processing stages, wherein one of the stages comprises a BWC (Byte-Wise Comparator) circuit, and wherein the pipeline is coupled to the memory; and processing means for: a) transferring a byte stream into the transactional memory and into the memory, 2) causing the pipeline to use the BWC circuit to identify a character signature in the byte stream thereby generating byte position information, 3) receiving a communication from the transactional memory, and 4) in response to the communication causing the NFA engine to use an NFA to process at least a part of the byte stream, wherein the byte stream is not read out of the transactional memory at any time between the transferring of the byte stream into the memory until the NFA has completed its processing of the byte stream.
18. The apparatus of claim 17, wherein the NFA engine is coupled to the memory via a second port of the memory, and wherein the pipeline is coupled to the memory via a first port of the memory.
19. The apparatus of claim 17, wherein the pipeline identifies the character signature in the byte stream by reading multiple bytes of the byte stream out of the memory and loading the multiple bytes into the BWC circuit, wherein the BWC compares each byte of the multiple bytes in parallel with a character signature to determine if said each byte matches the character signature.
20. The apparatus of claim 17, wherein the processing means is also for transferring NFA engine configuration information into the transactional memory and into the memory, wherein the NFA engine configuration information includes transition table entry values.
3676851 | July 1972 | Eastman |
5140644 | August 18, 1992 | Kawaguchi |
6744533 | June 1, 2004 | Easwar |
7805392 | September 28, 2010 | Steele |
7810155 | October 5, 2010 | Ravi |
20020059551 | May 16, 2002 | Alamouti |
20050028114 | February 3, 2005 | Gould |
20050035784 | February 17, 2005 | Gould |
20060020595 | January 26, 2006 | Norton |
20060136570 | June 22, 2006 | Pandya |
20080077587 | March 27, 2008 | Wyschogrod |
20080140911 | June 12, 2008 | Pandya |
20090070459 | March 12, 2009 | Cho |
20100229238 | September 9, 2010 | Ma |
20100325080 | December 23, 2010 | Ichino |
20110004694 | January 6, 2011 | Taylor |
20130332660 | December 12, 2013 | Talagala |
20140317134 | October 23, 2014 | Chen |
20140379961 | December 25, 2014 | Lasser |
- Yang, Y.H.E. et al. (Oct. 2011). Optimizing regular expression matching with sr-nfa on multi-core systems. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on (pp. 424-433). IEEE.
- Norton, M. (2004). “Optimizing Pattern Matching for Intrusion Detection”.
- De Beijer, N. et al. (Aug. 2010). “Improving Automata Efficiency by Stretching and Jamming”. In Stringology (pp. 9-24).
- Cascarano N. et al. (2010). “iNFAnt: NFA pattern matching on GPGPU devices”. In: Computer Communication Review, vol. 40:5, pp. 20-26. DOI:10.1145/1880153.1880157.
Type: Grant
Filed: Jan 9, 2014
Date of Patent: Oct 11, 2016
Patent Publication Number: 20150193266
Assignee: Netronome Systems, Inc. (Santa Clara, CA)
Inventors: Gavin J. Stark (Cambridge), Steven W. Zagorianakos (Brookline, NH)
Primary Examiner: Stanley K Hill
Assistant Examiner: Benjamin Buss
Application Number: 14/151,677
International Classification: G06F 9/46 (20060101);