IN-MEMORY COMPUTATION DEVICE WITH INTER-PAGE AND INTRA-PAGE DATA CIRCUITS
An in-memory computation device is described that comprises a memory with a plurality of blocks B(n) of cells, where n ranges from 0 to N−1. A page output circuit PO(n) and page input circuit PI(n) are operatively coupled to block B(n) in the plurality of sets. A data bus system for providing an external source of input data and a destination for output data is provided. Data circuits are configurable connect page input circuit PI(n) to one or more of page output circuit PO(n), page output circuit PO(n−1), and the data bus system to source the page input data in a sensing cycle. This configuration can be done between each sensing cycle, or in longer intervals, in order to support a variety of neural network configurations and operations.
Latest MACRONIX INTERNATIONAL CO., LTD. Patents:
The present invention relates to in-memory computing devices, and more particularly, to in-memory computing devices supporting efficient data sharing among multiple computational stages.
Description of Related ArtA neural network is an information processing paradigm that is inspired by the way biological nervous systems process information. With the availability of large training datasets and sophisticated learning algorithms, neural networks have facilitated major advances in numerous domains such as computer vision, speech recognition, and natural language processing.
The basic unit of computation in a neural network is often referred to as a neuron. A neuron receives inputs from other neurons, or from an external source, performs an operation, and provides an output.
In the sum-of-products expression above, each product term is a product of a variable input x1 and a weight w1. The weight w1 can vary among the terms, corresponding for example to coefficients of the variable inputs x1. Similarly, outputs from the other neurons in the hidden layer can also be calculated. The outputs of the two neurons in the hidden layer 110 act as inputs to the output neuron in the output layer 104.
Neural networks are used to learn patterns that best represent a large set of data. The hidden layers closer to the input layer in general learn high level generic patterns, and the hidden layers closer to the output layer in general learn more data-specific patterns. Training is a phase in which a neural network learns from training data. During training, the connections in the synaptic layers are assigned weights based on the results of the training session. Inference is a stage in which a trained neural network is used to infer/predict using input data and to produce output data based on the prediction.
In-memory computing is an approach in which memory cells, organized in an in-memory computing device, can be used for both data processing and memory storage. A neural network can be implemented using an in-memory computing device for a number of synaptic layers. The weights for the sum-of-products function can be stored in memory cells of the in-memory computing device. The sum-of-products function can be realized as a circuit operation in the in-memory computing device in which the electrical characteristics of the memory cells of the array effectuate the function.
An engineering issue associated with neural networks relates to movement of data among the synaptic layers. In some embodiments, there can be thousands of neurons in each layer, and the routing of the outputs of the neurons to the inputs of other neurons can be a time consuming aspect of the execution of the neural network.
It is desirable to provide in-memory neural network technology that supports efficient movement of data among the computational components of the system.
SUMMARYAn in-memory computation device is described that comprises a memory configured in a plurality of blocks useable for in-memory computations. Blocks B(n), n going from 0 to N−1, in the plurality of blocks have corresponding page input circuits PI(n) and page output circuits PO(n) that are operatively coupled to sets of bit lines in the blocks. For example, each block B(n) can include a set S(n) of bit lines coupled to its corresponding page output circuit and to its corresponding page input circuit. The device includes in some embodiments a data bus system for providing an external source of input data and a destination for output data. Data circuits are configurable to connect page input circuit PI(n) to one or more of page output circuit PO(n), page output circuit PO(n−1), and the data bus system to source input data for a sensing cycle. This configuration can be done between each sensing cycle, or in longer intervals, in order to support a variety of neural network configurations and operations.
In a device described herein, a plurality of bit line bias circuits is connected to bit lines in the plurality of blocks. Bit line bias circuit Y(n) in the plurality of bit line bias circuits being operatively coupled to block B(n), and to page input circuit PI(n) in the plurality of page input circuits. The bit line biasing circuit can bias the bit lines in block B(n) in response to input voltages generated by page input circuit PI(n).
Other aspects and advantages of the present invention can be seen on review of the drawings, the detailed description and the claims, which follow.
A detailed description of embodiments of the present invention is provided with reference to the
The memory cells in the array of
The NOR style array can be implemented with many bit lines and word lines, storing thousands or millions of bits. The NOR style array of
In-memory computation circuits as described herein can use memory arrays of other styles in some examples, including for example AND style arrays or NAND style arrays.
In this example, the blocks B(n) in the plurality of blocks comprise corresponding YPASS circuits 301, 302, which combine the signals on the set S(n) of bit lines in the block into an output signal on a data line DL(n). The memory cells coupled to the bit lines in the set S(n) of bit line store a coefficient vector W(n) represented by the threshold voltages of the memory cells in a row or rows selected by word line signals.
Also, the YPASS circuits include a plurality of bit line bias circuits (not shown in
In the illustrated embodiment, two blocks, B(0) and B(1), are illustrated. In general, there can be a plurality of blocks B(n), where “n” ranges from 0 to N−1, and N can be any positive integer more than 1. In some embodiments, the number N can be 8 for example, or 16. In other embodiments, N can be in much higher.
The memory 300 can comprise a single two-dimensional array, in which the blocks are arranged side-by-side in sequence. In other embodiments, the memory 300 can comprise a three-dimensional array, including a plurality of stacked two-dimensional arrays. In this case, each two-dimensional array in the stack can comprise one block. Blocks arranged in sequence can reside on sequential levels of the stack. In other arrangements, each two-dimensional array in the stack can comprise more than one block configured as illustrated.
A data bus system, represented by the input data block 320 and the input data block 321, is provided on the device. The data bus system can be used as an input/output interface for data from an external data source, or from other circuitry on the device that generates data for use as the input vectors in some configurations. The input data block 320 and the input data block 321 can comprise, for examples, a data cache memory, a register or a latch coupled to input/output circuitry on the device or other data bus circuits.
The output signals on lines DL0 and DL1 from the YPASS circuits 301, 302 represent an in-memory computation result, combining the signals on the bit lines in the corresponding block B(0) or B(1), that are produced in response to weights stored in memory cells selected by a word line signal on a selected word line in the memory array, and to the input signals X[n*z+1:(n+1)*z)] from lines 360, 361 for the corresponding block. Word line drivers 398 and decoders are included to provide the word line signals on selected word lines. In the simplified example shown in
In this illustration, the page input circuits (PI(0) and PI(1)) include respective input registers 314, 316 which store input data in the form of input vectors VI(n) received as input from data circuits on the device for the corresponding block B(n). The page input circuits can include circuits that convert the input vector VI(n), which have a number of bits equal to a multiple M of the number Z of bit lines in the block, to the input signals. Also, the page output circuits (PO(0) and PO(1)) include respective output registers 315, 317 which store output vectors VO(n) generated in response to an in-memory computation using memory cells in the corresponding block B(n) for output to the data circuits on the device. The output vectors VO(n) can have the same number of bits as the input vectors VI(n), or a different number of bits.
Data circuits on the device are configurable to interconnect the input register for a given page (e.g., PI(1)) singly or in any combination, with sources of input data, including the output register of a previous page (e.g., PO(0)), the output register of the same page (e.g., PO(1) as feedback) and the data bus to source input data for a given sensing cycle. An input vector including the input data applied to the input register in a given sensing cycle can be sourced by a single source, or a combination of multiple sources. The data circuits are configurable to transfer an output vector VO(0) of page output circuit PO(0) as all or part an input vector VI(1) to the next page input circuit PI(1) as represented by the line 340. Also, data circuits are configurable to feed back the output vector VO(0) of the page output circuit PO(0) as all or part of an input vector VI(0) for the page input circuit PI(0), as represented by line 350. In addition, the data circuits are configurable to connect the page input circuit PI(0) to the data bus system including input data block 320 as represented by line 330, to receive all or part of an input vector VI(0) from another source in the bus system.
Likewise, the data circuits on the device are configurable to transfer an output vector VO(1) of page output circuit PO(1) as all or part of an input vector VI(2) for the next page input circuit as represented by the line 341. Also, data circuits are configurable to feedback the output VO(1) of the page output circuit PO(1) as all or part of an input vector VI(1) for the page input circuit PI(1), as represented by line 351. In addition, the data circuits are configurable to connect the page input circuit PI(1) to the data bus system including input data block 321 as represented by line 331, to receive all or part of an input vector VI(1) from an another source on the bus system.
Configuration circuits 399 are included on the device. The circuits 399 include logic, configuration parameters storage, or both. The configuration circuits 399 control the configuration of the data circuits for the routing of input vectors and output vectors among the page input and page output circuits. The configuration of the data circuits can be set dynamically for each sensing cycle as suits the needs of a particular implementation, using timed control signals delivered to switches in the data circuits. Alternatively, the configuration circuit can use volatile or nonvolatile configuration registers to set up the data circuits.
Control circuits, not shown, are coupled to the circuit of
In this example, the blocks B(n) in the plurality of blocks comprise corresponding YPASS circuits 901, 902, which combine the signals on the set S(n) of bit lines in the block into an output signal on a data line DL(n). The memory cells coupled to the bit lines in the set S(n) of bit line store a coefficient vector W(n) represented by the threshold voltages of the memory cells in a row or rows selected by one or more word line signals.
Also, the word line drivers include a plurality of bias circuits (not shown in
In the illustrated embodiment, two blocks, B(0) and B(1), are illustrated. In general, there can be a plurality of blocks B(n), where “n” ranges from 0 to N−1, and N can be any positive integer more than 1. In some embodiments, the number N can be 8 for example, or 16. In other embodiments, N can be in much higher.
The memory 900 can comprise a single two-dimensional array, in which the blocks are arranged side-by-side in sequence. In other embodiments, the memory 900 can comprise a three-dimensional array, including a plurality of stacked two-dimensional arrays. In this case, each two-dimensional array in the stack can comprise one block. Blocks arranged in sequence can reside on sequential levels of the stack. In other arrangements, each two-dimensional array in the stack can comprise more than one block configured as illustrated.
A data bus system, represented by the input data block 920 and the input data block 921, is provided on the device. The data bus system can be used as an input/output interface for data from an external data source, or from other circuitry on the device that generates data for use as the input vectors in some configurations. The input data block 920 and the input data block 921 can comprise, for examples, a data cache memory, a register or a latch coupled to input/output circuitry on the device or other data bus circuits.
The output signals on lines DL0 and DL1 from the YPASS circuits 901, 902 represent an in-memory computation result, combining the signals on the bit lines in the corresponding block B(0) or B(1), that are produced in response to weights stored in memory cells selected by one or more word line signals on a selected word line in the memory array, and to the input signals X[n*z+1:(n+1)*z)] from lines 960, 961 for the corresponding block. In the simplified example shown in
In this illustration, the page input circuits (PI(0) and PI(1)) include respective input registers 914, 916 which store input data in the form of input vectors VI(n) received as input from data circuits on the device for the corresponding block B(n). The page input circuits can include circuits that convert the input vector VI(n), which have a number of bits equal to a multiple M of the number Z of bit lines in the block, to the input signals. Also, the page output circuits (PO(0) and PO(1)) include respective output registers 915, 917 which store output vectors VO(n) generated in response to an in-memory computation using memory cells in the corresponding block B(n) for output to the data circuits on the device. The output vectors VO(n) can have the same number of bits as the input vectors VI(n), or a different number of bits.
Data circuits on the device are configurable to interconnect the input register for a given page (e.g., PI(1)) singly or in any combination, with sources of input data, including the output register of a previous page (e.g., PO(0)), the output register of the same page (e.g., PO(1) as feedback) and the data bus to source input data for a given sensing cycle. An input vector including the input data applied to the input register in a given sensing cycle can be sourced by a single source, or a combination of multiple sources. The data circuits are configurable to transfer an output vector VO(0) of page output circuit PO(0) as all or part an input vector VI(1) to the next page input circuit PI(1) as represented by the line 940. Also, data circuits are configurable to feed back the output vector VO(0) of the page output circuit PO(0) as all or part of an input vector VI(0) for the page input circuit PI(0), as represented by line 950. In addition, the data circuits are configurable to connect the page input circuit PI(0) to the data bus system including input data block 920 as represented by line 930, to receive all or part of an input vector VI(0) from another source in the bus system.
Likewise, the data circuits on the device are configurable to transfer an output vector VO(1) of page output circuit PO(1) as all or part of an input vector VI(2) for the next page input circuit as represented by the line 941. Also, data circuits are configurable to feedback the output VO(1) of the page output circuit PO(1) as all or part of an input vector VI(1) for the page input circuit PI(1), as represented by line 951. In addition, the data circuits are configurable to connect the page input circuit PI(1) to the data bus system including input data block 921 as represented by line 931, to receive all or part of an input vector VI(1) from an another source on the bus system.
Configuration circuits 999 are included on the device. The circuits 999 include logic, configuration parameters storage, or both. The configuration circuits 999 control the configuration of the data circuits for the routing of input vectors and output vectors among the page input and page output circuits. The configuration of the data circuits can be set dynamically for each sensing cycle as suits the needs of a particular implementation, using timed control signals delivered to switches in the data circuits. Alternatively, the configuration circuit can use volatile or nonvolatile configuration registers to set up the data circuits.
Control circuits, not shown, are coupled to the circuit of
The YPASS circuit 401 includes a bit line bias circuit. In this embodiment, the bit line bias circuit includes a plurality of clamp transistors 490-493 having one source/drain terminal coupled to a bit line in the block B(0) of the array 410, and another source/drain terminal coupled to a summing node on the data line DL0. The gates of the clamp transistors 490-493 are connected to corresponding input signals in the set of input signals X[1:z] provided by the page input circuit 419.
The YPASS circuit 402 includes a bit line bias circuit. In this embodiment, the bit line bias circuit includes a plurality of clamp transistors 494-497, each having one source/drain terminal coupled to a bit line in the block B(1) of the array 410, and another source/drain terminal coupled to a summing node on the data line DL1. The gates of the clamp transistors 494-497 are connected to corresponding input signals in the set of input signals X[z+1:2z] provided by the page input circuit 459.
In the example shown in
In the example shown in
Likewise, the page output circuit 469 (PO(1)) includes an N-bit sense amplifier 470, or other type of multilevel sensing circuit or analog-to-digital converter. The output of the sense amplifier 470 is coupled to an output register 471 or other type of register. The output register 471 in this example is connected to a compute unit 472 which can accumulate or otherwise process data stored in the output register 471. In combination, the circuits in the page output circuit 469 convert the signal on the DL1 line into an output vector VO(n) for the computation based on a sum-of-signals generated by memory cells on a selected word line on the block B(n).
In this example, the cell current in each memory cell on the selected word line coupled to a bit line in the set of bit lines can be represented in one example memory system by the equation:
Data circuits on the device are configurable to transfer an output vector VO(0) of page output circuit 429 as an input vector VI(1) for the next page input circuit 459 as represented by the line 443. Also, data circuits are configurable to feedback the output VO(0) of the page output circuit 429 as an input vector VI(0) for the page input circuit 419 as represented by line 442. In addition, the data circuits are configurable to connect the page input circuit 419 to the data bus system 440 as represented by line 441, to receive an input vector VI(0) from an another source on the bus system.
Likewise, data circuits on the device are configurable to transfer an output vector VO(1) of page output circuit 469 as an input vector VI(1) for the next page input circuit as represented by the line 483. Also, data circuits are configurable to feedback the output VO(1) of the page output circuit 469 as an input vector VI(1) for the page input circuit 459 as represented by line 482. In addition, the data circuits are configurable to connect the page input circuit 459 to the data bus system 480 as represented by line 481 to receive an input vector VI(1) from an another source on the bus system.
In this example, the page input circuit 500 includes an input register 501 that comprises a plurality of latches 520-523.
Data circuits include a set of switches 510, 511, 512, 513 operable as a multiplexer in response to the signal SW. The switches 510, 511, 512, 513 connect and disconnect the register 501 to the previous page output circuit which provides output vector VO(n−1) on lines 508 of the data circuits for use as input vector VI(n). Also, the data circuits include a multiplexer 551 and input bus register 550. The multiplexer 551 is responsive to the signal SW_IN to connect and disconnect the input data from the input bus register 550 to deliver an input vector VI(n) to the input register 501. Also the data circuits include a bus 509 which is coupled to a multiplexer (see, MUX 642 in
In this example, for an embodiment in which block B(n) includes a number Z of bit lines, the input vector VI(n) includes Z chunks of M bits of data. Input register 501 applies the Z chunks of M bits of input vector VI(n) to conversion circuits 502 that provide the bias voltages X5 to X8 in the Figure to establish the drain level of the selected memory cells in the corresponding block B(n). In this example, the circuits 502 include one M bit digital-to-analog converter 530, 531, 532, 533 for each of the Z chunks in the input vector VI(n) to generate Z analog bias voltages for corresponding bit lines in the block B(n). In this example, M is 2 and Z is 4 with an input vector including 8 bits. In other embodiments, the input vector can include 16 bits, or any number of bits. Also, the number of chunks in the input vector used to generate a bias voltage for one of the corresponding bit lines in the set of bit lines can be determined by the number Z of bit lines in the set.
The data line DL(n) is connected to a sense amplifier 601 or other type of sensing circuit or analog-to-digital converter. The sense amplifier 601 in this example includes an adjustable current source I1 which is connected to node 610, which is also connected to data line DL(n). A precharge transistor 612 is coupled to a sensing node SEN. Also a capacitor 611 is coupled to the sensing node SEN. Transistor M2 is connected in series between node 610 and the sensing node SEN. The sensing node SEN is connected to the gate of transistor M1. Transistor M1 is connected between ground and output node A. The output node A is coupled to a set of switches responsive to respective switch signals SWQ1 to SWQ8 in this example for an 8-bit output. The switches apply data signals Q1 to Q8 to corresponding latches in the output register 620. The output register 620 stores the output vector VO(n) for the stage. In operation, the adjustable current source I1 is operated in coordination with the switch signals SWQ1 to SWQ8 to sense a plurality of levels of the signal on DL(n) and store the resulting sensing results in the output register 620.
Data circuits are coupled to the output register 620 for transferring the output vector VO(n) on lines 630 to the page input circuit in the next stage as the input vector VI(n+1), or in feedback on lines 631 as the input vector VI(n) for the page input circuit PI(n) of the same stage through a multiplexer 642, which is controlled by the signal SW_FB. The signal SW_FB is provided by a configuration logic/store such as configuration circuits 399 and configuration circuits 999 shown in
The data line DL(n) is connected to a sense amplifier 820 or other type of sensing circuit or analog-to-digital converter. The sense amplifier 820 in this example includes an adjustable current source I1 which is connected to node 810, which is also connected to data line DL(n). A precharge transistor 812 is coupled to a sensing node SEN. Also a capacitor 811 is coupled to the sensing node SEN. Transistor M2 is connected in series between node 810 and the sensing node SEN. The sensing node SEN is connected to the gate of transistor M1. Transistor M1 is connected between ground and output node A. The output node A is coupled to a set of switches including two members in this example, responsive to respective switch signals SWQ1 to SWQ2 in this example for an 2-bit output. The switches apply data signals Q1 to Q2 to corresponding latches (e.g. 821) in the output register 802. The output register 802 stores a chunk of output vector VO(n) for the stage. In operation, the adjustable current source I1 is operated in coordination with the switch signals SWQ1 to SWQ2 to sense a plurality of levels of the signal on DL(n) and store the resulting sensing results in the output register 802 in each sensing cycle. For example, in each sensing cycle, the sum of the signals from the set of memory cells on one selected word line in the selected set of bit lines can be converted to a 2-bit chunk.
The output of the output register 802 is applied to a layer interface block 803. In this example, the output vector VO(n) can be generated using bit lines on a plurality of layers of a 3D memory. In this example, the interface block 803 provides the chunks of data generated by the sense amplifier 800, which are combined with corresponding chunks generated by interface blocks on other layers of the 3D memory to form an output vector VO(n) having Z chunks of M bits.
In this example, the page output circuit PO(n) is described as comprising chunk-wide portions on a plurality of layers of a 3D memory. The page input circuit PI(n) can be configured in a similar way for connection to a block B(n) coupled to memory cells on the plurality of layers of the 3D memory.
In other embodiments, the interface block 803 can be used to accumulate additional chunks of the output vector in a sequence of sensing cycles using memory cells on a set of word lines in a single layer or single 2D array.
Data circuits are coupled to the interface block 803 for transferring the output vector VO(n) on lines 830 to the page input circuit in the next stage as the input vector VI(n+1), or in feedback on lines 831 as the input vector VI(n) for the page input circuit PI(n) of the same stage through a multiplexer 842, which is controlled by the signal SW_FB. The signal SW_FB is provided by a configuration circuit such as configuration circuits 399 and configuration circuits 999 shown in
An in-memory computing structure is described in which a sum-of-products result latched in a page output circuit can feedback as an input vector for the page input circuit in the same block, or pass as an input vector to the page input circuit in the next page. Also, the page input vector in each block can be coupled to a bus system that provides input vectors from other sources on the bus. Thus, the input data can be latched from outside the in-memory computing device, from a feedback loop of the previous state output from the same block, or from the output data in a previous block. With this scheme, a large bandwidth with very little routing capacitance can be achieved. Connection between two blocks can be very short, so that the previous layer or previous page output can be delivered to the next layer, or next block, fast and with low power. The structure can be applied to artificial intelligence processors use for both training and inference steps in deep learning.
While the present invention is disclosed by reference to the preferred embodiments and examples detailed above, it is to be understood that these examples are intended in an illustrative rather than in a limiting sense. It is contemplated that modifications and combinations will readily occur to those skilled in the art, which modifications and combinations will be within the spirit of the invention and the scope of the following claims.
Claims
1. An in-memory computation device, comprising:
- a memory including a plurality of blocks B(n), where n ranges from 0 to N−1;
- a plurality of page output circuits PO(n), where n ranges from 0 to N−1, connected to the memory;
- a plurality of page input circuits PI(n), where n ranges from 0 to N−1, connected to the memory and responsive to a page input data to apply a bias to the memory in a sensing cycle; the plurality of page output circuits, the plurality of page input circuits and the plurality of blocks being operatively coupled, so that a page output circuit PO(n) in the plurality of page output circuits and a page input circuit PI(n) in the plurality of page input circuits being operatively coupled to a block B(n) in the plurality of blocks, where n ranges from 0 to N−1;
- a data bus system; and
- data circuits configurable connect the page input circuit PI(n) to one or more of the page output circuit PO(n), the page output circuit PO(n−1), and the data bus system to select a source for the page input data in the sensing cycle.
2. The device of claim 1, including word line drivers coupled to the plurality of blocks in the memory to apply signals on word lines in the blocks, and wherein the page output circuit PO(n) in the plurality of page output circuits includes circuits that generate an output indicating a sum-of-signals on a set of bit lines in block B(n), responsive to a signal on a selected word line.
3. The device of claim 2, wherein the page input circuit PI(n) includes bias circuitry responsive to the input data for biasing bit lines in the block B(n), wherein the signals on the bit lines represent a product of the input data and thresholds of memory cells coupled to the selected word line.
4. The device of claim 1, wherein the page input circuit PI(n) includes bias circuitry responsive to the input data for biasing bit lines in the block B(n).
5. The device of claim 1, wherein the page input circuit PI(n) includes an input register connected to the data circuits, to store an input vector VI(n) including the page input data, and circuits to generate bias signals for bit lines in the block B(n) in response to the input vector VI(n).
6. The device of claim 5, wherein the block B(n) includes a number Z of bit lines, the input vector VI(n) includes Z chunks of M bits, and the page input circuit PI(n) includes digital-to-analog conversion circuits to convert chunks of M bits in the input vector VI(n) to Z analog bias voltages for corresponding bit lines in the block B(n).
7. The device of claim 6, wherein the memory cells in block B(n) on a selected word line store coefficient vector W(n), and the signals on the bit lines in the block B(n) represent a product of chunks in the input vector VI(n) and coefficients in the coefficient vector W(n) for the selected word line.
8. The device of claim 1, wherein the page input circuit PI(n) includes an input register connected to the data circuits, to store an input vector VI(n) including the page input data, and circuits to generate bias signals for bit lines in the block B(n) in response to the input vector VI(n), and page output circuit PO(n) in the plurality of page output circuits generates an output vector VO(n).
9. The device of claim 8, wherein the input vector VI(n) includes Z chunks of M bits, and the page output circuit PO(n) in the plurality of page output circuits includes sensing circuits to generate an output including one or more chunks of M bits indicating a sum-of-signals on the bit lines in block B(n) in response to a word line signal on a selected word line, and an output register to store all or part of output vector VO(n) including the Z chunks of M bits.
10. The device of claim 1, wherein the data circuits transfer all or part of output vector VO(n), including Z chunks of M bits as an input vector VI(n+1) to a page input circuit PI(n+1).
11. The device of claim 1, wherein the memory comprises nonvolatile memory cells.
12. The device of claim 11, wherein the nonvolatile memory cells are charge trapping memory cells.
13. The device of claim 1, including word line drivers coupled to the plurality of blocks in the memory to apply signals on word lines in the blocks, and wherein the page output circuit PO(n) in the plurality of page output circuits includes circuits that generate an output indicating a sum-of-signals on one or more bit lines in block B(n), responsive to a signal or signals on one or more selected word lines; and wherein
- page input circuit PI(n) includes bias circuitry responsive to the input data for biasing word lines in the block B(n), wherein the signals on the bit lines represent a product of the input data and thresholds of memory cells coupled to the one or more selected word lines.
14. An in-memory computation device, comprising:
- a memory including a plurality of blocks B(n), where n ranges from 0 to N−1;
- a plurality of page output circuits connected to the memory, a page output circuit PO(n) in the plurality of page output circuits generating a page output vector VO(n), and being operatively coupled to a block B(n) in the plurality of blocks;
- a plurality of page input circuits connected to the memory, a page input circuit PI(n) in the plurality of page input circuits receiving a page input vector VI(n), and generating input voltages in response to the page input vector VI(n), and being operatively coupled to the block B(n);
- a plurality of bit line bias circuits connected to the plurality of blocks, a bit line bias circuit Y(n) in the plurality of bit line bias circuits being operatively coupled to the block B(n) in the plurality of blocks, and to the page input circuit PI(n) in the plurality of page input circuits, and biasing the bit lines in the block B(n) in response to input voltages generated by the page input circuit PI(n);
- a data bus system; and
- data circuits configurable connect the page input circuit PI(n) to one or more of the page output circuit PO(n), a page output circuit PO(n−1), and the data bus system to select a source for the page input vector VI(n) in a sensing cycle.
15. The device of claim 14, including word line drivers coupled to a plurality of word lines in the memory to apply signals on selected word lines in the plurality of blocks, and wherein the page output circuit PO(n) in the plurality of page output circuits includes circuits that generate page output vector VO(n) indicating a sum-of-signals on the bit lines in block B(n), responsive to a signal on a selected word line.
16. The device of claim 15, wherein the signals on the bit lines in the block B(n) represent a product of the page input vector VI(n) and thresholds of memory cells coupled to the selected word line.
17. The device of claim 16, wherein the memory comprises nonvolatile memory cells.
18. The device of claim 17, wherein the bit line bias circuit Y(n) in the plurality of bit line bias circuits comprises a bit line clamp transistor having a gate terminal connected to receive corresponding input voltages generated by the page input circuit PI(n).
19. The device of claim 14, wherein the page input vector VI(n) includes a plurality of multi-bit chunks of data, each chunk in the plurality corresponding to one bit line in the block B(n).
20. An in-memory computation device, comprising:
- a memory including a plurality of blocks B(n), where n ranges from 0 to N−1;
- a plurality of page output circuits connected to the memory, a page output circuit PO(n) in the plurality of page output circuits generating a page output vector VO(n), and being operatively coupled to a block B(n) in the plurality of blocks;
- a plurality of page input circuits connected to the memory, a page input circuit PI(n) in the plurality of page input circuits receiving a page input vector VI(n), and generating input voltages in response to the page input vector VI(n), and being operatively coupled to the block B(n);
- a plurality of word line drivers connected to the plurality of blocks, a word line driver WD(n) in the plurality of word line drivers being operatively coupled to the block B(n) in the plurality of blocks, and to the page input circuit PI(n) in the plurality of page input circuits, and biasing the word lines in block B(n) in response to input voltages generated by the page input circuit PI(n);
- a data bus system; and
- data circuits configurable connect page input circuit PI(n) to one or more of page output circuit PO(n), page output circuit PO(n−1), and the data bus system to select a source for the page input vector VI(n) in a sensing cycle.
Type: Application
Filed: Mar 8, 2019
Publication Date: Sep 10, 2020
Applicant: MACRONIX INTERNATIONAL CO., LTD. (HSINCHU)
Inventors: Chun-Hsiung HUNG (HSINCHU), Shang-Chi YANG (Changhua)
Application Number: 16/297,504