Control Of Overpolishing Of Multiple Substrates On the Same Platen In Chemical Mechanical Polishing

A polishing method includes simultaneously polishing a first substrate and a second substrate on the same polishing pad, storing a default overpolishing time, determining first and second polishing endpoint times of the first and substrates with the in-situ monitoring system, determining a difference between the first and second polishing endpoint times, and determining whether the difference exceeds a threshold. If the difference is less than the threshold, then an overpolishing stop time is calculated and polishing of the first substrate and the second substrates is halted simultaneously at the overpolishing stop time. If the difference is greater than the threshold, then first and second overpolishing stop times that equal the first and second endpoint times plus the default overpolishing time are calculated, and polishing of the first and second substrates is halted at the first and second overpolishing stop times, respectively.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
TECHNICAL FIELD

The present disclosure relates generally to monitoring and control of multiple substrates during chemical mechanical polishing.

BACKGROUND

An integrated circuit is typically formed on a substrate by the sequential deposition of conductive, semiconductive, or insulative layers on a silicon wafer. One fabrication step involves depositing a filler layer over a non-planar surface and planarizing the filler layer. For certain applications, the filler layer is planarized until the top surface of a patterned layer is exposed. A conductive filler layer, for example, can be deposited on a patterned insulative layer to fill the trenches or holes in the insulative layer. After planarization, the portions of the conductive layer remaining between the raised pattern of the insulative layer form vias, plugs, and lines that provide conductive paths between thin film circuits on the substrate. For other applications, such as oxide polishing, the filler layer is planarized until a predetermined thickness is left over the non planar surface. In addition, planarization of the substrate surface is usually required for photolithography.

Chemical mechanical polishing (CMP) is one accepted method of planarization. This planarization method typically requires that the substrate be mounted on a carrier head. The exposed surface of the substrate is typically placed against a rotating polishing pad with a durable roughened surface. The carrier head provides a controllable load on the substrate to push it against the polishing pad. A polishing liquid, such as a slurry with abrasive particles, is typically supplied to the surface of the polishing pad.

One problem in CMP is using an appropriate polishing rate to achieve a desirable profile, e.g., a substrate layer that has been planarized to a desired flatness or thickness, or a desired amount of material has been removed. Variations in the initial thickness of a substrate layer, the slurry composition, the polishing pad condition, the relative speed between the polishing pad and a substrate, and the load on a substrate can cause variations in the material removal rate across a substrate, and from substrate to substrate. These variations cause variations in the time needed to reach the polishing endpoint and the amount removed. Therefore, determining the polishing endpoint merely as a function of the polishing time may lead to overpolishing or underpolishing, and it may not be possible to achieve a desired profile merely by applying a constant pressure.

In some systems, a substrate is optically monitored in-situ during polishing, e.g., through a window in the polishing pad. Some optical monitoring systems detect a “polishing endpoint”, after which they continue polishing for a preset overpolishing time. For example, in copper polishing, the optical monitoring system can detect exposure of the underlying layer, and overpolishing can be used to ensure complete removal of any copper residue. However, existing overpolishing and optical monitoring techniques may not satisfy increasing demands of semiconductor device manufacturers.

SUMMARY

In one aspect a polishing method includes simultaneously polishing a first substrate and a second substrate on the same polishing pad, storing a default overpolishing time, monitoring the first substrate and the second substrate during polishing with an in-situ monitoring system, determining a first polishing endpoint time of the first substrate with the in-situ monitoring system, determining a second polishing endpoint time of the second substrate with the in-situ monitoring system, determining a difference between the first polishing endpoint time and the second endpoint time, and determining whether the difference exceeds a threshold. If the difference is less than the threshold, then an overpolishing stop time is calculated and polishing of the first substrate and the second substrates is halted simultaneously at the overpolishing stop time. If the difference is greater than the threshold, then a first overpolishing stop time that equals the first endpoint time plus the default overpolishing time is calculated and a second overpolishing stop time that equals the second endpoint time plus the default overpolishing time is calculated, and polishing of the first substrate is halted at the first overpolishing stop time and polishing of the second substrate is halted at the second overpolishing stop time.

Implementations can include one or more of the following features. Calculating the overpolishing stop time may include calculating an average of the first polishing endpoint time and the second polishing endpoint time. Calculating the overpolishing stop time may include adding the default overpolishing time to the average. The default overpolishing time may be between five and twenty seconds. The default overpolishing time may be between ten and fifteen seconds. The threshold may be between two and six seconds.

Determining the first polishing endpoint time may include storing a first target value for the first substrate, generating a first sequence of values for the first substrate with the in-situ monitoring system, fitting a first function to the first sequence of values, and determining the first polishing endpoint time by calculating a projected time at which the first substrate will reach the target value based on the first function. Determining the second polishing endpoint time may include storing a second target value for the second substrate, generating a second sequence of values for the second substrate with the in-situ monitoring system, fitting a second function to the second sequence of values, and determining the second polishing endpoint time by calculating a projected time at which the second substrate will reach the target value based on the second function. The first function and the second function may be linear functions.

The in-situ monitoring system may include a spectrometric optical monitoring system. Generating the first sequence of values may include measuring a first sequence of spectra from the first substrate during polishing with the optical monitoring system, for each measured spectrum in the first sequence of spectra for the first substrate, determining a best matching reference spectrum from one or more libraries of reference spectra, and for each best matching reference spectrum for the first substrate, determining an index value to generate a sequence of first index values. Generating the second sequence of values may include measuring a second sequence of spectra from the second substrate during polishing with the optical monitoring system, for each measured spectrum in the second sequence of spectra for the second substrate, determining a best matching reference spectrum from the one or more libraries of reference spectra, and for each best matching reference spectrum for the second substrate, determining an index value to generate a sequence of second index values.

The in-situ monitoring system may include an eddy current monitoring system. The first sequence of values and the second sequence of values may be eddy current signal values. Determining the first polishing endpoint time may include detecting clearance of a first overlying layer from a first underlying layer on the first substrate. Detecting clearance of a first overlying layer may include detecting a sudden change in a signal from the in-situ monitoring system. The first substrate and the second substrate may be removed from the polishing pad simultaneously. The polishing pad may be rinsed after removing the first substrate and the second substrate. The default overpolishing time may include a first default overpolishing time for the first substrate and a second default overpolishing time for the second substrate.

In other aspects, polishing systems and computer-program products tangibly embodied on a computer readable medium are provided to carry out these methods.

Certain implementations may have one or more of the following advantages. A good balance can be struck between avoiding defects and having the substrates be uniformly polished. By having the substrates on the same platen endpoint at approximately the same time, defects can be avoided, such as scratches caused by rinsing a substrate with water too early or corrosion caused by failing to rinse a substrate in a timely manner. Equalizing polishing times across multiple substrates can also improve throughput. On the other hand, by permitting substrates to be polished for different amounts of time if the potential difference exceeds a threshold, significant variations in polishing can be avoided and wafer-to-wafer polishing uniformity can be increased.

The details of one or more embodiments are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic cross-sectional view of an example of a polishing apparatus having two polishing heads.

FIG. 2 illustrates a schematic top view of a substrate having multiple zones.

FIG. 3A illustrates a top view of a polishing pad and shows locations where in-situ measurements are taken on a first substrate.

FIG. 3B illustrates a top view of a polishing pad and shows locations where in-situ measurements are taken on a second substrate.

FIG. 4 illustrates a trace.

FIG. 5 illustrates a plurality of traces for different substrates.

FIG. 6 illustrates a calculation of a desired slope for an substrate based on a time that a function fit to a trace reaches a target value.

FIG. 7 illustrates a calculation of times that a plurality of substrates reach a target value.

FIG. 8 is a flow diagram of an example process for adjusting the polishing rate of a plurality of substrates such that the plurality of substrates have approximately the same thickness at the target time.

FIG. 9 is a flow diagram of an example process for calculating overpolishing times.

FIG. 10 illustrates time vs. endpoint monitor signal curves of two substrates polished simultaneously on a same platen.

Like reference numbers and designations in the various drawings indicate like elements.

DETAILED DESCRIPTION

Where multiple substrates are being polished simultaneously, e.g., on the same polishing pad, polishing rate variations between the substrates can lead to the substrates reaching their target thickness at different times. By determining a polishing rate for each substrate from in-situ measurements, a projected endpoint time for a target thickness or a projected thickness for a target endpoint time can be determined for each substrate, and the polishing rate for at least one substrate can be adjusted so that the substrates achieve closer endpoint conditions. By “closer endpoint conditions,” it is meant that the substrates would reach their target thickness closer to the same time than without such adjustment, or if the substrates halt polishing at the same time, that the substrates would have closer to the same thickness than without such adjustment.

Nevertheless, even if the polishing rate for the one of the substrates is adjusted based on in-situ measurements, variations in when the substrates reach their target thickness can still occur. On the one hand, if polishing is halted simultaneously for the substrates, then some will not be at the desired thickness. On the other hand, if polishing for the substrates is stopped at different times, then some substrates may have defects and the polishing apparatus may be operating at lower throughput.

A technique for controlling overpolishing is to determine whether the difference between the respective times that the substrates will reach their polishing endpoints exceeds a threshold. If the time difference is below the threshold, then the polishing of the substrates can be halted simultaneously. On the other hand, if the time difference is above the threshold, then the polishing of each substrate can be halted at a different time that depends on the time that a polishing endpoint condition is detected.

FIG. 1 illustrates an example of a polishing apparatus 100 for polishing one or more substrates 10. The substrate can be, for example, a product substrate (e.g., which includes multiple memory or processor dies), a test substrate, a bare substrate, and a gating substrate. The substrate can be at various stages of integrated circuit fabrication, e.g., the substrate can be a bare wafer, or it can include one or more deposited and/or patterned layers. The term substrate can include circular disks and rectangular sheets.

The polishing apparatus 100 includes a rotatable disk-shaped platen 120 on which a polishing pad 110 is situated. The platen is operable to rotate about an axis 125. For example, a motor 121 can turn a drive shaft 124 to rotate the platen 120. The polishing pad 110 can be detachably secured to the platen 120, for example, by a layer of adhesive. The polishing pad 110 can be a two-layer polishing pad with an outer polishing layer 112 and a softer backing layer 114. The polishing apparatus 100 can include a combined slurry/rinse arm 130. During polishing, the arm 130 is operable to dispense a polishing liquid 132, such as a slurry, onto the polishing pad 110. While only one slurry/rinse arm 130 is shown, additional nozzles, such as one or more dedicated slurry arms per carrier head, can be used. The polishing apparatus can also include a polishing pad conditioner to abrade the polishing pad 110 to maintain the polishing pad 110 in a consistent abrasive state.

In this embodiment, the polishing apparatus 100 includes two (or two or more) carrier heads 140. Each carrier head 140 is operable to hold a substrate 10 (e.g., a first substrate 10a at a first carrier head 140a and a second substrate 10b at a second carrier head 140b) against the polishing pad 110, i.e., the same polishing pad. Each carrier head 140 can have independent control of the polishing parameters, for example pressure, associated with each respective substrate.

In particular, each carrier head 140 can include a retaining ring 142 to retain the substrate 10 below a flexible membrane 144. Each carrier head 140 also includes a plurality of independently controllable pressurizable chambers defined by the membrane, e.g., 3 chambers 146a-146c, which can apply independently controllable pressurizes to associated zones 148a-148c on the flexible membrane 144 and thus on the substrate 10 (see FIG. 2). Referring to FIG. 2, the center zone 148a can be substantially circular, and the remaining zones 148b-148e can be concentric annular zones around the center zone 148a. Although only three chambers are illustrated in FIGS. 1 and 2 for ease of illustration, there could be two chambers, or four or more chambers, e.g., five chambers.

Returning to FIG. 1, each carrier head 140 is suspended from a support structure 150, e.g., a carousel or track, and is connected by a drive shaft 152 to a carrier head rotation motor 154 so that the carrier head can rotate about an axis 155. Optionally each carrier head 140 can oscillate laterally, e.g., on sliders on the carousel 150; or by rotational oscillation of the carousel itself. In operation, the platen is rotated about its central axis 125, and each carrier head is rotated about its central axis 155 and translated laterally across the top surface of the polishing pad.

While only two carrier heads 140 are shown, more carrier heads can be provided to hold additional substrates so that the surface area of polishing pad 110 may be used efficiently. Thus, the number of carrier head assemblies adapted to hold substrates for a simultaneous polishing process can be based, at least in part, on the surface area of the polishing pad 110.

The polishing apparatus also includes an in-situ monitoring system 160, which can be used to detect a pointing endpoint, or to determine whether to adjust a polishing rate or an adjustment for the polishing rate, as discussed below. For each substrate, the in-situ monitoring system generates a time-varying sequence of values that depends on the thickness of a layer on that substrate.

For example, the in-situ-monitoring system 160 can be an optical monitoring system. In particular, the in-situ-monitoring system 160 can be an optical monitoring system that measures a sequence of spectra of light reflected from a substrate during polishing. One monitoring technique is, for each measured spectrum, to identify a matching reference spectrum from a library of reference spectra. Each reference spectrum in the library can have an associated characterizing value, e.g., a thickness value or an index value indicating the time or number of platen rotations at which the reference spectrum is expected to occur. By determining the associated characterizing value for each matching reference spectrum, a time-varying sequence of characterizing values can be generated. This technique is described in U.S. Patent Publication No. 2010-0217430, which is incorporated by reference. Another monitoring technique is to track a characteristic of a spectral feature from the measured spectra, e.g., a wavelength or width of a peak or valley in the measured spectra. The wavelength or width values of the feature from the measured spectra provide the time-varying sequence of values. This technique is described in U.S. Patent Publication No. 2011-0256805, which is incorporated by reference. Another monitoring technique is to fit an optical model to each measured spectrum from the sequence of measured spectra. In particular, a parameter of the optical model is optimized to provide the best fit of the model to the measured spectrum. The parameter value generated for each measured spectrum generates a time-varying sequence of parameter values. This technique is described in U.S. Patent Application No. 61/608,284, filed Mar. 8, 2012, which is incorporated by reference. Another monitoring technique is to perform a Fourier transform of each measured spectrum to generate a sequence of transformed spectra. A position of one of the peaks from the transformed spectrum is measured. The position value generated for each measured spectrum generates a time-varying sequence position values. This technique is described in U.S. patent application Ser. No. 13/454,002, filed Apr. 23, 2012, which is incorporated by reference.

Other examples of the in-situ-monitoring system 160 include eddy current monitoring systems, capacitive measurement systems, and slurry chemistry monitoring systems. Eddy current monitoring systems are described in U.S. Pat. No. 6,924,641 and U.S. Pat. No. 7,112,960, each of which is incorporated by reference.

The in-situ monitoring system 160 includes a sensor 162 that is supported by and rotates with the platen 120. In this case, the motion of the platen will cause the sensor 162 to scan across each substrate.

As shown by in FIG. 3A, due to the rotation of the platen (shown by arrow 204), as sensor 162 travels below the first carrier head 140a, the sensor 162 can make measurements at positions 201 below the first substrate 10a. This permits the monitoring system 160 to generate a signal with a value that depends on the thickness of the layer of the substrate 10a. Similarly, as shown by in FIG. 3B, due to the rotation of the platen, as the sensor 162 travels below the second carrier head, the sensor 162 can make measurements at positions 202 below the second substrate 10b. This permits the monitoring system 160 to generate a signal with a value that depends on the thickness of the layer of the substrate 10b.

Thus, for any given rotation of the platen, based on timing, motor encoder and/or platen position sensor information, the controller 190 can determine which substrate, e.g., substrate 10a or 10b, is the source of the signal. Over multiple rotations of the platen, for each substrate, a sequence of values can be obtained over time.

Referring to FIG. 4, which illustrates the results for only a single substrate, a time-varying sequence 210 of values 212 is illustrated. This sequence of values can be termed a trace. In general, for a polishing system with a rotating platen, the trace 210 can include one, e.g., exactly one, value per sweep of the sensor below the substrate. If multiple zones on a substrate are being monitored, then there can be one value per zone per sweep. Multiple measurements below the substrate can be combined to generate a single value that is used for control of the endpoint and/or pressure. However, it is also possible for more than one value to be generated per sweep of the sensor 162.

Referring to FIG. 5, a plurality of traces 210, 220 is illustrated. As discussed above, a trace can be generated for each substrate. For example, a first sequence 210 of values 212 (shown by hollow circles) can be generated for the first substrate 10a, and a second sequence 220 of values 222 (shown by solid circles) can be generated for the second substrate 10b.

As shown in FIG. 5, for each trace 210, 220, a polynomial function of known order, e.g., a first-order function (e.g., a line), is fit to the sequence of values for the associated substrate, e.g., using robust line fitting. For example, a first line 214 can be fit to the values 212 for the first substrate, and a second line 224 can be fit to the values 222 of the second substrate. Fitting of a line to the sequence of values can include calculation of the slope S of the line and an x-axis intersection time T at which the line crosses a starting value, e.g., 0.

Referring to FIG. 6, at some during the polishing process, e.g., at a time T0, a polishing parameter for at least one substrate is adjusted to adjust the polishing rate of the substrate such that at a polishing endpoint time, the plurality of substrates are closer to their target thickness than without such adjustment. In some embodiments, the plurality of substrates can have approximately the same thickness at the endpoint time.

In some implementations, one substrate is selected as a reference substrate, and a projected endpoint time TE at which the reference substrate will reach a target value V is determined. For example, as shown in FIG. 6, the first substrate is selected as the reference substrate, although a different substrate could be selected. The target value V is set by the user prior to the polishing operation and stored.

In order to determine the projected time at which the reference substrate will reach the target value, the intersection of the line of the reference substrate, e.g., line 214, with the target value, V, can be calculated. Assuming that the polishing rate does not deviate from the expected polishing rate through the remainder polishing process, then the sequence of values should retain a substantially linear progression. Thus, the expected endpoint time TE can be calculated as a simple linear interpolation of the line to the target value V, e.g., V=S·(TE−T).

The substrates other than the reference substrate can be defined as adjustable substrates. The point where the lines for an adjustable substrates meets the expected endpoint time TE defines a projected endpoint for the adjustable substrate. The linear function of each adjustable substrate, e.g., line 224 in FIG. 6, can thus be used to extrapolate the estimated value, e.g., E2, that will be achieved at the expected endpoint time ET for the associated substrate. For example, the second line 224 can be used to extrapolate the expected value, E2, at the expected endpoint time ET for the second substrate.

As shown in FIG. 6, if no adjustments are made to the polishing rate of any of the substrates after time T0, then if endpoint is forced at the same time for all substrates, then each substrate can have a different thickness. Here, for example, the second substrate (shown by line 224) would endpoint at an expected value E2 less (and thus a thickness less) than the expected value of the first substrate. Alternatively, if endpoint is forced for each substrate individually based on when the function equals the target value, each substrate could have a different endpoint time, which is not desirable because it can lead to defects and loss of throughput.

If, as shown in FIG. 6, the target value will be reached at different times for different substrates, the polishing rate can be adjusted upwardly or downwardly, such that the substrates would reach the target value (and thus target thickness) closer to the same time than without such adjustment, e.g., at approximately the same time, or would have closer to the same value (and thus same thickness), at the target time than without such adjustment, e.g., approximately the same value (and thus approximately the same thickness).

Thus, in the example of FIG. 6, commencing at a time T0, at least one polishing parameter for the second substrate is modified so that the polishing rate of the second substrate is increased (and as a result the slope of the trace 220 is increased). As a result both substrates would reach the target value (and thus the target thickness) at approximately the same time (or if polishing of both substrates halt at the same time, both substrates will end with approximately the same thickness).

The reference substrate can be, for example, a predetermined substrate, or a substrate having the earliest or latest projected endpoint time of the substrates. The earliest time is equivalent to the substrate with the thinnest layer if polishing is halted at the same time. Likewise, the latest time is equivalent to the substrate with the thickest layer if polishing is halted at the same time.

For each of the adjustable substrates, a desired slope for the trace can be calculated such that the adjustable substrate reaches the target value at the same time as the reference substrate. For example, the desired slope SD can be calculated from (V−I)=SD*(TE−T0), where I is the value (calculated from the linear function fit to the sequence of values) at time T0 that the polishing parameter is to be changed, Vis the target value, and TE is the calculated expected endpoint time.

In some implementations, there is no reference substrate. For example, the expected endpoint time TE′ can be a predetermined time, e.g., set by the user prior to the polishing process, or can be calculated from an average or other combination of the expected endpoint times of two or more substrate (as calculated by projecting the lines for various substrates to the target value). In this implementation, the desired slopes are calculated substantially as discussed above (using the expected endpoint time TE′ rather than TE), although the desired slope for the first substrate must also be calculated, e.g., the desired slope SD can be calculated from (V−I)=SD*(TE′−T0).

In some implementations, (which can also be combined with the implementation shown in FIG. 6), there are different target values for different substrates. This permits the creation of a deliberate but controllable non-uniform thickness between substrates. The target values can be entered by user, e.g., using an input device on the controller.

For any of the above methods described above, the polishing rate is adjusted to bring the slope of a trace closer to the desired slope. The polishing rate can be adjusted by, for example, increasing or decreasing the pressure in a corresponding chamber of a carrier head. The change in polishing rate can be assumed to be directly proportional to the change in pressure, e.g., a simple Prestonian model. For example, for each substrate, where the substrate was polished with a pressure Pold prior to the time T0, a new pressure Pnew to apply after time T0 can be calculated as Pnew=Pold*(SD/S), where S is the slope of the line prior to time T0 and SD is the desired slope.

The process of determining projected times that the substrates will reach the target thickness, and adjusting the polishing rates, can be performed just once during the polishing process, e.g., at a specified time, e.g., 40 to 60% through the expected polishing time, or performed multiple times during the polishing process, e.g., every thirty to sixty seconds. At a subsequent time during the polishing process, the rates can again be adjusted, if appropriate. During the polishing process, changes in the polishing rates can be made only a few times, such as four, three, two or only one time. The adjustment can be made near the beginning, at the middle, or toward the end of the polishing process.

Polishing continues after the polishing rates have been adjusted, e.g., after time T0, and the optical monitoring system continues to collect spectra and determine values for each substrate.

Referring to FIG. 7, even after the adjustment of the polishing rate of one or more of the substrates, the substrates may still not reach the target value at the same time. For each trace, a polynomial function of known order, e.g., a first-order function (e.g., a line) is fit to the sequence of values after the time T0 for the associated substrate, e.g., using robust line fitting. For example, a first line 214′ can be fit to the values 212 after the time T0 for the first substrate, and a second line 224′ can be fit to the values 222 after the time T0 for the second substrate.

The time T1 that the first line 214′ equals the target value V can be calculated, and similarly the time T2 that the second line 224′ equals the target value V can be calculated. A time difference ΔT is calculated as |T1−T2|.

In some implementations, e.g., for metal polishing, e.g., copper polishing, after detection of the endpoint for a substrate, the substrate is immediately subjected to an overpolishing process, e.g., to remove metal residue, e.g., copper residue. Although theoretically the polishing process can be stopped as soon as an underlying layer, e.g., a dielectric material, is exposed, in practice stopping the polishing immediately may result in metal residue (e.g., in the form of spots or islands) over the underlying layer. Overpolishing the metal (e.g., copper, in this example) ensures removal of such residues and reduces undesired short circuits. The overpolishing process can be at a uniform pressure for all zones of the substrate, e.g., 1 to 1.5 psi. The overpolishing process typically has a duration of 10 to 15 seconds.

During bulk polishing of a metal such as copper, pressure can be used as a control variable to polish dual (or multiple) substrates on a same platen to substantially the same thickness in a target time. In case of overpolishing, however, pressure is typically not used as a control variable since pressure variations may result in poor topography. In such cases, the overpolishing time may be suitably adjusted to achieve substantially equal polishing time for multiple substrates. This avoids defects caused due to unequal polishing times while achieving good topography by maintaining a substantially same pressure during the overpolishing.

The controller 190 can store a threshold time difference TTD. The threshold time difference TTD can be set by the user or the manufacturer of the equipment. The threshold time difference TTD can be, e.g., 2 to 6 seconds. The controller 190 can also store a default overpolishing time TOP. The default overpolishing time TOP can be set by the user or the manufacturer of the equipment. The default overpolishing time TOP can be, e.g., 5 to 20 seconds.

If the time difference ΔT is less than the threshold time difference TTD, then the controller 190 can halt polishing of the substrates 10a, 10b simultaneously. In this case, the overpolishing time for at least one of the substrates is calculated, but all of the substrates halt polishing at the same time. In some implementations, an overpolishing time can be calculated for each substrate.

For example, the overpolishing time for a substrate (say substrate i) can be calculated as:


TOPi=TOP+Ti−TAVG

wherein Ti denotes the endpoint time for the substrate, e.g., T1 for the first substrate 10a and T2 for the second substrate 10b, and TAVG denotes the average endpoint time across all substrates being polished on the same platen, e.g., (T1+T2)/2.

For example, if two substrates have endpoint times T1=57.4 seconds and T2=58 seconds, then TAVG=57.7 seconds. Therefore, assuming the default overpolishing time TOP is 15 seconds, then using the above equation, the respective overpolishing time TOP1 for the first substrate is calculated as 15.3 seconds. Polishing is halted for both substrates at the time T1+TOP1. Thus, the entire polishing process for both substrates ends at 72.7 seconds. The pressure is kept substantially the same throughout the overpolishing in order to ensure good topography on both substrates. Alternatively, overpolishing times could be calculated for all of the substrates. Alternatively, an overpolishing stop time can be calculated by adding the default overpolishing time to the average endpoint time.

On the other hand, if the time difference ΔT is greater than the threshold time difference TTD, then the controller 190 can use the default overpolishing time TOP for each substrate, so that polishing halts for the different substrates at different times.

For example, polishing of the first substrate 10a can be halted at a first time T1+TOP and polishing of the second substrate 10b can be halted at a second time T2+TOP.

This approach sets a balance between avoiding defects and having the substrates be uniformly polished. On the one hand, by halting polishing of substrates simultaneously on the same platen, the substrates can be lifted from the polishing pad simultaneously, and defects can be avoided, such as scratches caused by rinsing a substrate with water too early or corrosion caused by failing to rinse a substrate in a timely manner. On the other hand, by permitting substrates to be polished for different amounts of time if the potential difference exceeds a threshold, significant variations in polishing can be avoided and wafer-to-wafer polishing uniformity can be increased.

As seen from the above equation, the adjusted overpolishing times for various substrates are functions of the predetermined parameter TOP. The parameter TOP can be chosen in various ways. For example, the parameter TOP can be selected based on the material being polished. In some cases, the parameter TOP may be adjusted based on observed results. For example, if it is observed that polishing for an adjusted overpolishing time fails to remove all residues, then the parameter TOP may be increased to achieve better removal of the residues. Even though the example in FIGS. 6-7 shows only two substrates being polished together, the number of substrates on a platen can be increased without deviating from the scope of this application.

After overpolishing has been completed for all substrates, rinsing of the polishing pad commences. In addition, all of the carrier heads can lift the substrates off the polishing pad simultaneously.

Referring to FIG. 8, a summary flow chart 600 is illustrated. A plurality of substrates are polished simultaneously with the same polishing pad in a polishing apparatus (step 602), as described above. During this polishing operation, each substrate has its polishing rate controllable independently of the other substrates by an independently variable polishing parameter, e.g., the pressure applied by a chamber in the carrier head. During the polishing operation, the substrates are monitored as described above to generate a sequence of values for each substrate (step 604). For each substrate, a function, e.g., a linear function, is fit to the sequence of values for that substrate (step 606). Expected endpoint times that the functions will reach a target value are determined, e.g., by linear interpolation of the linear function (step 608). If needed, one or more polishing parameters for the substrates are adjusted to adjust the polishing rate of that substrate such that the plurality of substrates have closer endpoint conditions (step 610), e.g., such that the plurality of substrates reach the target thickness at approximately the same time or such that the plurality of substrates have approximately the same thickness (or a target thickness) at the target time. Polishing, monitoring and generating the sequences of values continues after the parameters are adjusted, and for each substrate a function, e.g., a linear function, is fit to the values generated after the change of the polishing parameters (step 612). The endpoint conditions are detected for each substrate based on the sequence of values for that substrate (step 614), e.g., at the time where the function for the substrate equals a target value. After detection of the endpoint condition, an overpolishing operation is performed (step 616).

Referring to FIG. 9, a flowchart 700 shows exemplary operations for determining overpolishing times when two or more substrates are polished on a same platen. Operations include storing (step 702) a default overpolishing time. The default overpolishing time can be stored in any computer readable storage medium that can communicate with a computer controlling the polishing apparatus. In general, the default overpolishing time is a predetermined time selected based on, for example, the material that is being polished.

A first substrate and a second substrate are monitored to determine polishing endpoint times (step 704) (this can be performed by step 614 above). The monitoring can be done in various ways, including, for example using a spectrometric optical monitoring system, a laser based monitoring system or an eddy current monitoring system. Even though the flowchart 700 describes only a first substrate and a second substrate, additional substrates can be polished on the same platen.

The time difference between the endpoint times is calculated (step 706), and an overpolishing time is calculated for each substrate on the platen (step 708). Calculating the overpolishing time includes comparing the time difference to a threshold and using a different overpolishing calculation if the difference is above the threshold than below the threshold (step 710). If the time difference is less than the threshold, then an overpolishing time can be calculated for a substrate, and the controller can halt polishing of all substrates based on when the overpolishing time elapses such that the overall polishing process (including the polishing and overpolishing) ends at the same time for all substrates (step 712). Thus, the overpolishing stop time is the same for all of the substrates. On the other hand, if the time difference is greater than the threshold, then the overpolishing times can be equal to the default overpolishing time for each substrate, so that the substrates halt polishing at different times (step 714).

Although the examples above discuss calculating the time difference from projections of functions that are fit to a sequence of values, the time difference can be calculated as the difference between times of detection of clearance of an underlying layer. In general, for some monitoring systems, when an underlying layer is exposed, there is a sudden change in the signal from the sensor. This sudden change can be detected and the time of the sudden change can be used as the endpoint time.

The monitoring systems can be of various types, e.g., a spectrographic monitoring system, a laser monitoring system or an eddy current monitoring system. For example, in the case of a laser monitoring system used to monitor polishing of metal, e.g., copper, the intensity of the reflected light beam, and thus signal from the in-situ monitoring system, drops as the underlying dielectric layer is exposed. For example, in the case of an eddy current monitoring system used to monitor polishing of metal, e.g., copper, the signal strength from the in-situ monitoring system can be generally proportional to the metal layer thickness.

FIG. 10 shows traces 402, 404 that are sequences of values for two different substrates generated by the sensor of a monitoring system. For example, the monitoring system can be a reflectivity monitoring system. The two substrates are polished on a same platen to remove a material, e.g., metal, such as copper, of an overlying layer from an underlying layer. Assuming that the overlying layer is copper and the underlying layer is a dielectric, as the copper is cleared the signal strength drops suddenly. Where the dielectric layer is completely exposed, e.g., there is virtually no copper residue remaining, the signal strength flattens out. The times 410, 415 at which each the sequence of values evens out after the sudden decrease can be used as the endpoint times T1, T2.

The controller 190 can include a central processing unit (CPU) 192, a memory 194, and support circuits 196, e.g., input/output circuitry, power supplies, clock circuits, cache, and the like. In addition to receiving signals from the optical monitoring system 160 (and any other endpoint detection system 180), the controller 190 can be connected to the polishing apparatus 100 to control the polishing parameters, e.g., the various rotational rates of the platen(s) and carrier head(s) and pressure(s) applied by the carrier head. The memory is connected to the CPU 192. The memory, or computable readable medium, can be one ore more readily available memory such as random access memory (RAM), read only memory (ROM), floppy disk, hard disk, or other form of digital storage. In addition, although illustrated as a single computer, the controller 190 could be a distributed system, e.g., including multiple independently operating processors and memories.

Embodiments of the invention and all of the functional operations described in this specification can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structural means disclosed in this specification and structural equivalents thereof, or in combinations of them. Embodiments of the invention can be implemented as one or more computer program products, i.e., one or more computer programs tangibly embodied in a machine-readable storage media, for execution by, or to control the operation of, data processing apparatus, e.g., a programmable processor, a computer, or multiple processors or computers. A computer program (also known as a program, software, software application, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file. A program can be stored in a portion of a file that holds other programs or data, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.

The processes and logic flows described in this specification can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

The above described polishing apparatus and methods can be applied in a variety of polishing systems. Either the polishing pad, or the carrier heads, or both can move to provide relative motion between the polishing surface and the substrate. For example, the platen may orbit rather than rotate. The polishing pad can be a circular (or some other shape) pad secured to the platen. Some aspects of the endpoint detection system may be applicable to linear polishing systems, e.g., where the polishing pad is a continuous or a reel-to-reel belt that moves linearly. The polishing layer can be a standard (for example, polyurethane with or without fillers) polishing material, a soft material, or a fixed-abrasive material. Terms of relative positioning are used; it should be understood that the polishing surface and substrate can be held in a vertical orientation or some other orientation.

Particular embodiments of the invention have been described. Other embodiments are within the scope of the following claims.

Claims

1. A polishing method, comprising:

simultaneously polishing a first substrate and a second substrate on the same polishing pad;
storing a default overpolishing time;
monitoring the first substrate and the second substrate during polishing with an in-situ monitoring system;
determining a first polishing endpoint time of the first substrate with the in-situ monitoring system;
determining a second polishing endpoint time of the second substrate with the in-situ monitoring system;
determining a difference between the first polishing endpoint time and the second endpoint time;
determining whether the difference exceeds a threshold; and
determining which of if the difference is less than the threshold then calculating an overpolishing stop time and halting polishing of the first substrate and the second substrates simultaneously at the overpolishing stop time, and if the difference is greater than the threshold then calculating a first overpolishing stop time that equals the first endpoint time plus the default overpolishing time and calculating a second overpolishing stop time that equals the second endpoint time plus the default overpolishing time, and halting polishing of the first substrate at the first overpolishing stop time and halting polishing of the second substrate at the second overpolishing stop time.

2. The method of claim 1, wherein calculating the overpolishing stop time comprises calculating an average of the first polishing endpoint time and the second polishing endpoint time.

3. The method of claim 2, wherein calculating the overpolishing stop time comprises adding the default overpolishing time to the average.

4. The method of claim 1, wherein the default overpolishing time is between five and twenty seconds.

5. The method of claim 4, wherein the default overpolishing time is between ten and fifteen seconds.

6. The method of claim 1, wherein the threshold is between two and six seconds.

7. The method of claim 1,

wherein determining the first polishing endpoint time comprises
storing a first target value for the first substrate,
generating a first sequence of values for the first substrate with the in-situ monitoring system,
fitting a first function to the first sequence of values, and
determining the first polishing endpoint time by calculating a projected time at which the first substrate will reach the target value based on the first function,
and wherein determining the second polishing endpoint time comprises
storing a second target value for the second substrate,
generating a second sequence of values for the second substrate with the in-situ monitoring system,
fitting a second function to the second sequence of values; and
determining the second polishing endpoint time by calculating a projected time at which the second substrate will reach the target value based on the second function.

8. The method of claim 7, wherein the first function and the second function are linear functions.

9. The method of claim 7, wherein the in-situ monitoring system comprises a spectrometric optical monitoring system.

10. The method of claim 9,

wherein generating the first sequence of values comprises
measuring a first sequence of spectra from the first substrate during polishing with the optical monitoring system,
for each measured spectrum in the first sequence of spectra for the first substrate, determining a best matching reference spectrum from one or more libraries of reference spectra,
for each best matching reference spectrum for the first substrate, determining an index value to generate a sequence of first index values,
and wherein generating the second sequence of values comprises
measuring a second sequence of spectra from the second substrate during polishing with the optical monitoring system,
for each measured spectrum in the second sequence of spectra for the second substrate, determining a best matching reference spectrum from the one or more libraries of reference spectra, and
for each best matching reference spectrum for the second substrate, determining an index value to generate a sequence of second index values.

11. The method of claim 7, wherein the in-situ monitoring system comprises an eddy current monitoring system.

12. The method of claim 11, wherein the first sequence of values and the second sequence of values comprise eddy current signal values.

13. The method of claim 1, wherein determining the first polishing endpoint time comprises detecting clearance of a first overlying layer from a first underlying layer on the first substrate.

14. The method of claim 13, wherein detecting clearance of a first overlying layer comprises detecting a sudden change in a signal from the in-situ monitoring system.

15. The method of claim 1, further comprising removing the first substrate and the second substrate from the polishing pad simultaneously.

16. The method of claim 15, further comprising rinsing the polishing pad after removing the first substrate and the second substrate.

17. The method of claim 1, wherein the default overpolishing time comprises a first default overpolishing time for the first substrate and a second default overpolishing time for the second substrate.

18. A polishing method, comprising:

simultaneously polishing a first substrate and a second substrate on the same polishing pad;
storing a default overpolishing time;
monitoring the first substrate and the second substrate during polishing with an in-situ monitoring system;
determining a first polishing endpoint time of the first substrate with the in-situ monitoring system;
determining a second polishing endpoint time of the second substrate with the in-situ monitoring system;
determining a difference between the first polishing endpoint time and the second endpoint time;
determining that the difference exceeds a threshold; and
calculating a first overpolishing stop time that equals the first endpoint time plus the default overpolishing time and calculating a second overpolishing stop time that equals the second endpoint time plus the default overpolishing time, and halting polishing of the first substrate at the first overpolishing stop time and halting polishing of the second substrate at the second overpolishing stop time.

19. A polishing method, comprising:

simultaneously polishing a first substrate and a second substrate on the same polishing pad;
storing a default overpolishing time;
monitoring the first substrate and the second substrate during polishing with an in-situ monitoring system;
determining a first polishing endpoint time of the first substrate with the in-situ monitoring system;
determining a second polishing endpoint time of the second substrate with the in-situ monitoring system;
determining a difference between the first polishing endpoint time and the second endpoint time;
determining that the difference is less than a threshold; and
calculating an overpolishing stop time and halting polishing of the first substrate and the second substrate simultaneously at the overpolishing stop time.
Patent History
Publication number: 20140024293
Type: Application
Filed: Jul 19, 2012
Publication Date: Jan 23, 2014
Inventors: Jimin Zhang (San Jose, CA), Zhihong Wang (Santa Clara, CA), David H. Mai (Palo Alto, CA), Ingemar Carlsson (Milpitas, CA), Stephen Jew (San Jose, CA), Boguslaw A. Swedek (Cupertino, CA)
Application Number: 13/553,209
Classifications
Current U.S. Class: Computer Controlled (451/5)
International Classification: B24B 49/00 (20120101); B24B 49/12 (20060101); B24B 49/10 (20060101);