With the improvement of mechanical atomic force microscopes (AFM) instrumentation,\textsuperscript{1–6} high speed cantilever technology,\textsuperscript{7–11} and improved feedback techniques,\textsuperscript{12–15} much higher scan rates are achievable than with commercial AFM systems, while still assuring precise tracking of the sample and the nanopositioner.\textsuperscript{16–18} As it is in the case of fast scanning tunneling microscopy (STM),\textsuperscript{19} the acquisition of the data generated by a fast scanning AFM soon exceeds the limit of typical commercial AFM/STM controllers. Modern digital components are capable of much higher transfer rates but require a significant amount of custom hardware and software development.\textsuperscript{20,21} We chose to implement a solution as close as possible to a commercially available general purpose data acquisition system (DAQ). Table I lists the necessary bandwidths for the three translation axes (and therefore the DAQ bandwidth to record these) for three different imaging settings of high speed AFM measurements.

Figure 1 shows the dataflow in a typical AFM setup. The solid lines are the data transfer lines we focus on in this work.

In order to process the data, four steps have to be achieved: (1) convert the analog signals of the AFM into digital data (A/D conversion); (2) transfer the data into the PC main memory; (3) correlate the digital data stream to line and frame sync timing; and (4) display and/or save the data.

To process the required amounts of data with a PC running a standard operating system (in our case, a WINDOWS environment), the data acquisition strategy has to be able to work within the limits provided by that operating system. In particular, the uncertainty of when the operating system allows time for data processing puts restraints on the amount of CPU processing that can be done after the data is transferred to main memory, in our case over the PCI (peripheral component interface) bus. To handle this uncertainty we minimize the required amount of CPU data processing by establishing a “coherent” data acquisition. To achieve data coherence the scan signals are timed to be synchronous to the A/D converters (see Fig. 2). The timing for the A/D and D/A channels are both derived from the same central clock. The clock signal is synchronous to three divider counters (open arrows represent clock synchronization lines). Here the clock speed (in our case 40 MHz) is divided by an integer (power of 2) to provide the required sample pulses for the specific components. The A/D converter and the digital input are timed from the same divider. Data of both inputs is then combined to a 16 bit word that is transferred to main memory (the solid arrows represent data flow). Each word then includes one data point that is labeled with the line and frame sync information. Recording the line and frame sync information (generated by a “general purpose timer counter,” GPTC) is not strictly necessary since the data already arrives in an ordered form (see Fig. 3), however it is very useful for detecting overflows in the post transfer processing.

The line and pixel scan rates are calculated to ensure that each scan line has an equal and integral number of sample clock ticks. Meaning that the pixel sample clock needs to be synchronous with the line scan. Each pixel can then be uniquely assigned to a position in a specific line and frame [see Fig. 3]. The A/D channel sample rates are derived (by the programmable clock divider) from the same 40 MHz clock and therefore are synchronous to the D/A channels. The data can then be directly written to the PC main memory without any processor involvement.

As a computer platform for our DAQ System we use a PC (2.4 GHz Pentium 4, 512MB SDRAM) running under WINDOWS XP. The requirements on the DAQ card were the following:

<table>
<thead>
<tr>
<th>TABLE I. Bandwidth requirements for high speed AFM.</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pixel resolution</td>
</tr>
<tr>
<td>Frame rate</td>
</tr>
<tr>
<td>Scan requirements</td>
</tr>
<tr>
<td>Z direction</td>
</tr>
<tr>
<td>X direction</td>
</tr>
<tr>
<td>Y direction</td>
</tr>
</tbody>
</table>

Reference 23.
• PCI Master-Mode DMA (direct memory access) to allow the transfer of data over the PCI interface directly to the main PC memory without processor involvement.
• Two 2 MS/s input channels with sufficient resolution (at least 14 bits) for data collection. In addition, it was desirable to have the ability to parse digitized data together with data from digital inputs. Data in each 16 bit data set consists then of 14 bits from an A/D converter and 2 bits from the simultaneously sampled digital inputs.
• Two 16 bits D/A channels used for the generation of scan signals
• Two general purpose timer/counters (GPTC) for generating line and frame control information

One DAQ card that provided these resources was the Adlink DAQ 2002. The Adlink DAQ 2002 offers us several operation modes. We chose the continuous double buffered data acquisition mode because of the continuous nature of the data generated by the AFM. With “post trigger mode” selected, the A/D conversion is triggered by the first frame sync. In this mode, the DAQ card waits for a trigger and then collects data until a stop command is sent.

We use the internal 40 MHz clock with programmable dividers to time the converters and “general purpose timer counters” (GPTC). In particular, the 24 bit register used for the A/D and digital input, allows us to divide the clock so we achieve a sampling rate (pixel clock) ranging from 2 Samples/s to 2 MSamples/s.

The input range of the A/D converter can be set from the control software (±10 V, ±5 V, ±2.5 V) so that the resolution of the A/D conversion can be adjusted to the output

---

FIG. 1. Dataflow in a typical AFM setup. Solid lines show the data connections on which we focus.

FIG. 2. Synchronization of the D/A, A/D converters, and digital input with data flow directions. The open arrows represent clock synchronization lines, the solid lines represent data flow. The clock signal goes synchronous to three dividers to provide the required sample pulses for the components. The A/D converter and the digital input are timed from the same divider. The data of both inputs is then combined to a 16 bit word (that represents the 14 bit signal data labeled with line and frame sync) that is transferred to main memory.

FIG. 3. Synchronization of pixel clock and line scan. Upper trace: fast scan axis; lower trace: slow scan axis. Each line has an equal and integral number of samples per line. Each frame has an integral number of lines. The dotted line in the fast scan signal represents the signal after lowpass filtering to avoid digitation.

FIG. 4. Calculation of scan parameters to ensure synchronization. A fixed number of lines, a desired (target) buffer size, a desired (target) framerate, fixed number of pixels per line, and a fixed number of lines per frame are used as input parameters. The buffer size is then slightly adjusted to fit condition 1. The actual buffer size is used to calculate the actual framerate using condition 2. The outcome is checked for condition 3. Adjustments to the actual buffer size might be necessary (recursive process). The white blocks represent the conditions that need to be fulfilled. n,m, and k are integer values which are calculated by the program. n and k ensure that the buffer size is an even number.
range of the AFM with a maximum resolution of 0.15 mV per bit.

At the output of the D/A converter, a lowpass filter was used to smooth any digitization of the drive signals originating from the finite resolution of the D/A converter. If these steps (see Fig. 3) would not be smoothed, the step would be translated into an abrupt movement of the AFM scanner which would in turn excite the scanner at its resonances. The lowpass filter has to be dimensioned such that the digitization steps are filtered but the shape of the scan signal is preserved as much as possible. In order to maintain the strict correlation of scan generation and data recording, it is important that not too much phase shift is induced by the low pass filter. The dotted line in Fig. 3 gives the result of an appropriate filter. This is especially of concern if triangular scan signals are used. For sinusoidal scan signals, the lowpass filter can have a cutoff frequency just above the scan frequency.

For the post-transfer processing and DAQ card setup, we wrote a front end in LABVIEW 6.0 consisting of three applications: control panel, data recording, and data display (using an intensity graph for plotting the data).

The performance of the data acquisition system is determined by several factors. The full maximum data acquisition rate of the DAQ card could be continuously written directly to the main memory (2 MSamples/s). However, the possible combinations of frame rate and frame size are intrinsically restricted by the scheme of synchronizing the scan generation and the data acquisition to certain values (due to the necessary integer divisions). Figure 4 shows the steps to determine what combination of number of lines, buffer size, pixels per line and frame rate are possible. The variables \( n \), \( k \), and \( m \) are integer values.

Since almost the full 2 MSamples/s maximum data acquisition rate of the DAQ card could be transferred into the main memory, now the main speed restriction proved to be the user defined post-transfer-data processing. In particular, the real time display of the data was processor intensive and became a limiting factor in our implementation using an intensity graph of LABVIEW. Depending on the display size of the intensity graph, the resources needed were different. Using lower level display methods (e.g., OpenGL or DirectX) could lessen this limitation.

Figure 5 shows the 2 MSamples/s limit compared to the actual performance that was achieved with our implementation while displaying the data in real time. The deviation of the actual limit to the card limit originate mostly from the data post processing for the real time data display. In order to minimize this influence, we implemented different display modes: (a) line by line buildup for low scan-speeds and (b) frame at once display for high scan speeds. Choosing the right display option depending on the imaging speed, we were able to record data with up to 30 frames/s obtaining on average 62\% ± 17\% of the theoretical card limit (see Fig. 5).

When only using the data acquisition part of this implementation (by direct writing of the data to the disc and displaying it off line after the experiment) almost the full 2 MSamples/s limit of the DAQ card could be recorded. This DAQ system fulfills the requirements for the medium and intermediate goals for high speed AFM imaging described in Table I.

ACKNOWLEDGMENTS

The authors would like to thank Thomas Gutsmann, Jacqueline A Cutroni, and Marquesa M Finch for their assistance. This work was funded by the National Science Foundation under Award No. DMR-9988640 and MRL DMR00-80034, the National Institute of Health under Award No. GM65354, and NASA/URETI under Award No. NCC-1-02037 and DOC scholarship ÖAW and FWF J2395-N02 and Veeco/DI SB030071.

22 Software available for download at http://hansmalab.physics.ucsb.edu
23 Due to the desired triangular scan signals, the bandwidth requirements on the positioner are one order of magnitude higher.