INTRO_FFT(3S)INTRO_FFT(3S)NAME
INTRO_FFT - Introduction to signal processing routines
IMPLEMENTATION
See individual man pages for operating system and hardware availability
DESCRIPTION
The signal processing routines currently consist of Fast Fourier
Transform (FFT) routines, convolution routines, and correlation routines.
The following data types are used in these routines:
* Single precision: Fortran "real" data type, C/C++ "float" data type,
32-bit floating point; these routine names begin with S.
* Single precision complex: Fortran "complex" data type, C/C++
"scsl_complex" data type (defined in <scsl_fft.h>), C++ STL
"complex<float>" data type (defined in <complex.h>, two 32-bit
floating point reals; these routine names begin with C.
* Double precision: Fortran "double precision" data type, C/C++
"double" data type, 64-bit floating point; these routine names begin
with D.
* Double precision complex: Fortran "double complex" data type, C/C++
"scsl_zomplex" data type (defined in <scsl_fft.h>), C++ STL
"complex<double>" data type (defined in <complex.h>), two 64-bit
floating point doubles; these routine names begin with Z.
NOTE: when using the C++ Standard Template Library (STL) to define
complex types, the include files must be used in the following order:
#include <complex.h>
#include <scsl_fft.h>
Often little or no difference exists between these versions, other than
the data types of some inputs and outputs. In this case, the routines
are described on the same man page, and that man page is named after the
real or complex routine.
The man(1) command can find a man page online by either the real,
complex, double precision, or double complex name.
The data types for the scale, table, and work arguments in these routines
vary, depending on the function which is called. In the CC, SC, and CS
routines, the arguments are single precision. In the ZZ, DZ and ZD
routines, the arguments are double precision.
By default, the integer arguments are 4 bytes (32 bits) in size; this is
the size obtained when one links to the SCSL library with -lscs or
-lscs_mp. Another version of SCSL is available, however, in which
Page 1
INTRO_FFT(3S)INTRO_FFT(3S)
integers are 8 bytes (64 bits). This version allows the user access to
larger memory sizes and helps when porting legacy Cray codes. It can be
loaded by using either the -lscs_i8 or -lscs_i8_mp link option. Note
that any program may use only one of the two versions; 4-byte integer and
8-byte integer library calls cannot be mixed.
C/C++ function prototypes for the signal processing routines are provided
in <scsl_fft.h>, when using the default 4-byte integers, and
<scsl_fft_i8.h>, when using 8-byte integers. These header files define
the complex types scsl_complex and scsl_zomplex, which are used in the
prototypes. Alternatively, C++ programs may declare arguments using the
types complex<float> and complex<double> from the standard template
library. But if these types are used, <complex.h> must be included before
<scsl_fft.h> (or <scsl_fft_i8.h>). Note, though, that both complex types
are equivalent: they simply represent (real, imaginary) pairs of floating
point numbers stored contiguously in memory. With the proper casts, you
can simply pass arrays of floating point data to the routines where
complex arguments are expected.
Casts, however, can be avoided. The header files <scsl_fft.h> and
<scsl_fft_i8.h> directly support the use of user-defined complex types or
disabling prototype checking for complex arguments completely. By
defining the symbol SCSL_VOID_ARGS before including <scsl_fft.h> or
<scsl_fft_i8.h> all complex arguments will be prototyped as void *. To
define the symbol SCSL_VOID_ARGS at compile time use the -D compiler
option (i.e., -DSCSL_VOID_ARGS) or use an explicit #define SCSL_VOID_ARGS
in the source code. This allows the use of any complex data structure
without warnings from the compiler, provided the structure is as
described above; that is:
1. The real and imaginary components must be contiguous in memory.
2. Sequential array elements must also be contiguous in memory.
While this allows the use of non-standard complex types without
generating compiler warnings, it has the disadvantage that the compiler
will not catch type mismatches.
Strong type checking can be enabled employing user-defined complex types
instead of SCSL's standard complex types. To do this, define
SCSL_USER_COMPLEX_T=my_complex and SCSL_USER_ZOMPLEX_T=my_zomplex, where
my_complex and my_zomplex are the names of user-defined complex types.
These complex types must be defined before including the <scsl_fft.h> (or
<scsl_fft_i8.h>) header file.
Fortran 90 users on IRIX systems can perform compile-time checking of
SCSL FFT subroutine calls by adding USE SCSL_FFT (for 4-byte integer
arguments) or USE SCSL_FFT_I8 (for 8-byte integer arguments) to the
source code from which the FFT calls are made. Alternatively, the
compile-time checking can be invoked without any source code
modifications by using the -auto_use compiler option, e.g.,
Page 2
INTRO_FFT(3S)INTRO_FFT(3S)
f90 -auto_use SCSL_FFT test.f -lscs
f90 -auto_use SCSL_FFT_I8 -i8 test.f -lscs_i8
FFT routines
These routines apply to one or more FFTs.
Following is a table of supported FFT routines. Each of these routines
is highly optimized for single-processor use. The two-dimensional,
three-dimensional, and one-dimensional multiple routines are also
multitasked (multi-threaded) for all sizes for which there is a
performance benefit; the one-dimensional routines are multitasked if the
data size exceeds the size of the largest processor cache. Each routine
can compute either a forward or an inverse Fourier transform.
In this table, rows of the table represent input and output data types
for the routines in each column.:
* C->C implies 32-bit complex input and output.
* Z->Z implies 64-bit double complex input and 64-bit double complex
output. Each routine in this row is documented with the complex
routine in the above row.
* S->C implies 32-bit real input and 32-bit complex output.
* D->Z implies 64-bit double precision real input and 64-bit double
precision complex output. Each routine in this row is documented
with the real-to-complex routine in the above row.
* C->S implies 32-bit complex input and 32-bit real output.
* Z->D implies 64-bit double complex input and 64-bit double precision
output. Each routine named in this row is documented with the
complex - real routine in the above row.
Columns of the table represent the number of dimensions for which the FFT
is calculated for the routines in each row:
* One-dimensional (single) calculates one FFT in one dimension.
* One-dimensional (multiple) calculates an FFT in one dimension for
each column of a two-dimensional matrix.
* Two-dimensional calculates one FFT in two dimensions.
* Three-dimensional calculates one FFT in three dimensions.
---------------------------------------------------------------------------
1-dimensional 1-dimensional 2-dimensional 3-dimensional
(single) (multiple)
---------------------------------------------------------------------------
Page 3
INTRO_FFT(3S)INTRO_FFT(3S)
C->C CCFFT CCFFTM CCFFTMR CCFFT2D CCFFT3D
Z->Z ZZFFT ZZFFTM ZZFFTMR ZZFFT2D ZZFFT3D
S->C SCFFT SCFFTM SCFFT2D SCFFT3D
D->Z DZFFT DZFFTM DZFFT2D DZFFT3D
C->S CSFFT CSFFTM CSFFT2D CSFFT3D
Z->D ZDFFT ZDFFTM ZDFFT2D ZDFFT3D
---------------------------------------------------------------------------
NOTES
The FFT routines were designed so that they can be implemented
efficiently on many different architectures. The calling sequence is the
same in any implementation. Certain details, however, depend on the
particular implementation.
One area of difference is the size of the table and work arrays.
Different systems may need different sizes. The subroutine call requires
no change, but you may have to change array sizes in the DIMENSION or
type statements that declare the arrays. The following are the required
array sizes for the Origin series. The values of NR and NFR are explained
below:
* CCFFT
table: 2n + NF REAL WORDS
work: 2n REAL WORDS
* ZZFFT
table: 2n + NF DBL PREC WORDS
work: 2n DBL PREC WORDS
* CCFFTMR
table: 2n + NF REAL WORDS
work: 2n REAL WORDS
* ZZFFTMR
table: 2n + NF DBL PREC WORDS work: 2n DBL PREC WORDS
* CCFFT2D table: (2*n1+NF) + (2*n2+NF) REAL WORDS
work: 2*MAX(n1,n2) REAL WORDS
* ZZFFT2D
table: (2*n1+NF) + (2*n2+NF) DBL PREC WORDS work: 2*MAX(n1,n2) DBL
PREC WORDS
* CCFFT3D table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) REAL WORDS
work: 2*MAX(n1,n2,n3) REAL WORDS
Page 4
INTRO_FFT(3S)INTRO_FFT(3S)
* ZZFFT3D
table: (2*n1+NF) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
work: 2*MAX(n1,n2,n3) DBL PREC WORDS
* CCFFTM
table: (NF + 2 * n) REAL WORDS
work: 2n REAL WORDS
* ZZFFTM
table: (NF + 2 * n) DBL PREC WORDS
work: 2n DBL PREC WORDS
* SCFFT, CSFFT
table: (n+NFR) REAL WORDS
work: n+2 REAL WORDS
* DZFFT, ZDFFT
table: (n+NFR) DBL PREC WORDS
work: n + 2 DBL PREC WORDS
* SCFFT2D, CSFFT2D
table: (n+NFR) + (2*n2+NF) REAL WORDS
work: n1+4*n2 REAL WORDS
* DZFFT2D, ZDFFT2D
table: (n1+NFR) + (2*n2+NF) DBL PREC WORDS
work: n1 + 4 * n2 DBL PREC WORDS
* SCFFT3D, CSFFT3D
table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) REAL WORDS
work: n1 + 4 * n3 REAL WORDS
* DZFFT3D, ZDFFT3D
table: (n1+NFR) + (2*n2+NF) + (2*n3+NF) DBL PREC WORDS
work: n1 + 4 * n3 DBL PREC WORDS
* SCFFTM, CSFFTM
table: (n+NFR) REAL WORDS
work: n + 2 REAL WORDS
* DZFFTM, ZDFFTM
table: (n+NFR) DBL PREC WORDS
work: n + 2 DBL PREC WORDS
The second area of difference is the isys parameter array, an array that
gives certain implementation-specific information. All features and
functions of the FFT routines specific to any particular implementation
are confined to this isys array. On any implementation, you can use the
default values by using an argument value of 0.
Page 5
INTRO_FFT(3S)INTRO_FFT(3S)
In the Origin series implementation, isys(0)=0 and isys(0)=1 are
supported. In SCSL versions prior to 1.3, only isys(0)=0 was allowed.
For isys(0)=0, NF=30 and NFR=15, and for isys(0)=1, NF=NFR=256. The NF(R)
words of storage in the table array contain a factorization of the length
of the transform.
The smaller values of NF and NFR for isys(0)=0 are historical. They are
too small to store all the required factors for the highest performing
FFT, so when isys(0)=0, extra space is allocated when the table array is
initialized. To avoid memory leaks, this extra space must be deallocated
when the table array is no longer needed. The routines CCFFTF, CCFFTMF,
etc., are used to release this memory. Due to the potential for memory
leaks, the use of isys(0)=0 should be avoided.
For isys(0)=1, the values of NF and NFR are large enough so that no extra
memory needs to be allocated, and there is no need to call CCFFTF, etc.
to release memory. (If called, these routines will do nothing.)
NOTE: isys(0) = 1 means that isys is an integer array with two elements.
The second element, isys(1), will not be accessed.
Finally, in addition to the work array, the FFT routines also dynamically
allocate scratch space from the stack. The amount of space allocated can
be slightly bigger than the size of the largest processor cache. For
single processor runs, the default stack size is large enough that these
allocations generally cause no problems. But for parallel runs, you need
to ensure that the stack size of slave threads is big enough to hold this
scratch space. Failure to reserve sufficient stack space will cause
programs to dump core due to stack overflows. The stack size of MP
library slave threads is controlled via the MP_SLAVE_STACKSIZE
environment variable or the mp_set_slave_stacksize() library routine. See
the mp(3C), mp(3F) and pe_environ(5) reference pages for more information
on controlling the slave stack size. For pthreads applications, the
thread's stack size is specified as one of many creation attributes
provided in the pthread_attr_t argument to pthread_create(3P). The
stacksize attribute should be set explicitly to a non-default value using
the pthread_attr_setstacksize(3P) call, described in the
pthread_attr_init(3P) man page.
Real-to-complex FFTs
In the formulas on the man pages, there are n real input values, and
n/2+1 complex output values. This property is characteristic of real-
to-complex FFTs.
The mathematical definition of the Fourier transform takes a sequence of
n complex values and transforms it to another sequence of n complex
values. A complex-to-complex FFT routine, such as CCFFT or CCFFTM, will
take n complex input values, and produce n complex output values. In
fact, one easy way to compute a real-to-complex FFT is to store the input
data, x, in a complex array, then call routine CCFFT to compute the FFT.
You get the same answer when using the SCFFT/SCFFTM routine.
Page 6
INTRO_FFT(3S)INTRO_FFT(3S)
A separate real-to-complex FFT routine is more efficient. Because the
input data are real, you can make use of this fact to save almost half of
the computational work. The theory of Fourier transforms tells us that
for real input data, you have to compute only the first n/2 + 1 complex
output values, because the remaining values can be computed from the
first half of the values by the simple formula:
YK,L = conjg(Yn-k,L) for n/2 <= k <= n-1
where the notation conjg(Y) represents the complex conjugate of y.
In fact, in many applications, the second half of the complex output data
are never explicitly computed or stored. Likewise, as explained later,
only the first half of the complex data has to be supplied for the
complex-to-real FFT.
Another implication of FFT theory is that, for real input data, the first
output value, Y(0), will always be a real number; therefore, the
imaginary part will always be 0. If n is an even number, Y(n/2) will
also be real and thus, have a zero imaginary part.
Complex-to-real FFTs
Consider the complex-to-real case. The effect of the computation is
given by the formulas on the man pages, but with X complex and Y real.
In general, the FFT transforms a complex sequence into a complex
sequence; however, in a certain application you may know the output
sequence is real, perhaps because the complex input sequence was the
transform of a real sequence. In this case, you can save about half of
the computational work.
According to the theory of Fourier transforms, for the output sequence,
Y, to be a real sequence, the following identity on the input sequence,
X, must be true:
Xk,L = conjg(Xn-k,L)
for n/2 <= k <= n-1
And, in fact, the following input values
Xk,L for k > n/2
do not have to be supplied, because they can be inferred from the first
half of the input.
Thus, in the complex-to-real routine, CSFFTM, the arrays can be
dimensioned as follows:
Fortran:
COMPLEX X(0:ldx-1, 0:lot-1)
Page 7
INTRO_FFT(3S)INTRO_FFT(3S)
REAL Y(0:ldy-1, 0:lot-1)
C/C++:
scsl_complex x[lot][ldx];
float y[lot][ldy];
C++ STL:
complex<float> x[lot][ldx];
float y[lot][ldy];
where ldx >= n/2 + 1, ldy >= n.
In each column, there are (n/2) + 1 complex input values and n real
output values. Even though only (n/2) + 1 input values are
supplied, the size of the transform is still n in this case, because
implicitly the FFT formula for a sequence of length n is used.
Another implication of the theory is that X(0, L) must be a real
number (that is, must have zero imaginary part). If n is an even
number, X(n/2, L) must also be real. The CSFFTM and CSFFT routines
assume that each of these values is real; if a nonzero imaginary
part is given, it is ignored.
Initialization
The table array stores the trigonometric tables used in calculation of
the FFT. You must initialize table by calling the routine with isign = 0
prior to doing the transforms. If the value of the problem size, n, does
not change, table does not have to be reinitialized.
Because SCFFT and CSFFT use the same format for table, either can be used
to initialize it (note that CCFFT uses a different table format).
Dimensions
In the preceding descriptions and on the specific man pages, it is
assumed that array subscripts were zero-based, as is customary in FFT
applications. However, if you prefer to use the more customary Fortran
style with subscripts starting at 1 you do not have to change the calling
sequence.
-------------------------------------------------------------------------
Routine subscripts starting at 0 subscripts starting at 1
-------------------------------------------------------------------------
CCFFT COMPLEX X(0:N-1) COMPLEX X(N)
COMPLEX Y(0:N-1) COMPLEX Y(N)
CCFFT2D COMPLEX X(0:ldx-1, 0:n2-1) REAL X(ldx, n2)
COMPLEX Y(0:ldy-1, 0:n2-1) COMPLEX Y(ldy, n2)
SCFFT REAL X(0:n-1) REAL X(n)
Page 8
INTRO_FFT(3S)INTRO_FFT(3S)
COMPLEX Y(0:n/2) COMPLEX Y(n/2 + 1)
SCFFT2D REAL X(0:ldx-1, 0:n2-1) COMPLEX X(ldx, n2)
COMPLEX Y(0:ldy-1, 0:n2-1) COMPLEX Y(ldy, n2)
CCFFTM COMPLEX X(0:ldx-1, 0:lot-1) COMPLEX X(ldx, lot)
COMPLEX Y(0:ldy-1, 0:lot-1) COMPLEX Y(ldy, lot)
CCFFTMR COMPLEX X(0:ldx-1, 0:n-1) COMPLEX X(ldx, n)
COMPLEX Y(0:ldy-1, 0:n-1) COMPLEX Y(ldy, n)
-------------------------------------------------------------------------
Convolution and Correlation Routines
These routines feature convolution for Finite Impulse Response (FIR) as
well as correlations. Each of these routines is highly optimized for
single-processor use. The routines which use two-dimensional input
sequences are multitasked (multi-threaded).
The convolution and correlation routines are very general. To achieve
this generality and maximum flexibility, one-dimensional sequences are
defined by 3 parameters. Six parameters are necessary for two-dimensional
sequences. One drawback of this generality are the long calling
sequences.
The following table contains a summary of the filter and correlation
routines. Each routine has its own man page. In this table, rows of the
table represent data types for the routines in each column:
* C implies 32-bit complex data.
* Z implies 64-bit double complex data.
* S implies 32-bit real data.
* D implies 64-bit double precision real data.
Columns of the table represent the type of computation as well as the
number of dimensions for which the convolution or correlation is
calculated for the routines in each row:
* One-dimensional FIR applies a Finite Impulse Response filter to one-
dimensional signals.
* One-dimensional (multiple) FIR applies a Finite Impulse Response
filter to multiple one-dimensional signals.
* Two-dimensional FIR applies a Finite Impulse Response filter to two-
dimensional signals.
* One-dimensional COR calculates the correlation of one-dimensional
sequences.
Page 9
INTRO_FFT(3S)INTRO_FFT(3S)
* One-dimensional (multiple) COR calculates the correlation of multiple
one-dimensional sequences.
* Two-dimensional COR calculates the correlation of two-dimensional
sequences.
------------------------------------------------------------
Type 1D (single) 1D (multiple) 2D
------------------------------------------------------------
C CFIR1D CFIRM1D CFIR2D
Z ZFIR1D ZFIRM1D ZFIR2D
S SFIR1D SFIRM1D SFIR2D
D DFIR1D DFIRM1D DFIR2D
------------------------------------------------------------
C CCOR1D CCORM1D CCOR2D
Z ZCOR1D ZCORM1D ZCOR2D
S SCOR1D SCORM1D SCOR2D
D DCOR1D DCORM1D DCOR2D
------------------------------------------------------------
SEE ALSOINTRO_SCSL(3S)
Page 10