FORTRAN PROGRAMMING GUIDE: Tips, Tricks and Techniques This is an edited version of a Fortran guide available at: http://www3.huji.ac.il/~agay/fortran.html ftp://vms.huji.ac.il/fortran/f77_guide.zip ftp://shum.cc.huji.ac.il/pub/doc/F77.guide and written by Abraham Agay (agay@vms.huji.ac.il) at Hebrew Univ. in Israel. The editing (in some cases extensive, partly to incorporate more Fortran-90 awareness) was done by W. Wiscombe, NASA Goddard (wiscombe@climate.gsfc.nasa.gov). ===================================================================== GENERAL PROGRAMMING RULES Programming rules summarize the programming experience of many programmers and theoretical considerations raised by computer scientists. Some programming rules apply to all programming language, they stem from general principles: A) Modern computers are very powerful, memory is relatively cheap and large, and modern compilers automatically optimize code better than the average programmer. So saving memory or CPU time is no longer of prime importance, what is important is creating programs that are easy to VALIDATE (check that the results are correct) and MAINTAIN (change to suit different future needs). So if you feel the itch to optimize by writing obscure code: DON'T DO IT! View programs as Knuth has suggested, as literate texts, and try to make them more literate, not more obscure. B) The units of program code, procedures and functions, should be made as GENERAL (suitable for many similar applications) and FLEXIBLE (able to handle different types of input and computing requirements) as possible. That principle promotes CODE REUSABILITY (using parts of old programs in new ones), but it also helps improve coding since programs that don't rely on special assumptions tend to be more reliable. C) Programs should have a well organized internal structure. The MODULARITY PROGRAMMING PARADIGM requires partitioning the program to relatively small units, each performing a well defined task. The interaction between code units can be described by a SUBPROGRAMS CALLING TREE, the calling tree also describes the internal structure of the program. There are several tools which generate a diagram of this tree, which should be pasted into the code. A subprogram call has a performance cost, so sometimes you don't partition the program into many subprograms but just arrange it in well defined blocks; optimizing compilers do such things automatically. D) Programs should be fully documented. Documentation consists of: Self-explaining names for variables, procedures etc. Short comments explaining: Variables roles Computation steps Program header containing: General description Bugs and problems Copyright and redistribution information Procedure prologues (describing input & output at a minimum) A text file explaining how to compile, install and use the program. E) Programs should be portable. To achieve this goal you should: Strictly adhere to the FORTRAN STANDARD Avoid using operating system routines Avoid using any processor dependent features Adhering to the standard ------------------------ The problem is that most compilers offer nice LANGUAGE EXTENSIONS, that sometimes are quite tempting to use, and may even be included in future revisions of the standard. However if you use language extensions your program may not compile on other machines without some rewriting, and so may be less useful. F) A program source is really just a text file, as a text file it should be as READABLE and EASY TO EDIT as possible (see below). Readable programs are easier to maintain and validate. G) A program should not depend on assumptions about the correctness of user's input, guesses about expected data etc. That suspicious turn of mind is called DEFENSIVE PROGRAMMING. Make your program check input and key computation results; if wrong try to recover - go back to the last trusted point, or just give a detailed message and abort the program. +-----------------------------------------------------+ | | | SUMMARY OF PROGRAMMING RULES | | ---------------------------- | | a) Clarity is more important than efficiency | | b) Generalize your code | | c) Write modular programs | | d) Document your program | | e) Write standard FORTRAN | | f) Improve your layout | | g) Program defensively | | | +-----------------------------------------------------+ The f77 programming principles guide by David Levine (levine@ics.uci.edu) is excellent, it can be found at: http://www.public.iastate.edu/~aeem/Fortran/guide.html ===================================================================== PROGRAM LAYOUT - THE ART OF MAKING PROGRAMS READABLE Program layout is the art of arranging program code in a READABLE and EASY TO EDIT way. Some practical methods to improve layout are: 1) Separating logical blocks 2) Identing control structures Let's "improve" the layout of this example: PROGRAM IDEXMP INTEGER I99 I99 = 99 WRITE(*,*) ' TAKE BUS ', I99 END First we add an incomplete prologue and some blank lines: PROGRAM IDEXMP C +----------------------------------------------------------------- C | Program: Bus number advice C | Author: Abraham Agay C | Date: 28.11.1995 C +----------------------------------------------------------------- INTEGER I99 I99 = 99 WRITE(*,*) ' TAKE BUS ', I99 END For a small procedure too many horizontal lines of dashes, etc., just adds "visual clutter"; however in a large program with long variable lists and many code sections it may really help to divide the code into manageable blocks in the reader's mind. Another important technique is indentation; control constructs should be properly indented to reveal the internal structure: DO 100 I = 1, 100 DO 200 J = 1, 100 WRITE(*,*) I, J 200 CONTINUE 100 CONTINUE Automatic tools can indent and provide other cleanup services, and it is safer to entrust the job to them. +-----------------------------------------------------+ | IMPROVING PROGRAM LAYOUT IS IMPORTANT! | +-----------------------------------------------------+ ===================================================================== DEBUGGING AND PROGRAM VALIDATION +----------------------------------------------------------+ | THE BEST WAY TO DEBUG A PROGRAM IS TO MAKE NO MISTAKES | +----------------------------------------------------------+ If you write programs in the "right" way, slowly and with a sufficient design phase before writing actual code, you will have no logic errors, and a few syntax errors that the compiler and a high-level checker (flint, NAG_tools, ...) will flag for you. Study listing files ------------------- It used to be automatic to get a listing file from a compiler, and it was the first thing one looked at. Now, it is optional, you have to ASK the compiler to generate a LIST FILE. This is always a good idea except for production codes that have been well tested. Normally, there are a number of listing flags that provide extra information, like cross-reference tables, and these can be invaluable. Static code analyzers --------------------- There are static tools that can help you flush "deeper" bugs. The Fortran FAQ is a good place to look for them. For example, you can get FTNCHEK by anonymous ftp from netlib. It has the reputation of being more primitive than commercial products like flint and NAG_Tools, but it has become increasingly sophisticated due to many years of devotion by its author. Preparing for testing --------------------- Check again all code before testing, NOT immediately after you wrote it, but when your mind is clear. You will find then many of those little slips of mind. Do UNIT testing first; test every possible procedure independently, before testing the complete program. Over 50% of errors are found in the unit testing phase. Going directly to complete-program testing is often just a very wasteful way of doing unit testing. Testing the program ------------------- The best way is to have a "verbose mode" in your program, that writes to a FILE all key results. You get into this mode when you supply a special value (say 1) to an input variable called for example "DEBUG". The DEBUG variable is checked and if it equals 1 you write to FILE the last result. Some programmers use different "debug levels", e.g. (DEBUG .EQ. 1) writes part of the information, (DEBUG .EQ. 2) writes more info, etc. Some programmers develop a "subsystem classification". Each value of the DEBUG variable corresponds to another part of the program, when you pass the value assigned to some subsystem, all info relevant to this subsystem will be written to the "debug file". Verbose mode statements can be easily removed in the final version of the program, if you put them inside 'preprocessor if statements', like this cpp example: #ifdef DEBUG IF (DEBUG .EQ. 1) WRITE(*,*) ' X= ', X #endif +------------------------------------------------------------------ TRY INPUTS AT AND BEYOND THE BOUNDARIES OF THE INPUT VARIABLE SPACE (THIS IS CALLED "FAILURE TESTING"). USE A RANDOM NUMBER GENERATOR TO CALCULATE INPUTS; OTHERWISE, YOU MAY SLANT THE TESTS TOWARD INPUT VARIABLES YOU KNOW WORK, AND THUS FAIL TO DISCOVER IMPORTANT BUGS. +------------------------------------------------------------------ Some sources of error --------------------- Data-types ---------- 1) Implicit variable declarations 2) Using a non-intrinsic function without a type declaration 3) Incompatible argument lists in CALL and routine definition 4) Incompatible declarations of a common block 5) Using constants and parameters of incorrect type Arithmetic ---------- 1) Assuming that dividing two integers will give a floating-point 2) Assuming integer exponents (e.g. 2**(-3)) are computed as real 3) Floating-point comparisons tests 4) Loops with a REAL control variable 5) MOD function with REAL arguments 6) Using constants and parameters of incorrect type FORTRAN related (more or less) ------------------------------ 1) Code lines longer than the allowed maximum 2) Common blocks losing their values while the program runs 3) Aliasing of dummy arguments and common block variables 4) Passing constants to a subprogram and modifying them General ------- 1) Assuming variables are initialized to zero 2) Assuming variables keep their value between procedure calls 3) Letting array indexes go out of bounds 4) Depending on assumptions about the computing order of subexpressions 5) Using trigonometric functions with large arguments on some machines 6) Inconsistent use of physical units See a later section for an explanation of some of these. Some of these bugs can be found by the compiler or a static code checker like FTNCHEK, others need a human. Many of these errors are impossible language constructs, but the old F-77 compilers were pretty lax and didn't detect them. They can be found by a Fortran 90 compiler in many cases; these compilers perform a much more rigorous analysis of code. (Remember that F-90 compilers will accept F-77 code.) ===================================================================== OBJECT-ORIENTED PROGRAMMING (OOP) PARADIGM A programming paradigm is a conceptual model of some programming problem, and an 'ideal' way to solve it. For example, in the section on general programming principles, we mentioned the modular (structured) programming paradigm of the 1970s. Some other 'paradigms' evolved over the years, including Data Flow Analysis and, most recently, OOP. Like many other good things OOP was invented at Xerox a long time ago. The idea of OOP is that sometimes it is more natural to partition a program into LOGICAL ENTITIES communicating by passing messages instead of FUNCTIONAL UNITS as in the modular paradigm. A good example is a modern visualization program, in which you choose the required DATA PROCESSING MODULES from a menu, connect them together with some mouse clicks, and then tell the input module the name of your data file, and the whole thing start producing pictures. The only suitable conceptual model for this kind of magic is to view the program as a set of logical entities: the input reader module reads your 3-D data file, it passes the data to a module that takes a 2-D section, then a graphic module translates the floating-points to a color picture where the colors correspond to magnitude. Object-oriented programming is done with programming languages like C++ that support 'objects'. Objects are data structures with associated functions; maintenance of the data structure values is performed ONLY with the associated functions. The interested reader is referred to the abundant literature on C++. Much of the OOP paradigm is available in Fortran-90, without the sometimes confusing terminology that accompanies it. ===================================================================== RECURSION Recursion is calling a procedure from itself, directly or via calls to other procedures. FORTRAN 77 does not allow recursion, Fortran 90 does, but requires it to be explicitly stated. Recursion is not computationally efficient, as there is an overhead associated with each procedure call; however in the name of good programming we may agree to bear that overhead. Recursion is important because the natural solution to some problems is a recursive algorithm, for example: Identifying pixels belonging to the same 'blob' Searching a tree-structured data base Programming a recursive algorithm is difficult; every small mistake is amplified manyfold and renders the procedure useless. You can simulate recursion in FORTRAN 77, but with the maturity of Fortran-90 compilers, and their low cost, this makes little sense. ===================================================================== FORTRAN TRAPS Passing constants to a subprogram --------------------------------- With a few exceptions, FORTRAN compilers pass arguments to procedures by reference; changing the value of a dummy argument in a procedure then changes the corresponding entity in the invoking CALL statement. If you pass a constant, and modify it, either you will get an access violation error (segmentation fault), or something gets modified. If your compiler pools and reuses constants, you may find that other instances of the constant got modified as well! We have seen a case where 1.0 got changed to 2.0! Floating-point comparisons tests -------------------------------- Using the relational operators .EQ. or .NE. with floating-point numbers is dangerous, because due to radix conversion and roundoff errors, the value of numbers may be different from the mathematical value. Loops with a REAL control variable ---------------------------------- Another version of the floating-point comparisons problem. With a REAL loop control variable, you may get different number of loop iterations than what you expected. (This misfeature is finally being removed in Fortran 95.) DO I = E1, E2, E3 will be executed: ( N times if N > 0 ( 0 times if N <= 0 where N = TRUNC((E2 - E1 + E3) / E3) The TRUNC intrinsic function is very sensitive to small errors in the quotient; if the quotient changes from 1.0001 to 0.9999 the result will be quite different (zero iterations instead of one). MOD function with REAL arguments -------------------------------- Another version of the floating-point comparison problem. A small example program will make it clear: PROGRAM RNDOFF REAL R1, R2, RESULT R1 = 1.0 R2 = 0.2 RESULT = MOD(R1,R2) WRITE(*,'(1X,F3.1)') RESULT END The MOD function really computes: MOD = R1 - (R2 * INT(R1 / R2)) Because of rounding up, R1/R2 is slightly smaller than 5.0, and the INT function gives the value 4 instead of the correct value 5. The program then proceeds to write the WRONG answer 0.2. Array indexes out of bounds --------------------------- By default, array indexes are not checked before being used. The memory protection mechanism discovers out of bound references only when they get out of allocated memory. The best way is to compile the program (until you are sure it works OK) with: $ FORTRAN/CHECK=BOUNDS Source-file (VMS) f77 -C Source-file (SUN) f77 -C Source-file (IRIX) f77 -C Source-file (OSF/1) f77 -C Source-file (ULTRIX) Common blocks losing their values while the program runs -------------------------------------------------------- The FORTRAN standard doesn't guarantee that common blocks will keep their data values. When none of the procedures that declared the common block are activated, the common block may lose the data. Static code checkers like FTNCHEK can warn you if your program is in such danger. Multiple use of same dummy argument or common block variable ------------------------------------------------------------ FORTRAN prohibits using the same variable (storage location) more than once on the same CALL statement or in the same common block. Eager-to-optimize compilers may merely ASSUME that condition, and not check for it, because it makes possible more code optimizations. If you use such dummy aliases, the results of your computations are unpredictable. For example, the following example program will give different results when compiled on VMS with the switches: /ASSUME=DUMMY_ALIASES and /ASSUME=NODUMMY_ALIASES. PROGRAM ALIAS REAL A(5) DO I=1,5 A(I) = 2 ENDDO WRITE(*,*) A CALL SUB(A, A(1)) <<<---- WRITE(*,*) A END SUBROUTINE SUB(X, Y) INTEGER I REAL X(5), Y DO I=1,5 X(I) = Y * X(I) ENDDO RETURN END Depending on assumptions about the computing order of subexpressions -------------------------------------------------------------------- The FORTRAN standard says nothing about the order in which the compiler will compute subexpressions when it computes an expression, except that the compiler will obey parentheses. (The arithmetic ordering rules are of course followed: powers before multiply/divide before add/subtract, etc.) The standard says that your program shouldn't depend on the computing order; a rather artifical example will make it clear: IF ((1 / I) .GT. 0.01 .AND. I .NE. 0) THEN The (I .NE. 0) condition will not protect you from dividing by zero in the ((1 / I) .GT. 0.01) condition, if it's evaluated later! Using one compiler the program may work OK and on another it may abort with a 'divide by zero' error message. Using trigonometric functions with large arguments on some machines ------------------------------------------------------------------- The main problem with computing trigonometric functions with large arguments is the reduction of the argument to the primary range (say between -Pi and Pi). We have to find very accurately the remainder when dividing the argument by the transcendental number Pi. Good algorithms were found around 1982, but some compilers didn't implement them, so they return garbage results, or 0.0 with an error message. Beware especially of PC and old UNIX compilers. Some standards (AT&T system V) says that the result of computing the trigonometric functions in such cases should be 0.0 with an error message. ===================================================================== BAD STATEMENTS Some older standard FORTRAN statements are detrimental to good programming practice by their nature. A short list of 'bad' statements: DIMENSION ENTRY COMMON EQUIVALENCE Some people make a big case against NAMELIST, but numerical modelers find this statement invaluable in implementing a "name-value pair" style of input (the preferred form in the information sciences) and also in quickly implementing debug prints. Those who despise NAMELIST often end up inventing an inferior substitute. ===================================================================== CONSTANTS Using constants modularly ------------------------- Arrange program constants using the fact that constants may depend on previous constants, but don't use intrinsic functions in the declarations (this was relaxed in F-90 to allow intrinsics that return integer or character values). INTEGER RECORD_SIZE, NUM_RECORDS, FILE_SIZE PARAMETER( RECORD_SIZE = 80, * NUM_RECORDS = 1000, * FILE_SIZE = RECORD_SIZE * NUM_RECORDS) In F-90, global constants can be put in a MODULE of their own called CONSTANTS, then just the needed constants imported (e.g. PI) using the statement: USE CONSTANTS, ONLY :: PI Precision --------- Some compilers will automatically convert a single precision constant to a double precision one, if it is assigned to a double variable, some are said to count the digits and decide accordingly, however you should not rely on such features. +---------------------------------------------------+ | DEFINE ALL CONSTANTS IN PARAMETER STATEMENTS | | USE CONSTANTS OF THE SAME TYPE AS THE VARIABLES | +---------------------------------------------------+ ===================================================================== COMMON BLOCKS Common blocks can be used to pass variables to procedures and functions indirectly and so make the argument lists very short, and eliminate non-used redeclarations. HOWEVER THE INDIRECT LINKS CREATED BETWEEN DIFFERENT PARTS OF THE PROGRAM ARE CONFUSING AND HARD TO TRACE, AND ARE A CONSTANT SOURCE OF NASTY BUGS. Common variables may LOSE THEIR VALUES after a subprogram return; you can use the SAVE statement on each redeclaration, or add a redeclaration in the main program to avoid such situations. This strange phenomenon may occur when the currently executing subprograms don't use the common block. A static code checker like FTNCHEK (used with the /volatile switch) can point common blocks that may have this problem. If you must use common blocks, the following remarks may help: 1) Character variables shouldn't be in the same common block with non-character variables. If you pass both kinds, create one common block for the character variables, and one for all other. This is a restriction in the FORTRAN standard, so you have to abide by it. In some implementations (DEC, IRIX) the problem was solved. 2) Declare array dimensions explicitly in the type declaration, or in the common block declaration, don't use the DIMENSION statement. 3) The best policy is to make all redeclarations of a common block exact copies, use the same variable names, in the same order, and with the same data types and sizes. On all redeclarations of a common block, make also the type declaration lists of the common block members exact copies. INCLUDE helps here. 4) Remember to order the variables in a common block in a size decreasing order, first REAL*8, then REAL*4, INTEGER*4, INTEGER*2, BYTE. In such order the 'smaller' variables will not get the 'larger' ones out of alignment. 5) A blank common block may behave differently from named common blocks: a) Blank common variables never become undefined as a result of procedures RETURN b) A blank common block may be redeclared with different size, but don't use that feature. c) A blank common block may not be initialized by a DATA statement +-----------------------------------------+ | AVOID USING COMMON BLOCKS IF POSSIBLE | +-----------------------------------------+ ===================================================================== IMPLICIT DECLARATIONS Don't be tempted to use FORTRAN automatic variable declaration capabilities, this is not a feature but a trap, declaring every variable without disabling implicit declarations is not enough. Following this simple rule can save you a lot of unnecessary nasty bugs. For example: A mistyped variable name will be considered by the compiler as a new variable (it'll possibly assign some default initial value, e.g. 0) and computation will continue without any warning! A missing comma between two variable names in a variable declaration list (even if they are on different continuation lines) will be considered by the compiler as a declaration of one variable! Some examples: DO N = 1 10 (F-77: DON = 110) IF(I.EQ.4) THEN J=1 (F-77: THENJ=1) J = I (F-77: J = Is) C This is Fortunately, blanks are significant in F-90 and many of these dumb errors will be caught by F-90 compilers. Practical solutions ------------------- The compiler can help you check that all variables are properly declared. Analysis tools like 'flint' and NAG_Tools will also discover these kinds of errors. You can either put an 'IMPLICIT NONE' (standard in F-90) at the beginning of every procedure and function, or compile with a compiler option: $ FORTRAN/WARNINGS=DECLARATIONS Source-file (VMS) f77 -u Source-file (SUN) f77 -u Source-file (IRIX) f77 -u Source-file (OSF/1) f77 -u Source-file (ULTRIX) +---------------------------------------------------+ | DECLARE EVERY VARIABLE AND USE MEANINGFUL NAMES | | DON'T USE IMPLICIT DECLARATIONS | +---------------------------------------------------+ ===================================================================== ARRAYS Dimensioning an array at its topmost level of definition -------------------------------------------------------- This is the 'right' way to explicitly declare an array: INTEGER MAXDATA PARAMETER( MAXDATA = 1000) REAL A(MAXDATA) or, in F-90, INTEGER, PARAMETER :: MAXDATA = 1000 REAL :: A(MAXDATA) Array 'internals' ----------------- An array is an allocated area in memory, together with the information needed to interpret it correctly. The information needed for that consists of: 1) The data type of the elements 2) Number of dimensions 3) The start-index and end-index of each dimension 4) Specification of the element order in memory The internal structure is imposed by the compiler: it consistently interprets the storage area as a sequence of specific elements. The interpretation is done by code the compiler inserts in the program, the code implements a formula that translates array indexes to memory addresses. For example, a formula for a one-dimensional array may be similar to: Memory-address = Base-address + Element-size * (Array-index - Start-index) The default start address in FORTRAN is of course = 1. FORTRAN's 'column major' storage order is discussed below. FORTRAN 77 array sizes are determined at the time of compilation, the 'outermost' declaration of the array allocates a chunk of memory, and the size of that memory chunk can't be changed during runtime. Passing arrays to subprograms ----------------------------- Arrays declared at an 'outer' procedure (the calling procedure or some procedure that called it) can be passed to another procedure (the called procedure). Most FORTRAN compilers pass arrays by passing the address of the array; all subprograms that use the array will then work on the same (and only) copy. The FORTRAN standard allows array passing by another method called copying in/out or copy/restore, but this method is not popular. Copying the array on every procedure call, will make large arrays get duplicated in memory again and again; copying in/out only the array addresses is less inefficient but seems pointless. Since F-90 allows array sections to be passed as arguments, however, this strategy is being used again for this purpose by some compilers (array sections are not contiguous in memory). When the array is declared in the 'outermost' procedure, the compiler allocates memory for it. When you pass the array with a 'call' statement, the compiler actually passes the base-address of the same array to the called procedure. When the called procedure operates on the array it works on the same array - uses the same memory storage area, the array is not 'copied' to another memory area. In short, the memory used in an array is allocated in the 'topmost' declaration and reused when you pass the array to other procedures. Redeclaration problem --------------------- When you pass an array to a subprogram you really (in most compilers) pass the memory address of the beginning of the array. At the beginning of the called subprogram the compiler should have also the dimensional information. The dimensional information (with the memory element order) is sufficient for the compiler to generate the code for the called subprogram without knowing anything else about the calling subprogram. Redeclaring explicitly the array size in the called subprogram is not a good programming practice. We need a way to pass the dimensional information to the subprogram, thus making it more flexible. In FORTRAN-77 there were just two methods available: adjustable and assumed size syntax. Here are examples: C ------------------------------------------------------------------ C | Adjustable size array method c ------------------------------------------------------------------ SUBROUTINE A(N,Y) INTEGER N REAL Y(N) <<<--- WRITE (*,*) ' Adjustable array: ', Y(1) RETURN END C ------------------------------------------------------------------ C | Assumed size method C ------------------------------------------------------------------ SUBROUTINE B(Y) REAL Y(*) <<<--- WRITE (*,*) ' Assumed size array: ', Y(1) RETURN END Both methods are basically obsolete with the advent of F-90 and in fact assumed size arrays have been marked "obsolescent". "Automatic" and "assumed-shape" arrays are the new preferred methods, and they basically make the discussion that was formerly here irrelevant. SUBROUTINE Sub1(A) REAL, DIMENSION(:,:) :: A ! assumed-shape array SUBROUTINE Sub2(X) REAL, DIMENSION(:) :: X REAL, DIMENSION(2*SIZE(X)) :: workspace ! automatic array No such sophsiticated array-passing mechanisms exist in standard C (some compilers had it implemented, e.g. gcc's parameterized arrays); you have to resort to strange looking tricks, or use dynamic arrays (allocated at runtime). Assumed size arrays and character strings ----------------------------------------- Assumed-size arrays are a convenient way for passing character strings, but remember that you can't write a complete assumed size array using just the array name. PROGRAM TEST CHARACTER STRING*80 ....... CALL SUB1(STRING) ......... SUBROUTINE SUB1(STRING) CHARACTER STRING*(*) Array storage order ------------------- Array elements are stored in memory one after the other. If you have a one-dimensional array, everything is simple. With multi-dimensional arrays it is more complicated: there are two practical possibilities: * the COLUMN MAJOR ORDER used in FORTRAN, and * ROW MAJOR ORDER used in C. Let's examine a small matrix: A11 A12 A13 A21 A22 A23 A31 A32 A33 FORTRAN's Column major order stores the elements in memory like this: A11 A21 A31 A12 A22 A32 A13 A23 A33 ---> Higher addresses (the first array index varies most rapidly). C row major order stores the elements like this: A11 A12 A13 A21 A22 A23 A31 A32 A33 ---> Higher addresses (the last array index varies most rapidly). (If the matrix is symmetric we will get the same representation in memory either way.) When you WRITE an array in FORTRAN using just the array name, you will get the elements written in FORTRAN's order - the transpose of the mathematical order. For the 3x3 array above Fortran will WRITE (with a format that allows 3 numbers per line): A11 A21 A31 A12 A22 A32 A13 A23 A33 Try this small program: PROGRAM ARRELM INTEGER I, J, ARRAY(9,9) DO I=1,9 DO J=1,9 ARRAY(I,J) = 10*I + J ENDDO ENDDO WRITE(*,'(1X,9I4)') ARRAY END If you always reference array elements using array indexes, and pass whole arrays to subprograms, the memory order doesn't matter, except when optimizing program memory use. More 'sophisticated' (and worse) programs may depend on tricks based on the internal memory order. ===================================================================== USING I/O FORMATS - SOME TRICKS It is preferable to use a 'unified' approach and do all I/O with WRITE(...) and READ(...); avoid PRINT and READ *,... An interesting fact about formats is that you can change format strings 'dynamically' while a program is running (RUNTIME FORMATS). A small program will illustrate this point: PROGRAM RUNFMT REAL X PARAMETER ( X = 0.123456789 ) CHARACTER STRING*80 WRITE(*,*) ' Enter a FORMAT string suitable for one float' WRITE(*,*) ' Don''t forget the ''()'' !' READ(*,*) STRING WRITE(*,FMT=STRING) X END By the way, formatting routines shouldn't truncate numbers but round, (see the chapter on radix conversion and rounding). A vendor may do correct (or nearly correct) IEEE rounding for all values, or use some other scheme; for example DEC rounds {0,1,2,3,4} down and {5,6,7,8,9} up. Most vendors don't do correct IEEE rounding. Embedded format specification ----------------------------- It is better, from a code maintenance and changing point of view, to have the format string inside the I/O statement, e.g.: WRITE(10, '(I5,E15.5)' ) INTVAR, REALVAR Using the 'A' format -------------------- READ(10, '(A)' ) STRING Using just 'A' without a 'size' is standard. It provides a flexible way to read and write strings, without having to specify explicitly their sizes. The length of the associated string is taken as the format 'width parameter'. Internal files and ASCII/binary conversion ------------------------------------------ FORTRAN provides a mechanism similar to formatted I/O statements for files, that allows you to convert numeric data from internal binary representation to 'formatted' (ASCII) representation and vice versa. The 'internal file' conversion mechanism takes character strings, or arrays of character strings, and formally treats them as if they were real files. Internal arrays can be thought of as files that resides in main memory, instead of on disk. They can have more than one 'record' if you use an array of strings, they may even allow some file oriented statements like REWIND. Of course when your program exits, the contents of all internal files are lost. The syntax of internal file statements is like that of formatted READ and WRITE, the only difference is that instead of the LOGICAL UNIT you put the string (or string array) name. The following examples are for INTEGER variables, but of course you can use other types of variables (with proper formats). Converting a string to integer: INTEGER*4 INTVAR CHARACTER STRING*80 ...................................... READ(STRING, FMT='(I5)') INTVAR Converting an integer to string: INTEGER*4 INTVAR CHARACTER STRING*80 ...................................... WRITE(STRING, FMT='(I5)') INTVAR +---------------------------------------------+ | INTERNAL FILES ARE A STANDARD FORTRAN | | METHOD FOR ASCII/BINARY CONVERSIONS | +---------------------------------------------+ ===================================================================== USING SUBSTRINGS - SOME SIMPLE TRICKS When you declare a FORTRAN's string you define the maximal length it can have, if you don't use all of it, e.g. you assign a smaller string to it, it is automatically padded by blanks (spaces). The blank padding at the end of the string is counted when you use the LEN() function to find the string's length, or when you WRITE the string. F-90 provides new intrinsics TRIM and LEN_TRIM to deal with blank padding. The following code shows some elementary 'tricks': INTEGER OFFSET1, OFFSET2 CHARACTER STRING1*20, STRING2*20 ...................................... STRING1 = 'bla bla bla (FORTRAN) bla bla ... ' OFFSET1 = INDEX('(', STRING1) + 1 OFFSET2 = INDEX(')', STRING1) - 1 STRING2 = STRING1(OFFSET1:OFFSET2) INDEX() is an intrinsic standard FORTRAN function; it takes two arguments, both strings, looks for the first string inside the second and returns the place the first string begins inside the second. For example: ST1 = 'good' ST2 = 'Fortran is good' 123456789012345 INDEX(ST1,ST2) is equal 12 The // is the concatenation operator, it takes two strings and 'adds' them one after the other, to form one larger string. You may use the // operator with assumed-size strings operands only in assignment statements. Other FORTRAN statements (e.g. WRITE) may accept string concatenations, but it's against the standard. For example: CHARACTER ST1*7, ST2*3, ST3*5 ST1 = 'Fortran' ST2 = ' is' ST3 = ' good' ST1 // ST2 // ST3 equals 'Fortran is good' +-----------------------------------------------------------+ | LEARN TO USE EMBEDDED AND RUNTIME FORMATS, AND STRING | | MANIPULATION INTRINSICS LIKE INDEX() AND // | +-----------------------------------------------------------+ ===================================================================== OVERFLOWING AND UNDERFLOWING I/O STATEMENTS I/O statements (read, write...) transfer data from a file to some variable list, or from a variable list to a file. Generally each I/O statement transfers one record, the exceptions (in standard FORTRAN) are formatted and list-directed statements which can transfer additional records. Reading ------- If you read only a part of a record, the rest is ignored - the next READ statement will read from the beginning of the next record. This can be overridden in F-90 but is still the default. If you try to read more data than a record contains then an error condition occurs. You will definitely want to trap this error using the following syntax: READ(..., IOSTAT=ios, ...) ... IF(ios < 0) CALL ERRMSG('End of file/record on READ'...) IF(ios > 0) CALL ERRMSG('Error on READ'...) Writing ------- If you try to write to a record more data than it can contain (see next section) an error condition occurs. If you write to a FIXED LENGTH RECORD less data than is required to fill it then: Formatted The record is filled with spaces Unformatted The record is filled with zeros WRITE's should use IOSTAT just like READs. Maximum and default record lengths ---------------------------------- Other important properties of files are the maximum record length allowed and the default maximum record length. We have seen computers with maximum formatted record lengths of only 130 characters, harking back to the days of line printers. Other computers allow much longer records. Experiment. +-------------------------------------------------+ | ALWAYS USE IOSTAT= IN READ, WRITE STATEMENTS | +-------------------------------------------------+ ===================================================================== LOGICAL UNITS AND FILES FORTRAN I/O is performed to or from a LOGICAL UNIT identified by the logical unit number (LUN), the LUN is connected to a device either explicitly with an OPEN statement or implicitly. Explicit and Implicit OPEN -------------------------- Using an 'OPEN' statement is not mandatory but is recommended, otherwise you may rely on system defaults that may be non-portable, or just strange. Using an 'OPEN' statement is a good programming practice and enables you to tune the file characteristics to your needs. For example the following 'open' keywords are useful: UNIT LUN FILE File name STATUS 'OLD' - To use an existing file, 'NEW' - To create a new file ACCESS 'SEQUENTIAL' FORM 'FORMATTED' - For an ASCII file 'UNFORMATTED' - For a binary file ERR Statement label to jump to if 'open' fails Non-standard keywords can optimize I/O performance, or enable you to use DIRECT and INDEXED (non-standard!) files. If you use the 'FILE' keyword in the 'open' statement, you create a 'connection' between the LUN and the file name. If you don't use this keyword a default file name is assumed: FORTnnn.DAT (VMS) fort.nn (IRIX) You can make the program process a file with another name by: $ define FOR010 newinp.inp (VMS) set FOR010 = newinp.inp (IRIX) Many programmers use that feature, it gives more flexibility in file I/O. Preassigned LUN's ----------------- Some LUN's are permanently assigned (preconnected) to the standard input and output devices (the keyboard and screen respectively in an interactive session): Unit VMS name UNIX name ---- ---------- --------- 5 SYS$INPUT stdin 6 SYS$OUTPUT stdout 0 SYS$ERROR stderr But one shouldn't use these numbers (5,6,...) explicitly. A better way: I/O Statement VMS Name UNIX Name Interactive Batch job ------------- ---------- --------- ----------- ---------- read(*,...) SYS$INPUT stdin keyboard batch file write(*,...) SYS$OUTPUT stdout screen log file If you don't specify a logical unit number in an I/O statement, a default unit number is used. The defaults used by DEC (and IRIX?) fortran are: I/O STATEMENT UNIT ------------- ---- PRINT -1 TYPE -2 ACCEPT -3 READ -4 Negative unit numbers that are unavailable to users, prevent conflicts between I/O statements that use the default logical unit numbers and those that use explicit logical unit numbers. You will see these negative logical unit numbers only in error messages. +---------------------------------------------+ | ALWAYS USE READ(*,...) AND WRITE(*,...) | +---------------------------------------------+ ===================================================================== FORTRAN carriage control ------------------------ You usually meet the strange FORTRAN carriage control when you "type" a file written by a FORTRAN program on the screen or print it, and in some mysterious way the first character in every line is missing, and/or some lines are completely missing or mixed up. The cause is an I/O convention (ASA) that got into the FORTRAN standard and forced some operating systems (e.g. VMS, IRIX) to keep supporting it to maintain compatability. In this I/O scheme the first character in every line is not considered a part of the line, but is used to control printing and screen display: Control character Effect ----------------- ------ Space Normal behaviour - printing/CR/LF 0 Double spacing - LF/printing/CR/LF 1 Next page - m*LF/CR/LF/n*LF/printing/CR/LF + Overwrite mode - CR/printing/CR/LF On program startup an extra CR is sent. List directed output may have an extra space character prepended to avoid truncation of the first character. The VMS operating system knows that the file was written by a FORTRAN program, and so has to be treated in a different way, if it has the FORTRAN CARRIAGE CONTROL ATTRIBUTE that is kept with other information on the file. The default carriage control for files written by a FORTRAN program is FORTRAN carriage control on VMS, and 'normal' on others. An old trick is to begin every format with a '1X', in this way a space character will be written at the beginning of each line, to produce normal behavior, and on reading the extra space will be ignored. There are system-dependant ways to convert a file with the FORTRAN carriage control attribute to a normal file (the asa utility on some UNIX machines), or to make FORTRAN programs write normal files (the CARRIAGECONTROL keyword to OPEN on VMS). A small example program: PROGRAM CRCTRL WRITE(*,*) WRITE(*,*) ' LIST DIRECTED FORMAT ' WRITE(*,*) ' ==================== ' WRITE(*,*) ' line = "+x123456789" (seems the same): ' WRITE(*,*) '+x123456789' WRITE(*,*) WRITE(*,*) ' WITH A FORMAT ' WRITE(*,*) ' ============= ' WRITE(*,*) ' line = "x123456789" (nothing special): ' WRITE(*,'(A)') 'x123456789' WRITE(*,*) WRITE(*,*) ' line = "+x123456789" (overwrite): ' WRITE(*,'(A)') '+x123456789' WRITE(*,*) WRITE(*,*) ' line = "0x123456789" (double-space): ' WRITE(*,'(A)') '0x123456789' WRITE(*,*) WRITE(*,*) ' line = "1x123456789" (new-page): ' WRITE(*,'(A)') '1x123456789' WRITE(*,*) END A comparison of the three I/O methods ------------------------------------- | Formatted | List-directed | Unformatted -----------|-----------------|---------------------|-------------------- Syntax | FMT='(format)' | FMT=* | No format specifier -----------|-----------------|---------------------|-------------------- Character | As is | Should be enclosed | As is input | | with quotes 'ABC' | -----------|-----------------|---------------------|-------------------- Carriage | ASA | Prefixes a space | control | | before each record | -----------|-----------------|---------------------|-------------------- +----------------------------------------------+ | USE LIST-DIRECTED OUTPUT WHEREVER POSSIBLE | +----------------------------------------------+