The C programming language reference article from the English Wikipedia on 24-Jul-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

C programming language

Learn about the lives of children in Africa
<em>[[The C Programming LanguageEnlarge

[[The C Programming Language

, Brian Kernighan and Dennis Ritchie, one of the most read and trusted books on C.]] The C programming language is a low-level standardized programming language developed in the early 1970s by Ken Thompson and Dennis Ritchie for use on the UNIX operating system. It has since spread to many other operating systems, and is one of the most widely used programming languages. C is prized for its efficiency, and is the most popular programming language for writing system software, though it is also used for writing applications. It is also commonly used in computer science education.

Table of contents
1 Features
2 Problems with C
3 History
4 "Hello, World!" in C
5 Relation to C++
6 Programming tools
7 See also
8 References
9 External links

Features

Overview

C is a relatively minimalist programming language that operates close to the hardware, and is more similar to assembly language than most other programming languages. Indeed, C is sometimes referred to as "portable assembly," reflecting its important difference from assembly languages: C code can be compiled for and run on almost any machine, more than any other language in existence, while assembly languages run on at most a few very specific models of machines. C is typically called a low level or medium level language, indicating how closely it operates with the hardware.

This is no accident; C was created with one important goal in mind: to make it easier to write large programs with less errors in the procedural programming paradigm, but without putting a burden on the writer of the C compiler, who is encumbered by complex language features. To this end, C has the following important features:

Some useful features that C lacks that are found in other languages include: Although the list of useful features C lacks is long, this has been important to its acceptance, because it allows new compilers to be written quickly for it on new platforms, and because it keeps the programmer in close control of what the program is doing. This is what often allows C code to run more efficiently than many other languages. Typically only hand-tuned assembly language code runs more quickly, since it has complete control of the machine, but advances in compilers along with new complexity in modern processors have quickly closed this gap.

One consequence of C's wide acceptance and efficiency is that the compilers, libraries, and interpreters of other higher-level languages are often implemented in C.

Types

C has a very simple type system in most respects similar to those of its contemporaries Fortran and Pascal, including types for integers of various sizes, both signed and unsigned, floating-point numbers, characters, and records (structs).

C makes extensive use of pointers, a very simple type of reference that stores the address of a memory location. The pointer can be dereferenced, an operation which retrieves the object stored at the memory location the pointer contains, and the address can be manipulated with pointer arithmetic. At runtime, a pointer is simply a machine address like those manipulated in assembly, but at compile-time it has a complex type that indicates the type of the object it points to, allowing expressions including pointers to be type-checked. Pointers are used widely in C; the C string type is simply a pointer to an array of characters, and dynamic memory allocation, described below, is performed using pointers.

Pointers in C have a special reserved value, NULL, which indicates that they are not pointing to anything. This is useful in constructing many data structures, but causes undefined behavior (typically a crash) if dereferenced. A pointer with the value NULL is called a null pointer. Similarly, C pointers also have a special void pointer type, meant to indicate a pointer that points to an object of unknown type.

C also has language-level support for static, or fixed-size, arrays. The arrays can appear to have more than one dimension, although they are technically arrays of arrays (e.g., tbl[10][20] rather than tbl[10,20]) Dimensions are laid out in row-major order. Arrays are accessed using pointers and pointer arithmetic; the array name is treated as a pointer to the beginning of the array. In many applications, having fixed-size arrays is unreasonable, and so dynamic memory allocation can be used to create dynamically-sized arrays (see Data storage below).

Because C is often used in low-level systems programming, there are cases where it's actually necessary to treat an integer as an address, a floating-point number as an integer, or one type of pointer as another. For these, C supplies casting, an operation that forces an object from one type to another, if this is possible. While sometimes necessary, the use of casts sacrifices some of the safety provided by the type system.

Data storage

One of the most important functions of a programming language is to provide facilities for managing memory and the objects that are stored in memory. C provides three distinct ways of allocating memory for objects:

These three approaches are appropriate in different situations and have various tradeoffs. For example, static memory allocation has no allocation overhead, automatic has only a little during function calls, and dynamic memory allocation can potentially have a great deal of overhead for both allocation and deallocation. On the other hand, stack space is typically much more limited than either static memory or heap space, and only dynamic memory allocation allows allocation of objects whose size is only known at run-time. Most C programs make extensive use of all three.

For most objects, automatic or static allocation is preferred where possible because the error-prone hassle of releasing the allocated memory can be avoided. However, an exception is arrays: statically allocated arrays are limited to a fixed size, and this is inappropriate for many applications; by allocating them in dynamic memory, they can be resized at any time. (See "malloc" for an example of dynamic arrays).

Multidimensional arrays can also be allocated dynamically (although they seldom are), in two different ways:

In many cases, rather than use multi-dimensional arrays, an easier approach is to define a data structure to store what would have been contained within the rows of the multi-dimensional array, and then create a single-dimensional array of pointers to the new data structure. An array of structures can also be used, but using an array of pointers allows the array to be dynamically resized, and since memory for elements does not have to be allocated until it is used, it is more memory-efficient.

Problems with C

A popular saying is that C makes it easy to shoot yourself in the foot. In other words, C permits many operations that are generally not desirable, and thus many simple errors made by a programmer are not detected by the compiler or even when they occur at runtime, leading to programs with unpredictable behavior.

One problem is that automatically and dynamically allocated objects are not initialized; they initially have whatever value is present in the memory space they are assigned. This value is highly unpredictable, and can vary between two machines, two program runs, or even two calls to the same function. If the program attempts to use such an uninitialized value, the results are usually unpredictable. Most modern compilers detect and warn about this problem.

C's pointers are one primary source of this danger; because they are unchecked, a pointer can be made to point to any object of any type, including code, and then written to, causing unpredictable effects. Although most pointers point to safe places, they can be moved to unsafe places using pointer arithmetic, the memory they point to may be deallocated and reused (dangling pointers), or they may be uninitialized (wild pointers). Another problem with pointers is that C freely allows free conversion between any two pointer types; again, most modern compilers warn about this. Other languages attempt to address this problem by using more restrictive reference types.

Although C has native support for static arrays, it does not verify that arrays are indexed into with a valid index (bounds checking). For example, one can write to the sixth element of an array with five elements, yielding unpredictable results. This is called a buffer overflow. Although this is in keeping with the C philosophy of giving the programmer full control, it has also been notorious as the source of a number of security problems in C-based network servers.

Another common problem in C is that heap memory cannot be reused until it is explicitly released by the programmer with free(). The result is that if the programmer accidentally forgets to free memory, but continues to allocate it, more and more memory will be consumed over time. This is called a memory leak. Conversely, it's possible to release memory too soon, and then continue to use it. Because the allocation system can reuse the memory at any time for unrelated reasons, this results in insidiously unpredictable behavior. These issues in particular are ameliorated in languages with automatic garbage collection.

Tools have been created to help C programmers avoid many of these errors in many cases. Automated source code checking and auditing is fruitful in any language, and for C many such tools exist, such as Lint. There are also libraries for performing array bounds checking and a limited form of automatic garbage collection, but they are not a standard part of C.

History

Early developments

The initial development of C occurred at AT&T Bell Labs between 1969 and 1973; according to Ritchie, the most creative period occurred in 1972. It was named "C" because many of its features were derived from an earlier language called "B". Accounts differ regarding the origins of the name "B": Ken Thompson credits the BCPL programming language, but he had also created a language called Bon in honor of his wife Bonnie.

By 1973, the C language had become powerful enough that most of the UNIX kernel, originally written in PDP-11/20 assembly language, was rewritten in C. This was one of the first operating system kernels implemented in a language other than assembly, earlier instances being the Multics system (written in PL/I) and TRIPOS (written in BCPL).

K&R C

In 1978, Ritchie and Brian Kernighan published the first edition of The C Programming Language. This book, known to C programmers as "K&R", served for many years as an informal specification of the language. The version of C that it describes is commonly referred to as "K&R C." (The second edition of the book covers the later ANSI C standard, described below.)

K&R introduced the following features to the language:

K&R C is often considered the most basic part of the language that is necessary for a C compiler to support. For many years, even after the introduction of ANSI C, it was considered the "lowest common denominator" that C programmers stuck to when maximum portability was desired, since not all compilers were updated to fully support ANSI C, and reasonably well-written K&R C code is also legal ANSI C.

In the years following the publication of K&R C, several "unofficial" features were added to the language, supported by compilers from AT&T and some other vendors. These included:

ANSI C and ISO C

During the late 1970s, C began to replace BASIC as the leading microcomputer programming language. During the 1980s, it was adopted for use with the IBM PC, and its popularity began to increase significantly. At the same time, Bjarne Stroustrup and others at Bell Labs began work on adding object-oriented programming language constructs to C. The language they produced, called C++, is now the most common application programming language on the Microsoft Windows operating system; C remains more popular in the Unix world.

In 1983, the American National Standards Institute (ANSI) formed a committee, X3J11, to establish a standard specification of C. After a long and arduous process, the standard was completed in 1989 and ratified as ANSI X3.159-1989 "Programming Language C". This version of the language is often referred to as ANSI C. In 1990, the ANSI C standard (with a few minor modifications) was adopted by the International Standards Organization (ISO) as ISO/IEC 9899:1990.

One of the aims of the ANSI C standardization process was to produce a superset of K&R C, incorporating many of the unofficial features subsequently introduced. However, the standards committee also included several new features, such as function prototypes (borrowed from C++), and a more capable preprocessor.

ANSI C is now supported by almost all the widely used compilers. Most of the C code being written nowadays is based on ANSI C. Any program written only in standard C is guaranteed to perform correctly on any platform with a conforming C implementation. However, many programs have been written that will only compile on a certain platform, or with a certain compiler, due to (i) the use of non-standard libraries, e.g. for graphical displays, and (ii) some compilers not adhering to the ANSI C standard, or its successor, in their default mode.

C99

After the ANSI standardization process, the C language specification remained relatively static for some time, whereas C++ continued to evolve. (Normative Amendment 1 created a new version of the C language in 1995, but this version is rarely acknowledged.) However, the standard underwent revision in the late 1990s, leading to the publication of ISO 9899:1999 in 1999. This standard is commonly referred to as "C99". It was adopted as an ANSI standard in March 2000.

The new features in C99 include:

Interest in supporting the new C99 features appears to be mixed. Whereas GCC and several other compilers now support most of the new features of C99, the compilers maintained by Microsoft and Borland do not, and these two companies do not seem to be interested in adding such support.

"Hello, World!" in C

The following simple application prints out "Hello, World" to standard output (which is usually the screen, but might be a file or some other hardware device or perhaps even the bit bucket depending on how standard output is mapped at the time the program is executed). A version of this program appeared for the first time in K&R.

\n#include \n\nint main(void)\n{\n    /* Function for printing text */\n    printf("Hello, World!\\n");\n\n    /* Return statement properly exits program */\n    return 0;\n}\n

The first line of the program is an #include preprocessing directive, which causes the compiler to substitute for that line the entire text of the file (or other entity) it refers to; in this case the standard file stdio.h will replace that line. The angle brackets indicate that the stdio.h file is to be found in whatever place is designated for the compiler to find standard include files.

The next (non-blank) line indicates that a function named "main" is being defined; the main function is special in C programs, as it is the function that is first run when the program starts (for hosted implementations of C, and leaving aside "housekeeping" code). The curly brackets delimit the extent of the function. The int defines "main" as a function that returns or evaluates to, an integral number; the void indicates that no arguments or data must be given to function main by its caller.

The next line "calls", or executes a function named printf; the included file, stdio.h, contains the information describing how the printf function is to be called. In this call, the printf function is passed a single argument, the constant string "Hello, World!\ "; the \ is translated to a "newline" character, which when displayed causes the line break. printf returns a value, an int, but since it is not used it is discarded by the compiler.

The return statement tells the program to exit the current function (in this case main), returning the value zero to the function that called the current function. Since the current function is "main", the caller is whatever started our program. Finally, the close curly bracket indicates the end of the function "main".

Note that text surrounded by "/*" and "*/" (comment text) is ignored by the compiler. C99-compliant compilers also allow comments to be introduced with "//", indicating that the comment extends to the end of the current line.

Relation to C++

The C++ programming language was originally derived from C. As C and C++ have evolved independently, the division between the two has widened, however.

C99 created a number of conflicting features. Today, the primary differences between the two languages are:

Some features originally developed in C++ have also appeared in C. Among them are:

Programming tools

See also

References

External links


Programming languages
Ada | AWK | BASIC| C | C++ | C# | COBOL | ColdFusion | Common Lisp | Delphi | Fortran | IDL | Java | JavaScript | Lisp | Perl | PHP | Prolog | Pascal | Python | SAS | SQL | Visual Basic | More programming languages
Edit this template

An early version of this article contained material from FOLDOC, used with permission.