The Programming language reference article from the English Wikipedia on 24-Jul-2004
(provided by Fixed Reference: snapshots of Wikipedia from wikipedia.org)

Programming language

Watch child sponsorship videos
has been proposed. Please council it when you plan to rewrite the article entirely.
A programming language or computer language is a standardized communication technique for expressing instructions to a computer. It is a set of syntactic and semantic rules used to define computer programs. A language enables a programmer to precisely specify what data a computer will act upon, how these data will be stored/transmitted, and precisely what actions to take under various circumstances.

Table of contents
1 Introduction
2 Features of a programming language
3 History of programming languages
4 Classes of programming languages
5 Languages
6 Formal semantics
7 See also
8 External links

Introduction

A primary purpose of programming languages is to enable programmers to express their intent for a computation more easily than they could with a lower-level language or machine code. For this reason, programming languages are generally designed to use a higher-level syntax, which can be easily communicated and understood by human programmers. Programming languages are important tools for helping software engineers write better programs faster.

Understanding programming languages is crucial for those engaged in computer science because today, all types of computation are done with computer languages.

During the last few decades, a large number of computer languages have been introduced, have replaced each other, and have been modified/combined. Although there have been several attempts to make a universal computer language that serves all purposes, all of them have failed. The need for a significant range of computer languages is caused by the fact that the purpose of programming languages varies from commercial software development to scientific to hobby use; the gap in skill between novices and experts is huge and some languages are too difficult for beginners to come to grips with; computer programmers have different preferences; and finally, acceptable runtime cost may be very different for programs running on a microcontroller and programs running on a supercomputer.

There are many special purpose languages, for use in special situations: PHP is a scripting language that is especially suited for Web development; Perl is suitable for text manipulation; the C language has been widely used for development of operating systems and compilers (so-called system programming).

Programming languages make computer programs less dependent on particular machines or environments. This is because programming languages are converted into specific machine code for a particular machine rather than being executed directly by the machine. One ambitious goal of FORTRAN, one of the first programming languages, was this machine-independence.

There are two mechanisms used to translate a program written in a programming language into the specific machine code of the computer being used.

If the translation mechanism used is one that translates the program text as a whole and then runs the internal format, this mechanism is spoken of as compilation. The compiler is therefore a program which takes the human-readable program text (called source code) as data input and supplies object code as output. The resulting object code may be machine code which will be executed directly by the computer's CPU, or it may be code matching the specification of a virtual machine.

If the program code is translated at runtime, with each translated step being executed immediately, the translation mechanism is spoken of as an interpreter. Interpreted programs run usually more slowly than compiled programs, but have more flexibility because they are able to interact with the execution environment. See interpreted language for detail. Although the definition may not be identical, these typically fall into the category of scripting programming languages.

Most languages can be either compiled or interpreted, but most are better suited for one than the other. In some programming systems, programs are compiled in multiple stages, into a variety of intermediate representations. Typically, later stages of compilation are closer to machine code than earlier stages. One common variant of this implementation strategy, first used by BCPL in the late 1960s, was to compile programs to an intermediate representation called "O-code" for a virtual machine, which was then compiled for the actual machine. This successful strategy was later used by Pascal with P-code and Smalltalk with byte code, although in many cases the intermediate code was interpreted rather than being compiled.

Features of a programming language

Each programming language can be thought of as a set of formal specifications concerning syntax, vocabulary, and meaning.

These specifications usually include:

Most languages that are widely used, or have been used for a considerable period of time, have standardization bodies that meet regularly to create and publish formal definitions of the language, and discuss extending or supplementing the already extant definitions.

Data types and data structures

Internally, all data in a modern digital computer are stored simply as zeros or ones (binary). The data typically represent information in the real world such as names, bank accounts and measurements and so the low-level binary data are organised by programming languages into these high-level concepts.

The particular system by which data are organized in a program is the type system of the programming language; the design and study of type systems is known as type theory. Languages can be classified as statically typed systems, and dynamically typed languages. Statically-typed languages can be further subdivided into languages with manifest types, where each variable and function declaration has its type explicitly declared, and type-inferred languages. It is possible to perform type inference on programs written in a dynamically-typed language, but it is entirely possible to write programs in these languages that make type inference infeasible. Sometimes type-inferred and dynamically-typed languages are called latently typed.

With statically-typed languages, there usually are pre-defined types for individual pieces of data (such as numbers within a certain range, strings of letters, etc.), and programmatically named values (variables) can have only one fixed type, and allow only certain operations: numbers cannot change into names and vice versa. Examples of these languages are: C, C++ and Java.

Dynamically-typed languages treat all data locations interchangeably, so inappropriate operations (like adding names, or sorting numbers alphabetically) will not cause errors until run-time. Examples of these languages are: Objective-C, Lisp, JavaScript, Tcl and Prolog.

Type-inferred languages superficially treat all data as not having a type, but actually do sophisticated analysis of the way the program uses the data to determine which elementary operations are performed on the data, and therefore deduce what type the variables have at compile-time. Type-inferred languages can be more flexible to use, while creating more efficient programs; however, this capability is difficult to include in a programming language implementation, so it is relatively rare. Examples of these languages are: Haskell, MUMPS and ML.

Strongly typed languages do not permit the usage of values as different types; they are rigorous about detecting incorrect type usage, either at runtime for dynamically typed languages, or at compile time for statically typed languages. Ada, Java, ML, and Python are examples of strongly typed languages.

Weakly typed languages do not strictly enforce type rules or have an explicit type-violation mechanism, often allowing for undefined behavior, segmentation violations, or other unsafe behavior if types are assigned incorrectly. C, assembly language, C++, and Tcl are examples of weakly typed languages.

Note that strong vs. weak is a continuum; Java is a strongly typed language relative to C, but is weakly typed relative to ML. Use of these terms is often a matter of perspective, much in the way that an assembly language programmer would consider C to be a high-level language while a Java programmer would consider C to be a low-level language.

Note that strong and static are orthogonal concepts. Java is a strongly, statically typed language. C is a weakly, statically typed language. Python is a strongly, dynamically typed language. Tcl is a weakly, dynamically typed language. But beware that some people incorrectly use the term strongly typed to mean strongly, statically typed, or, even more confusingly, to mean simply statically typed--in the latter usage, C would be called strongly typed, despite the fact that C doesn't catch that many type errors and that it's both trivial and common to defeat its type system (even accidentally).

Most languages also provide ways to assemble complex data structures from built-in types and to associate names with these new combined types (using arrays, lists, stacks, files).

Object oriented languages allow the programmer to define data-types called "Objects" which have their own intrinsic functions and variables (called methods and attributes respectively). A program containing objects allows the objects to operate as independent but interacting sub-programs: this interaction can be designed at coding time to model or simulate real-life interacting objects. This is a very useful, and intuitive, functionality. Programs such as Python and Ruby have developed as OO (Object oriented) languages. They are comparatively easy to learn and to use, and are gaining popularity in professional programming circles, as well as being accessible to non-professionals. These more intuitive languages have increased the public availability and power of customised computer applications.

Aside from when and how the correspondence between expressions and types is determined, there's also the crucial question of what types the language defines at all, and what types it allows as the values of expressions (expressed values) and as named values (denoted values). Low-level languages like C typically allow programs to name memory locations, regions of memory, and compile-time constants, while allowing expressions to return values that fit into machine registers; ANSI C extended this by allowing expressions to return struct values as well (see record). Functional languages often allow variables to name run-time computed values directly instead of naming memory locations where values may be stored. Languages that use garbage collection are free to allow arbitrarily complex data structures as both expressed and denoted values.

Finally, in some languages, procedures are allowed only as denoted values (they cannot be returned by expressions or bound to new names); in others, they can be passed as parameters to routines, but cannot otherwise be bound to new names; in others, they are as freely usable as any expressed value, but new ones cannot be created at run-time; and in still others, they are first-class values that can be created at run-time.

Instruction and control flow

Once data has been specified, the machine must be instructed how to perform operations on the data. Elementary statements may be specified using keywords or may be indicated using some well-defined grammatical structure. Each language takes units of these well-behaved statements and combines them using some ordering system. Depending on the language, differing methods of grouping these elementary statements exist. This allows one to write programs that are able to cover a variety of input, instead of being limited to a small number of cases. Furthermore, beyond the data manipulation instructions, other typical instructions in a language are those used for control flow (branches, definitions by cases, loops, backtracking, functional composition).

Reference mechanisms and re-use

The core of the idea of reference is that there must be a method of indirectly designating storage space. The most common method is through named variables. Depending on the language, further indirection may include references that are pointers to other storage space stored in such variables or groups of variables. Similar to this method of naming storage is the method of naming groups of instructions. Most programming language use macro calls, procedure calls or function calls as the statements that use these names. Using symbolic names in this way allows a program to achieve significant flexibility, as well as a high measure of reusability. Indirect references to available programs or predefined data divisions allow many application-oriented languages to integrate typical operations as if the programming language included them as higher level instructions.

Design philosophies

For the above-mentioned purposes, each language has been developed using a special design or philosophy. Some aspect or another is particularly stressed by the way the language uses data structures, or by which its special notation encourages certain ways of solving problems or expressing their structure.

Since programming languages are artificial languages, they require a high degree of discipline to accurately specify which operations are desired. Programming languages are not error tolerant; however, the burden of recognising and using the special vocabulary is reduced by help messages generated by the programming language implementation. There are a few languages which offer a high degree of freedom in allowing self-modification in which a program re-writes parts of itself to handle new cases. Typically, only machine language and members of the Lisp family (Common Lisp, Scheme) provide this capability. Some languages such as MUMPS and Perl allow modification of data structures that contain program fragments, and provide methods to transfer program control to those data structures; languages that support dynamic linking and loading such as C, C++, and the Java programming language can emulate self-modification by either embedding a small compiler or calling a full compiler and linking in the resulting object code. Interpreting code by recompiling it in real time is called dynamic recompilation; emulators and other virtual machines exploit this technique for greater performance.

There are a variety of ways to classify programming languages. The distinctions are not clear-cut; a particular language standard may be implemented in multiple classifications. For example, a language may have both compiled and interpreted implementations.

In addition, most compiled languages contain some run-time interpreted features. The most notable example is the familiar I/O format string, which is written in a specialized, little language and which is used to describe how to convert program data to or from an external representation. This string is typically interpreted at run time by a specialized format-language interpreter program included in the run-time support libraries. Many programmers have found the flexibility of this arrangement to be very valuable.

History of programming languages

The development of programming languages , unsurprisingly, follows closely the development of the physical and electronic processes used in today's computers.

Charles Babbage is often credited with designing the first computer-like machines, which had several programs written for them (in the equivalent of assembly language) by Ada Lovelace.

Alan Turing used the theoretical construct of a Turing machine which behaves in principle in all relevant ways like modern computers, according to the low level program which is input.

In the 1940s the first recognisably modern, electrically powered computers were created, requiring programmers to operate machines by hand. Some military calculation needs were a driving force in early computer development, such as encryption, decryption, trajectory calculation and massive number crunching needed in the development of atomic bombs. At that time, computers were extremely large, slow and expensive: advances in electronic technology in the post-war years led to the construction of more practical electronic computers. At that time only Konrad Zuse imagined the use of a programming language (developed eventually as Plankalkül) like those of today for solving problems.

Subsequent breakthroughs in electronic technology (transistors, integrated circuits, and chips) drove the development of increasingly reliable and more usable computers. This was paralleled by the development of a variety of standardised computer languages to run on them. The improved availability and ease of use of computers led to a much wider circle of people who can deal with computers. The subsequent explosive development has resulted in the Internet, the ubiquity of personal computers, and increased use of computer programming, through more accessible languages such as Python, Visual Basic, etc..

Classes of programming languages

Languages

The following languages are major languages used by several thousand to several million programmers worldwide:
Programming languages
Ada | AWK | BASIC| C | C++ | C# | COBOL | ColdFusion | Common Lisp | Delphi | Fortran | IDL | Java | JavaScript | Lisp | Perl | PHP | Prolog | Pascal | Python | SAS | SQL | Visual Basic | More programming languages
Edit this template

Formal semantics

The rigorous definition of the meaning of programming languages is the subject of Formal semantics.

See also

External links