18Dec2021

Foundations of programming languages kent lee pdf download

Kent D. Foundations of Programming Languages written by Kent D. Lee detailed in the below table…. Step-1 : Read the Book Name and author Name thoroughly. Step-4 : Click the Download link provided below to save your material in your local drive.

LearnEngineering team try to Helping the students and others who cannot afford buying books is our aim. For any quarries, Disclaimer are requested to kindly contact us , We assured you we will do our best. Thank you. If you face above Download Link error try this Link. The interpreter is the machine language program that executes all the programs you write in the interpreted language. The source program you write controls the behavior of the interpreter program. Programming in an interpreted language is a two step process.

Each time your program is executed it is translated into an AST by a part of the interpreter called the parser. There may be an additional step that translates the AST to some lower-level representation, often called bytecode.

In an interpreter, this lower-level representation is still internal to the interpreter program. Then a part of the interpreter, often called a virtual machine, executes the byte code instructions. Not every interpreter translates the AST to bytecode. Eliminating the compile step has a few implications.

This is a generally a good thing and can shorten the development cycle. You only have a source code program to keep track of. The interpreter insulates your program from platform dependencies. Of course, source programs for compiled languages are generally platform indepen- dent too.

Thankfully, because the Python interpreter is written in C the same Python interpreter program can be compiled with some small differences for each platform. The portability of interpreted languages has made them very popular among programmers, especially when writing code that needs to run across multiple platforms. One huge problem that has driven research into interpreted languages is that of heap memory management. Recall that the heap is the place where memory is dynamically allocated.

The heap is a big space, but if a program runs long enough and continues to allocate and not free space, eventually the heap will fill up and the program will terminate abnormally. Instead, there is a special task or thread that runs periodically as part of the interpreter to check the heap for space that can be freed. This task is called the garbage collector. For a garbage collector to work correctly, space on the heap has to be allocated and accessed in the right way.

Many interpreted languages are designed to insure that a garbage collector will work correctly. The disadvantage of an interpreted language is in speed of execution. Interpreted programs typically run slower than compiled programs. In a compiled program, parsing and code generation happen once when the program is compiled.

When running an interpreted program, parsing and code generation happen each time the program is executed. In addition, if an application has real-time dependencies then having the garbage collector running at more or less random intervals may not be desirable.

It turns out that one of the biggest advantages is the portability of programs. This portability issue has driven a lot of research into making interpreted programs run as fast as compiled languages. As discussed earlier in this chapter, the concept of a virtual machine has been around quite a while. A virtual machine is a program that provides insulation from the actual hardware and operating system of a machine while supplying a consistent implementation of a set of low-level instructions, often called bytecode.

Figure 1. There is no one specification for bytecode instructions. They are specific to the virtual machine being defined. Python has a virtual machine buried within the inter- preter. Prolog is another interpreter that uses a virtual machine as part of its imple- mentation. Some languages, like Java have taken this idea a step further.

Java has a virtual machine that executes bytecode instructions as does Python. The creators of Java separated the virtual machine from the compiler.

Instead of storing the bytecode instructions internally as in an interpreter, the Java compiler, called javac, compiles a Java source code program to a bytecode file. This file is not machine language so it cannot be executed directly on the hardware.

Java bytecode files all end with a. You may have noticed these files at some point after compiling a Java program. Programs written using a hybrid language like Java are compiled. However, the compiled bytecode program is interpreted. Source programs in the language are not interpreted directly. By adding this intermediate step the interpreter can be smaller and faster than traditional interpreters. Very little parsing needs to happen to read the program and executing the program is straightforward because each bytecode instruction usually has a simple implementation.

NET, JScript, and other. NET platform languages. You might notice that Standard ML and Python were included as examples of interpreted languages. Both ML and Python include interactive interpreters as well as the ability to compile and run low-level bytecode programs. Python bytecode files are named with a. In the case of Python and Standard ML the virtual machine is not a separate program. Both interpreters are written to recognize a bytecode file and execute it just like a source program.

Java and the. NET programming environments do not include interactive inter- preters. The only way to execute programs with these platforms is to compile the program and then run the compiled program using the virtual machine. Programs written for the. Microsoft submitted some of the. NET on other platforms. In theory all. NET programs are portable like Java, but so far implementations of the.

NET framework are not as generally available as Java. The Java platform has been implemented and released on all major platforms. In fact, in November Sun, the company that created Java, announced they were releasing the Java Virtual Machine and related software under the GNU Public License to encourage further develop- ment of the language and related tools.

Since then the rights to Java have now been purchased by Oracle where it continues to be supported. Java and. NET language implementations maintain backwards compatibility of their virtual machines. This means that a program compiled for an earlier version of Java or. This distinction makes Python more of an interpreted language, while Java and.

NET languages are truly virtual machine implementations. Maintaining backwards compatibility of the virtual machine means that program- mers can distribute application for Java and. NET implementations without releasing their source code. NET and Java applications can be distributed while maintaining privacy of the source code.

Since intellectual property is an important asset of compa- nies, the ability to distribute programs in binary form is important. The development of virtual machines made memory management and program portability much easier in languages like Java, Standard ML, and the various. NET languages while also pro- viding a means for programmers to distribute programs in binary format so source code could be kept private.

Data transformation is the fundamental operation that is performed by all program- ming languages. Some programming languages mutate data to new values. Other languages transform data by building new values from old values.

However the transformation takes place, these data transformation operations are defined for cer- tain types of data. Not every transformation operation makes sense for every type of value.

How would you add two customers together and what would that mean? Since programming languages define data transformation operations, they simi- larly define types to specify which operations make sense on which types of data. Types in programming languages include integers, booleans, real numbers i.

Transformation operations are defined operators on these values. The plus sign e. String concatenation might also be denoted by the plus sign. Or it might be some other symbol. One of the jobs of a programming language implementation is to determine which operation is meant when, for instance, the plus sign is written in a program.

Determining if the plus sign makes sense in the context of its operands i. More generally, the programming language implementation is responsible for checking that the operations performed on its data types are defined and the programming language is responsible for invoking the correct operation.

There are two different times that this type checking might occur. Some pro- gramming languages defer all type checking until the last possible second when the program is actually executing. When the next operation occurs, a programming language implementation may terminate a program and report that the next opera- tion to be executed is not defined.

This is called a dynamically typed programming language. Python is a dynamically typed programming language. No earlier warning is given. When the code is executed, if you try to add two customers together, you find out that this has not been defined. Other programming languages report any operation that is not defined before the program begins execution. In this case the programming language is statically typed. A statically typed language goes through a step during before execution where type checking is performed to see if the operations are defined for the given types of operands.

These languages are statically typed. An important facet of Standard ML is the strong type checking provided by the language. The type inference system, commonly called Hindley-Milner type infer- ence, statically checks the types of all expressions and operations in the language. In addition, the type checking system is polymorphic, meaning that it handles types that may contain type variables. The polymorphic type checker is sound. It will never say a program is typed correctly when it is not.

Interestingly, the type checker has also been proven complete, which means that all correctly typed programs will indeed pass the type checker. No correctly typed program will be rejected by the type checker. We expect soundness out of type checkers but completeness is much harder to prove and it has been proven for Standard ML.

Typi- cally there is more overhead to programming with a statically typed language. But this is not always the case. Standard ML infers the types of most values in the language without requiring the types of its values to be declared. Dynamically typed languages typically require less overhead in declaring values. The problem with dynamically typed languages comes from the lateness of deter- mining if the operation is defined on the objects. If the operation is only determined to be valid right before it is executed, then every single line of a program must be tested to determine if the program is correct or not.

They only tell you that an operation is not defined if you actually try to execute that line of code. So, which is better, dynamically or statically typed languages? It depends on the complexity of the program you are writing and its size.

Static typing is certainly desirable if all other things are equal. But static typing typically does increase the work of a programmer up front. On the other hand, static typing is likely to decrease the amount of time you spend testing as evidenced by the Fox Project [10] at Carnegie Mellon.

There are many great resources on the web where you can get more information. However, be careful. While the web is a great source, you should always research your topic enough to independently verify the information you find there. While learning new languages and studying programming language implementa- tion it becomes important to understand models of computation. A compiler translates a high-level programming language into a lower level computation.

These low-level computations are usually expressed in terms of machine language but not always. More important than the actual low-level language is the model of computation.

Some models are based on register machines. Some models are based on stack machines. Still other models may be based on something entirely different.

Chapters 3 and 4 explore stack-based virtual machines in much more detail. JCoCo is an interpreter of Python bytecode instructions.

Chapter three introduces assembly language programming using JCoCo, providing some insight into how programming languages are imple- mented. Subsequent chapters in the book will again look at language implementation to better understand the languages you are learning, their strengths and weaknesses. While learning these languages you will also be implementing a compiler for a high level functional language called Small which is a robust subset of Standard ML.

This will give you even more insight into language implementation and knowledge of how to use these languages to solve problems. Finally, in the last two chapters of this text, you will learn about type checking and type inference using Prolog, a language that is well-suited to logic problems like type inference.

Learning how to use Prolog and implement a type checker is a great way to cap off a text on programming languages and language implementation. A great way to summarize the rest of this text is to see it moving from very prescriptive approaches to programming to very descriptive approaches to program- ming. The word prescriptive means that you dwell on details, thinking very carefully about the details of what you are writing.

For instance, in a prescriptive approach you might ask yourself, how do you set things up to invoke a particular type of instruction? In contrast, descriptive programming relies on programmers describing relationships between things.

Functional programming languages, to some extent, and logic programming languages employ this descriptive approach to programming. Read on to begin the journey from prescriptive to descriptive programming!

What are the three ways of thinking about programming, often called program- ming paradigms? Name at least one language for each of the three methods of programming described in the previous question. Name one person who had a great deal to do with the development of the impera- tive programming model. Name another who contributed to the functional model.

Finally, name a person who was responsible for the development of the logic model of programming. What are the primary characteristics of each of the imperative, functional, and logic models?

Name a language besides Standard ML, that is a functional programming lan- guage. What other logic programming languages are there other than Prolog? You might have to get creative on this one. Why is compiling a program preferred over interpreting a program? Why is interpreting a program preferred over compiling a program? What benefits do virtual machine languages have over interpreted languages? What is a bytecode program? Name two languages that use bytecode in their implementation.

Why are types important in a programming language? What does it mean for a programming language to be dynamically typed? What does it mean for a programming language to be statically typed? You should only consult these answers after you have tried each of them for yourself first. Practice problems are meant to help reinforce the material you have just read so make use of them.

They later proved the two approaches were equivalent in their power to express computation. Both von Neumann and Turing contributed to the idea of a stored-program com- puter. Backus developed BNF notation which was used in the development of Algol So was The problems in Mathematics were growing complex enough that many mathe- maticians were developing models and languages for expressing their algorithms.

This was one of the driving factors in the development of computers and Computer Science as a discipline. The run-time stack, global memory, and the heap are the three divisions of data memory. Data on the heap is created at run-time.

An activation record holds information like local variables, the program counter, the stack pointer, and other state information necessary for a function invocation. An activation record is created each time a function is called. An activation record is deleted when a function returns. The primary operation is memory updates. In the imperative model the primary operation revolves around updating memory the assignment statement. In the functional model the primary operation is function application.

The functional model emphasizes immutable data. However, some imperative languages have some immutable data as well. For instance, Java strings are immutable. You never write a program in Prolog. You write a database of rules in Prolog that tell the single Prolog program depth first search how to proceed. The programmer provides a database of facts and predicates that tell Prolog about a problem. In Prolog the programmer describes the problem instead of programming the solution.

C was created by Dennis Ritchie. Stan- dard ML was primarily designed by Robin Milner. Python was created by Guido van Rossum. Java was the work of the Green team and James Gosling. Standard ML and Prolog were both designed as languages for automated theorem proving first. Then they became general purpose programming languages later. Both Python and Prolog run on virtual machine implementations.

But, understanding just how to write in the new language takes looking at examples or reading documentation to learn its details. In other words, you need to know the mechanics of putting a program together in the new language.

Are the semicolons in the right places? Do you use begin Learning how a program is put together is called learning the syntax of the language. Syntax refers to the words and symbols of a language and how to write the symbols down in some meaningful order. Semantics is the word that is used when deriving meaning from what is written. The semantics of a program refers to what the program will do when it is executed. Informally it is much easier to say what a program does than to describe the syntactic structure of the program.

However, syntax is a lot easier to formally describe than semantics. In either case, if you are learning a new language, you need to learn something about both the syntax and semantics of the language. Semantics describes how or whether such programs will execute. Some questions may concern semantic issues that can be determined statically, meaning before the program is run.

Other semantic issues may be dynamic issues, meaning they can only be determined at run-time. The difference between static semantic issues and syntactic issues is sometimes a difficult distinction to make. Do b and c have values? Does the assignment statement have the proper form? There are lots of questions that need to be answered about this assignment statement. Some questions could be answered sooner than others. The answers to questions 2 and 3 can be answered at compile-time and are called static semantic issues.

The answer to question 1 is a dynamic issue and is probably not determinable until run-time. In some circumstances, the answer to question 1 might also be a static semantic issue. Question 4 is definitely a syntactic issue. Unlike the dynamic semantic issues, the correct syntax of a program is statically determinable.

Said another way, determining a syntactically valid program can be accomplished without running the program. The syntax of a programming language is specified by a grammar. But before discussing grammars, the parts of a grammar must be defined.

A terminal or token is a symbol in the language. A syntactic category or nonterminal is a set of phrases, or strings of tokens, that will be defined in terms of symbols in the language terminal and nonterminal symbols. English is used as a metalanguage for describing pro- gramming languages, but because of the ambiguities in English, more formal meta- languages have been developed.

The next section describes a formal metalanguage for describing programming language syntax. BNF is a formal metalanguage for describing language syntax. The word formal is used to indicate that BNF is unambiguous. Unlike English, the BNF language is not open to our own interpretations. There is only one way to read a BNF description. In , John Backus and Peter Naur, a computer magazine writer, had just attended a confer- ence on Algol.

As they returned from the trip it became apparent that they had very different views of what Algol would look like. As a result of this discussion, John Backus worked on a method for describing the grammar of a language. Peter Naur slightly modified it.

That is the entire language. Whether abbreviated or not, the meaning is the same. The BNF is much easier to understand and is not ambiguous like this English description. Parentheses may be used for grouping 2. Most interesting grammars are context-free, meaning that the contents of any syntactic category in a sentence are not dependent on the context in which it is used. Informally, a context-free grammar is a set of nonterminals and terminals.

For each nonterminal there are one or more productions with strings of zero or more non- terminals and terminals on the right hand side as described in the BNF description.

There is one special nonterminal called the start symbol of the grammar. A sentence belongs to the language of a grammar if it can be derived from the grammar. This process is called constructing a derivation. A derivation is a sequence of sentential forms that starts with the start symbol of the grammar and ends with the sentence you are trying to derive. A sentential form is a string of terminals and nonterminals from the grammar. For a grammar, G, the language of G is the set of sentences that can be derived from G and is usually written as L G.

The derivation begins with the start symbol of the grammar and ends with the sentence. The underlined nonterminal in each sentential form is replaced by the right hand side of a production for that nonterminal.

You can check your answer s in Section 2. There are typically many different derivations for a particular sentence of a grammar. However, there are two derivations that are of some interest to us in understanding programming languages. Another type of expression is called a prefix expression.

In prefix expressions the operator appears before the operands. The operator precedence is determined by the order of operations in the expression. Because of this, parentheses are not needed in prefix expressions. This kind of tree is called a parse tree.

A parse tree is another way of representing a sentence of a given language. A parse tree is constructed with the start symbol of the grammar at the root of the tree.

The children of each node in the tree must appear on the right hand side of a production with the parent on the left hand side of the same production. A program is syntactically valid if there is a parse tree for it using the given grammar.

While there are typically many different derivations of a sentence in a language, there is only one parse tree. This is true as long as the grammar is not ambiguous.

A grammar is ambiguous if and only if there is a sentence in the language of the grammar that has more than one parse tree. The parse tree for the sentence derived in Sect. Notice the similarities between the derivation and the parse tree. Practice 2. Try it the other way and see why! An abstract syntax tree is like a parse tree except that non-essential information is removed. For example, the parse tree from Fig.

The abstract syntax tree eliminates all the unnecessary information and leaves just what is essential for evaluating the expression. Abstract syntax trees, often abbreviated ASTs, are used by compilers while generating code and may be used by interpreters when running your program. Abstract syntax trees throw away superfluous information and retain only what is essential to allow a compiler to generate code or an interpreter to execute the program.

A grammar, because it is a well-defined mathematical structure, can be used to construct a program called a parser. A language implementation, like a compiler or an interpreter, has a parser that reads the program from the source file. For a parser to do its job, it must be able to get the stream of tokens from the source file. Forming tokens from the individual characters of a source file is the job of another program often called a tokenizer, or scanner, or lexer.

Lex is the Latin word for word. The words of a program are its tokens. In programming language implementations a little liberty is taken with the definition of word. A word is any terminal or token of a language. It turns out that the tokens of a language can be described by another language called the language of regular expressions. The dot operator means that T is followed by K. The grammar defines the precedence of these operators.

Kleene star has the highest precedence followed by the dot operator, followed by the choice operator. At its most primitive level, a regular expression may be just a single character. Frequently, a choice between many different characters may be abbreviated with some sensible name. Usually these abbreviations are specified explicitly before the regular expression is given. For brevities sake, assume that letter and digit have the usual definitions. Then, these tokens might be defined by the regular expression letter.

Identifiers must be at least one character long, but can be as long as we wish them to be. Numbers are only non-negative integers in the infix expression language. Floating point numbers cannot be specified in the language as the tokens are currently defined.

A finite state machine is often called a finite state automaton. The word automaton is just another word for machine. Every regular expression has at least one finite state machine and vice versa, every finite state machine has at least one matching regular expression. In fact, there is an algorithm that given any regular expression can be used to construct a finite state machine for it.

Formally a finite state automata is defined as follows. A finite state machine has a current state which initially is the start state. The machine starts in the start state and reads characters one at a time. As characters are read, the finite state machine changes state. Each state has transitions to other states based on the last character read. Each time the machine transitions to a new state, another character is read from the stream of characters.

After reading all the characters of a token, if the current state is in the set of final states, F, then the token is accepted by the finite state machine. Otherwise, it is rejected. Finite state machines are typically represented graphically by drawing the states, transitions, start state, and final states. States in a graphical representation are depicted as nodes in a graph. The start state has an arrow going into it with nothing at the back side of the arrow.

The transitions are represented as arrows going from one state to another and are labelled with the characters that trigger the given transition. Finally, final or accepting states are denoted with a double circle. The start state is 1. Each of states 2 through 9 are accepting states, denoted with a double circle.

State 2 accepts identifier tokens. State 3 accepts number tokens. States 4 to 9 accept operators and the parenthesis tokens. The finite state machine accepts one token at a time. For each new token, the finite state machine starts over in state 1. If, while reading a token, an unexpected character is read, then the stream of tokens is rejected by the finite state machine as invalid. Only valid strings of characters are accepted as tokens. Characters like spaces, tabs, and newline characters are not recognized by the finite state machine.

The finite state machine only responds with yes the string of tokens is in the language accepted by the machine or no it is not. However, this process is largely the same for all programming languages so there are tools that have been written to do this for us. Typically these tools are called lexer generators. To use a lexer generator you must write regular expressions for the tokens of the language and provide these to the lexer generator.

A lexer generator will generate a lexer program that internally uses a finite state machine like the one pictured in Fig. The lex tool is an example of a lexical generator for the C language. If you are writing an interpreter or compiler using C as the implementation language, then you would use lex or a similar tool to generate your lexer.

The Linux alternative is called flex. Java, Python, Standard ML, and most programming languages have equivalent available lexer generators. Every time you compile a program or run a program in an interpreter the program is first parsed using a parser. A parser is a program that given a sentence, checks to see if the sentence is a member of the language of the given grammar. A parser usually does more than just answer yes or no. A parser frequently builds an abstract syntax tree representation of the source program.

There are two types of parsers that are commonly constructed. Top-down and bottom-up parsers check to see if a sentence belongs to a grammar by constructing a derivation for the sentence, using the grammar.

A parser either reports success and possibly returns an abstract syntax tree or reports failure hopefully with a nice error message. The flow of data is pictured in Fig. They are sometimes called recursive descent parsers because they can be written as a set of mutually recursive functions. A top-down parser performs a left-most derivation of the sentence i. A top-down parser operates by possibly looking at the next token in the source file and deciding what to do based on the token and where it is in the derivation.

To operate correctly, a top-down parser must be designed using a special kind of grammar called an LL 1 grammar. An LL 1 grammar is simply a grammar where the next choice in a left-most derivation can be deterministically chosen based on the current sentential form and the next token in the input. The first L refers to scanning the input from left to right. The second L signifies that while performing a left-most derivation, there is only 1 symbol of lookahead that is needed to make the decision about which production to choose next in the derivation.

The grammar for infix expressions is not LL 1. We are only allowed to look at the 5 i. Which production do we choose? Therefore the grammar is not LL 1. Just because this infix expression grammar is not LL 1 does not mean that infix expressions cannot be parsed using a top-down parser.

There are other infix expres- sion grammars that are LL 1. In general, it is possible to transform any context-free grammar into an LL 1 grammar. It is possible, but the resulting grammar is not always easily understandable. The infix grammar given in Sect. These rules are left recursive. Left recursive rules are not allowed in LL 1 grammars. A left recursive rule can be eliminated in a grammar through a straightforward transformation of its production.

Common prefixes in the right hand side of two productions for the same nontermi- nal are also not allowed in an LL 1 grammar. Common prefixes can be eliminated by intro- ducing a new nonterminal to the grammar, replacing all common prefixes with the new nonterminal, and then defining one new production so the new nonterminal is composed of the common prefix. An empty production is a production that does not consume any tokens. Empty productions are sometimes convenient in recursive rules.

Once common prefixes and left recursive rules are eliminated from a context-free grammar, the grammar will be LL 1. However, this transformation is not usually performed because there are more convenient ways to build a parser, even for non- LL 1 grammars. In fact, most grammars for programming languages are LALR 1. The LA stands for look ahead with the 1 meaning just one symbol of look ahead.

The LR refers to scanning the input from left to right while constructing a right-most derivation. A bottom-up parser constructs a right-most derivation of a source program in reverse. So, an LALR 1 parser constructs a reverse right-most derivation of a program. Building a bottom-up parser is a somewhat complex task involving the computa- tion of item sets, look ahead sets, a finite state machine, and a stack. The finite state machine and stack together are called a pushdown automaton.

The construction of the pushdown automaton and the look ahead sets are calculated from the grammar. Bottom-up parsers are not usually written by hand. A parser generator is a program that is given a grammar and builds a parser for the language of the grammar by constructing the pushdown automaton and lookahead sets needed to parse programs in the language of the grammar.

The original parser generator for Unix was called yacc, which stood for yet another compiler compiler since it was a compiler for grammars that produced a parser for a language.

Since a parser is part of a compiler, yacc was a compiler compiler. The Linux version of yacc is called Bison. Hopefully you see the pun that was used in naming it Bison.

There are versions of yacc for other languages as well. ML-yacc is introduced and used in Chap. Parser generators like Bison produce what is called a bottom-up parser because the right-most derivation is constructed in reverse. In other words, the derivation is done from the bottom up. Usually, a bottom-up parser is going to return an AST representing a successfully parsed source program. Figure 2. The parser generator is given a grammar and runs once to build the parser.

The generated parser runs each time a source program is parsed. A bottom-up parser parses a program by constructing a reverse right-most deriva- tion of the source code.

As the reverse derivation proceeds the parser shifts tokens from the input onto the stack of the pushdown automaton. Then at various points in time it reduces by deciding, based on the look ahead sets, that a reduction is necessary.

A right-most derivation for this expression is as follows. It can be useful to look at the stack of the pushdown automaton as it parses the expression as pictured in Fig. In step A the parser is beginning. The dot to the left of the 5 indicates the parser has not yet processed any tokens of the source program and is looking at the 5.

The stack is empty. From step A to step B one token, the 5 is shifted onto the stack. From step B to C the parser looks at the multiplication operator and realizes that a reduction using rule 5 of the grammar must be performed. It is called a reduction because the production is employed in reverse order. The reduction pops the right hand side of rule 5 from the stack and replaces it with the nonterminal F. If you look at this derivation in reverse order, the first step is to replace the number 5 with F.

The rest of the steps of parsing the source program follow the right-most derivation either shifting tokens onto the stack or reducing using rules of the grammar. In step O the entire source has been parsed, the stack is empty, and the source program is accepted as a valid program. The actions taken while parsing include shifting and reducing. These are the two main actions of any bottom-up parser.

In fact, bottom-up parsers are often called shift-reduce parsers. If it is a reduce operation, then what production is being reduced? If it is a shift operation, what token is being shifted onto the stack? Show the stack after each shift and each reduce operation. In general, ambiguity in a grammar is a bad thing.

However, some ambiguity may be allowed by parser generators for LALR 1 languages. A classic example of ambiguity in languages arises from nested if-then-else state- ments.

The BNF for an if-then-else statement might look something like this. Because of this recursive definition, the else in this code is dangling.

That is, it is unclear if it goes with the first or second if statement. When a bottom-up parser is generated using this grammar, the parser generator will detect that there is an ambiguity in the grammar.

The problem manifests itself as a conflict between a shift and a reduce operation. The first rule says when looking at an else keyword the parser should shift. The second rule says when the parser is looking at an else it should reduce. To resolve this conflict there is generally a way to specify whether the generated parser should shift or reduce. The default action is usually to shift and that is what makes the most sense in this case. By shifting, the else would go with the nearest if statement.

This is the normal behavior of parsers when encountering this if-then-else ambiguity. New application areas frequently cause new languages to be developed to make programming applications in that area more convenient. NET are three languages that were created because of the world wide web. Ruby and Perl are languages that have become popular development languages for database and server side programming.

A recent trend in programming languages is to develop domain specific languages for particular embedded platforms. Many of these reference manuals give the grammar of the language using a variation of a context free grammar. All these syntax metalanguages share the same features as grammars.

They all have some way of defining parts of a program or syntactic categories and they all have a means of defining a language through recursively defined productions. The definitions, concepts, and examples provided in this chapter will help you understand a language reference when the time comes to learn a new language. Not all syntactically valid strings of tokens should be regarded as valid programs. Syntactically, this is a valid expression, but of course cannot be evaluated since division by zero is undefined.

This is a semantic issue. The meaning of the expression is undefined because division by zero is undefined. This is a semantic issue and semantics are not described by a syntactic definition. All that a grammar can ensure is that the program is syntactically valid.

A BNF grammar defines a context-free language: the left-hand side of each rules contains only one syntactic category. It is replaced by one of its alternative definitions regardless of the context in which it occurs. The set of programs in any interesting language is not context-free.

Context-sensitive features may be formally described as a set of restrictions or context conditions. Context-sensitive issues deal mainly with dec- larations of identifiers and type compatibility. Sometimes, context-sensitive issues like this are said to be part of the static semantics of the language. While a grammar describes how tokens are put together to form a valid program the grammar does not specify the semantics of the language nor does it describe the static semantics or context-sensitive characteritics of the language.

Other means are necessary to describe these language characteristics. Some methods, like type infer- ence rules, are formally defined. Most semantic characteristics are defined informally in some kind of English language description.

These are all context-sensitive issues. Reading and understanding syntactic descriptions is worthwhile since you will undoubtedly come across new languages in your career as a computer scientist.

There is certainly more that can be said about the topic of programming language syntax. Aho, Sethi, and Ullman [2] have written the widely recognized definitive book on compiler implementation which includes material on syntax definition and parser implementation. There are many other good compiler references as well. The Chomsky hierarchy of languages is also closely tied to grammars and regular expres- sions. Many books on Discrete Structures in Computer Science introduce this topic and a few good books explore the Chomsky hierarchy more deeply including an excellent text by Peter Linz [13].

In the next chapter you put this knowledge of syntax definition to good use learning a new language: the JCoCo assembly language. JCoCo is a virtual machine for inter- preting Python bytecode instructions. Learning assembly language helps in having a better understanding of how higher level languages work and Chap. What does the word syntax refer to?

How does it differ from semantics? What is a token? What is a nonterminal? What does BNF stand for? What is its purpose? What kind of derivation does a top-down parser construct? What is another name for a top-down parser? What is the difference between lex and yacc? What kind of machine is needed to implement a bottom-up parser?

What is a context-sensitive issue in a language? Give an example in Java. What do the terms shift and reduce apply to? Rewrite the BNF in Sect. Given the grammar in Sect.

Describe how you might evaluate the abstract syntax tree of an expression to get a result? Write out your algorithm in English that describes how this might be done. Write a regular expression to describe identifier tokens which must start with a letter and then can be followed by any number of letters, digits, or underscores. Draw a finite state machine that would accept identifier tokens as specified in the previous exercise. Be sure to say what is shifted and which rule is being used to reduce at each step.

See the solution to practice problem 2. There are other derivations that would be correct as well. There is only one correct right-most derivation. There is only one parse tree for the expression given this grammar. Once that derivation is complete, you go through the derivation backwards.

The difference in each step of the derivation tells you whether you shift or reduce. Here is the result. Stack contents have the top on the right up to the dot. Everything after the dot has not been read yet. We shift when we must move through the tokens to get to the next place we are reducing. Each step in the reverse derivation provides the reduce operations.

Since there are seven tokens there should be seven shift operations. Shift: 6. Reduce by rule 5: F. Reduce by rule 4: T. Reduce by rule 2: E. Reduce by rule 1: E. Reduce by rule 6: F. Reduce by rule 3: T. Assembly Language 3 Python is an object-oriented, interpreted language. Internally to the Python inter- preter, a Python program is converted to bytecode and interpreted using a virtual machine. Most modern programming languages have support for high-level abstrac- tions while the instructions of a virtual machine are closer to the machine language instructions supported by hardware architectures, making the interpretation of byte- code easier than interpretation of the original source program.

The advantage of virtual machine implementations results from dividing the mapping from high-level abstractions to low-level machine instructions into two parts: high-level abstractions to bytecode and bytecode to machine instructions.

While bytecode is a higher level abstraction than machine language, it is not greatly so. As programmers, if we understand how the underlying machine executes our programs, we better equip ourselves to make good choices about how we program. Just as importantly, having an understanding of how programs are executed can help us diagnose problems when things go wrong.

This chapter introduces assembly language programming in the bytecode lan- guage of the Python virtual machine.

Sheryl West's Ownd

0コメント

1000 / 1000