Nolife Compiler (Part 1):
A Context-Sensitive Analyzer
CS4131
Spring 2005


Due Date: Friday, March 18, 2005 at 5pm

Purpose

This project is intended to give you experience building a context-sensitive analyzer. You will use the flex scanner generator and the bison parser generator system. You will get experience manipulating a programming language grammar and building an abstract syntax tree and performing context-sensitive analysis. Read this document and the accompanying Nolife documents completely before embarking on this adventure.

Project Summary

Your task is to construct a context-sensitive analyzer that accepts the Nolife programming language. A parser will obtain input by calling a lexical analyzer (scanner) on a token-by-token basis and build an abstract syntax tree of the program. Your parser will construct an abstract syntax tree (AST) to represent the program structure. After the AST is constructed, another pass should be made over the AST to perform type checking. The output of your program will be a listing of the abstract syntax tree for the input program or a list of error messages.

The Scanner and Parser

You are to construct a scanner using the flex scanner generator system and a parser using the bison parser generator system with the grammar provided. You should proceed in three distinct steps.

  1. Read all of the documentation on flex and bison provided on the CS4131 web page.

  2. Locate the code for the scanner in /classes/cs4131/common/nolife.l. Feed the description to flex, compile the results, and familiarize yourself with its driver and actions. You should write a driver and test the scanner before going further.

  3. The grammar provided in the Nolife language definition has been converted to a form suitable for bison and can be found in /classes/cs4131/common/nolife.y. Make sure that you use the -d option when running bison on its input file nolife.y. This creates the file nolife.tab.h which is included by the scanner and parser.

  4. Extend the simple parser of part (2) by adding actions to be performed on the various reductions. The code in these actions will construct the abstract syntax tree. You will want to use bison's pseudo-variables $$, $1, $2, ..., $i to communicate values between the different reductions.

    The form of the abstract syntax tree is specified in a document titled "An Abstract Syntax Tree for Nolife". That document also specifies the format for listing the tree.

Context-Sensitive Analysis

To compile a Nolife program requires a large amount of knowledge that cannot be detected in a context-free parse. This suggests a compiler design that includes a separate context-sensitive analyzer. The analyzer should perform the following tasks:

  1. assigning a type to each expression and subexpression,
  2. finding any context-sensitive errors in the program,

These tasks can be performed in two separate passes over the tree, or they can be compressed into a single tree-walking pass. Of course, since each pass requires use of an accurate block-structured symbol table, a single-pass implementation will be more efficient.

The next two subsections describe these tasks in more detail.

Mixed Type Expressions

Figure 1: conversion tables for mixed mode expressions
\begin{figure}\begin{tabular}{\vert l \vert c \vert c \vert}
\hline
\multicolumn...
...ror}& int&int \\ \hline
\end{tabular}\par\rule{\textwidth}{0.005in}
\end{figure}

Nolife supports three basic data types: integer, floating point, and character. Each expression and subexpression has a type that can be determined at compile time. Your lab should determine (1) the type of each subexpression, (2) where coercions must be inserted, and (3) where invalid type combinations exist.

Figure 1 gives type conversion tables for several of the Nolife operators.

  1. The type of a subscripted name is wholly determined by the array's type declaration -- it is independent of the type of the subscript expression.
  2. Similarly, the type of a function call is determined by the function's definition rather than by the types of any actual parameters at the call site.
  3. Assignment uses an idiosyncratic and asymmetric rule. For a left hand side of type integer or float, the right hand side is converted to the type of the left hand side. If the left hand side is of type character, the right hand side must have type character.
  4. The result of a NOT always has type integer. The operand of NOT must be of type integer.
  5. Relational operators always produce an integer. Comparisons between characters and numbers make no sense; they are illegal. Comparisons between integers and floats produce integer results. To perform the comparison, the integer is converted to a float.

Your lab will need to insert the appropriate conversions and report any expressions that would require an illegal coercion.

Context-Sensitive Errors

Your lab should detect the following context-sensitive errors and report them to the user on stderr.

  1. a variable is referenced but not declared
  2. a variable is declared multiple times in a single scope
  3. a variable is declared, but never referenced
  4. any type mismatch (illegal mixed type expression)
  5. any mismatch between actual parameters at a call site and the definition of the called procedure
  6. a constant-valued subscript that is outside the declared bounds of an array
  7. incorrect number of dimensions in a variable reference
  8. a procedure call that invokes a function (incorrectly discarding the return value)
  9. a function call that invokes a procedure (incorrectly using a non-existent return value)
  10. a return statement whose type does not match the function definition
  11. a return statement in the main program
This list should not be considered exhaustive. Extra credit will be given for any extra errors detected. Submit with your lab a file containing the error and a README file explaining the type of additional errors you catch. Points will be given on a subjective evaluation of significance.

Nolife Specification

Figure 2: Operator precedences in Nolife
\begin{figure}\rule{\textwidth}{0.005in}
\begin{center}
\begin{tabular}{c \vert ...
...\
AND, OR&1\\
\end{tabular}\end{center}\rule{\textwidth}{0.005in}
\end{figure}

The syntax of Nolife is specified in a document titled "Nolife: A Language For Practice Implementation". The grammar given in that document will need to be massaged to create one that is acceptable for input to bison. The following additional information may be of use.

  1. All the operators in Nolife should be left associative.
  2. Operator precedences in Nolife are specified in Figure 1. Multiplication has the highest priority, AND and OR have the lowest.
  3. The scope of a name is the region of the program in which the name can be used. Variables declared in a subprogram are only visible within that subprogram. They obscure an identically named variable in the surrounding scope. The scopes of distinct subprograms are disjoint - a name declared in one subprogram is not visible inside another subprogram.

    The scope defined by the main program is called the global scope. Names declared in the global scope are accessible from the point of declaration to the end of the program, including the body of the main program and any subprograms that do not redeclare the same name. All variables declared in the main program are in the global scope.

  4. Function names are in the global scope. This has two important implications. First, a function can be called from any other function, provided that the calling function has not redefined that name. (This is true, even if the function being called appears later in the source text.) Second, any function can be called recursively.
  5. The lifetime of a variable is limited to the execution of the procedure (main program or function) that defines the scope in which it is declared. Thus, the value of a variable x declared in function f ceases to exist after the function returns.

Requirements

Write all of your code in C or C++. It will be tested on wopr and MUST work there. You will receive no special consideration for programs which "work" elsewhere, but not on wopr.

Your code should be well-documented. You will need to submit a brief report of what works, main difficulties you encountered, and any other relevant information that you consider will help in grading your work. You will submit all of your files, INCLUDING a makefile to build your compiler. I will just type "make" and expect everything to "come out right." (Come out right in this context means I can type "nc $<$ foo.nl" and I will get the AST of foo.nl dumped to stdout.)

Your program will be graded on whether it produces a correct AST for all correct programs and whether it catches the required context-sensitive errors. Extra credit will be given for mnemonic parse error messages. If you modify the AST definition given in the document "An Abstract Syntax Tree for Nolife", you must document your changes and submit that document in a file called "ASTChanges".


Soner Onder