What do you mean by input buffering? How is input buffering implemented? What problem can arise implementing input buffering? Give suitable example. What is sentinel? What is its use?
The Lexical Analyzer scans the characters of the source program one at a time to discover tokens. Because of large amount of time can be consumed scanning characters, specialized buffering techniques have been developed to reduce the amount of overhead required to process an input character.
Buffering techniques:
Often, however, many characters beyond the next token many have to be examined before the next token itself can be determined. For this and other reasons, it is desirable for the lexical analyzer to read its input from an input buffer. Hence we introduce a two-buffer scheme that handles large lookaheads safely. We then consider an improvement involving "sentinels" that saves time checking for the ends of buffers.
Buffer Pairs:
Each buffer is of the same size N, and N is usually the size of a disk block, e.g., 4096 bytes.
if forward at end of first half then begin reload second half; forward := forward + 1 end else if forward at end of second half then begin reload second half; move forward to beginning of first half end else forward := forward + 1; end
Code to advance forward pointer
Problem implementing input buffering:
Advancing forward requires that we first test whether we have reached the end of one of the buffers, and if so, we must reload the other buffer from the input, and move forward to the beginning of the newly loaded buffer. As long as we never need to look so far ahead of the actual lexeme that the sum of the lexeme's length plus the distance we look ahead is greater than N, we shall never overwrite the lexeme in its buffer before determining it.
For example, in a PL/I program we may see: DECALRE (ARG1, ARG2… ARG n) without knowing whether DECLARE is a keyword or an array name until we see the character that follows the right parenthesis. In either case, the token itself ends at the second E. If the lookahead pointer travels beyond the buffer half in which it began, the other half must be loaded with the next characters from the source file. Since the buffer shown in above figure is of limited size there is an implied constraint on how much look ahead can be used before the next token is discovered. In the above example, if the look ahead traveled to the left half and all the way through the left half to the middle, we could not reload the right half, because we would lose characters that had not yet been grouped into tokens. While we can make the buffer larger if we chose or use another buffering scheme, we cannot ignore the fact that overhead is limited.
Sentinels:
forward : = forward + 1; if forward ↑ = eof then begin if forward at end of first half then begin reload second half; forward := forward + 1 end else if forward at end of second half then begin reload first half; move forward to beginning of first half end else /* eof within a buffer signifying end of input */ terminate lexical analysis end
Lookahead code with sentinels
In the previous scheme, we must check each time the move forward pointer that has not moved off one half of the buffer. If it is done, then must reload the other half.
– Therefore the ends of the buffer halves require two tests for each advance of the forward pointer.
– This can reduce the two tests to one if it is extend each buffer half to hold a sentinel character at the end.
Write short notes on:
Draw the syntax tree and DAG from the following expression.
if x>0 then x=3*(y+1) else y=y+1
Write a C program to print all prime no.(1-100).
Discuss Software Maintenance