Description
Assignment 1
Lexer and Recognizer
Author Dr. Nguyen Hua Phung
Contents
1 Specification 2
1.1 Phase 1: Lexer 3
1.2 Phase 2: Recognizer 3
2 Requirements 4
3 Change Log 4
Assignment 1 version 1.0
After completing this assignment, you will be able to
• define formally lexicon of a programming language.
• use ANTLR to implement a lexer for a programming language.
• define formally grammar of a programming language.
• use ANTLR to implement a recognizer for a programming language.
1 Specification
In this assignment, you are required to write a lexer and a recognizer for a program written in BKIT. To complete this assignment, you need to:
• Install Python 3 if you have not installed it yet.
• Download initial.zip and unzip it.
• Download antlr-4.8-complete.jar from https://www.antlr.org/download.html, set the environment variable ANTLR_JAR to this file; install antlr4-python3-runtime (see instructions in section Python Targets of the above webpage).
• Remove all files in folders initial/src/main/bkit/utils, initial/src/main/bkit/astgen, initial/src/main/bkit/checker if any.
• Test the initial code again with just three following tructions:
python run.py gen python run.py test LexerSuite python run.py test ParserSuite
• Change folder initial into assignment1 To complete this assignment, you need to:
• read carefully the specification of language
• Modify BKIT.g4. in the initial code to describe formally BKIT language.Please fill in your id in the header of this file.
• Add more test in LexerSuite and ParserSuite in the initial code.
This assignment is divided two phases: lexer phase and recognizer phase. These phases are assessed independently.
1.1 Phase 1: Lexer
In this phase, you are required to write a lexer for a program written in ANTLR. To complete this phase, you need to:
• Modify BKIT.g4 to detect tokens in BKIT language.
• Make 100 testcases for LexerSuite to test your code.
• For lexical errors, please return the following tokens together with specific lexemes:
– ERROR_CHAR with <unrecognized char> lexeme: when the lexer detects an unrecognized character
– UNCLOSE_STRING with <unclosed string> lexeme: when the lexer detects an unterminated string. The <unclosed string> lexeme does not include the opening quote.
– ILLEGAL_ESCAPE with <wrong string> lexeme: when the lexer detects an illegal escape in string. The wrong string is from the beginning of the string (without the opening quote) to the illegal escape.
– UNTERMINATED_COMMENT without any lexeme: when the detects an unterminated comment.
• You can assume that there is only one error in each test case.
1.2 Phase 2: Recognizer
In this phase, you are required to write a recognizer for a program written in BKIT. To complete this phase, you need to:
• Modify BKIT.g4.
• Make 100 testcases for ParserSuite to test your code.
• You can assume that there is at most one error in each test case.
2 Requirements
Note that you must NOT compress your files when submit them. You MUST submit three files BKIT.g4, LexerSuite.py and ParserSuite.py in BKeL.
3 Change Log




Reviews
There are no reviews yet.