Lexical Analyzer

Software Engineering

Cisco ISE

Lexical Analyzer

Overview

This document details the implementation of a basic lexical analyzer for a simple custom programming language. The lexical analyzer is implemented in Python and is designed to tokenize source code into recognizable elements.

Custom Language Definition

The custom programming language handled by this lexical analyzer includes:

Numeric Literals: Both integers and floating-point numbers.
Identifiers: Variable names that are alphanumeric strings starting with a letter.
Arithmetic Operators: +, -, *, /.
Assignment Operator: The = symbol.
String Literals: Text enclosed in double quotes.
Comments: Lines starting with #.
Parentheses: For grouping expressions.
Control Structures: Basic if statements.

‍

Lexical Analyzer Implementation

Token Specification

Each token is defined with a regular expression pattern:

NUMBER: Matches integers and floating-point numbers (\d+(\.\d+)?).
ASSIGN: Matches the assignment operator = (=).
OPERATOR: Matches arithmetic operators ([+\-*/]).
STRING: Matches string literals ("[^"]*").
IF: Matches the if keyword (\bif\b).
IDENTIFIER: Matches identifiers ([A-Za-z]\w*).
COMMENT: Matches comments (#.*).
PAREN: Matches parentheses ([\(\)]).
SKIP: Ignores spaces and tabs ([ \t]).
NEWLINE: Matches newline characters (\n).
MISMATCH: Catches any other character (.).

Tokenization Function

The tokenize function breaks a string of source code into tokens based on the defined patterns.

def tokenize(code):
# Tokenization logic...

Utility Function: get_token_regex

Combines individual token regexes into a single pattern for matching.

def get_token_regex():
# Combining regex logic...

Sample Programs and Testing

The test_lexer function demonstrates the lexer's functionality using sample code snippets:

def test_lexer():
# Testing logic...

Running the Lexer

To run the lexer and see the tokens for the sample programs, execute:

test_lexer()

Error Handling

The lexer raises a RuntimeError for any unexpected or mismatched characters during tokenization.

Repo

My projects

Recent work

C++ Stock Market Program

Software Engineering

Database System Programmed in C

Software Engineering

Contact Me

Lets Work Together

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Stay in touch

Ready to Talk

Feel free to contact me