SlideShare a Scribd company logo
1 of 12
• Lex -- a Lexical Analyzer Generator (by
M.E. Lesk and Eric. Schmidt)
– Given tokens specified as regular expressions,
Lex automatically generates a routine (yylex)
that recognizes the tokens (and performs
corresponding actions).
• Lex source program
{definition}
%%
{rules}
%%
{user subroutines}
Rules: <regular expression> <action>
Each regular expression specifies a token.
Default action for anything that is not matched: copy to the
output
Action: C/C++ code fragment specifying what to do when
a token is recognized.
• lex program examples: ex1.l and ex2.l
– ‘lex ex1.l’ produces the lex.yy.c file that
contains a routine yylex().
• The int yylex() routine is the scanner that finds all
the regular expressions specified.
– yylex() returns a non-zero value (usually token id)
normally.
– yylex() returns 0 when end of file is reached.
– Need a drive to test the routine. Main.cpp is an
example.
• You need to have a yywrap() function in the lex file
(return 1), see the function in ex1.l.
– Something to do with compiling multiple files.
• Lex regular expression: contains text
characters and operators.
– Letters of alphabet and digits are always text
characters.
• Regular expression integer matches the string
“integer”
– Operators: “[]^-?.*+|()$/{}%<>
• When these characters happen in a regular
expression, they have special meanings
– Lex regular expressions cannot have space in
them!!!
– operators (characters that have special meanings):
“[]^-?.*+|()$/{}%<>
• ‘*’, ‘+’, ‘|’, ‘(‘,’)’ -- used in regular expression
• ‘ “ ‘ -- any character in between quote is a text character.
– E.g.: “xyz++” == xyz”++”
• ‘’ -- escape character,
– To get the operators back: “xyz++” == ??
– To specify special characters: 40 == “ “
• ‘[‘ and ‘]’ -- used to specify a set of characters
– e.g: [a-z], [a-zA-Z],
– Every character in it except ^, - and  is a text character
– [-+0-9], [40-176]
• ‘^’ -- not, used as the first character after the left bracket
– E.g [^abc] – any character except ‘a’, ‘b’ or ‘c’.
– [^a-zA-Z] -- ??
– operators (characters that have special
meanings): “[]^-?.*+|()$/{}%<>
• ‘.’ -- every character
• ‘?’ -- optional ab?c matches ‘ac’ or ‘abc’
• ‘/’ -- used in character lookahead:
– e.g. ab/cd -- matches ab only if it is followed by cd
• ‘{‘’}’ -- enclose a regular definition
• ‘%’ -- has special meaning in lex
• ‘$’ -- match the end of a line, ‘^’ -- match the
beginning of a line
– ab$ == ab/n
• ‘<‘ ‘>’: start condition (more context sensitivity
support, see the paper for details).
– Order of pattern matching:
• Always matches the longest pattern.
• When multiple patterns matches, use the first pattern.
– To override, add “REJECT” in the action.
...
%%
Ab {printf(“rule 1n”);}
Abc {printf(“rule 2n”);}
{letter}{letter|digit}* {printf(“rule 3n”);}
%%
Input: Abc
What happened when at ‘.*’ as a pattern?
Should regular expressions for reserved words happen before or
after the regular expression for identifier?
– Manipulate the lexeme and/or the input stream:
• yytext -- a char pointer pointing to the matched C string
• yyleng -- the length of the matched string
• I/O routines to manipulate the input stream:
– yyinput() -- get a character from the input character, return <=0
when reaching the end of the input stream, the character
otherwise
» yyinput() for c++, input() for c.
– unput( c ) -- put c back onto the input stream
– Deal with comments: (/* ….. */
» Why is pattern “/*”.*”*/” a problem?
%%
…
“/*” {char c1;
c2 = yyinput();
if (c2 <=0) {lex_error(“unfinished comment” …}
else { c1 = c2; c2 = yyinput();
while (((c1!=‘*’) || (c2 != ‘/’)) && (c2 > 0)) {c1 = c2; c2 = yyinput();}
if (c2 <= 0) {lex_error( ….)
}
– Reporting errors:
• What kind of errors? Not too many.
– Characters that cannot lead to a token
– unended comments (can we do it in later phases?)
– unended string constants.
– Reporting errors:
• How to keep track of current position (which line, which
column)?
– Use to global variable for this: yyline, yycolumn
%{
int yyline = 1, yycolumn = 1;
%}
...
%%
“n” {yyline++; yycolumn = 0;}
[ t]+ {/* do nothing*/ yycolumn += yyleng;}
If {yycolumn += yyleng; return (IFNumber);}
“+” {yycolumn += 1; return (PLUSNumber);}
{letter}{letter|digit}* {yylval = idtable_insert(yytext); yycolumn += yyleng;
return(IDNumber);}
...
%%
• Dealing with identifiers, string constants.
– Data structures:
• Put the lexeme in a string table: e.g. vector of C strings.
• See token.l
– Recognizing constant strings with special characters
• Assuming string cannot pass line boundary.
• Use yymore()
“[^”n]* {char c;
c = yyinput();
if (c != ‘”’) error
else if (yytext[yyleng-1] == ‘’) {
unput( c ); yymore();
} else {/* find the whole string, normal process*/}
Put it all together
• Checkout token.l and main.cpp in lex2.tar

More Related Content

Similar to Generate Lexical Analyzers with Lex

Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Leonardo Borges
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)Sami Said
 
Compiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationCompiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationEelco Visser
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Programming in C Basics
Programming in C BasicsProgramming in C Basics
Programming in C BasicsBharat Kalia
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptNderituGichuki1
 
Denis Lebedev, Swift
Denis  Lebedev, SwiftDenis  Lebedev, Swift
Denis Lebedev, SwiftYandex
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnMoriyoshi Koizumi
 
Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc pptpssraikar
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]RootedCON
 
Python language data types
Python language data typesPython language data types
Python language data typesHarry Potter
 
Python language data types
Python language data typesPython language data types
Python language data typesHoang Nguyen
 
Python language data types
Python language data typesPython language data types
Python language data typesLuis Goldster
 

Similar to Generate Lexical Analyzers with Lex (20)

Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012Continuation Passing Style and Macros in Clojure - Jan 2012
Continuation Passing Style and Macros in Clojure - Jan 2012
 
Lex (lexical analyzer)
Lex (lexical analyzer)Lex (lexical analyzer)
Lex (lexical analyzer)
 
7645347.ppt
7645347.ppt7645347.ppt
7645347.ppt
 
Compiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code GenerationCompiler Construction | Lecture 13 | Code Generation
Compiler Construction | Lecture 13 | Code Generation
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Should i Go there
Should i Go thereShould i Go there
Should i Go there
 
Cbasic
CbasicCbasic
Cbasic
 
Programming in C Basics
Programming in C BasicsProgramming in C Basics
Programming in C Basics
 
Lecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.pptLecture 1 - Lexical Analysis.ppt
Lecture 1 - Lexical Analysis.ppt
 
Denis Lebedev, Swift
Denis  Lebedev, SwiftDenis  Lebedev, Swift
Denis Lebedev, Swift
 
Hacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 AutumnHacking Go Compiler Internals / GoCon 2014 Autumn
Hacking Go Compiler Internals / GoCon 2014 Autumn
 
LEX & YACC TOOL
LEX & YACC TOOLLEX & YACC TOOL
LEX & YACC TOOL
 
Lexical
LexicalLexical
Lexical
 
Lex and Yacc ppt
Lex and Yacc pptLex and Yacc ppt
Lex and Yacc ppt
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Lexicalanalyzer
LexicalanalyzerLexicalanalyzer
Lexicalanalyzer
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 
Python language data types
Python language data typesPython language data types
Python language data types
 

Recently uploaded

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Recently uploaded (20)

UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

Generate Lexical Analyzers with Lex

  • 1. • Lex -- a Lexical Analyzer Generator (by M.E. Lesk and Eric. Schmidt) – Given tokens specified as regular expressions, Lex automatically generates a routine (yylex) that recognizes the tokens (and performs corresponding actions).
  • 2. • Lex source program {definition} %% {rules} %% {user subroutines} Rules: <regular expression> <action> Each regular expression specifies a token. Default action for anything that is not matched: copy to the output Action: C/C++ code fragment specifying what to do when a token is recognized.
  • 3. • lex program examples: ex1.l and ex2.l – ‘lex ex1.l’ produces the lex.yy.c file that contains a routine yylex(). • The int yylex() routine is the scanner that finds all the regular expressions specified. – yylex() returns a non-zero value (usually token id) normally. – yylex() returns 0 when end of file is reached. – Need a drive to test the routine. Main.cpp is an example. • You need to have a yywrap() function in the lex file (return 1), see the function in ex1.l. – Something to do with compiling multiple files.
  • 4. • Lex regular expression: contains text characters and operators. – Letters of alphabet and digits are always text characters. • Regular expression integer matches the string “integer” – Operators: “[]^-?.*+|()$/{}%<> • When these characters happen in a regular expression, they have special meanings – Lex regular expressions cannot have space in them!!!
  • 5. – operators (characters that have special meanings): “[]^-?.*+|()$/{}%<> • ‘*’, ‘+’, ‘|’, ‘(‘,’)’ -- used in regular expression • ‘ “ ‘ -- any character in between quote is a text character. – E.g.: “xyz++” == xyz”++” • ‘’ -- escape character, – To get the operators back: “xyz++” == ?? – To specify special characters: 40 == “ “ • ‘[‘ and ‘]’ -- used to specify a set of characters – e.g: [a-z], [a-zA-Z], – Every character in it except ^, - and is a text character – [-+0-9], [40-176] • ‘^’ -- not, used as the first character after the left bracket – E.g [^abc] – any character except ‘a’, ‘b’ or ‘c’. – [^a-zA-Z] -- ??
  • 6. – operators (characters that have special meanings): “[]^-?.*+|()$/{}%<> • ‘.’ -- every character • ‘?’ -- optional ab?c matches ‘ac’ or ‘abc’ • ‘/’ -- used in character lookahead: – e.g. ab/cd -- matches ab only if it is followed by cd • ‘{‘’}’ -- enclose a regular definition • ‘%’ -- has special meaning in lex • ‘$’ -- match the end of a line, ‘^’ -- match the beginning of a line – ab$ == ab/n • ‘<‘ ‘>’: start condition (more context sensitivity support, see the paper for details).
  • 7. – Order of pattern matching: • Always matches the longest pattern. • When multiple patterns matches, use the first pattern. – To override, add “REJECT” in the action. ... %% Ab {printf(“rule 1n”);} Abc {printf(“rule 2n”);} {letter}{letter|digit}* {printf(“rule 3n”);} %% Input: Abc What happened when at ‘.*’ as a pattern? Should regular expressions for reserved words happen before or after the regular expression for identifier?
  • 8. – Manipulate the lexeme and/or the input stream: • yytext -- a char pointer pointing to the matched C string • yyleng -- the length of the matched string • I/O routines to manipulate the input stream: – yyinput() -- get a character from the input character, return <=0 when reaching the end of the input stream, the character otherwise » yyinput() for c++, input() for c. – unput( c ) -- put c back onto the input stream – Deal with comments: (/* ….. */ » Why is pattern “/*”.*”*/” a problem? %% … “/*” {char c1; c2 = yyinput(); if (c2 <=0) {lex_error(“unfinished comment” …} else { c1 = c2; c2 = yyinput(); while (((c1!=‘*’) || (c2 != ‘/’)) && (c2 > 0)) {c1 = c2; c2 = yyinput();} if (c2 <= 0) {lex_error( ….) }
  • 9. – Reporting errors: • What kind of errors? Not too many. – Characters that cannot lead to a token – unended comments (can we do it in later phases?) – unended string constants.
  • 10. – Reporting errors: • How to keep track of current position (which line, which column)? – Use to global variable for this: yyline, yycolumn %{ int yyline = 1, yycolumn = 1; %} ... %% “n” {yyline++; yycolumn = 0;} [ t]+ {/* do nothing*/ yycolumn += yyleng;} If {yycolumn += yyleng; return (IFNumber);} “+” {yycolumn += 1; return (PLUSNumber);} {letter}{letter|digit}* {yylval = idtable_insert(yytext); yycolumn += yyleng; return(IDNumber);} ... %%
  • 11. • Dealing with identifiers, string constants. – Data structures: • Put the lexeme in a string table: e.g. vector of C strings. • See token.l – Recognizing constant strings with special characters • Assuming string cannot pass line boundary. • Use yymore() “[^”n]* {char c; c = yyinput(); if (c != ‘”’) error else if (yytext[yyleng-1] == ‘’) { unput( c ); yymore(); } else {/* find the whole string, normal process*/}
  • 12. Put it all together • Checkout token.l and main.cpp in lex2.tar