Thorough introduction to language parsing in C#, overview of different approaches, and live-coding session that showcases how to build a working JSON parser using Sprache
Alexey Golub - Writing parsers in c# | 3Shape Meetup
1. Writing parsers in C#
(“Projecting arbitrary character streams into C# objects using monadic
parser combinators”)
Speaker: Alexey Golub @Tyrrrz
2. What is a parser?
• To parse — to resolve text into logical syntactic components
• i.e. IEnumerable<T> Parse(IEnumerable<char> text)
• e.g. double.Parse, XDocument.Parse
4. What do parsers do?
• Disambiguate text into domain objects
• Assert that the text is well-formed
123 456,93
numeric literals
thousands separator
decimal separator
numeric literal
5. Formal language theory
• Alphabet – set of allowed characters
• Language – set of words made from characters in alphabet
• Grammar – set of rules that define how words are generated
6. Grammar types
• Regular grammar – RHS of a production rule is a terminal or a
terminal plus non-terminal
• Context-free grammar – RHS of a production rule is a finite sequence
of terminals and/or non-terminals
7. Rules of thumb
• If a language has recursive grammar rules – it’s not regular
• Regular grammar can be represented with regular expressions
• Context-free grammar cannot be directly represented with regular
expressions (in .NET)
8. Syntax trees
• Primary goal of a parser is to break down text into syntactic
components
• Syntactic structure of context-free languages is represented by a
syntax tree
• Program can then further evaluate the syntax tree as required
Root
Terminal
node
Non-terminal
node
Terminal
node
Terminal
node
10. Approaches
• Loop/stack-based manual parsers
• Loop through all characters in the input
• Maintain context on a stack
• Parser generators
• Custom language that defines grammar
• Compiles into code that you can execute
• Parser combinators
• Each parser is a delegate
• Parsers can be combined into higher-order parsers
14. Parser combinators
• Start by building simple parsers
• Combine them into more complex parsers
• Repeat until you reach the root
• Hierarchy of parsers should resemble target syntax tree
15. Parser combinators (illustrated)
10 + 5
NumberParser WhiteSpaceParser SignParser
NumberParser THEN WhiteSpaceParser THEN SignParser THEN WhiteSpaceParser THEN NumberParser
Number (5)Number (10)
PlusOperator
OperatorParser
17. Further reading
• Formal grammar on Wikipedia –
https://en.wikipedia.org/wiki/Formal_grammar
• Parsing in C# by Federico Tomassetti –
https://tomassetti.me/parsing-in-csharp