Parser

Parser

I have been fascinated by parsers recently. It all began with my desire to create a programming language one day. I have been looking around the Internet to find resources that would teach me how to create a programming language. That's when I learnt about parsers.

I then learned that parsers exist not just for programming language compilers and interpreters but also for different tools that handle text; such as servers and search tools. It was surprising to know that I had been writing tiny parsers without realising that they were parsing text.

So, what is a parser?

The word "parse" is a verb and according to the Oxford Advanced Learner's Dictionary, to parse something means "to divide a sentence into parts and describe the grammar of each word or part".

This is what an average human does when he reads a piece of text. Let me show you an example:

You already know that the above text is an email address (albeit a dummy one). But what made you realise that it is an email address?

The "@" symbol, separating the handle name ("johndoe") and a string which is composed of the domain name ("example") and the TLD ("com") separated by the "." (dot) symbol.

The "@" symbol alone did not tell you that it is an email address, did it?

Because, the @symbol has now become a method of tagging or mentioning or representing usernames across different platforms such as X (Twitter), YouTube, Facebook, WhatsApp, Slack etc. So, your brain already knew that there had to be text on both sides of the "@" symbol.

So, any string that is separated by a "@" is usually treated as an email address. Especially if the right-hand string is further separated by the "." (dot) symbol (dotless domain names are prohibited by ICANN). And there can be only one "@" symbol in an email address.

Hence, what you did or what your brain did was to parse the string: and realise that it was an email address. However, all you had to do to accomplish that was to identify the "@" symbol separating the pieces of text. Your eyes did not scan through each character and check whether it is a "@" symbol, eventually finding the "@" symbol separating the strings. This is what a parser does.

A parser is a program that parses through a text and find its meaning. As mentioned earlier, parsers are components of different programs such as browsers, programming language interpreters and compilers, and web servers.

Then, there is another component that precedes or is part of what parsers do when it comes to some of the above mentioned tools: lexers. I will write about this in my next article.

  1. OALD definition of "parse"
  2. Prohibition on dotless domain names by ICANN