What is CV/Resume Parsing?
Resume Parsing (also known as CV Parsing, Resume Extraction, CV Extraction) is the conversion of a free-form CV/Resume into structured information suitable for storage, reporting and manipulation by a computer.
Throughout the world MS Word format resumes are still the format of choice to describe an individual’s skills, qualifications and experience that make them a suitable candidate for a particular job. This format is easy for us to read and understand, however, to a computer they are just a long sequence of letters, numbers and punctuation. A parser is a program that can analyse this sequence and extract from it the elements of what the writer actually meant to say.
This is a surprisingly difficult task for a computer to do. Although today’s computers can add up millions of numbers in the blink of an eye, or win the world championship at chess, understanding language in all its generality, even as well as a young child, remains but a pipe dream.
Part of the reason for this is that language is almost infinitely varied. There are tens if not hundreds of ways to write down a date, for example, and countless millions of ways to write what you did in your last job. All these different ways of writing the same thing have to be captured by the complex rules and statistical algorithms that are involved in resume parsing, and this requires a huge amount of effort and persistence to encode.
Although the sheer variety of ways of saying the same thing is a challenge for the CV parsing software, an even bigger problem is ambiguity, where the same word or phrase can mean different things in different contexts. For example, "Director" can be a job title in some contexts, or a software package in others. A 4-digit number can be part of a telephone number, a Swiss zip code, a year, a version of a software package, or many other things depending on the words surrounding it. Seeing the term "Project Manager" in a resume may indicate that the writer was indeed a project manager, but not in the context "I reported to the Project Manager". "Meryll Lynch" may be someone's name, but is more likely to refer to a company.
All of these ambiguities have to be resolved by the resume parsing software by looking at the context in which they are used.