Less noise, more data. Get the biggest data report on software developer careers in South Africa.

Dev Report mobile

How we implemented a CV parser to improve OfferZen’s candidate journey

3 February 2022, by Rebecca Crompton

Creating a new user profile on a platform can be quite tedious, especially if it requires to re-enter your entire resume every time. To make this easier for the developers wanting to find a job on OfferZen, we needed to implement an off-the-shelf CV parser that would meet our data security requirements, support a wide range of document formats, and integrate into our Ruby tech stack without additional technical complexity. Here's how we did that and what we learned from the process.

CV parser for candidate journey

OfferZen is a developer job marketplace where companies reach out to candidates with job opportunities.

We know that creating a profile on any job platform is incredibly time consuming, especially with increased years of experience. We also know that a clear profile plays a major role in receiving great job offers. That's why we needed a solution that would allow developers to build their profiles faster while keeping the profile quality high.

Early in OfferZen’s growth, we had implemented an internal parser for extracting profile information from developers’ LinkedIn CVs. We later removed this feature because Linkedin started implementing encoding measures on their PDFs that made their CVs extremely difficult to parse. As OfferZen grew and candidate volume increased, we found ourselves needing a more general CV parsing solution to create a smoother candidate journey.

Buying a CV parser instead of building

Building a CV parser is challenging, because it’s difficult to reliably extract information from CVs given the varying formats and structures of each CV. This results in complex rules to account for the variations. At the same time, there are companies that parse CVs as their entire business model and use sophisticated technologies like machine learning to accurately extract information.

We recognised that an external CV parsing service would likely do a significantly better job than we ever could, given the time and resources we have to develop and maintain a feature like this. We needed our CV parser to extract information from a variety of file formats and provide accurate results in the profile sections that are the most time-consuming for candidates to create. This would make the candidate journey a lot smoother.

The decision to buy instead of build a CV parsing solution sparked a particularly close collaboration process between the designers and developers within the Product Team. This is because we were limited by what the external CV parser could do and therefore needed to carry out the technical discovery process in parallel with the design discovery process. The technical discovery findings would then guide the design discovery process as we learnt what features we would have available.

From all the external CV parsing options we investigated, we chose the parser that allowed for a clean technical integration and exceeded our expectations for what the parser needed to achieve. Here's how we came to that decision, implemented it and what we learnt in the process.

The discovery process

Design discovery

The design discovery process started by engaging with the feedback we’d received from candidates about their goals and pains while setting up their OfferZen profiles. We also considered our talent advisors’ feedback about blockers and challenges they faced in supporting candidates through this process. With this feedback, we defined what we wanted the CV parser to achieve for our users and for our business:

  • To reduce the time it takes for candidates to fill out their work history and education
  • To be able to parse various file formats.

With our goals set, we continued the design discovery process by:

  • Researching common UX patterns and principles for CV parsers
  • Testing and analysing these solutions
  • Documenting our learnings and applying them to create our first set of design options for the CV parsing feature.

Technical Discovery

With the basic idea of what we wanted to achieve defined in the design discovery stage, we carried out thorough technical research on a number of different external CV parsers. We wanted to find a solution that could accurately extract the information we required without too much added technical complexity.

When assessing our options, we needed to keep five considerations in mind:

API integration and response time: We needed a service that we could seamlessly integrate into our existing Ruby tech stack.

We also considered the time it would take for the API to respond to a CV parsing request from our application. The CV parsing service’s response time affects the complexity of the integration, the overall performance of our application as well as the user experience.

An API with real-time capabilities with the ability to return a successful response from a single request would be technically less complex to implement than an API that needed multiple requests to process a single CV. This would also directly impact our design team as they would need to account for the delay that users may experience in their designs to ensure that candidates had a good and engaging experience with the feature.

Data security:

We needed a service that would protect our candidates’ data during the CV parsing process. To ensure a high level of security, we focused on how the external CV parsers would process data.

Some services could do all CV parsing operations in-memory, while others needed to store data. If the external service did store data, we looked at the purpose they had for doing so, and whether or not any data would be shared.

This was vital because OfferZen is ultimately responsible for keeping our candidates’ data safe. Given OfferZen’s expansion into the EU, we also needed to ensure the data processing would comply with all major international privacy standards.

Document type support:

We wanted to investigate which CV formats the CV parsers could process. If we could find a solution that supported multiple document types, it would give more flexibility to candidates to use whichever CV format they had on hand, without the added effort of having to convert the document to a supported type. Additionally, we wanted to know if the parsers could support CVs from other platforms, such as LinkedIn since they are an easy way for a candidate to access an up-to-date CV.

Results and quality:

The CV parser needed to be able to extract information that was relevant to OfferZen’s candidate profiles, such as work histories and education. Accuracy was crucial, as we did not want for candidates to have to put in extra work to fix data that isn’t correct. We requested access to demo accounts to test results with real CVs given to us by OfferZen team members. We also needed to compare results across different document formats to ensure the quality of the results was consistently high.

As we researched and tested different services, we were able to find even more comparison points:

  • Language detection and flexibility,
  • Automated candidate summaries included in the parsing results,
  • Standard skills extraction and
  • API request configurability

Our research was documented on a Miro board to allow for asynchronous feedback between the designers and developers within the team. This early sharing and feedback sparked various conversations that allowed us to shape both the design and technical discovery processes and choose the best CV parser for our needs, quickly.

After substantial research and testing, we chose a CV parser that could give us an instant response to a request and does all parsing operations in-memory so candidate data is never written to a file system or database. Our chosen parser also used AWS as their cloud provider and guaranteed appropriate uptime. Importantly, it also extracted high-quality information relevant to candidate profiles.

Implementing and releasing the CV parser

Releasing in production

Once the discovery process was completed, we spent time mapping out possible options for the architectural design for the CV parsing feature on our Miro board. It was important to be thorough in identifying all elements of the implementation, as this would set us up to build and release the feature into production in small incremental pieces.

These elements included:

  • The database schema,
  • The ideal flow of requests between our frontend and backend applications and the external API call to the CV parsing service,
  • How we would process all the function calls to avoid any timeouts, and
  • How and where we extract and format the response from the CV parsing service to use in the candidate journey

Having a clear and documented plan allowed us to identify dependencies and prioritise tasks more efficiently. The ‘continuous deployment’ approach allowed us to reduce work in progress and made code reviewing easier and more thorough. It also ensured we had lower risk releases and could deliver value sooner.

Benefits of testing in production

Continuously shipping our changes to production meant that the url for the feature was live but only accessible to candidates via a feature flag.

This allowed both the designers and developers within the team to carry out testing in a production environment in parallel to the implementing the CV parser. It meant that the designers could carry out user testing in a production-like environment to get early feedback from candidates and make incremental improvements. This was incredibly valuable in helping us implement the best possible feature to improve the candidate journey.

From a technical perspective, testing in a production environment allows you to see ‘real’ results that you wouldn't otherwise be able to see in a testing environment. This is because the data you have in a test environment doesn't always best describe real production data.

Testing the external CV parsing service in a production environment gave us more confidence in how the feature would perform and be experienced by a candidate.

We were able to discover and fix implementation and performance bugs early on. Identifying and sharing common errors from the service meant the designers could adapt designs and resolve user-facing errors.

Our user testing was also performed in a production-like environment since we had access to the live url. This meant that we could test the feature in a way that was ‘real’ to the users too, resulting in more reliable feedback and results.

Benefits of feedback loops and close collaboration

A key part of the success of the CV parsing solution was the collaboration between the designers and developers. We wanted to redefine the solution as we moved forward in the discovery, implementation and release processes. This was only made possible because of the short and continuous feedback loops that were used throughout the process.

Using Miro as a collaboration tool helped us ensure that all research was visible and shared openly from the start. This meant that we could easily and asynchronously ask questions about the designs and implementation early on. The early feedback helped us adapt and shape the discovery process as we went. Having a central place to share research also allowed other key members in the Product team and wider organisation to track progress and give additional feedback.

How CV parsing added value to developers seeking jobs

Initially, the primary goal for the CV parsing solution was only to parse the sections that took the most time for candidates to fill in, namely education and work history. As we discovered what our chosen CV parser could do, we adapted the designs to make better use of the additional functionality and improve candidate experience even further. The CV parser was also able to accurately derive candidate skills based on work history, positions and listed tech stacks. This meant that we could use these extracted skills to better fill in profiles.

After the release of the CV parser, our metrics showed increased adoption and quicker profile building times. Anecdotal feedback from our internal users in our team also indicated that candidates were coming in with fuller and better quality profiles, which allowed more time to be spent on other parts of their profiles.

We know that we still have a way to go in building the best candidate journey on the OfferZen platform and this feature was only just the start. The close collaboration and tight feedback loops within the Product Team meant that we shaped, and are still shaping, the solution together. This is what ultimately resulted in a better final product and happier users.


Rebecca Crompton is a Software Developer at OfferZen. She primarily works on backend development in the Marketplace squad.

Thanks to Rebecca's teammates Elena Aiello and Lydia Dodge who contributed to this article.

Recent posts

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.