Better Scientific Software with Helen Kershaw
Perhaps the rapper Childish Gambino summed it up best, “Because the internet, mistakes are forever.” In an age where everything online is seemingly stored permanently for everyone to view, it can be daunting for budding software developers to post code into an open-source repository for review.
Helen Kershaw, NSF NCAR senior software engineer who develops the Data Assimilation Research Testbed (DART), noticed fairly early on in her career that there can be a certain level of apprehension when software developers contribute code to an open-source project. “I think there are two connected difficulties that people have with the code review process. First is showing their code. People will explain a scientific idea with a sketch on a whiteboard, or a napkin no problem. But when it comes to code, there is a real tendency to keep it hidden,” said Kershaw.
Code is often shared on open-source repository websites such as Github.com. Github’s mechanics include creating issues that describe why new code is necessary, pull requests that inquire whether the new code can be merged into preexisting code, and critiques that include comments and suggestions that should be responded to before the new code is merged into an existing code repository.
Kershaw continued, “You might have heard people say, ‘oh I need to polish this before I show it to you.’ There is some psychological effect behind this, and it would be great to change this and get people showing even pseudo code to each other. Second, people will often put a lot of time and effort in before sharing their work, and submitting a pull request is when they are ‘done’. This leads to a situation where the pull request is the end of work for the contributor, but the beginning for the reviewer, so you can instantly end up in a situation that feels like a conflict.”
To help address these issues, Kershaw was selected as one of six recipients of the 2023 Better Scientific Software (BSSw) Fellowship. The BSSw Fellowship Program aims to, “foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes.” It funds activities that contribute toward the goal of promoting better scientific software by creating an enduring resource available to the community.
Kershaw’s BSSw project created a code review tutorial, viewable at code-review.org, that can be completed by individuals or teams of two. The aim of the tutorial is to help its users become more comfortable with code review by practicing using the git utility and Github. Users are asked to think critically about whether their solutions are suitable to address the problem at hand and are tasked with making sure their pull requests are comprehensible by others reviewing the code.
Before her current role as senior software engineer for DART, Kershaw worked at Brown University’s Center for Computation and Visualization (CCV) where she provided support to university researchers, including both faculty and students. “Working at CCV was a great experience because in a support role people often are coming to you with a problem: they may be quite (very) frustrated already, but they are open to help,” said Kershaw. “So you have to build rapport, team up to work out what problem you are trying to solve, how you’re going to solve it, and what a good solution looks like. With budding software engineers, you want to line up what they are getting out of the project – a skill, an experience – with what you need as a maintainer/supervisor/mentor.”
Kershaw continues her mentorship work at NSF NCAR with the Summer Internships in Parallel Computational Science (SIParCS) program. “Working with students requires a balance between getting quality work from them, and the students finding benefit in this work and progressing their career. And part of making an internship a positive experience is to communicate about what the student wants to get out of the project. What do they want to learn, get experience in, takeaway?” said Kershaw.
The code-review.org tutorial is used to onboard new staff, including interns in the SIParCS program. Starting with a basic level of comfort with the code review process can help interns to grow into their roles and achieve great results over the course of a summer. “Ideally you can build up small successes that can be extended to a larger project. If you have an interesting project that you’re driving the direction of, it is an engaging way to learn,” Kershaw continued.
Although the BSSw fellowship program is sponsored by the U.S. Department of Energy and U.S. National Science Foundation, focusing on open-source code review using Github broadens the potential audience. Said Kershaw, “There is so much code available to browse, read, and try out. And the review process being public and asynchronous: you can take a look at pull requests: How does this work? Why did they do this? What could they have done better? You can be anywhere in the world and get involved.”