Why every researcher should use github

Gurudev Ilangovan

July 26, 2018


Why I am writing this.

Having been in the masters program in industrial engineering and computer science while doing research, I have witnessed a great deal of disconnect between the tools computer scientists use and how researchers in others fields operate. And of all the tools, one of the most powerful and useful tools is git/github. I never used it before getting into the computer science program and now I use it all the time and I find it so incredibly useful. So, I just wanted to write this post for every non-computer science researcher out there and how they can use github to better organize their research.

Why git? And what is github?

This short video explains it best.

Myths

Github is just Google Drive for code

While it is true that you can make use of the full extent of github’s capabilities if you write code, github can still be handy even if you are never going to write a line of code. In essence, it is a version control tool and Google drive is not (though it has that capability). And any project can benefit from version control. If you do happen to write code however, you absolutely should use github.

You need to be able to code to use these things

Nope. You don’t have to write a single line of code if you don’t want to as github has its own GUI now. Using the command line however, is in fact, way easier and is one of the many pluses of github. I use primarily 3 github commands 95% of the time and I probably know all of 10 out of the innumerable ones.

Advantages

Project snapshots

You can create a snapshot of your project at every milestone. While you can still do that manually (making copies of your folder), the process quickly goes out of hand. Have you ever started a project and ended up with 3 or more copies and utterly confused? Google drive might also give you the ability to create snapshots of files and tag them but for a project as a whole, there is no such ability. And keeping snapshots at every major change gives you the freedom to make new changes confidently for you know you can rollback if you mess something up.

Pay it forward

Github is public by default. Unless the work you do has data that is confidential, this is a huge advantage. Whatever problem you solve, you can help others by offering your solutions to people with similar problems. This is a great thing to do.

Project management

While it is system for version control, github also comes with tools that make project management much more tractable. See this short video

A project page or wiki.

A wiki is basically a geeky way of referring to a project page where you can write documentation for yourself or people who might find it useful. And if someone asks you what project you’re working on, you have a link. Click here to know more

Collaboration

This is a biggie. Most of the projects are collaborative efforts and this is where version control really shines (particularly where there is a lot of text or code involved). Basically, you don’t need to have copies of the same text file you are editing or backups before you manually decide to copy and paste and merge content. Git does all of that for you (and if it can’t, helps you do it). If two people are working on a project, git is useful. If five people are, it is indispensable.

A portfolio of your research

Your github profile acts as a portfolio and evidence for all the work that you do. Saying you are proficient in matlab (you probably should move to python or R) is one thing. But showing 1000 lines of code on github is completely different. It gives people a feel for the kind of work you’re into.

Branching, merging and pull requests

Very useful but a little advanced. Check Git out if you are curious.

Command line? No problem!

Finally, if you ever have to run your analysis in the command line on a bigger system, chances are that Google drive will be inaccessible but github will be accessible through the git command line.

Some guidelines

There are innumerable tutorials online. This post is just for saying you absolutely should use github for your projects. But when you start, it’s sufficient if you just know these commands:

  • git init
  • git add (particularly with -A)
  • git commit -a -m “why you’re tagging this version”
  • git push origin master
  • git pull
  • git clone

So when you learn, try to keep an eye out for them. If you don’t understand the other commands right away, don’t worry. I still don’t know many commands but I still use git all the time.

Cheers!

comments powered by Disqus