Article Technology

Python for biologists: the code of bioinformatics

The increasing necessity to process big data and develop algorithms in all fields of science mean that programming is becoming an essential skill for scientists, with Python the language of choice for the majority of bioinformaticians.

03 September 2019

The increasing necessity to process big data and develop algorithms in all fields of science mean that programming is becoming an essential skill for scientists, with Python the language of choice for the majority of bioinformaticians.

By Georgie Lorenzen, Science Communications Trainee.

Bioinformatics is still a relatively new field, meaning that biology graduates aren’t necessarily trained in using the programming languages that help us perform data-intensive research: developing and using the algorithms that allow us to decode complex living systems.

Often bioinformaticians are learning on the job, especially PhD students or early career postdoctoral scientists, which is one of the reasons why we offer such a range of scientific training courses suited to a variety of needs here at EI.

One such need is training in Python, which is an open-source, higher-level coding language that, despite being written in ‘91, has seen a steady surge in popularity in recent years - becoming the programming language of choice for the majority of bioinformaticians.

 

Training @ EI

With the growing demand in bioinformatics skills driven by an increase in data-driven research projects, the curriculum for higher education struggles to keep pace.

We offer a diverse training programme in a state-of-the-art training facility aimed at life scientists, who are engaging in research projects relating to –omics techniques.

Courses include anything from short workshops on specific software or key programming skills to week-long, hands-on courses that encompass complete research workflows. One example is our Metagenomics course. This takes delegates from quality control of samples, identifying which sequencing platform(s) to use, on to genome annotation and the production of publication-ready figures.

From 22 - 26 July, EI hosted a 5 day course on ‘Advanced Python for Biologists’, taught by freelance trainer Martin Jones.

Martin, a trained biologist, has been coding since his PhD. He worked in various academic roles at the University of Edinburgh, culminating in two years of lecturing in bioinformatics, before starting up his business Python for Biologists.

I went to speak to him and some of the delegates to get some tips and find out how they would be using Python in their research.

 

So: why choose python?

“It’s easy to pick up as your first programming language because the syntax is so easy to read,'' Martin told me, adding that its popularity among biologists is due to the fact that “the community is great - there are lots of resources out there for scientists, such as SciPy, meaning it can be used to solve a range of problems.”

“It’s just a tool, therefore the applications are extensive.”

This was evident when I asked Martin what sort of things his ex-students were up to now: RNAseq, high throughput sequencing, text mining abstracts from papers, social media mining and natural language processing - to name a few!

Matt Bawn, who works at both QIB and EI researching evolution in pathogens, such as Salmonella, told me: “I’ve actually done this the wrong way round. I don’t use Python at the moment, but one of my colleagues at EI recommended I attend this training.”

Matt currently uses Perl in his work, but wants to switch to Python as it could make him more efficient.

“Python is a higher-level coding language than Perl”, he explained. “If we could only communicate in three letter words, we would need to use more to get our point across than if we were able to use longer words.

“Everyone can produce the same volume of code per day. But if you write in a higher-level code, you can get the point across more quickly, meaning we can convey a greater amount of information in the same amount of time. That’s the way Python works.”

 

Everyone can produce the same volume of code per day. But if you write in a higher-level code, you can get the point across more quickly, meaning we can convey a greater amount of information in the same amount of time. That’s the way Python works.

How quickly can I learn python?

“I’d say a week!” Martin said enthusiastically. “I offer a week long introductory course and most people will have got to grips with it by the end of this. Obviously there is some variation - it’s harder if you’ve never learnt how to programme before, because you also have to learn how to think in code.”

Martin explained to me that learning a programming language is just like learning a conversational language: the second one is always easier.

“Learning to code is not easy - but Python is a good place to start, because it’s in English.” Tomasz Wrzesinski, one of the delegates, told me. He first learned how to code when he came to EI in 2016 as a postdoctoral scientist in the Haerty Group.

“The course is good so far! It’s useful. Some things I already knew how to do, but they were buried in the back of my mind so this has been a good refresher. It has taught me how to build more complex programmes, which I currently use workarounds for.”

 

What’s the hardest thing about learning to code?

“The hardest thing about learning how to code is learning how to think computationally”, Matt Bawn later told me as the workshop progressed. Indeed, Graham Etherington of the Di Palma Group pointed out the same thing, that the hardest part is “getting used to writing in pure logic.”

Matt further explained that “you have to extract the bits you need to program from your problem and then visualise all the steps it takes to get there.”

“That’s where the creativity comes in, though. There will be many different ways to code, it’s open to interpretation.”

Matt pointed out that there is good code and bad code, and that this can lead to variance in the results that you see. Ben White of the Anthony Hall Group said that, often, the hardest thing about learning to code, “is other people’s code”.

Ben Ward of the Clavijo Group told me to “accept the bugs will happen to you, and nothing but care and time will cure them,” while Paul Fretter, Head of CiS, agreed, when he told me the hardest thing is “knowing when to blame the OS, the function, or the library you’re using… and when to admit the problem is in your own code.”

Nicola Soranzo of the Davey Group said that the hardest thing then is “debugging, i.e. how to find what’s wrong in your program!”

Another aspect is what you’re working with. Year in Industry student Will Glynn said that “computers don’t ‘think’ the same way humans do,” while PhD student Calum Raine agreed, telling us that “the hardest thing is learning how to intelligently think with complete unintelligence. The computer is very fast but entirely stupid and needs to be meticulously spoonfed.”

Ryan Joynson, another postdoc in the Anthony Hall Group, rounded us off with some sound advice, when he said, “no matter what you’ve learnt, there’s probably a faster way to do what you’ve done.”

 

You have to extract the bits you need to programme from your problem and then visualise all the steps it takes to get there. That’s where the creativity comes in, though. There will be many different ways to code, it’s open to interpretation.

And finally, Martin’s top tip for anyone trying to learn how to code? PRACTICE!

“Coming on a week course is great as you’ll be immersed and pick it up quickly. But when you go back to the lab, it’s important to put what you have learnt into practice.” Martin tells me that this can be the downfall of many of his delegates, especially biologists who spend the majority of their time in wet labs as opposed to an office.

“If you don’t have any particular problems to solve I recommend making them up. This means you can keep what you have learnt fresh in your mind. I would also recommend chatting to other programmers regularly and discussing your work. You may hear about another method of doing something and by explaining what you are doing to someone else it will be cemented in your mind.”

This article was put together and written by Science Communications Trainee Georgie Lorenzen.

 

Sign up now for these Python training events.

Learn to code

Fancy having a go at coding yourself?

 

Learn Python online via these resources:

 

Download Python

Download Atom (a code editor)

 

Check out this beginners guide to Python

Learn Python in Y minutes

Try a Code Academy course

You could do Google’s Python Class

Or something more specific on Coursera

There's also a cheat sheet here on Comparitech.

 

Practice! Code Abbey has loads of problems for you to try solving

 

Stuck? Check out forums such as stack exchange, the official Python forum or code review for the answers to your coding queries!

We chatted to Chris, who travelled all the way from Nigeria to do the course...

“I am a web developer at ACEPRD, a research institute in Jos, Nigeria. I develop web applications that are used to run biological analysis.

 

The training course was very interesting and unique. The most important thing I learned was troubleshooting.

I am interested in Python because it’s easier for me to understand and use in developing applications.

Earlham Institute is a nice place with a lot of research work going on there. I wish to visit the place again for another program. I will love to do my PhD studies in the institute if possible.”