Uncategorized

Coding for Data Science Tips 2 – Standardize csv reading between Windows and Mac

When one starts to learn data science it is extremely useful to ask feedback from other data scientists and data enthusiasts on the quality of our code and the process, we are using to analyse data. To ease this process, we often send notebooks and projects back and forth. But way too often the code looks like this.

Don’t do this. This points the file to a file pathway specific to your computer and makes the life of those wanting to help you a lot more…boring.

Instead try something like this. Keep a folder within the folder of your project where you keep your data and name it something straightforward like….data. The function to upload a .csv or .xlsx starts by the working directory where the notebook is so it won’t be a problem. The code will look like this:

Much simpler, isn’t it? With this all you will need to ask for feedback from another data enthusiast is to copy the folder where the notebook is and the code should work fine.

You can do an extra step to be sure it works in all environments (Mac, Linux and Windows). Add an r before the pathway to the file, like this:

As usual all tips are stored in code in https://github.com/insilicobiologyblog/DScodingTips so you can check them.

Any tips & tricks you might have for coding in Data Science for all levels of data scientists? Share them with me! =)

Uncategorized

Coding for Data Science Tips 1 – Discovering the encoding of a .csv file

“Grrr, why can’t I upload this csv?”

Sounds familiar this little rant? Sometimes csv’s gives us a struggle to understand, mainly due to encoding, or the protocol in which the .csv processes characters. Different enconding but how can we discover the encoding of our .csv to start researching how to upload it to the notebook?

Simple, the code is pretty straightforward 🙂

Let’s hope that the next time you face a problem with uploading csv’s the solution comes easier 🙂

Check the code on https://github.com/insilicobiologyblog/DScodingTips/blob/main/CodingTips1/DSCodeTips1-DiscoveringEncodingCSVfile.ipynb

Any tips & tricks you might have for coding in Data Science for all levels of data scientists? Share them with me! =)

Uncategorized

Hire a Data Scientist, not a Data Technician

Everytime I write a new blogpost here or a new post on LinkedIn I always get the same question: “What programming languages do I need to learn to become a data scientist?”

In a short answer: “There is no answer for that question. Focus on becoming a Scientist, rather than a technician.” The market knows that the need for data scientists has increased but it doesn’t know how to hire them or even…find them.

With the change in thinking that data science has brought to the world, the paradigm in recruitment has to change as well. Data Scientist means a scientific mindset, not a technical mindset.

So, what can change to ease the life of data scientists and companies? Here are a few ideas.

Change or Ban Technical CV’s – Data Science exists where data exists, it really doesn’t matter if a data scientist has programmed in R, Python, Java or even Excel and for how many years. We are not developers, and it should not be expected of a data scientist a developer knowledge of a specific language. Why not changing for a story telling side that reflects the experience of the candidate with different projects, even the academic ones and volunteer work. Then you can assess how the candidate deals with change and how it adapts and contributes to new projects. A good data scientist should be happy to talk about projects, the pros and cons of each one, be aware of not revealing sensitive information and able to explain which lessons each project provided him/her.

Focus interviews on Story Telling and Challenge Solving – Data Science is not a checklist kind of job so it should not be faced as one. Instead of making a data scientist enumerate tools and technologies that he/she has worked on, why not making them tell how the project was developed and how the solution(s) was(were) found? That will help you assess the influence of the candidate in a group and his/ her abilities in communicating results. Two crucial traits in a data scientist.

Discuss current themes in Data Science and ask for their opinion – Is data privacy on the spotlight nowadays? Maybe the problems with driverless cars are on the news? Neural networks spark your curiosity? Ask for their opinion on the subject and what do they know about it. It can assess their curiosity and passion for the area and how updated they are on an area that can change from day to day.

Hire Diverse instead of just claiming it – Data Science is the best opportunity for a diverse team. Engineers, life sciences scientists, physicists, chemists, etc….all walks of life can bring a unique perspective to a project. Instead of demanding an engineering background, why not asking for an analytical background? As long as there is some knowledge and experience of statistics and linear algebra does it really matter if that person is an engineer or a chemist? In fact, diverse backgrounds mean different methodologies and techniques within a group which fosters creativity.

What do you think? How would you evaluate a potential scientific mindset?

A bit of Banksy fun to remind you that office work is not mandatory for data scientists =)

Uncategorized

In Silico Biology – What is this? Who am I?

Data is at the center of today’s world. In every transaction, in every event, in every single tiny thing happening in our universe.

This is what motivates a data scientist. Understanding this data.

If we move then to the specific data existing within nature and that can help us improve healthcare, science, ecology and even help us protect the environment, we get computational biologists working with it.

We’re talking about terabytes upon terabytes of data. But fear not, they’re far more simple and fun than what you might think. And it’s my responsibility to help you understand it.

This is my little corner of the web to show what a computational biologist and science communicator does. Let’s go 🙂