What Do Data Scientists Actually Do?
What Are Data Scientists Really Doing?
People who have moved into this sphere tell us in is interview.
Data Science is a popular area in IT that everyone is talking about now. But not everyone understands what data scientists do in practice. In short, they process huge amounts of data (so much that they do not fit into an Excel spreadsheet) and, on their basis, create algorithms for solving various problems - from making weather forecasts and recommendation systems for music services to developing smart chat bots and conducting genetic research.
There is a huge demand for qualified data scientists among large companies. Interesting work, lack of routine and high salaries make people think about changing jobs not only for people with technical education, but also for humanities. However, neither one nor the other knows how to approach the profession of data scientist: where to go to study, how to get a job and what will eventually have to be done.
We talked with three graduates who took a course in Data Science, and found out why they decided to make changes in their lives, whether the expectations from the new profession coincided with reality and what difficulties they had to face during work and study.
Why I chose Data Science
I didn't really like all the professions that I knew as a child, but I was always attracted by computers. In the 6th grade, I became interested in programming and began to study the C++ and Python languages and especially os.path.realpath. We can say that by the 9th grade I already had a fairly deep knowledge of writing code.
Even then I realized that if I want to develop in the IT field, programming alone is not enough. At that moment, I was offered to participate in a school Olympiad related to Data Science. Working with data sets attracted me because it requires a creative approach - for each task you need to find an original solution. This is where Data Science differs from software development, which uses approximately the same methods. But this is my subjective opinion.
There are very few training courses and really useful information in the open access on Data Science. The decision to study data scientist at SkillFactory came after I took a three-month course in Python programming with them. I liked the remote format and the way the curriculum is structured.
I already knew how to code and was confident in my skills, so the only thing that confused me on the course was the section with higher mathematics. It was very difficult for me, so sometimes I turned to mentors for help. Their answer could come instantly or the next day.
Other students also helped me. In general, there are many team competitions on the course, because the data scientist almost never works alone. The contest topics are completely related to Data Science. For example, there was a time series analysis competition. the Python workshop helped me a lot.
How the graduation project helped to improve the skills of a data scientist
For almost two years that I have been doing Data Science, the most difficult task for me was my graduation project on realpath Python - "Predicting Property Prices Using Machine Learning." The program that I made took data on a specific object: location, number of storeys, apartment area and number of rooms - and built forecasts of the cost of this housing based on them.
The most difficult, but also the most interesting part of the project was the complex data format. It is easy to work with information in the same format. For example, when numbers are neatly collected in a table. But if there are any signatures or symbols, they need to be cleaned, and this is very difficult. Basically, I was faced with a huge array of unstructured data. The graduation project took a very long time, but it was he who developed the skills that were previously lacking. The task forced me to apply the most sophisticated solutions, which I would hardly have thought of before.
I became more detailed in all the "features" of Data Science and mastered new tools, for example, hyperopt for automatic selection of hyperparameters or spellchecker for correcting spelling in words. I also strengthened my knowledge of materials that were not entirely clear to me during the course.
The diploma format was new to me, so the mentors mostly helped with the presentation design. At each stage of the work, I received a list of errors and shortcomings that need to be corrected. The same goes for code. You could always ask for help, but I wanted to sort things out myself. At least where possible.