What I Learned from Writing a Data Science Article Every Week for a Year

On the surface, this article seems to have nothing to do with the types of writing most indie authors pursue, indie publishing, etc., but PG thinks the mental attitude about learning new ways of thinking and different approaches applies equally to tasks involved in being an indie author.

SF authors will almost certainly know that data science and the creation of artificial intelligence are intimately intertwined.

From Towards Data Science:

There ought to be a law limiting people to one use of the term “life-changing” to describe a life event. Had a life-changing cup of coffee this morning? Well, hope it was good because that’s the one use you get! If this legislation came to pass, then I would use my allotment on my decision to write about data science. This writing has led directly to 2 data science jobs, altered my career plans, moved me across the country, and ultimately made me more satisfied than when I was a miserable mechanical engineering university student.

In 2018, I made a commitment to write on data science and published at least one article per week for a total of 98 posts. It was a year of change for me: a college graduation, 4 jobs, 5 different cities, but the one constant was data science writing. As a culture, we are obsessed by streaks and convinced those who complete them must have gained profound knowledge. Unlike other infatuations, this one may make sense: to do something consistently for an extended period of time, whether that is coding, writing, or staying married, requires impressive commitment. Doing a new thing is easy because our brains crave novelty, but doing the same task over and over once the newness has worn off requires a different level of devotion. Now, to continue the grand tradition of streak completers writing about the wisdom they gained, I’ll describe the lessons learned in “The Year of Data Science Writing.”

The five takeaways from a year of weekly data science writing are:

  1. You can learn everything you need to know to be successful in data science without formal instruction
  2. Data science is driven by curiosity
  3. Consistency is the most critical factor for improvement in any pursuit
  4. Data science is empirical: instead of relying on proven best methods, you have to experiment to figure out what works
  5. Writing about data science — or anything —is a mutually beneficial relationship as it benefits you and the entire community

. . . .

1. Everything in data science can be learned without going to school

Mechanical engineering, which I unfortunately studied in college, has to be taught at an institution. It’s just not possible for an individual (at least one with normal resources) to gather the equipment— labs, prototyping machines, wind tunnels, manufacturing shop — needed for a “mech-e” education. Fortunately, data science is not similarly constrained: no topic in the field, no matter how state-of-the-art, is off-limits to anyone in the world with an Internet connection and a willingness to learn.

While I did take a few useful stats classes in college (note: everything in these classes is covered by the free Introduction to Statistical Learning) the data science courses at my college were woefully out-of-date. We were taught tools and techniques that fell out of favor years ago. In several cases, I showed the professor evidence of this only to be told: “well I’m going to teach what I know because it worked for me.” What’s more, these classes were geared toward research which means writing inefficient, messy code that runs once to get results for a paper. Nothing was ever mentioned about writing code for production: things like unit tests, reusable functions, or even code standards.

Instead of relying on college classes, I taught myself (and continue learning) data science/machine learning from books and online courses/articles. I select resources that teach by example and focus on what is actually used in data science in practice today. (By these standards, the best classes are from Udacity and the best book is Hands-On Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron.) You don’t have to pay for material: fast.aihas the most cutting-edge course available on deep learning for free; Kagglegives you opportunities to work on real-world data and learn from thousands of data scientists; and, books like The Python Data Science Handbook don’t cost anything! (Towards Data Science is also useful).

. . . .

Few people know what they are talking about when it comes to data science, and if you’ve studied the most recent material available online, you’ll be ahead of most everyone else. In fact, I would argue you are better off learning from online sources/courses, which are constantly updated, than from educational institutions that revise curriculum at most once per year.

. . . .

Curiosity is also helpful when you’re actually doing data science: exploratory data analysis is driven by the goal of finding interesting patterns in the data. On a somewhat related tangent, Richard Feynman, arguably the smartest man of the 20th century, might be the best proponent for the benefits of a curious mindset. A theoretical physicist, he was as well known for picking up skills (like safe-cracking) or playing practical jokes as he is for his work on quantum mechanics. According to his works, this curiosity was integral to his work as a scientist and made his life more enjoyable.

Feynman was driven not by a desire for glory or wealth, but because he genuinely wanted to figure things out! This is the same attitude I adopt in my data science projects: I’m doing these projects not because they are a required chore, but because I want to find answers to hard problems hidden within data. This curiosity-based attitude also makes my job enjoyable: every time I get to do some data analysis, I approach it as a satisfying task.

Link to the rest at Towards Data Science

Until he read the OP, PG didn’t know about Project Jupyter and Jupyter Notebooks, which are another cool online thing.

2 thoughts on “What I Learned from Writing a Data Science Article Every Week for a Year”

  1. Will Koehrsen, the author of the OP, may have learned “data science”, but he did not learn humility. His reference to Richard Feynmnan is an insidiously disingenuous way of drawing unearned glory to himself.

    What is “data science” anyway? Is it any different from statistics in the modern world?

    IMO Koehrsen knows enough to be dangerous and no more. I am not persuaded that he understands the foundations and history of statistics well enough to know the limits or the true power of the math he uses.

    As for “data visualization”, histograms and scatter plots are basic tools I used to tell me if the data would support parametric techniques. The reason a university mathematics department requires integral calculus as a prerequisite to probability and probability as a prerequisite to statistics is to build the foundation for parametric statistics and acquaint the student with their limits. Koehrsen’s unstructured approach makes ignorance a virtue. A false virtue but a virtue nonetheless. It leads to political polls with an error of ±4%, a margin that tells me that the poll used too small a sample and is not robust enough to support any conclusion or justify any decision. Perhaps Koehrsen’s chaotic study would discover this truth, but it would come by guess and by golly and not by design.

    In my experience, data is what we collected. Information was what we produced. In between was a lot of collation and analyses. But production of useful results required collection of the right data. And getting that was a battle.

    I believe that Mr Koehrsen is a dangerous fool.

    • He might have a bright future in politics.
      Bigger fools seem to be doing okay. All the way to Congress.

Comments are closed.