The data scientist may well be the “sexiest job of the 21st century” say Davenport and Patil in a recent HBR article. They define the data scientist as an information technologist who can“coax treasure out of messy, unstructured data.” But that too easily leads us to think in IT categories.
Davenport and Patil are correct that the data scientist may well be the “sexiest job of the 21st century,” but they need a different, more visual way to understand the profession.
Jessica Chastain’s Maya, the CIA’s pro in Zero Dark Thirty, is not an obvious data person, but she’s a near perfect model of the data scientist. The movie narrative refers to the origination and development of data, but focuses primarily on data interpretation and organizational communication. The movie is a piece of luck for understanding a “hot” profession.
History of the career
Data science as a career is not new and it is
not fundamentally digital. The CIA has been developing and using data
scientists since WWII. The true big data professionals can come from a current,
existing career model.
The competencies
The competencies of data scientist can be framed
into three buckets: big data, interpretation and communication.
That’s a single, somewhat hard science (data acquisition and development), and
two soft sciences.
The first bucket, big data, takes a number of forms today: digital troves, major enterprise systems, including the nanotechnologies and even gene therapies. Though the “old” CIA operatives used different data tools than today’s CIA, the problems and underlying processes of data origination and development are all analogous. Today’s digital tools are enablers, facilitators, accelerators and magnifiers of human capability. They don’t replace the human component that was so obvious in the early CIA professionals. In their excitement over the digital tools, many stumble over “data” and forget about the human component, much to their detriment.
So I remind you that machines don’t make the essential, important connections among data and they don’t create information. That’s the job of humans. And that’s what the software architect, Grady Booch, had in mind when he uttered that famous phrase about fools and tools: A fool with a tool is still a fool. This blog, at the least, is a warning not to get sidetracked by a vision about ties between big data and information technology. It was such failure that led to Target’s faux pas of sending baby coupons to a teenager who hadn’t yet told her parents she was pregnant. And it was the same failure that brought about a 1,000 point crash of the Dow Jones in May 2010, and that also brought the financial crisis when hedge funds failed all over the world.
That second bucket, interpretation, is about theorizing and inferential and associative thinking, a major problem in nearly every academic discipline, from the humanities, to the social sciences and the biological and physical sciences. It’s exemplary, for example, that my friends in statistics say that drawing inferences is profoundly plagued with error—and that you need graduate work and a supervisory experience to be able to do at least a half-assed job. It’s just as obvious in social psychology, rhetoric and even physics.
Furthermore, interpretation will require a knowledge base about a given subject such as health care or consumer products to succeed in associative thinking. A labor economist, for example, has to draw inferences and conclusions from his knowledge of the labor context as well as from the economic data. Ever wonder why economists disagree so much? It’s rarely about data origination and development, but almost inevitably about the inferences, associations and conclusions drawn from the data. A social scientist has to draw his conclusions from the various, almost infinitesimal forms of social data applied to human behavior. Thus, as a social scientist and executive coach, I’m always warning managers about this problem. My warnings usually surface nicely as “How else can you interpret John’s behavior?” And I force them to dig—warning them that even when we’re through, our interpretation of the motivational data may be quite wrong. In short, the gap between data and interpretation is a huge black hole not readily demystified.
Interpretation is unique not only because of its thinking requirements, but also because of the required understanding of a given context. Inferential thinking is a really tough discipline and intelligent execs won’t want the conclusions of a mere info technologist impacting the company’s business development, marketing, strategy or sales.
The third bucket, communication, may well be the most difficult and most misunderstood of the three competencies. Candidly, it’s the Achilles heel of far too many business people. It shouldn’t be a surprise that there’s a lot of data (here goes my interpretation) suggesting that as a result of our digital infatuation, communication competencies are getting worse, not better. There’s no doubt that there is a vast scope for data scientists to boost productivity in health and education if only those sectors were more open to change. There’s extensive data with solid policy implications if you could get legislators to accept and act upon the conclusions. When you watch Zero Dark Thirty, observe the hoops that Maya had to go through to get governmental executives to believe in her conclusions. Her campaign of communication hit pothole after pothole before she had success. The technological demands of interpersonal and organizational communication are at a level rarely experienced in human history.
So pay close attention to a correct caveat added by Davenport and Patil’s article in which they write that though hacking seems to be having its heyday, this may be less true in five years’ time, when many more people will have the title “data scientist” on their business cards. More enduring will be the need for data scientists to communicate in language that all their stakeholders understand—and to demonstrate the special skills involved in storytelling with data, whether verbally, visually, or—ideally—both. So yeah, the gap between interpretation and communication is problematic. And frustratingly, the relationship between communication and organizational buy-in is still another huge black hole.
The education of a data
scientist
It’ll take smarts, well-rounded education and
experience to develop the competencies. No two-year wonder from the local tech
school can muster the kind of education needed. Thus the CIA, which has a
storied data science history dating back to WWII, is looking for the following
types of people:
- potential data scientists with background in economic, mathematical and modeling methodologies
- economists who can track foreign financial activities
- intelligence collectors who can provide colleagues the data to understand international and cultural behaviors and actions
- medical and health analysts (physicians) who analyze and assess global health issues
- military analysts who stay on top of threats from foreign military and technical developments
- psychiatric analysts who can tap their expertise to study the health of foreign officials and assess the psychological and social factors that influence world events
The reason that the CIA has such a diversity of need is that each discipline has its own substance, but especially its own way of thinking. IT people, for example, tend to be very concrete, linear thinkers. That’s what IT wants and needs. Marketing people tend to be concrete, but random, creative thinkers. Economists tend to be abstract, but fairly sequential thinkers. And the military, in contrast, is filled with people who are concrete and sequential but not especially creative. One reason General Petraeus, for example, had so much success and got so much attention is that his thinking styles are atypical for the military, making possible creative strategies for the management of Middle Eastern conflict.
The data scientist, as its own discipline, requires a highly adaptable person with superb skills in analysis, concrete and sequential thinking, yet also oriented to the big-picture abstract and creatively random patterns. Even more importantly, the profession requires phenomenal interpersonal and presentational communication skills. Creativity and innovation go nowhere without the ability to sell and get buy-in.
Davenport and Patil got it right: the data scientist is the sexiest job of the 21stcentury. But visualize Maya if you really want to understand the profession—and its glamor.
Relevant reading:
- Thomas H. Davenport and D.J. Patil, DataScientist: The Sexiest Job of the 21stCentury. Harvard Business Review, October 2012.
- For the perils of big data: Emanuel Derman, Models.Behaving.Badly.: Why Confusing Illusion with Reality Can Lead to Disaster, on Wall Street and in Life. (New York: Free Press), 2011.
- Many academic disciplines address inferential thinking, but this is one of the best: Nisbett and Ross, Human Inference: Strategies and Shortcomings of Social Judgment.(Englewood Cliffs: Prentice-Hall), 1980.
- Communication and rhetorical writers have failed to keep up with the needs of technology and organizational life. Sadly, even the new book by Groysberg and Slind, Talk, Inc. (Cambridge: Harvard Business School Publishing, 2012) fails to do little more than put traditional communication categories into the work setting. It lacks an understanding of how language works, the use of forms, or an understanding of how to create rhetorical impacts upon individual mindsets, all needed for today’s technologies.
Flickr photo: The next list