Confessions of a Reformed Dataphobe
This article originally appeared in Inside Higher Ed on January 23, 2017.
As a higher ed faculty member, my cognitive dissonance toward the term “big data” was palpable. My body would stiffen and my arms would fold when discussing the use of student data to increase enrollment or support academic performance.
Although my background in research instilled in me great respect for inquiry and data -- as did my childhood affinity for master observer Sherlock Holmes -- I was never at home with the idea of data with the term “big” thrown in front of it. Research, to me, was a very human endeavor, while “big data” represented something cold, calculated and ethically gray. Big data was the Moriarty to my Sherlock.
Within higher ed, I wasn’t in the minority with my cynicism. In a 2013 Educause report titled “Building Organizational Capacity for Analytics,” the authors identified a “substantial need for raising professional development, capacity building and the analytics IQ of institutional leadership and practitioners, at all levels.” Additionally, a recent Inside Higher Ed article by John Warner expressed skepticism of using aggregated data to predict individual behavior in education.
While I identify with the sentiment behind Warner’s data skepticism, I have come to understand in much greater detail the capabilities of data science and big data. And now that I’ve seen how data can empower institutions to match students to the right programs and provide individualized support through graduation, I am a huge proponent of increasing the use of big data in higher education.
My mind-set shift came care of a career change. After spending a few years working in higher ed, I moved to the private sector to help higher ed institutions increase enrollments, thereby helping students at scale. That move caused an immediate collision between my research-loving self and my dataphobic self, as now my work revolves around the insights of big data.
To reconcile these two seemingly disparate mentalities, I had to do some soul-searching. I quickly realized that my resistance to big data hinged on three things: my attachment to big data’s disreputable forward face, my ignorance of the breadth of big data’s capabilities and my misconception that big data could only create fixed, unchanging portraits of students.
Big Data Has a Bad Reputation
Big data’s public profile leaves much to be desired. As business executive Jonathan H. King and law professor Neil M. Richards pointed out in a 2014 Forbes article, “While there’s nothing particularly new about the analytics conducted in big data, the scale and ease with which it can all be done today changes the ethical framework of data analysis.” And we’ve seen this ethical dilemma play out for the worse in predatory marketing practices in a number of sectors, including higher ed. Once again: Moriarty.
Familiar questions about privacy, ownership and transparency of data are particularly salient in post-Edward Snowden America. Many privacy clauses lie buried in pages and pages of legal text, most of which consumers never read. With online activity becoming so ingrained in our daily lives, it is unreasonable to think that we can either: a) discontinue our online activities due to privacy concerns or b) fully attend to the myriad legal agreements our online activities make on a daily basis. If one wants to remain (or become) a contributing member of society, neither of these options is plausible.
Yet a simple truth underpins this ethical debate: big data itself is ethically neutral. As Debra Humphreys, vice president of strategic engagement at the Lumina Foundation, points out in Game Changers: Education and Information Technologies, “People define how technology is deployed, not the technologies that people invent.” Because people are at the crux of all ethical gray areas in big data, higher education institutions are confronted with the responsibility -- and opportunity -- to set the ethical standard for the utilization of data science.
Big Data Is Just That: Big
I remained unaware of the breadth of big data’s capabilities for quite some time. To visit examples put forth in Warner’s article, I too had caught word of studies that correlated things like first-semester credit loads and pre-emptive access to courses with student success. Yet we all know that correlation is not causation, and Warner rightly pointed out that “by focusing on questions of what (take 15 hours/access course early), we allow ourselves to keep from confronting the much more important questions of why.” This assertion lies at the crux of the misunderstanding of data science in education. We do have the capability to more narrowly get at the why behind student success on a much more individual level through data.
The truth is that data analytics capabilities have grown exponentially. Now millions of data points can be assessed in relation to all others. As Vernon Smith posited in Game Changers,
“A growing body of best practices and interventions that remove barriers to student progress and success exists, but those interventions would be better informed if they were based on what the research and actual behaviors indicate, rather than on anecdotal notions or experiences alone.”
In terms of research, these anecdotal notions come when they are founded on too little data. While sweeping interventions hinge on single data points like early access to online course materials, big data has grown the ability to concurrently assess millions of data points -- demographics, test scores, previous academic performance, employment, family size and learning styles, to name a few -- and potentially identify “at-risk” students. Holmes was right: “the world is full of obvious things which nobody by chance ever observes.” Leveraging big data to reveal those obvious things can help institutions paint a predictive picture of a student’s likelihood of success.
Data Analysis Is a Living Process
All that said, I wholeheartedly agree that predictive modeling alone is not the panacea to end all student failure. Neither is collecting real-time student data the only answer. The problem with many real-time indicators of student struggle is that once they surface, it’s often too late. Many students simply won’t raise their hands and say “help!” and it’s often too late to effectively assist them with red-flag indicators like not showing up to class or not logging onto a learning management system. Yet when we combine that predictive model with real-time indicators of student performance, you’ve got a living, individualized and iterative foundation for student support.
To maximize impact, we must view data in terms of iteration and interaction. By merging predictive models of student success with real-time indicators of student performance, we home in on a more individualized foundation for student success. Predictive models inform a baseline understanding for each student, then data on each student’s continuing exchanges with and performance at an institution can help inform interactions throughout the entirety of the student lifecycle. Only then can interventions hinge on a more holistic story than log-ins or credit hours alone can tell.
I am passionate about the mission of higher education, which is why I’m now doing what I do. Higher ed institutions fill a vital role in society and place value on information and high ethical standards. On the surface, higher education’s commitment to research, teaching and serving the public seems in opposition to the unethical applications of big data to simply maximize profits. Yet if we focus on transparency, customization and innovation, we can employ big data to more fully pursue our mission and goals. In this way, we’ll say to our students, prospects and stakeholders, “You know my methods, Watson.”