The Most Eggregious Affront to K-12 Public Education: A National Database Built by the Gates Foundation and Rupert Murdoch’s News Corporation!

Dear Commons Community,

Reuters News Service reported yesterday on a $100 million database built to chart the academic paths of public school students from kindergarten through high school has just gone into effect.

In operation just three months, the database already holds files on millions of children identified by name, address and sometimes social security number. Learning disabilities are documented, test scores recorded, attendance noted. In some cases, the database tracks student hobbies, career goals, attitudes toward school – even homework completion.

The database is a joint project of the Bill & Melinda Gates Foundation, which provided most of the funding, the Carnegie Corporation of New York and school officials from several states. Amplify Education, a division of Rupert Murdoch’s News Corp, built the infrastructure over the past 18 months. When it was ready, the Gates Foundation turned the database over to a newly created nonprofit, inBloom Inc, which will run it.

Local education officials retain legal control over their students’ information. But federal law allows them to share files in their portion of the database with private companies selling educational products and services.

Reuters commented:

“Entrepreneurs can’t wait.

“This is going to be a huge win for us,” said Jeffrey Olen, a product manager at CompassLearning, which sells education software.

CompassLearning will join two dozen technology companies at this week’s SXSWedu conference in demonstrating how they might mine the database to create custom products – educational games for students, lesson plans for teachers, progress reports for principals.

States and school districts can choose whether they want to input their student records into the system; the service is free for now, though inBloom officials say they will likely start to charge fees in 2015. So far, seven states – Colorado, Delaware, Georgia, Illinois, Kentucky, North Carolina, and Massachusetts – have committed to enter data from select school districts. Louisiana and New York will be entering nearly all student records statewide.

“We look at personalized learning as the next big leap forward in education,” said Brandon Williams, a director at the Illinois State Board of Education…

That’s hardly reassuring to many parents.

“Once this information gets out there, it’s going to be abused. There’s no doubt in my mind,” said Jason France, a father of two in Louisiana.

While inBloom pledges to guard the data tightly, its own privacy policy states that it “cannot guarantee the security of the information stored … or that the information will not be intercepted when it is being transmitted.”

Parents from New York and Louisiana have written state officials in protest. So have the Massachusetts chapters of the American Civil Liberties Union and Parent-Teacher Association. If student records leak, are hacked or abused, “What are the remedies for parents?” asked Norman Siegel, a civil liberties attorney in New York who has been working with the protestors. “It’s very troubling.”

There is no reason for a database of this kind other than to provided for-profit education companies with a mailing list for selling their products.  Furthermore, why should all the details of our children’s school records be available on a national database where they can be tracked throughout their lives.

Lastly, how can American parents trust a database develop by Rupert Murdoch and The News Corporation.  Murdoch and his company have been the subject of systemic hacking scandals in the United Kingdom.  Key managers in his company have been indicted for hacking private citizens email including those of children and bribing public officials.  What an affront to the American people and what a danger to our children.

I thank Erik Bennett, a student in our program in Urban Education here at the Graduate Center for drawing my attention to this article.

Tony

 

Spurious Correlations Everywhere: The Tragedy of Big Data!

Dear Commons Community,

Many of us who follow and/or engage in quantitative analysis have been following the rise of interest in “big data”.  A major issue is the question of real findings versus spurious findings that result because of the very large size of datasets.  The statistician, Nate Silver, referred to the above as the “signal” and the noise in a recent best seller.  Geoffrey Pullum, a professor of general linguistics at the University of Edinburgh, in the blog Lingua Franca in the Chronicle of Higher Education, cautions those engaged in linguistics research about the subtleties of big data.  He specifically calls out recent work conducted by Keith Chen, a professor of economics at Yale University:

“The results (see this blog post for an informal account) were jaw-dropping. He found that dozens of linguistic variables were better predictors of prudence than future marking: whether the language has uvular consonants; verbal agreement of particular types; relative clauses following nouns; double-accusative constructions; preposed interrogative phrases; and so on—a motley collection of factors that no one could plausibly connect to 401(k) contributions or junk-food consumption.

The implication is that Chen may have underestimated the myriads of meaningless correlations that can be found in large volumes of data about human affairs.

Roberts and a colleague recently published a paper on this topic (“Social Structure and Language Structure: the New Nomothetic Approach” by Sean Roberts and James Winters, Psychology of Language and Communication 16.2 [2012], 89-112). They noted several zany positive correlations of language with behavior; for example, people who speak a subject-object-verb language (like Japanese, Turkish, or Hindi) have more children on average than do people who speak a subject-verb-object language (like English, Indonesian, or Swahili).

Nassim Taleb’s Antifragile (2012, Page 417, quoted by James Winters in a blog comment) contains a relevant remark about why such things might be: “In large data sets, large deviations are vastly more attributable to noise (or variance) than to information (or signal). … The more variables, the more correlations that can show significance. … Falsity grows faster than information.”

We should expect correlations that are statistically significant but ultimately meaningless to pop up all over the place once large quantities of data are available—especially with regard to something like language, given the difficulty of controlling adequately for cultural diffusion, geographical proximity, shared origins, and intervariable linkage.

I suspect that Chen’s correlations mean nothing at all: There is no causal link, and we do not need an explanatory story. In the kind of world we live in, you wrestle every day with a swirling mass of inexplicable correlations, and then you die.”

Pullum’s analysis is meaningful as we attempt to define how big data can be used in a number of applications including our own research.

Tony