Is “data” singular or plural? The Wall Street Journal wants it both ways.
If you are writing about data, you have a choice to make: singular or plural.
Singular: The data is so persuasive that only one conclusion is possible.
Plural: The data are so persuasive that only one conclusion is possible.
Both are correct, depending on the audience
Colloquially, most people speak about “data” as a singular noun, for example: “The data is ambiguous. I don’t trust it.”
However, many statisticians and other experts who deal with data adopt the plural to be technically correct. “These data are ambiguous. I don’t trust them.”
To most readers, the singular sounds correct and the plural, stilted or wrong. But for academics, the plural is common.
You can make the argument that the word “data” is the plural of “datum” (a datum is a bit of information). While this is a fair rationalization based on the Latin origin of the word, it’s not definitive. For example, even though “media” originated as the plural of “medium,” we use the word “media” as as a singular to refer to entities that create content (“The media has a persistent bias towards newness.”) The pedantic Latin-followers would say “The media have a persistent bias,” but that’s not how we write or talk. The English word “media” can be a singular, even if it is plural in Latin.
There is no one best answer here: your choice depends on your audience. What you can’t do is flip back and forth in the same piece of writing. Earlier this week, the Wall Street Journal posted an article called “Data Show the Economy Is Booming. Wall Street Thinks Otherwise.” The article thinks that “data” is plural — mostly. Examine these sentences.
Data suggesting the U.S. economy is too hot for comfort are getting a cool reception . . .
Though not proof, recent data have suggested . . .
Questions about other data also have focused on oddities related to the month of January.
The catch in the data was that 353,000 was a seasonally adjusted number.
Discounting the importance of January data extended to the one notable exception . . .
The Commerce Department, which produces both reports, officially considers GDP “more reliable because it’s based on timelier, more expansive data.”
“We still think that real output is at most growing modestly above potential, despite the much stronger GDP data,” analysts at Goldman Sachs wrote . . .
Analysts have hardly dismissed recent economic data entirely.
The data, on its face, were about the last thing investors wanted.
Both the article title and the lead treat the data as plural. So does the next sentence I included, which comes from the end of the article.
I’ll get to the last sentence in a moment.
But the rest of the sentences are interesting: they would read the same regardless of the singular or plural nature of “data.” They solve the problem by embedding it in phrases. “Questions about other data” is plural. “The catch in the data” is singular. “Discounting the importance of January data” is singular.
And sentences that use data as the object rather than the subject of the sentence can be read either way. The phrases “based on timelier, more expansive data,” “despite the much stronger GDP data,” and “Analysts have hardly dismissed recent economic data” will make both data singularists and data pluralists happy.
It’s clear that the Journal’s style sheet treats “data” as plural — but it breaks its own rule all the time, for example in the lead of this article (italics added): “Scientists are looking at animal data to improve the speed of some medical test results. They’re taking that data, building artificial intelligence tools and then using those tools to train diagnostic algorithms for humans.”
But one sentence in the original article can’t avoid the issue: “The data, on its face, were about the last thing investors wanted.” If “the data” is plural, you’d have to say, “the data, on their faces,” which is disturbing, since data don’t have faces. But “on it’s face” (singular) followed by “were” (plural) is trying to have it both ways.
There must be a power struggle going on at the paper’s copy desk.
Make a choice. But rewrite to avoid confusion.
I’m doing some contract writing about data right now, and the client has required me to write about data in the plural, because that’s the correct choice for the audience. But when I can, I sidestep the problem by rewriting the sentences.
If I say “Data and statistics require close study,” the subject is plural regardless of whether “data” is singular or plural.
And if I write, “The challenge for any collection of data is to judge its veracity based on the source,” then the subject is singular regardless.
The writer (or copy editor) of the Journal article could have rewritten the problematic last sentence to read: “On its face, this new set of data was about the last thing investors wanted.” Problem solved.
If you’re the writer, follow your publication’s house style for data (and if you have a choice, I think the singular will be readable to more people).
But when you can, rewrite sentences to avoid the problem. This will make your prose more readable regardless of the way your audience thinks about data.
I’m a translator, and I’ve this discussion with innumerable clients and editors. I’ve decided that whatever works for them works for me as well, as long as my name isn’t on it.
“…I’ve had…”
This debate is one of my favorites. Like the Oxford English Dictionary, I consider”data” to be a “mass noun” like “information,” and thus singular. But to Josh’s point, the argument is a losing battle in academia, where it will forever remain plural.
The WSJ should pick a lane. I’ve seen similar debates about data set vs. dataset. I prefer the latter.
As for restructuring sentences, I suppose that you could also use the passive voice to mask the inconsistency. For example, “It is suggested by the data that…”
But you and I would never tolerate the passive voice here. In fact, now need to take a shower for even writing that horrible fragment.
I had a similar conversation with a colleague. When we discussed various “mediums” in which to consume content, she said, “don’t you mean media?” To which I sided with Josh on this one and offered, “well, some cities have multiple stadia for different sports”.
She nodded, though agreed to disagree. We went with ‘mediums’ for ease of consumption.