This is an excellent effort that essentially computerizes the entire
language data base and its internal linkages. It should ultimately
be inclusive and make translation far easier as a task. We do have
ancient texts whose translations are very problematic. A good
example of just that is the four thousand year old texts for Chinese
medicine.
It really helps to know what was intended as particular words
emerged, even as they also evolved.
Language is sometimes a poor media to transmit ideas, yet it is the
main tool we all use. The mind actually absorbs meaning visually for
the most part which means that word groups need to trigger image to
work well.
Perhaps there is merit in re-imagining the visual meaning of out
proto languages. After all water is easy but rippled water, fast
water, cold water are all plausible one word derivatives that can
easily emerge. From that to analogy is a deep breathe.
Language 'time
machine' a Rosetta stone for lost tongues
In
an attempt to preserve endangered languages, UC Berkeley scientists
create a computer program that reconstructs ancient tongues using
their modern descendants.
by Leslie Katz
February 12,
2013
One of my favorite
things about watching old movies is hearing how people might have
spoken in eras past -- the expressions they used, their old-school
smack talk. But what did the languages from thousands of years back
sound like? Hollywood, as far as I know, has yet to make a movie in
which characters talk in authentic proto-Austronesian.
The language nerd in
me, was, therefore, excited to discover that scientists from UC
Berkeley and the University of British Columbia have created a
computer program to rapidly reconstruct vocabularies of ancient
languages using only their modern language descendants.
The program, a
linguistic time machine of sorts, can quickly crunch data on some of
the earliest-known "proto-languages" that gave rise to
modern languages such as Hawaiian, Javanese, Malay, Tagalog, and
others spoken in Southeast Asia, parts of continental Asia,
Australasia, and the Pacific.
The speed and scale of
the work is key here, as proto-languages are typically reconstructed
through a timely and painstaking manual process that involves
comparing two or more languages that hail from a shared ancestor.
"What excites me
about this system is that it takes so many of the great ideas that
linguists have had about historical reconstruction, and it automates
them at a new scale: more data, more words, more languages, but less
time," says Dan Klein, an associate professor of computer
science at UC Berkeley and co-author of a paper on the system
published this week in the journal Proceedings of the National
Academy of Sciences.
The program relies on
an algorithm known as the Markov chain Monte Carlo sampler to
examine hundreds of modern Austronesian languages for words
of common ancestry from thousands of years back. The computational
model is based on the established linguistic theory that words evolve
as if along the branches of a genealogical tree.
In their paper (if you
want more specifics on how the researchers did things like associate
a context vector to each phoneme, read this PDF), the team
details the process of reconstructing more than 600
proto-Austronesian languages from an existing database of more than
140,000 words.
They say more than 85
percent of the reconstructions come within a character of those done
manually by a linguist.
Linguists needn't
worry about competition from the computer, however. "This really
is not an attempt to put historical linguists out of a job,"
Klein tells CNET. "It's not like chess where we're seeing if we
can one-up the humans.
"It would take a
very long time for people to be able to cross-reference every
language and every word that we feed into the program," Klein
adds. "Of course in principle a human could do that, but in
practice they just won't. Humans are not well-suited to crunching all
of this data. Humans are well-suited to other things."
They can, for example,
examine linguistic influences a computer can't yet grok -- the
sociological and geographical factors that lead to morphological
changes that turn, say, "cat" into "kitty cat"
and how spelling errors reveal sound confusions for languages that
are also written.
"It's the same way that astronomers haven't been replaced by computerized telescopes," Klein says. "We hope that linguists can use tools like ours to see further into the past -- and more of it, too."
Clues from the past,
yes. But the work may also be able to extrapolate how language might
change in the future, according to Tom Griffiths, associate
professor of psychology, director of UC Berkeley's Computational
Cognitive Science Lab, and another co-author of the paper.
Which of course begs
the question: how will Twitter-speak look 6,000 years from now?
No comments:
Post a Comment