Million Books to the Web An Example of Indo-US Collaboration Lessons Learnt & The Road Ahead
Prof N. Balakrishnan
Lessons from the past
fires of Alexandria
irrevocably severed our access to any of the works of the ancients.
introduction of printing technology
several Indian and Chinese knowledge disseminated by word of mouth and on palm leaves virtually disappear or inaccessible
New cultural revolutions
edifices built by destroying the past irrevocably
later revolutions seek solace in attempting to preserve what was destroyed
we need to preserve our heritage independent of the political and social ups and downs
Lessons from Reality
In a thousand years:
only a few of the paper documents we have today will survive the ravages of deterioration, loss, and outright destruction.
Existing archives of paper many other works still in existence today are rare
- only accessible to a small population of scholars and collectors at specific geographic locations
Contrary to the popular beliefs, the libraries, museums, and publishers do not routinely maintain broadly comprehensive archives of the considered works of man
No one can afford to do this, unless the archive is digital
The Approach
Technology Driven Vision
Decide on the stake holders
Never make it exclusive
Pilot Projects to perfect technology
Bring in advanced management concepts
like People Maturity Models
Quality assurance
automate wherever possible
The Approach
Lessons from the past
Too many Digital Library Projects
with half-life of less than 2 years from the date of “Launch” or a long incubation time
Follow Nike – JUST DO IT
Digital Library must have two ingredients
A knowledge Amplifier
Free-access, giving avenues for every one to make economic benefit
still contribute to multiplication of knowledge by circulation
In India, it should be a test bed for our Language Technology Research
A billion Transistors at 10 to 20 GHz Clock rates by 2010
128 G Bytes of Main Memory
Terra byte of Disk Storage- may be Holographic
Speech input/ output ASR
Multiligual
Terrabit connectivity at PC
The DL plans of today must be sensitive to this
The Road Ahead
The future trends:
Browser will be the only medium of communication.
It will be active- with voice and video, language independent.
Mobility will be the key.
Small form factor devices such as Palms, PDAs and Tablets would be the future.
We would soon see TVPCT at the cost of a TV
We will witness major convergence between ICT, Nano Technologies and Biological Sciences
Electronic Resources and the Library of the Future
E-mags; E-books; E-music;
E-Movies
Dedicated E-book Readers
Dedicated readers – about 20,000
Palm devices – 6,000,000
PC’s – hundreds of millions
“For people accustomed to reading text on a computer for hours at a time, e-book screen clarity is a non-issue.”
A low cost E-Book reader design on in India
http://www.eink.com/technology/index.htm
E Ink is made up of millions of microcapsules
each the diameter of a human hair
Each microcapsule contains
positively charged white particles &
negatively charged black particles
that float in a clear fluid
A film of transistors supplies the voltage to the capsules
A negative charge makes the white particles move to the top of the microcapsule
an opposite electric field pulls the black particles to the bottom of the microcapsules, mimicking the effect of print.
Electronic ink is a real power miser
E-ink/e-paper (Lucent)
The technology has been identified and development is well under way
By the year 2003, we envision electronic books
that can display volumes of information as easily as flipping a page,
permanent newspapers that update themselves daily via wireless broadcast
Just as today's books give people easy access to everyday information, tomorrow's books will provide the same easy access to the dynamic data of the information age
Indian Institute of Science’s Simputer
A hand held Linux Box at around US$ 200
Has the state of the art browser
Color screen
very good speech synthesizer
In English and many Indian Languages
A very powerful tool for access with wireless
Soon to be modified as an E-book
www.simputer.org
www.picopeta.com
www.ncoretech.com
The Challenges in Computing
Tomorrow’s computing needs are not in mflops and Gflops
The computer to process Information, recognition and DM like a Human
Small inexpensive
Robots, swarms will
be a reality
Ray Kurzweil: The Age of Spiritual Machines
“A $1,000 PC (in 1999-dollars)…
2009 = trillion calculations/second
2019 = 20 million billion calculations/second (the human brain)
2029 = 2 * 1019 calculations/second (1,000 human brains)
Ray Kurzweil: The Age of Spiritual Machines
2009: “Computer displays have all the display qualities of paper- high resolution, high contrast, large viewing angle, and no flicker. Books, magazines, and newspapers are now routinely read on displays that are the size of small books.”
2009: “At least half of all (business) transactions are conducted online.”
2009: “There is effective convergence of all media, which exist as digital objects (that is, files) distributed by the ever-present high-bandwidth, wireless information web. Users can instantly download books, magazines, newspapers, television, radio, movies, and other forms of software to their highly portable personal communication devices.”
2009: “There is effective convergence of all media, which exist as digital objects (that is, files) distributed by the ever-present high-bandwidth, wireless information web. Users can instantly download books, magazines, newspapers, television, radio, movies, and other forms of software to their highly portable personal communication devices.”
2009
A $1,000 PC delivers Terahertz speeds
PCs with high resolution visual displays come in a range of sizes
from those small enough to be embedded in clothing and jewelry
to the size of a thin book
Cables are disappearing
Communication between components uses wireless technology, as does access to the Web
The majority of text is created using continuous speech recognition
Also ubiquitous are language user interfaces.
Most routine business transactions (purchases, travel, etc.) take place between a human and a virtual personality
Often the virtual personality includes an animated visual presence that looks like a human face
2019: “Reading books, magazines, newspapers, and other Web documents; listening to music; watching three-dimensional moving images (for example, television, movies); engaging in three-dimensional visual phone calls; entering virtual environments (by yourself, or with others who may be geographically remote); and various combinations of these activities are all done through the ever-present communications Web and do not require any equipment, devices, or objects that are not worn or implanted.”
2019: “Reading books, magazines, newspapers, and other Web documents; listening to music; watching three-dimensional moving images (for example, television, movies); engaging in three-dimensional visual phone calls; entering virtual environments (by yourself, or with others who may be geographically remote); and various combinations of these activities are all done through the ever-present communications Web and do not require any equipment, devices, or objects that are not worn or implanted.”
2029: “The ever learning Society”
2029: “The ever learning Society”
Learning now constitutes the primary focus of the human species.
Human learning is accomplished using virtual teachers (and virtual libraries?).
Learning is enhanced by widely available neural implants, which improve memory and perception but cannot yet download knowledge directly.
Automated agents are learning, on their own without human assistance. Machines can now create significant new knowledge with little or no human intervention; unlike humans, machines easily share knowledge structures with one another.
And Then There Was Music
RealJukeBox
Win Amp
MP3
Napster
The Growth rates
The processor performance doubles every 18 Months
The Network bandwidth doubles every year
The storage capacity doubles every nine months
Soon you will have processor bottleneck
1000 times growth in storage in 10 years – I already have 250 GB on a single disk-
Recognition verses Recall
Recognition is like seeing your friend’s face in a sea of faces
even if he has changed since you last saw him
storage intensive and fast
Recall is like figuring out how to repair your car’s carburetor using a manual and you have never done that before- applying knowledge to a new situation- processor intensive and less storage
Brian works on recognition
Present day computers prefer recall – remember the Y2K
Future computers would work like the brain- recognition
Recognition verses Recall- what it does to our DL
We will move away from quantitative search (key word match) to “aboutness” and content based retrieval
In Future the documents will be read more by computers than by humans – will it change the way we write ? Would we think in html or in xml ?
From mere Text data to 3d Objects, voice and video
Multiligual
Every conceivable form of knowledge expression
Technology Driven vision for The Digital Library
We can store everything
all the knowledge of the human race
in all forms
that is the Universal Digital Library
Cost of Selection is stationary but storage cost is plummeting
Education
Universal Library Vision
All recorded information online
instantly available
To Anyone
Anywhere in the world
In any language
searchable, browsable, navigable by humans and machines
Digital Library Contents
Books
Periodicals (journals, newspapers)
Art, photographs
Databases, software
Movies, video
Music, opera, dance
Suppose all of this were on the Web
Digital Library of the future
Digital library
Digital museum
Digital tour guide
Research assistant
Knowledge amplifier
Can we store all the human knowledge in a Digital form
There are about 100 Million books written by the human race
Multiply by 10 for all other form of knowledge
1 book = 500 pp. = 1 MB uncompressed
109 books = 1015 bytes = 1 petabyte
140 million computers on the Internet
At 20 GB free space each >2.8 Zetabytes now
1 GB of disk costs ~$1
1 petabyte < $1 million
Our Peta Byte server Initiative
Storage is not the limitation but creation and coordination are
Avoiding Duplication and connectivity are
Universal Digital Library
More than 120 million PCs on the net
Each having atleast 20 GB of free space
Peer to peer Communication
Can we store all the Human Knowledge in the computers
Technology Driven Vision for the Universal Digital Library
A vision to store everything that the human race ever produced
Mahastrastra Industrial Development Corporation, Maharastra
Universirty of Pune, Pune
Kanchi University, Kanchi, Tamil Nadu
Indian Institute of AstroPhysics, Karnataka
Scanner Operation at Hubs
Progress of Various Centre in Scanning
Number of Pages Scanned
Category of Books
Cumulative Status
More Centres and Initiatives- Already 61 scanners in operation + 39 in the pipe line
Rashtrapathi Bhavan
Punjab Technical University
IIIT Hyderabad and University of Hyderabad
MCIT’s Initiatives
Mobile Van with VSAT for the Book Mobile
ERNET providing connectivity to all centres
Many Centres supported with funds for computers and for scanning operations
Total spending from Government support and from Scanning Centre’s resources is ten times more than the Scanning equipment cost and effectively 100 times more
Support from all quarters of the government, religious leaders, academia and private agencies
Universal Digital Library of India to be launched
Some Observations and the Road ahead
More than 5 million pages have been scanned
The highest average rate of sustained scanning was about 4,000 pages per day at Hyderabad during February.
Our goal is to establish best practices to reach 6000 pages a day
3 years – 1 M Books
By 2020 – 20 Million Books, 2 Million Songs, 200,000 Movies
The most enviable content creation
Road Ahead
Establishing the Digital Library of India on the same lines as the E-Governance Initiative
Under the MCIT
Head Quartered in AP
A think tank for content selection, delivery, technology and policy directions for the country
Creation of special funds for 4C
Criteria for Selecting Mega Centres- 5 of them planned
Geographical Distribution
Availability of contents of interest to larger user base
Local enthusiasm to support and sustain this activity
Budget of US$ 200,000 Initially and around 0.5 cent per page of output
One single scanner can produce 2 Million pages a year-
We will have 300 scanners – a Million books a year
Change in the nature of Information creation and use
Philosophy of Copy Right Laws
Protect the Inventor so that private investments in R & D would flow
Disseminate the information so that society grows
Protect the fairuse
Ensure you get what you paid for
What can be copyrighted ?
Must be tangible, e.g. a lecture can’t be copyrighted, a transcript of it can
Work must be original
Work must be creative - even minimal efforts usually count as creative
Fair use doctrine
Authorizes any person to make fair use of a published or unpublished copyrighted work (including the making of unauthorized copies) in these contexts:
In connection with criticism of or comment on the work
In the course of news reporting
For teaching purposes or
As part of scholarship or research activity
Four basic Factors:
The purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes
The nature of the copyrighted work
The amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
The effect of the use upon the potential market for or value of the copyrighted work
www.library.org principles
Scholarly and government information and knowledge is a public good
that should be available, maintaining the balance of the rights of the individual creator vs. the needs of the public
The Library is the intellectual crossroads of the community.
Librarians will conceptualize and ensure
implementation of innovative new systems
for the creation and dissemination of information for succeeding generations.
“This rule provides that the first sale of a copy of a work to a member of the public ‘exhausts’ the rights holder’s ability to control further distribution of that copy. A library is thus free to lend, or even rent or sell, its copies of books to patrons”
“This rule provides that the first sale of a copy of a work to a member of the public ‘exhausts’ the rights holder’s ability to control further distribution of that copy. A library is thus free to lend, or even rent or sell, its copies of books to patrons”
How does this work in the Digital World ?
Music, Movie and Entertainment Industry
Much larger part of most of the economies
Large production costs
Need to protect business interest
Need to technology to protect
NAPSTER – peer to peer communication
DeCSS
NAPSTER for video ??
Consumer is different from the creator
New paradigms in the Digital Library
Should the laws used for protecting commercially attractive enterprise such as patents, music, entertainment be applied to DL
The dissemination of information creates multiplication unlike in music etc
Shorter life cycles for the information
Copyright Conflicting requirements
Need to protect the financial interests of creators in order to encourage private investments to the economy
Need to create a framework for every human being to create
The 2nd principle should dominate in DL
The 1st principle should dominate the others
The Concept of FourC
The scientific community is the only one that is creator and consumer of information
Can we do it in Scholarly communication, text books etc.
The Concept of FourC
In the 20th Century, in the interest of public good the Governments created BBC, PBS, AIR and also the Public Library System- provided compensation for artists and writers while providing free access to public
Total Global Expenditure in public broadcasting and public libraries exceed 100 B$
Look at our kings who supported all the poets and scholars
We need to find the 21st Century equivalent of BBC, AIR and PBS.
The Concept of FourC
Learn from NAPSTER- will we have a video equivalent of NAPSTER
It is impossible to police and protect IP Rights at gigabit rate connections
Some countries and WIPO under pressure from lobbying groups form the draconian Copy Right Laws
Remember the FAIR USE Doctrine- and what the creators want- recognition and compensation
The Solution -FourC
Consortium for Compensation of Creative Contents- FourC
Set aside 25% of the current national expenditure on public broadcasting and PLs
Authors are encouraged to put the work on the web after a few years of commercial exploitation- many models- in return get tax excempt etc.
India showing the way IASc and INSA
Books out of print
Titanic effect
Authors Can take back the Copy right
The Solution -FourC
Authors compensation based on the hits
Future versions of text books may be FAQs and XMLised-
Many eceonomic models-
Can work for Courseware as well
The Solution -FourC
The changing trend in publications- we want the documents to be readable by the machines as well humans
Born digital documents
Can we compensate those for creating contents for the web
Can we compensate those who create music and movies for the web- really small form factor – small screens
Conclusion
Knowledge multiplies whenever bits are circulated on the web
Technology has a habit of creating a problem (by knowledge explosion) and spending the rest of its time in trying to solve it- through Digital Library
The Universal Digital Library with 20 Million Books by 2020 – A year our President dreams India to become a developed nation
A FourC Policy and a Digital Library Act are in the anvil in India to meet this mission
If a billion people sneeze- together we can create a Hurricane
With the technology of the two nations we will convert this hurricane into useful energy and light up the world of knowledge
If you are creating a digital library, it should be for access by anyone, anytime and from any place
If you are creating a digital library, it should be for access by anyone, anytime and from any place
If Your Digital Library Is For Exclusive Use, Let Us Talk About Weather