Copyright search
In mid-August SFWA’s (the Science Fiction and Fantasy Writers of America) epiracy committee chair, Andrew Burt, sent a notice to Scribd listing documents that SFWA alleged were being hosted in violation of copyright terms. Unfortunately the list of was poorly researched and little more than a list of documents that contained the words “Asimov” or “Silverberg”. After a couple of email exchanges, Scribd removed the documents.
The result was a number of legally published works were replaced with a page displaying “The document … has been removed from Scribd. This content has been removed at the request of copyright agent Science Fiction and Fantasy Writers of America.” One of the authors affected was Cory Doctorow, who wrote an article on Boing Boing denouncing SFWA’s abuse of the DMCA. SFWA has acknowledged for the error, and contacted the affected authors to apologize. As others weighed in, the issue seemed to spiral into a freedom vs. author’s rights battle, with neither Scribd or SFWA looking good.
While reading about this, I was reminded of the issues some publishers and authors had with Google’s plans to build a full-text index of books, their main concern being that it was not a fair use of copyrighted material.
Put these two stories together, and I think there is an opportunity for Google to provide a valuable service that authors would support. By adding copyright information in a standard format to the book index, anyone would be able to check the copyright status of text indexed by Google. Add an API that returned results in a machine readable format, and sites such as Scribd could easily verify if text is copyrighted and under what terms before publishing it.
Google and the authors would each benefit from a copyright search. Google gets support from authors, and could potentially avoid scanning and verify books by having authors or publishers submit the text for indexing. Their search results would likely also contain advertising for places to purchase copyrighted works. The authors get a single point for maintaining copyright information that is easily referenced by others.
This does little to combat the blatant piracy, but legitimate sites hosting content would have access to an easy way to verify copyright information.