Data-Mining Reveals That 80% of Books Published 1924-63 Never Had Their Copyrights Renewed and Are Now in the Public Domain

This content has been archived. It may no longer be accurate or relevant.

From BoingBoing:

[T]here’s another source of public domain works: until the 1976 Copyright Act, US works were not copyrighted unless they were registered, and then they quickly became public domain unless that registration was renewed. The problem has been to figure out which of these works were in the public domain, because the US Copyright Office’s records were not organized in a way that made it possible to easily cross-check a work with its registration and renewal.

. . . .

Enter the New York Public Library, which employed a group of people to encode all these records in XML, making them amenable to automated data-mining.

Now, Leonard Richardson (previously) has done the magic data-mining work to affirmatively determine which of the 1924-63 books are in the public domain, which turns out to be 80% of those books; what’s more, many of these books have already been scanned by the Hathi Trust (which uses a limitation in copyright to scan university library holdings for use by educational institutions, regardless of copyright status).

Link to the rest at BoingBoing and thanks to HM for the tip.

PG notes that BoingBoing has a less than sterling reputation for accuracy in reporting, but thought the possibility that this item might be correct was interesting.

11 thoughts on “Data-Mining Reveals That 80% of Books Published 1924-63 Never Had Their Copyrights Renewed and Are Now in the Public Domain”

  1. There is a business model here: invest time and money in researching the horde of copyright free titles and ebook them.

    One outfit selling through Amazon seems to be doing it with classic SF from older magazines and novels. Anthologies and single author collections at Indie prices. Name authors, too. Norton, Piper, Harrison among others.

    Halcyon Press, ltd.

    https://www.amazon.com/s?k=halcyon+classics&i=digital-text&crid=2870QC6J21513&sprefix=Halcyon%2Cdigital-text%2C277&ref=nb_sb_ss_i_1_7

    SF is ripe for this because of its reverence/support for vintage stories. It helps that, pre 1960, magazines were the core of the genre.

    • Felix, you’re right in general. Too bad the publishers who do this sort of thing (and Halcyon isn’t either the largest or best known) tend to be… lax in their copyright searches and hyperformal in what they’re publishing. (There’s a reason so many of them are in the Sixth Circuit — Michigan, Ohio, Kentucky, and Tennessee! One of them is even partially owned by a big library services vendor of e-books.)

      Frequently, these republications are repackaged omnibus editions of serialized novels that do not have the author’s corrections as appeared in roughly contemporaneous book editions. Even the typos may well be preserved! This is because a hyperformal view of what “registration” meant under the 1909 Act was that only the exact form and format that was deposited at the Copyright Office, and identified on the registration certificate, was “in copyright.” This was a particular problem when a novel was serialized in, say, July-August-September of year X, and published in book form in December of year X… which would bear a copyright date of year X+1, making it look to the uninitiated (or the ill-willed) as if the registration was untimely as to the serialized version and that the serialized version is therefore public domain.

      It’s a problem. It’s defensible because the Sixth Circuit has never repudiated some musical-composition cases from the early 1960s that were criticized at the time as both bad decisions and limited to musical compositions in any event, which gives at least arguability to overreaching republications.

    • There are already companies selling poor OCR and PDF copies of public domain items available on Project Gutenburg and elsewhere. Including in Print on Demand format in both hardcover and softcover. It’s very much buyer beware when looking for old books in electronic format on Amazon. I’d happily pay for a typo-free .mobi format version created from scanned 1609 chivalric romance, but, that’s not what I’d get.

      And as Petit points out, the public domain SF, Fantasy, and Mystery works from original pulps are often poorly edited. It’s worth paying extra for modern press editions of the works of HP Lovecraft and CA Smith prepared from their original manuscripts by scholars over their pulp appearances as they were sometimes dumbed down for the pulps.

      • That assumes the stuff is available.

        A lot of magazine stuff from the 30’s and 40’s never made it to book form.
        And the better publishers of the stuff do clean it up.
        Most of the quick buck guys are gone by now, too. Remember, Amazon did a sweep ages ago. 2011.

        Most of the remaining PD vendors are value add folks like Delphi and adhere to Amazon guidelines:

        https://kdp.amazon.com/en_US/help/topic/G200743940

        • Felix, I wish that I had been referring to value-added folks like Delphi; I was not. Delphi, in particular, concentrates on either long-out-of-copyright material (such as a decent edition of the works of “George Eliot”) or works originating outside the US, so it’s really not a good example of what Doctorow’s article was discussing.

          The first letter in that publisher name is correct, though; and that company is generating lotsandlotsandLOTS of complaints for its sloppiness from the community of librarians, or at least the part of that community that actually reads e-books purchased for libraries through consortia. Unfortunately, though, because that company is affiliated with the 500kg gorilla of library electronic editions, there’s often little or no alternative.

          And as far as works not making it to book form: A surprising proportion did, albeit from small presses and/or two to five years after serialized publication in pulps. Another subset (especially shorter works that hadn’t been serialized) appeared in author’s own collected-works editions, again with corrections. I recall a couple of 1930s authors whose works from the pulps were republished, and properly registered, this way (perhaps Weinbaum was one of them?).

          In any event, this is a separate data set from what Doctorow’s article appeared to rely upon. That, unfortunately, is one of the favorite tricks of the IWTBF (information wants to be free) crowd: Choose a dataset that allows a particular conclusion, and then generalize that conclusion to datasets that are not well represented by the chosen dataset.

      • > sometimes dumbed down for the pulps.

        And sometimes the pulp version was dumbed down for the novel. Or at least chopped down to some arbitrary shorter length.

        I’ve been spending a lot of time in the SF magazine collection at archive.org, sometimes comparing their OCR’d texts to some of my ebook versions, ripped down to plain ASCII. I was surprised to see that perhaps one in four of the pulp versions was longer than the novel version; Heinlein’s “The Moon Is a Harsh Mistress” is a good example.

        • I was reading Analog in the 70s when it serialised new novels before the were published as books. The serialised stories were often different from the book version. Often longer. I doubt material was added for the magazine.

          • Remember, that was back in the days before computerized castoffs, when even “electronic typesetting” was still a by-hand process… and the rigid physical requirements of the presses used to print mass-market paperbacks often dictated substantial cuts to avoid adding another 32-page signature.

            Talk about form determining function…

  2. The problem here is that — because it’s not consistent with the general meme at BoingBoing, or the ideological preference of its columnists (and here, Mr Doctorow is usually much more willing to recognize that not everyone shares his views than are many of his colleagues) — they didn’t take the next step and ask why.

    For many of these works (and as will become apparent, it’s impossible to determine the proportion), the failure to initially register is a breach of the publishing contract by the publisher, which is usually no longer in business or buried under so many layers of acquisition that even if the rights have been fully returned, it’s impossible to determine who breached. And that ignores both laches and the statute of limitations on contract actions. It’s a relic of the 1909 Act’s indivisibility of copyright… because at least at the moment of publication, the publisher had to own the copyright. (Remember those “return the copyright in 90 days” provisions in contracts from that era? Same issue… with the added bonus that under the Copyright Office’s administrative procedures up to the late 1950s, registration could be refused if applied for more than 90 days after publication!)

    So this is not, as both the article and prevalent IWTBF memes imply, that authors didn’t value their works, and the proof is that they didn’t register. This is, instead, because it was economically infeasible or actually impossible to police the parties who actually had an often-contractual obligation to perform. (N.B. One of the major publishers in Mr Doctorow’s field was notoriously awful about this, actually applying for less than 20% of the registrations it “should have” until it was fully assimilated into Viking Penguin, and conversely putting out its own copyright-infringing edition of The Lord of the Rings in the 1960s because more than 50 copies had been imported fully bound, thus forfeiting the copyright under the 1909 Act. But that’s just irony.)

    What this actually means, therefore, is for debate, not for ideological polemic.

  3. Project Gutenberg did their own analysis a few years ago. They did try to exclude non-US works. I don’t know how successfully. I think the number they came up with was about 70% of novels didn’t get renewed. The gotcha was they didn’t try to exclude works that were first published as serials in magazines. This catches a surprising number of items.

  4. I wonder if they made any attempt to exclude foreign works where I understand that the Uruguay Round Agreements Act restored the copyright for works that were still copyrighted in their source country (save for the simultaneous publication exceptions) even if they had never been copyrighted in the US or never renewed.

    Determining what is covered by copyright is a mess, particularly in the USA.

Comments are closed.