copyright

The Verge
That’s because there is no actual precedent for saying that scraping data to train an AI is fair use; all of these companies are relying on ancient internet law cases that allowed search engines and social media platforms to exist in the first place. It’s messy, and it feels like all of those decisions are up for grabs in what promises to be a decade of litigation.
The current round of language and image model speculation is based on the premise that using any public data for training is fair use not a massive copyright violation.
Washington Post
The Post’s analysis suggests more legal challenges may be on the way: The copyright symbol — which denotes a work registered as intellectual property — appears more than 200 million times in the C4 data set.
This humble website is included in the C4 corpus. You can use this tool to see if your copyright has also been violated.
si.edu
"…where you can download, share, and reuse millions of the Smithsonian’s images—right now, without asking. With new platforms and tools, you have easier access to more than 4.4 million 2D and 3D digital items from our collections—with many more to come."
Fun collection to browse through. I can even post this image of a Bell X-1 cockpit without attribution!

View of flight instrument panel in the cockpit of the Bell X-1
stablediffusionlitigation.com
"At min­i­mum, Sta­ble Dif­fu­sion’s abil­ity to flood the mar­ket with an essen­tially unlim­ited num­ber of infring­ing images will inflict per­ma­nent dam­age on the mar­ket for art and artists."
Describing image models as sophisticated collage tools takes some of the mystery out of AI and makes it clear work is being used without consent. This essay has a clear description of the diffusion process.
Flickr Blog
"What a strange, unexpected delight to be asked to return with the express goal of researching what the Commons has become and understanding how cultural institutions around the world have evolved through being a part of it. We want to design a stronger future for the program, with enduring longevity at its heart."
Great to hear this! The new Flickr owners are investing in its Flickr Commons program.
crummy.com crummy.com
Most books published before 1964 are in the public domain even though copyright has been extended to cover things by default after 1923. This article explains things well. Here's another take with more background: Where to Download the Millions of Free eBooks that Secretly Entered the Public Domain.

Almost completely unrelated, I enjoyed Top 5 bits of advice for first-time readers of Moby-Dick which I found via Austin Kleon but now can't find a direct link to that mention. Moby Dick is not in copyright so it's easy to track down. har har.
law.duke.edu law.duke.edu
image from law.duke.edu
Some art from 1923 is finally entering the US public domain after a 20-year extension passed by congress in 1998. In addition to a partial list of works here, check out the What Could Have Been section to feel the impact of that 1998 decision. They also have a good page about Why the Public Domain Matters.
  • This is Andy Baio's annual round-up of online movie piracy. This year: HD video makes leaked screeners irrelevant. "Already, with a month to go before the ceremony, 89% of this year’s nominated films have already leaked in high quality online, more than last year."
  • I had no idea that Oregon tried to keep minorities out in its early years. It's a depressing article but it's important to understand our history.
  • "Remix culture is the new Prohibition, with massive media companies as the lone voices calling for temperance. You can criminalize commonplace activities from law-abiding people, but eventually, something has to give."
  • "Like a service? Make them charge you or show you ads. If they won't do it, clone them and do it yourself. Soon you'll be the only game in town!" This is both absolutely true and heartbreaking.
« Older posts