The good news for search engines like Google is a proposed German copyright law won’t require them to pay to show short summaries of news content. However, uncertainty remains about how much might be “too much” and require a license. The new law is expected to pass on Friday.Der Spiegel explains more about the change: Google will still be permitted to use “snippets” of content from publisher’s web sites in its search results….What the new draft does not stipulate, however, is the precise definition of the length permitted.
The draft bill introducing an ancillary copyright for press publishers in Germany (Leistungsschutzrecht or LSR) goes to a final vote at 1oam Germany time on Friday. Below is my background about the hearings that happened this week, which in part lead to the snippets change.
Despite all the procedural and constitutional objections to the Leistungsschutz bill, there are also a couple of technical and political ones. Critics (and there are plenty of them) raise concerns that the collateral damage by this change in copyright will hurt search engines, innovation in general and especially smaller press publishers.They point to ambiguous language in the bill that will cause legal uncertainty and lawsuits that will take years to be settled.
The German government and supporters of the bill have done little to address these objections. On Saturday, I published an advance copy of the answers by the government in response to a letter of inquiry by the opposition Left Party. There is a continuing pattern in the government’s response referring open questions to be settled by courts or simply by ignoring the question.
One of the last opportunities to discuss the mechanisms of this ancillary right within the parliament lasted for 90 minutes Wednesday at an expert hearing at the subcommittee for New Media (Unterausschuss Neue Medien, UANM) at the German Parliament.
Public invitations for this hearing were sent out only a couple of days ago, after two weeks of behind-the-curtain negotiation between the governing factions in parliament (Christian Democrats (CDU/CSU) and Liberal Democrats (FDP)) and the opposition factions (Social Democrats, Left Party and Green Party).
CDU/CSU and FDP had previously refused to schedule another hearing next to the judiciary committee hearing in January, saying that all questions could also be addressed in this expert hearing. As it turned out, there were a couple of technical questions that could not be addressed, due to the fact that none of the invited experts in the judiciary committee hearing were experts in the field of technology. How could anyone have known that there are at least two kinds of experts out there!
Invited experts were
Dr. Wieland Holfelder, engineer at Google (there was a consensus agreement by the committee members that he could pass non-technical questions to legal counsel Arnd Heller from Google, who was sitting behind him)Dr. Thomas Höppner, representative from the press publishers’ association BDZVProf. Dirk Lewandowski, University of Applied Sciences, HamburgMichael Steidl, International Press Telecommunications Council (IPTC), London
Two experts were invited by the majority factions (Höppner and Steidl), two experts were invited by the opposition (Holfelder and Lewandowski). The procedure was following the usual procedures: There were three rounds of questions for members of parliament, two questions from each faction to one expert or one question to two experts. There was no opportunity for introductory statements by the experts and no strictly enforced time limit on answers.
So, in order for an expert to be allowed to speak, he has to be given a question from a member of parliament. An expert is not allowed to ask questions or offer refutations to other experts directly. This results in a strategy that each side is going to give softball questions to their own experts and potentially compromising questions to the experts from the other side. It has to be assumed at many hearings that questions were exchanged before the meeting and that there is some level of expectation on what the answer might be. This is exceptionally true for partisan experts whose employers directly benefit from or suffer by the outcome of this legislative process.
Some of the softball questions provided the experts the opportunity to explain how robots.txt works (Holfelder) or explain the shortcomings of robots.txt (Steidl and Höppner).
Holfelder introduced himself as engineer who implemented his own web crawler 14 years ago. He distributed printouts of robots.txt examples and the resulting snippets in the search engine results pages. He explained additional meta-tags that Google uses to add or remove content from the Google (or any other of the leading search engines). To some extend, his presentation felt both verbose and strangely elementary. In an ideal world, none of this information would have been new to a subcommittee that specifically focusses on such topics.
Petra Sitte, (Left Party) had asked Holfelder to comment on ACAP, a protocol that was proposed by a few publishers and has failed to get any meaningful level of acceptance by the market. Holfelder provided a few examples in which implementing ACAP will be prone to spammers, as it mandates the way in which provided descriptions have to be shown.
Konstantin von Notz (Green Party) asked Holfelder whether it was possible for a search engine provider to detect whether specific content on a web site is covered by this LSR or not. This is – in my opinion – one of the most important questions of this bill because it outlines the potential for huge collateral damage or legal uncertainty over the coming years.
The ancillary copyright is awarded to a press publisher (a press publisher is defined as anyone who does what press usually does) for his press product (a product of what a press publisher usually does). It exists next to copyright awarded to the author who can license his/her content to anyone else. It means that it is not the text itself that defines whether content is covered by the LSR.
Here is an example: A journalist maintains his personal web site in order to advertise for his services as a freelancer. He has a selection of half a dozen of his articles on his web site that help to inform potential customers on his journalistic skills. These articles are of course protected by copyright. They will not, however, be covered by the ancillary copyright because he is not a press publisher. The very same texts on the web site of a magazine’s web site will be covered by the LSR. How can a search engine determine if text on a web site is subject to both copyright *and* LSR?
Holfelder replied that Google has a couple of heuristics to determine whether a certain page is provided by a press publisher. However, this law has no provisions for “honest mistakes”. If Google fails to detect LSR content and does not receive prior permission to index such content, Google faces legal consequences. There is no such things as a “warning shot” or an obligation by the press publisher to proactively inform a search engine whether it things a certain page is LSR covered. This is the legal equivalent of a minefield.
Holfelder stated that a search engine would in this scenario tend towards overblocking in order to avoid a lawsuit for violating the LSR.
Höppner, the press publishers’ expert spent his time mocking a comparison about this bill that involves taxis and restaurants. He then stated how services such as Google News substitute visiting the original pages, with some rambling about a Google service called “Google Knowledge”. It was hard to tell whether he meant the failed Google Know project or the Google Knowledge Graph in the standard Google search.
His main argument on robots.txt was a passive-aggressive one. Publishers do not like robots.txt per se, they merely use it to fight for the last crumbs that search behemoths like Google have left them. In other words, if a press publisher is providing meta description text or Twitter cards, this should not be seen as some kind of agreement to actually use this text in order to build snippets in a search engine. I severely doubt that this position would hold in court or among the motivation of press publishers.
Prof Lewandowski’s contribution to the hearing was an interesting one as he is the first expert in a long time who does not seem to have an agenda with respecto to the LSR. His viewed were balanced, nuanced ones, highlighting both the high level of acceptance of robots.txt and some of its shortcomings. He pointed out that at least at Google News, the limited amount of sources and the opt-in-meachnism (yes, it’s more complicated than that) of Google News would permit running such a service in an LSR world.
Steidl used his time to explain IPTC’s contribution to the world of standards and mentioning the RightsML project which is in active development. He criticised robots.txt for being without a governing organisation and for failing to express rights on a sub-article level.
Both Google and the press publishers were not very eager to present actual numbers in Google News usage or how visitors are directed to third party web sites.
In round two, Google’s legal counsel Haller was asked how Google will react to this bill if enacted. He replied that Google does not know the final version of this bill, and that Google has not decided yet on how to implement it. He pointed out that his companry would have to not only deal with publishers from Germany but from the entire European economic area who could exercise their own LSR rights against Google.