OUR WORLD - Perspectives on Artificial Intelligence

More documents

Recommendations

Info

PERSPECTIVES ON ARTIFICIAL INTELLIGENCE an additional, more complex task, such as labelling objects in the photographs. Again, the value of additional data for the task of labelling objects flattens out past a certain threshold. Once again, as the volume of data continues to grow, the algorithm continues to learn. At some point, it can perform an additional, even more difficult task, such as understanding the nature of the actions in the photographs. Therefore, the value of data grows to the extent that harder problems need more data. Weyl and Posner’s argument is, implicitly, that data exhibits economies of scope. As Figure 3 illustrates, we end up with a picture that challenges Varian’s decreasing returns to scale hypothesis. Weyl and Posner note that the value of data may not increase forever: we could see a future where ML has “learned everything”. But until then, they claim, returns to the volume of data are increasing. Figure 3. Value of data as a function of the number of observations in a typical ML Source: Weyl and Posner (2018) Data as a barrier to entry Economic theory teaches us that, where there are increasing returns to scale, monopolies naturally emerge—the largest firm being the most economically efficient.[19] Likewise, where they materialise, increasing returns to scale to data could be expected to work to concentrate economic gains in the hands of data-rich firms. Monopolisation, however, is not necessarily synonymous with monopolistic behaviour—i.e. supra-competitive pricing, reduced quality, and/or hampered innovation. So long as the market leaders are challenged by the prospect of competitive entry by other firms, monopolised markets can be competitive. Increasing returns to scale and scope for data, however, might work in some cases to entrench the incumbent’s position by creating very high barriers to entry for prospective entrants. This could be a problem in markets where there is no substitute for a dataset that is essential, i.e. where there is only one dataset and it cannot be replicated, dispensed or purchased.[20] As US FTC Commissioner Terrell McSweeny has noted, “it may be that an incumbent has significant advantages over new entrants when a firm has a database that would be difficult, costly, or time-consuming for a new firm to match or replicate.”[21] Incumbency advantage could be particularly pronounced in the digital space. Leading ML users were once leading data collectors, but increasingly are leading ML providers. Amazon, for instance, originally sold books as a way to gather personal data on affluent, educated shoppers (Ezrachi et al. 2016). It is now a leading provider of ML services on the cloud. ‘Data scarcity’ can seem oxymoronic. The world overflows with information—right up to toilet seats, the new data hotbed. The volume of data created and copied each year is expected to reach 44 x 1021 bytes in 2020 — 40 times more bytes than there are stars in the observable universe.[22] Yet the vast amounts of data collected online form unique sets of behavioural data that may be harder and harder to replicate in light of the self-reinforcing dynamics described above. Why not let prospective entrants buy the necessary data in one of the many dedicated marketplaces (such as datapace.io)? A concern is that data incumbents have strong reasons not to sell their data: increasing economic returns to data tend to create perverse incentives for firms to establish a data advantage and erect barriers to entry thereafter (Cockburn et al., 2018). Incumbents might prefer hoarding the data they collect in order to, first, gain an advantage over their competitors and, later, curtail market entry (Jones et al., 2018). Two factors that could tend to compound these challenges. First, as previously noted, data-rich firms could benefit not only from economies of scale, but also from economies of scope. Data acquired for a particular purpose may be valuable in other contexts, granting incumbent firms an advantage over new entrants in adjacent markets (Goldfarb et al., 2018). For instance, data collected in the context of search queries can be used to inform a shoppingrecommendation algorithm. Economies of scope could leave little room for potential entrants looking to grow outside the incumbents’ market segments. Second, even if a firm successfully enters a new ML application market, incumbents may be in a position to use their rich data to detect the competitive threat, and acquire the new entrant firm before the incumbent’s position is challenged (so-called ‘killer acquisitions’). Implications for competition policy[23] Absent intervention, a possible market outcome could be high concentration and low contestability in data-reliant markets. This implies the need for competition policy scrutiny around data access. Should unique and non-substitutable datasets be considered an ‘essential facility’, on par with local loops for fixed telephony? Forced sharing can create inefficiencies, e.g. in the form investment 44 January 2021 | <strong>OUR</strong> <strong>WORLD</strong>
PERSPECTIVES ON ARTIFICIAL INTELLIGENCE disincentives, but is well established EU policy in many of the network industries. Under what conditions would the benefits of forced sharing outweigh the costs? The emergence of ML as a general-purpose technology raises difficult empirical and normative questions. Does the relationship between data accumulation and economic returns give data-rich incumbents a significant and self-reinforcing advantage? Are competition authorities equipped to discern and analyse data-driven monopolistic returns? These questions are high on the new European Commission’s agenda,[24] and for good reasons. Monopolistic behaviour by ML providers could slow the adoption of technology critical for EU competitiveness, especially hitting those smaller firms that lack the knowledge and resources to build alternative capacity in-house. If technological revolutions are distributional earthquakes, competition authorities should work to ensure that everyone lands on their feet. [1] This post focuses on ML applications. [2] See https://medium.economist.com/will-big-data-create-a-new-untouchable-business-elite-8dc23bcaa7cb [3] DG COMP 2019, citing https://medium.com/machine-intelligence-report/data-not-algorithms-is-key-to-machine-learning-success-69c6c4b79f33, https://www. edge.org/response-detail/26587, and http://www.spacemachine.net/views/2016/3/datasets-over-algorithms. [4] While this post focuses on issues pertaining to the volume of data, other characteristics of data are just as important for generating value. These include the other so-called ‘4Vs’ of data: volume, but also velocity (i.e. frequency), variety (e.g. administrative data, social media data, pictures, etc), and veracity (i.e. representative of the target population, free of bias, etc). For a firm to have a competitive advantage over these other characteristics can also generate important economic benefits. For the purpose of this blog, I note that securing a sufficient volume of data appears to be necessary but not sufficient to having a competitive AI/ML business. [5] Varian (2018) and Bajari et al. (2018) [6] In particular, Bajari et al. (2018) find that the length of histories is robustly helpful in improving the demand forecast quality, but at a diminishing rate; whereas the number of products in the same category is not (with a few exceptions where it exhibits diminishing returns to scale). [7] Agrawal, Ajay, Joshua Gans, and Avi Goldfarb. 2018a. Prediction Machines: The Simple Economics of Artificial Intelligence. Cambridge, MA: Harvard Business Review Press. [8] As related in Goldfarb et al. (2018) [9] See the deal’s press release: https://news.microsoft.com/2009/07/29/microsoft-yahoo-change-search-landscape/ [10] See https://www.cnet.com/news/googles-varian-search-scale-is-bogus/ [11] See Hal Varian in a CNET interview: “the scale arguments are pretty bogus in our view” (https://www.cnet.com/news/googles-varian-search-scale-is-bogus/) [12] “the amount of traffic that Yahoo, say, has now is about what Google had two years ago” and “when we do improvements at Google, everything we do essentially is tested on a 1 percent or 0.5 percent experiment to see whether it’s really offering an improvement. So, if you’re half the size, well, you run a 2 percent experiment.” Source: ibid [13] i.e. in performing tasks such as crawling, index, or ranking. [14] The European Commission’s DG COMP made similar claims in the context of the Google Shopping case. DG COMP claimed that general search service has to receive at least a certain minimum volume of queries in order to improve the relevance of its results for uncommon queries because users evaluate the relevance of a general search service on the basis of both common and uncommon queries. See para. 288 of the EC decision (https://ec.europa.eu/competition/antitrust/cases/ dec_docs/39740/39740_14996_3.pdf) [15] Cockburn et al. (2019). [16] An podcast episode from the Economist brings this point to life (https://www.economist.com/podcasts/2019/10/09/the-promise-and-peril-of-ai) [17] See Schaefler and al. (2018): “In perhaps no other market has the question of the role of data stirred such a vivid discussion among industry participants, academic experts, and policy advocates than in general internet search.” [18] Glen Weyl and Eric Posner, 2018. Radical Markets [19] https://cs.stanford.edu/people/eroberts/cs181/projects/1997-98/microsoft-vs-doj/economics/returns.html [20] See Calvano et al. (2020) for a survey of the literature around these issues in digital markets. [21] Commissioner Terrell McSweeny, Opening Remarks for a Panel Discussion, “Why Regulate Online Platforms?: Transparency, Fairness, Competition, or Innovation?” at the CRA Conference in Brussels, Belgium, at 5 (Dec. 9, 2015), https://www.ftc.gov/system/files/documents/public_statements/903953/mcsweeny_-_ cra_conference_remarks_9-12-15.pdf. [22] includes data generated online and by IoT and connected devices. Source: Word Economic Forum citing Raconteur (https://www.weforum.org/agenda/2019/04/ how-much-data-is-generated-each-day-cf4bddf29f/) [23] Note that a range of issues lie at the intersection of privacy and competition, including data ownership, reuse, transparency, sharing. These issues are beyond the score of this post and will not be explored here. [24] See the mission statement of European Commission President Ursula von der Leyen, which instructs Margarethe Vestager: “In the first 100 days of our mandate, you will coordinate the work on a European approach on artificial intelligence, including its human and ethical implications. This should also look at how we can use and share non-personalised big data to develop new technologies and business models that create wealth for our societies and our businesses.” (https:// ec.europa.eu/commission/sites/beta-political/files/mission-letter-margrethe-vestager_2019_en.pdf) Anderson, J. (2020), ‘The dynamics of data accumulation’, Bruegel Blog, 11 February. https://bruegel.org/2020/02/the-dynamics-of-data-accumulation/ <strong>OUR</strong> <strong>WORLD</strong> | January 2021 45
Page 1: January 2021 OUR W
Page 4 and 5: The Cover EDITOR & PUBLISHING PARTN
Page 7 and 8: Foreword By Kyriakos Pierrakakis Gr
Page 9 and 10: Foreword By Adina Vălean European
Page 11 and 12: PERSPECTIVES ON ARTIFICIAL INTELLIG
Page 19 and 20: Do not consider Hellenic American C
Page 46 and 47: Fearless fear·less|ˈfir-

OUR WORLD - Perspectives on Artificial Intelligence

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?