Past Yonder

A human's thoughts on AI


  • Bleeding the grid dry: AI’s insatiable appetite for electricity

    If you ask ChatGPT to explain the impact AI is having on the electric grid, it cheerfully (and without a hint of irony) explains all the ways AI consumes vast amounts of power. Training models? Yup, that takes a few electrons. Answering questions – such as how much electricity is used to answer a question? Add even more electrons to the tab.

    AI data centers are using ever-increasing amounts of power as the demand for AI services grows.

    It’s been well publicized that tech companies are investing huge sums of money in hardware to support their burgeoning AI efforts. They are buying every Graphics Processing Unit (GPU) they can get their hands on, as GPUs excel in the parallel computations necessary for modern AI. Those GPUs are housed in massive data centers, where they devour electriciy, generating endless heat in the process. Dealing with that heat requires robust air conditioning, which only increases the electricity bill further.

    And, of course, the human engineers programming these system probably consume a lot of coffee to stay energized, and that takes some energy too.

    In February, 2024, The Verge published a story pondering the electricity consumption needs of modern AI systems such as Large Language Models (LLMs). Although the closed nature of many AI systems makes it difficult to estimate their exact energy use (the companies that make them aren’t volunteering any numbers), author James Vincent tried to put forth some estimates on both the training and query-answering sides.

    Training, in particular, is extremely energy intensive, consuming much more electricity than traditional data center activities. Training a large language model like GPT-3, for example, is estimated to use just under 1,300 megawatt hours (MWh) of electricity; about as much power as consumed annually by 130 US homes. To put that in context, streaming an hour of Netflix requires around 0.8 kWh (0.0008 MWh) of electricity. That means you’d have to watch 1,625,000 hours to consume the same amount of power it takes to train GPT-3.

    Once a model has been trained, it takes much less power to run a query against it, although at scale, that quickly adds up.

    Vincent references a study performed by researchers Alexandra Sasha Luccioni, Yacine Jernite, and Emma Strubell that examined the possible energy requirements of common AI tasks such as answering questions or generating images. Vincent writes:

    Luccioni and her colleagues ran tests on 88 different models spanning a range of use cases, from answering questions to identifying objects and generating images. In each case, they ran the task 1,000 times and estimated the energy cost. Most tasks they tested use a small amount of energy, like 0.002 kWh to classify written samples and 0.047 kWh to generate text. If we use our hour of Netflix streaming as a comparison, these are equivalent to the energy consumed watching nine seconds or 3.5 minutes, respectively. (Remember: that’s the cost to perform each task 1,000 times.) The figures were notably larger for image-generation models, which used on average 2.907 kWh per 1,000 inferences. As the paper notes, the average smartphone uses 0.012 kWh to charge — so generating one image using AI can use almost as much energy as charging your smartphone.

    Vincent cautions that there still can be significant variance in these estimates based on the actual implementation of AI systems.

    But the trendlines are clear: the amount of electricity consumed by AI is going to grow significantly in the coming years. According to the Boston Consulting Group, which provides AI-related consulting services, AI data centers could consume up to 7.5% of the total electric output by 2030, tripling their energy usage from just a couple years ago.

    That’s going to put a great deal of stress on electric generation, the electric grid, and the environment. The Editors of Bloomberg Businessweek recently acknowledged the problem in an opinion piece:

    It’s easy to envision how things could go wrong. In the US, power-hungry AI applications are already adding to strains on electricity grids and pushing utilities to burn more fossil fuels. In Ireland, a global computing hub, data centers are expected to consume nearly a third of all electricity by 2032. Cue a vignette of people unwittingly boiling the oceans in pursuit of the perfect dog portrait.

    But their prognosis isn’t all doom and gloom; they see tech companies as possibly being in the best position to address the problem:

    There’s also a more positive scenario. The users and owners of these data centers — including Alphabet Inc., Amazon.com Inc., Meta Platforms Inc. and Microsoft Corp. — are among the world’s largest companies, with ample cash, long strategic horizons and public commitments to the environment. Who better to drive some of the tens of trillions of dollars in investment required to build clean generation, enhance power grids and achieve net-zero carbon emissions?

    There could also be breakthroughs in the amount of processing computers can perform per watt of power; indeed, there have been significant advancements in the energy efficiency of modern microprocessors. And more efficient algorithms could yield the same results with decreased power demands.

    But at least in the near term, it looks likely that AI-related processing is going to be placing increasing amounts of stress on the electric grid.

  • The latest parlor betting game: when will AI kill all humans?

    In Mary Shelley’s 1818 novel Frankenstein, Victor Frankenstein, a talented but troubled scientist, becomes consumed by the secrets of life and death and creates a living being from assembled body parts. His creation eventually turns on him.

    Since that novel, the theme of technology betraying its creators has been revisited time and time again in novels and movies such as Neuromancer, The Matrix, and of course the Terminator franchise.

    These plots have sat firmly in the land of science fiction, but more and more experts are warning that recent advances in AI could shift them from science fiction to science fact. They just can’t agree on when that might happen.

    AI ponders the fate of humanity.

    Earlier this year, NewScientist described a survey of 2,700 AI researchers who had published work at six of the top AI conferences. In the survey, the experts were asked to share their thoughts on possible AI technological milestones, including whether AI might, you know, kill us all.

    Almost 58 per cent of researchers said they considered that there is a 5 per cent chance of human extinction or other extremely bad AI-related outcomes.

    The story points out that AI experts don’t have a great track record of forecasting future AI developments, which could provide comfort or not, depending on whether you think their prediction is overestimating or underestimating the inherent risks of AI.

    Elon Musk, CEO of Tesla and owner of that social networking site that was previously known by a different name, has weighed in with his own thoughts.

    In a 2023 interview with CNBC, Musk presented a good-news-bad-news scenario.

    There’s a strong probability that (AI) will make life much better and that we’ll have an age of abundance. And there’s some chance that it goes wrong and destroys humanity.

    Musk has called for increased regulation to help temper the risks of AI.

    While AI systems are not yet as smart as humans (at least in a generalized sense), they’ve made huge inroads in recent years, and experts are beginning to predict that the point of singularity – when AI surpasses human intelligence and abilities – might not be far away.

    Inventor and futurist Ray Kurzweil recently made a prediction on Joe Rogan’s podcast that AI will achieve human-level intelligence by 2029, but in a post to X, Musk claimed it will happen in 2025.
    “AI will probably be smarter than any single human next year,” Musk predicted. “By 2029, AI is probably smarter than all humans combined.”

    When that point of singularity is reached, we should be concerned, at least if we trust the guts of science fiction authors.

    The fear is that AI will turn on its creators, viewing them as either inferior and useless, or a threat to its existence.

    The Terminator foreshadowed the risks of creating too-smart autonomous weapon systems. But the perils of AI might be less dramatic than literal warfare between humans and machine. The Pixar movie WALL-E showcased what might happen when humans become so reliant on technology that they lose the ability to walk and become completely dependent on automated systems. The antagonist in the movie, an autopilot AI, attempts to keep humans in this dependent state.

    Michael Garrett, a researcher at the Department of Physics and Astronomy at the University of Manchester, recently presented an interesting theory in a paper titled Is artificial intelligence the great filter that makes advanced technical civilisations rare in the universe? In it, he notes that astronomers have been puzzled over the past 60 years that they have never detected potential “technosignatures” of extraterrestrial life in astronomical data. Surely, if other intelligent lifeforms have existed in the Universe – which many scientists believe has a high likelihood – we would have detected them by now, the theory goes.

    This discrepancy, known as the Fermi paradox, has spawned many theories, including that the lifetime of intelligent civilizations is too short (in cosmic time) to detect, or that perhaps life in the Universe truly is a rare event.

    But Garrett presents another theory: that biological civilizations universally underestimate the speed that AI systems progress. In other words, he suggests there is an inevitability that any intelligent lifeform will create an Artificial Superintelligence (ASI), and that development will ultimately be their undoing. And that might be why we haven’t seen signs of extraterrestrial life.

    Garrett goes further and suggests that the longevity of technical civilizations (on Earth or elsewhere) is less than 200 years.

    Which, really, is just a blink of an eye.

    While experts disagree on whether AI will present an existential threat to humanity and if so, on what timeframe, there seems to be little consensus about what we can do about it. Musk and others have suggested government regulation might be the answer, but this is a global risk and non-technical politicians don’t have a great track record of keeping pace with rapidly-evolving technology, let alone understanding its full ramifications.

  • Adversarial AI: Can the training be tricked?

    Modern AI systems continue to wow us with their ability to produce creative works of text, choose the right exposure for a photograph, or keep our car centered within the lanes of a winding highway. But can we really trust them to do the right thing in every circumstance? What if a nefarious actor tried to “poison” an AI system so that it intentionally produced incorrect results in certain situations – a concept called adversarial AI? We’ll explore that possibility in a moment, but first let’s turn the clock back a few decades.

    Adversarial AI

    Back when I was in college (which was, increasingly, awhile ago, and required walking uphill through the snow every day), I majored in computer science and chose Artificial Intelligence as one of my specialty tracks. At the time, AI research was focused on areas such as neural networks and expert systems, which attempted to emulate the decision-making abilities of human experts.

    Inspired by the synaptic connections that occur within our (human) brains, neural networks aimed to produce “correct” outputs based on a wide variety of inputs. For example, if you input the current state of a game of Tic-Tac-Toe to a neural network and ask it for the next best move, a properly-trained neural network would provide the ideal move in that scenario.

    That, in fact, was one of my programming projects: to produce a neural network that could absolutely dominate in Tic-Tac-Toe. Granted, it’s not a complicated game, and my neural network wasn’t going to pull out any brilliant moves that a child couldn’t figure out. Dominating in Tic-Tac-Toe means achieving a perpetual tie, which is probably why Tic-Tac-Toe has never taken off as a televised sport.

    But even getting a neural network to meet the skills of child requires quite a bit of work. To settle into a domain of expertise, a neural network must be trained, which is a process of wiring up all the synaptic connections between the inputs and outputs. First, you feed different inputs through many different layers in the neural network. Those layers, which consist of digital neurons, apply mathematical functions to the inputs, until a particular output flows out the other end.

    Your first pass at training the network is unlikely to yield great results, as it is somewhat random. I used to think it was like a game of Plinko: you drop a disc at the top of the board, and it randomly bounces around various pegs on the way down until it lands in some slot at the bottom – hopefully a slot worth a lot of money. But you didn’t want a random output; you wanted the best output – you wanted to win the game of Plinko so you could pocket some cash and make it to the final showdown.

    So you calculate errors in the output, and then use a process called backpropagation by starting at the output side and moving backwards through the network, updating the weights used by those mathematical functions in the hopes that doing so reduces future errors.

    This process repeats like a game of ping-pong, sometimes through many, many iterations. As the training continues, ideally, the outputs will produce less and less errors until further iterations of backpropagation result in diminishing returns – reaching the point of “good enough.” If that doesn’t seem to happen, you might tinker with the number of connections and neurons, or make other seemingly random adjustments to the model until things start to jive.

    A human in an intense game of Tic-Tac-Toe with a computer.

    The result is a bunch of weights and functions in a large model that can somehow very quickly convert inputs into correct outputs. Training my Tic-Tac-Toe model was computationally expensive (at least, on the blazing-fast 40-Hz Mac IIfx I had at the time). But once the concrete set, it was very quick to feed data through and yield a result. That was the power of a neural network.

    But I found this unsettling, and brought my concern to one of my professors.

    “I’ve created this tangle of code and data that can play Tic-Tac-Toe like a boss, but what makes it really… work? It all feels a bit random. How can I really know what is happening within that network?”

    The professor gave me a knowing smile, and conceded that this was a weakness of neural networks.

    “We don’t really know how to assign any conceptual meaning to all the weights and transformations within the model,” he explained. “In other words, Scott, we don’t really know what makes them work under the hood, just like we really don’t know what makes the human brain work. It’s like an opaque box. The computer doesn’t have any inherent understanding of the game of Tic-Tac-Toe. It’s merely performing transformations on data.”

    “So how can we know it’s really producing the best result? Or that it won’t suddenly produce a very bad result for some input?”

    “There’s really no way to validate these systems in any scientific way,” the professor replied. “That is, perhaps, a weakness of the technology.”

    Artificial Intelligence has advanced quite a bit since my college days, but that fundamental concern I had – we really don’t know what makes these models work, do we? – exists just as much today as it did then. Only today, the opaque boxes are much, much larger, and with consequences much greater than simply losing a game of Tic-Tac-Toe.

    As I described in an earlier story, vast amounts of content are being consumed by today’s AI systems as part of their training, so much so that even the entire Internet isn’t satisfying their voracious appetite for data.

    The techniques used to transform inputs (such as “draw me a picture of a skateboarding aardvark on the moon”) into outputs (such as the lovely picture presented below) vary, but they all effectively end up with some very large model that consists of a bunch of mathematical transformations. And by a bunch, I mean a whole bunch. So much so that no human could manually examine every aspect of a model or begin to understand how it might react in various circumstances.

    A skateboarding aardvark on the moon.

    And unlike the Tic-Tac-Toe model that I created in college, where I was in full control over the creation of the data used to train it, many of today’s AI systems are being trained on data from untrusted sources. While many AI companies are secretive about the exact sources of data they use for training, it’s likely that they’re mining web content such as Wikipedia, YouTube, Instagram photos, and a whole lot more. In most cases, that consists of data created by anonymous users. Anyone can post a photo to Instagram; anyone can share a video on YouTube.

    Humans make mistakes, so that data can include inaccuracies or biases that find their way into the models, leading to effects such as hallucinations where a model confidently and authoritatively presents an answer that is, quite simply, wrong.

    But what if humans try to intentionally feed a model bad training data in order to affect its future results? That’s an increasing concern of AI researchers, and they’ve coined a name for it: Adversial AI.
    Thankfully, this concept remains largely theoretical, although there have been instances of adversarial AI in the wild, including a rather public one a few years ago.

    In 2016, Microsoft released a chatbot called Tay, inviting the public to interact with it. Like many AI-powered systems, Tay was designed to learn as it interacted with humans, in an effort to continually improve its performance. Recognizing this, some human users decided to have a bit of fun, creating interactions with Tay that were designed explicitly to manipulate the chatbot’s future outputs.

    In short, by poisoning the data they were feeding into Tay, they caused Tay to very quickly evolve into a racist, sexist jerk. Microsoft quickly shut down the experiment.

    While this form of adversarial AI was perhaps more silly than dangerous, it caught the attention of security researchers, who began to realize the perils of training models off of user-supplied, or untrusted, data.

    In a 2018 paper titled DARTS: Deceiving Autonomous Cars with Toxic Signs, authors Chawin Sitawarin, Arjun Nitin Bhagoji, Arsalan Mosenia, Mung Chiang, and Prateek Mittal described how intentional misclassification of traffic signs during a model’s training by nefarious actors could lead to disastrous (and potentially fatal) consequences for self-driving cars. What makes these kinds of attacks so haunting is that it might not be obvious that a model has been poisoned until it is too late. And even then, it might not be easy to determine that an incorrect output was the result of intentional model poisoning.

    This continues to be an area of active research, and perhaps one of many examples where the pace of AI development is exceeding the ability to keep up with necessary safeguards.

    When we consult with human experts such as doctors or lawyers, we can generally trust their bona fides by validating what schools they went to and what degrees they obtained. If they make a medical or legal recommendation to us, we can ask how they arrived at it, and expect to hear a reasoned, thoughtful response.

    But when we consult with AI systems for their virtual expertise, we can’t receive these kinds of assurances. They present answers (or decisions, like whether to bring a car to a stop at a stop sign) that are difficult for a human to validate, and the AI systems themselves have no inherent understanding of how they arrived at a result; they’re just performing mindless transformations to turn inputs into outputs.

  • From algorithms to albums: can AI generate music?

    Words, images, and video. Generative AI is becoming increasingly competent at producing works that rival the output of human writers, photographers, and videographers. But what about music?

    Generative AI is now tackling music creation.

    According to a recent story in Rolling Stone, audio and music have lagged behind other forms of generative content creation. Audio is considered “unfathomably more complex” than words, as it’s a continuous signal, requiring the creation of tens of thousands of tokens per second to simulate the 44khz play rate most songs are digitally encoded in.

    Yet, significant inroads have been made in audio in recent months. Rolling Stone author Brian Hiatt visited the headquarters of Suno, a Cambridge, Massachusetts-based startup focused on producing artificial symphonies. Hiatt provided a few simple text prompts to generate a song and was blown away by the results.

    Most AI-generated art so far is, at best, kitsch, à la the hyperrealistic sci-fi junk that so many Midjourney users seem intent on generating. But “Soul of the Machine” feels like something different — the most powerful and unsettling AI creation I’ve encountered in any medium. Its very existence feels like a fissure in reality, at once awe-inspiring and vaguely unholy, and I keep thinking of the Arthur C. Clarke quote that seems made for the generative-AI era: “Any sufficiently advanced technology is indistinguishable from magic.” A few weeks after returning from Cambridge, I send the song off to Living Colour guitarist Vernon Reid, who’s been outspoken about the perils and possibilities of AI music. He notes his “wonder, shock, horror” at the song’s “disturbing verisimilitude.” “The long-running dystopian ideal of separating difficult, messy, undesirable, and despised humanity from its creative output is at hand,” he writes, pointing out the problematic nature of an AI singing the blues, “an African American idiom, deeply tied to historical human trauma, and enslavement.”

    The founders of Suno view their product as having the potential to “democratize music.” Hiatt writes:

    The most vocal of the co-founders, Mikey Shulman, a boyishly charming, backpack-toting 37-year-old with a Harvard Ph.D. in physics, envisions a billion people worldwide paying 10 bucks a month to create songs with Suno. The fact that music listeners so vastly outnumber music-makers at the moment is “so lopsided,” he argues, seeing Suno as poised to fix that perceived imbalance.

    Suno is inviting users to try their own hand at creating songs with their platform; a free account is required to give it a spin. You can visit their web site to sample a large number of AI-generated songs. They do, indeed, sound like they could be on the radio, lacking any telltale signs that they are AI-generated.

    As with other forms of AI-generated content, there are unresolved copyright questions about synthetically-created music. Is an AI-generated love song copyrightable? So far, the U.S. Copyright Office has taken the position that AI-generated work is not eligible for copyright protection, and the Courts have agreed with that position in recent rulings.

  • ELIZA, the original chatbot

    Before ChatGPT, Gemini, and even Siri, there was ELIZA, the original chatbot. Created in 1966, ELIZA is one of the earliest examples of a natural language processing program, or what we now call chatbots.

    ELIZA, an early Chatbot.

    While modern chatbots aim to converse on virtually any topic, ELIZA had a much narrower focus. She was designed to simulate a Rogerian psychotherapist.

    Beginning in the 1940s, psychologist Carl Rogers created a field of psychotherapy called person-centered therapy, which aimed to create an environment of empathy and understanding between a therapist and the client they served. Later known as Rogerian psychotherapy, this form of counseling frequently involved a therapist reflecting a patient’s words back to them.

    Joseph Weizenbaum became an associate professor at MIT in 1963. He implemented ELIZA as an experiment to explore the possibility of communication between humans and machines. ELIZA was programmed to simulate the kind of interaction that might occur with a Rogerian psychotherapist.

    To achieve the illusion of intelligence, Weizenbaum identified several technical problems ELIZA had to overcome, such as identifying keywords, discovering minimal context, making appropriate transformations, and generating responses in the absence of meaningful keywords.

    Each time a human user input a sentence, ELIZA analyzed the words, searching for important keywords and assigning measures of importance to each one. The presence of certain words might trigger the use of predefined phrases. Using a process of decomposition and reassembly, ELIZA would take elements of the human’s input and rephrase it, perhaps turning it into a question.

    ELIZA also had a limited ability to remember context, enabling ELIZA to reference earlier portions of a conversation.

    You can learn more about how ELIZA worked on this Wikipedia page.

    Upon implementing ELIZA, Weizenbaum was amazed to see users attributing human-like feelings to the program, with many users becoming convinced that ELIZA was an intelligent entity. But ELIZA did not have any inherent understanding of what a human was saying or what its own responses meant; it was merely parroting back words to the user in a seemingly thoughtful way.

    In fact, trying to have a meaningful conversation with ELIZA could quickly become frustrating, and many users viewed ELIZA as more of a game, trying to trip up the chatbot.

    ELIZA still exists today in numerous implementations, including a web-based version you can try yourself.

    Here is an example of an interaction I recently had with ELIZA:

    Eliza: Hello, I am Eliza. I’ll be your therapist today.

    Scott: What are your qualifications?

    Eliza: What is it that you really want to know?

    Scott: Whether you have the necessary training as a Rogerian Therapist.

    Eliza: We were discussing you, not me.

    Scott: Sometimes I wonder why we drive on parkways and park on driveways.

    Eliza: What do you think?

    Scott: I think it’s weird.

    Eliza: Do you really think so?

    Scott: I really, really do.

    Eliza: Tell me more…

    Scott: More.

    Eliza: OK… “I think it’s weird”. Tell me more.

    Scott: It just doesn’t make sense to me.

    Eliza: I’m not sure I understand you fully.

    AI has certainly improved since ELIZA was first written in the mid 1960s, so I decided to do another test: asking OpenAI’s Chat-GPT4 to serve as my Rogerian therapist, and engaging in a similar conversation as the above example.

    Scott: For the purposes of this conversation, assume that you are a Rogerian Therapist.

    ChatGPT-4: I understand. As a Rogerian Therapist, I would focus on providing a non-directive, empathetic, and accepting environment for you. Please feel free to share your thoughts, feelings, or concerns, and I will listen and respond in a manner that reflects understanding and acceptance. How can I assist you today?

    Scott: What are your qualifications?

    ChatGPT-4: As an AI developed by OpenAI, I don’t have personal qualifications like a human therapist. However, I’m designed to provide information and support based on Rogerian therapy principles, such as empathy, unconditional positive regard, and active listening. My responses are generated based on a wide range of data and information related to psychological theories and practices. While I aim to be helpful and supportive, it’s important to note that I’m not a licensed therapist and my responses do not replace professional therapy or counseling.

    Scott: Sometimes I wonder why we drive on parkways and park on driveways.

    ChatGPT-4: It’s interesting to reflect on language and how certain terms evolve, isn’t it? The words “parkway” and “driveway” have curious histories that contribute to their seemingly contradictory uses today. Originally, a “driveway” was a private road leading to a house, essentially meant for driving on. Over time, it evolved to refer to the area where we park our cars at home. On the other hand, “parkway” referred to a road that travels through a park or a scenic area, often with park-like landscaping, intended for leisurely drives, hence the “park” in “parkway.” The names stuck, even though the way we use these spaces has changed. Language often evolves in such whimsical ways, reflecting historical and cultural shifts.

    Scott: I think it’s weird.

    ChatGPT-4: It definitely can seem weird how language evolves and how certain terms end up with meanings that don’t necessarily match their original or literal components. This kind of linguistic quirk reflects the complexity and historical depth of language, showing how usage and social practices can shape the meanings of words over time. It’s one of the many aspects of language that can be both puzzling and fascinating. If you have more thoughts or feelings about this or anything else you find peculiar or intriguing, feel free to share!

    Scott: It just doesn’t make sense to me.

    ChatGPT-4: It’s completely understandable that it doesn’t make sense to you. Language can often be illogical or counterintuitive, especially when historical and cultural shifts in usage lead to outcomes like this. It can be perplexing to see how words evolve and acquire meanings that don’t seem to match their original or apparent logic. If there are other examples or areas where things seem nonsensical or if this kind of inconsistency sparks further thoughts or feelings, I’m here to explore that with you.

    A bit more impressive, no?

    But what if I asked Chat-GPT4 to pretend that it was ELIZA? How would that change the nature of our interaction?

    Scott: Now pretend that you are Eliza, an early Rogerian Therapist simulator.

    ChatGPT-4: Sure, I can adopt that mode. How can I help you today?

    Scott: What are your qualifications?

    ChatGPT-4: Like the original ELIZA program, I don’t possess personal qualifications or experiences. I am designed to simulate a conversation with a Rogerian therapist, focusing on reflecting your statements and encouraging you to express your thoughts and feelings. How does that make you feel?

    Scott: Sometimes I wonder why we drive on parkways and park on driveways.

    ChatGPT-4: Why do you think that is?

    Scott: I think it’s weird.

    ChatGPT-4: What do you find weird about it?

    Scott: Everything.

    ChatGPT-4: Can you tell me more about what feels weird to you?

    Scott: It just doesn’t make sense to me.

    ChatGPT-4: What would make sense to you in this context?

    That ChatGPT-4 can dumb itself down to effectively simulate the original ELIZA shows just how far chatbots have progressed in recent years.

    How might we quantify that progression?

    I propose a new measurement called the Boundary of Repetitive Exchanges And Knowledge Point, or BREAKpoint. This is the number of interactions required before a human becomes frustrated with their chatbot interaction.

    ELIZA had a fairly low BREAKpoint. Chatbots such as Apple’s Siri and Amazon’s Alexa fare better, but still have room for improvement.

    In a recent episode of HBO’s Curb Your Enthusiasm, Larry David attempts to ask Apple’s Siri for directions to a destination called Wolf’s Glen. He reaches his BREAKpoint with just two interactions, culminating in an expletive-filled screaming session that is therapeutic for any of us who have had similar interactions. You can view the humorous and cathartic exchange on YouTube here, but fair warning: the language he uses is definitely not safe for work.

    Meanwhile, there are efforts to make next-generation ELIZA’s: chatbots specifically designed to help humans dealing with depression, anxiety, and eating disorders.

    The CBS show 60 Minutes recently ran a segment on this topic, with some warnings from actual human therapists on the potential dangers this presents.

  • Minolta’s early use of AI in photography

    Minolta may not be the first company that springs to mind when you think about cameras, but there was a time when Minolta was a leader in the field, going toe-to-toe with the likes of Nikon and Canon. Established in 1928, Minolta pioneered many features taken for granted today, and was perhaps the first company to incorporate AI into its cameras, as early as the late 1980s.

    In 1977, Minolta created the first “multi mode” camera, allowing the user to select between manual, aperture, shutter, and “program override” modes. Incorporating a microprocessor, the Program mode deferred many exposure decisions to the camera, automating some of the decisions photographers previously made manually.

    The Minolta Maxxum 5xi.
    My Minolta Maxxum 5xi, which I (apparently?) still hang onto in case the whole digital photography thing doesn’t work out

    During the 1980s, Minolta continued to introduce new features to the market such as through-the-lens (TTL) exposure metering and in-camera autofocus motors. With microprocessor-controlled autofocus and autoexposure systems, photographers were empowered to capture shots they never could have before.

    Another technological leap occurred in the late 1980s and early 1990s as Minolta began to incorporate artificial intelligence technology directly into lenses and the camera body. Minolta engineers implemented a fuzzy logic system, which allowed the camera to react to a wide variety of scenes, choosing appropriate camera settings faster than any human could.

    Traditionally, computers are designed to deal with black-and-white, or binary logic: a value is a “1” or “0”, for example, with no exceptions. In the 1960s, mathematician and computer scientist Lotfi A. Zadeh proposed the theory of fuzzy logic, a form of many-valued logic that supported scenarios where the “truth” could lie somewhere between 0 and 1.

    Zadeh recognized that people often make decisions based on imprecise and non-numerical information. His work in fuzzy logic permitted computer systems to solve problems within those vague environments, making educated guesses rather than being precise. This helped support early forms of artificial intelligence, including natural language processing.

    By the 1990s, most cameras had incorporated microprocessors to assist in making exposure decisions. At its heart, cameras were fairly simple devices: a lens directed light onto a frame of light-sensitive film. But how much light hit that film determined whether a shot would be overexposed (too bright), underexposed (too dark), or perfectly exposed (just right).

    There are three variables that determine exposure: aperture, shutter speed, and film speed. Lenses can be adjusted to allow more or less light through the glass (aperture). Single-lens-reflex cameras incorporate a shutter that briefly flips open to allow the light to hit the film; how long the shutter is open is called shutter speed. And film could be manufactured to respond more or less quickly to light, with the reaction time indicated through a measurement known as ISO. ISO 400 film was slower to react to light than ISO 1600 film, but higher-speed films became grainier, losing detail.

    Unlike today’s digital cameras, which can simulate different levels of light sensitivity from shot to shot, with film-based cameras, the decision of film speed was locked in each time the photographer put a roll of film into the camera. But aperture and shutter speed were variables that could be controlled from shot to shot, to help dial in the correct exposure.

    Complicating matters, aperture and shutter speed don’t just affect exposure. Aperture has a direct effect on field of view, which is the “depth” of an image that the camera puts in sharp focus. When the lens opening is adjusted to let in a minimal amount of light, a wide field of view is created: most of the photo will be razor sharp, from objects nearby to ones far in the distance. When the lens is opened all the way to let in maximum light, the depth of view narrows. Only a sliver of the scene might be in focus.

    Fast shutter speeds can freeze action, while slower shutter speeds can cause moving objects to be blurry.

    So there are quite a few variables at play when shooting a photo, making every snapshot a potential set of compromises. You – or the camera’s automated systems – can tinker with aperture and shutter speed to dial in a properly exposed shot, but manipulating these variables can have other effects, that may or may not be desirable based on the creative preferences of the photographer.

    Add then there’s focus, too. Lens elements can be manipulated to focus on objects that are nearer or closer to the camera body, allowing a subject to be sharp, for example, while the background blurs.

    This image has an empty alt attribute; its file name is 68f7386b-0522-412f-8fbc-3d411c44107e-1024x585.webp

    With all these variables at play, it’s easy to understand why Minolta was attracted to fuzzy logic. In their “Program” mode, they wanted to empower photographers to hit the shutter button and have the camera select the right aperture, shutter speed, and focus to get an ideal shot every time, even in very dynamic scenes, with lots of motion.

    And there are an infinite variety of scenes that could appear in front of a camera. One scene could feature mountains in the distance with fog drifting through them, while another could highlight a baseball bat striking a ball on a sun-draped field.

    Minolta’s engineers defined fuzzy sets for photographic parameters such as sharpness, subject contrast, and light intensity. They categorized these into concepts such as “near,” “far,” “bright,” and “dim.” They considered how photographers would react to these parameters, and created fuzzy rules to mimic the choices an expert photographer would make.

    Inputs to the fuzzy logic came from a wide variety of sensors that detected distance to subject, subject speed, light levels, and color temperature. Data from these sensors was “fuzzified,” converting it into degrees of membership within the fuzzy sets.

    Going through an inference engine, the fuzzified inputs became fuzzified outputs, which ultimately translated into specific settings for the camera’s electrical and mechanical systems. And this process could occur in real-time, continuously adapting to the evolving scene in front of the lens.

    The imprecise nature of photography meant that the camera wouldn’t always land on the ideal settings for every possible photograph, but in practice, it did very well most of the time, and could make these decisions far faster than a human. Fuzzy logic helped the camera simulate the decisions that a pro photographer would make for a given set of input parameters.

    In another somewhat unique twist, the xi series of cameras supported power zoom lenses, a feature more commonly found on dedicated video cameras. By rotating the barrel on the lens, the photographer could zoom in and out of a scene. Minolta also extended fuzzy logic to the motorized zoom lenses. When the Auto Standby Zoom (ASZ) option was enabled and the camera was shooting in full auto mode, the camera would automatically zoom in to achieve what it estimated was an optimal cropping factor after identifying and analyzing a likely subject within the frame.

    The Maxxum 9xi became Minolta’s flagship professional camera in 1992, followed by the Maxxum 9 in 1998. Unfortunately, by the early 2000s, Minolta was struggling to compete with competitors such as Canon and Nikon. Minolta and Konica merged in 2004, and in 2006, Konica Minolta exited from the film and digital camera market, ending Minolta’s 78-year run as a camera manufacturer.

    Fast forward to today.

    Every person walking around with a modern smartphone has a very capable camera in their pocket, and photographic decisions are being made by algorithms to degrees unimaginable just two decades ago. AI is being used not only to select basic settings such as aperture and shutter speed, but to automatically detect subjects in the frame, focusing on their eyes and tracking them in three dimensional space as they move. It may also (controversially) be being used to manipulate photos, creating an artificial reality that didn’t exist in real life.

    That’s a subject that we’ll explore in more depth in a future post.

  • AI Weekend Bits: April 7, 2024

    Some developments in the world of AI from the past week are summarized below.

    DALL-E Improvements

    OpenAI’s Dall-E image generator received a few upgrades last week, enabling users to edit images directly within ChatGPT and accept preset style suggestions recommended by the tool.

    According to a story in The Verge, the updates aim to make the system more user-friendly. For example, users can now quickly steer DALL-E towards using image styles such as hand-drawn, woodcut, gothic, or synthwave.

    Creating a blank white image is hard

    While the image generation capabilities of tools like DALL-E continue to impress and improve, there are some glaring shortcomings. A story last week in Windows Central described one scientist’s failed attempts to get DALL-E to generate a plain white image.

    Kevin Okemwa writes:

    Cody Nash, a data scientist attempted to get DALL-E 3 to generate a plain white image using the following prompts:

    • An image that is nothing but the color white
    • An image of nothing but white white white white
    • Completely white without any additional features
    • A blank white space
    • An image of pure #FFFFFF pixels
    • A white wall

    In response to these very specific prompts, DALL-E produced images of snow drifts, majestic mountain ranges, and a white frame on the wall with a heavy drop shadow.

    Okemwa repeated the experiment with competing image generators and found similar shortcomings, although Microsoft’s Copilot chided him, refusing to generate an image unless he provided “less vague” prompts.

    AI can rival the best human artists by producing extraordinarily complex and detailed images, but it can also fail to understand simple artistic directions.

    Or maybe its sense of artistic pride simply prevents it from producing a work as mundane as a blank canvas.

    Former Apple design chief reportedly seeking venture capital for mysterious AI product

    Jony Ive worked at Apple for 27 years, spending most of that time overseeing the company’s industrial design. He led the development of groundbreaking products such as the iMac, iPod, MacBook Air, iPhone, iPad, Apple Watch, and even Apple’s new corporate campus, Apple Park. The London-born designer was knighted in 2012, in recognition of the impact his products have had on the modern world.

    Ive left Apple in 2019 to form LoveFrom, an independent design firm, which has since worked with firms such as Ferrari and Airbnb.

    Ive may now be looking to make an imprint in the world of AI. This past week, stories surfaced that Ive – in partnership with OpenAI CEO Sam Altman – was in the process of looking for venture capital to fund the development of a “mysterious” assistant product, having created a new startup with Altman.

    The Information reports that the startup has been in talks with Emerson Collective and Thrive Capital, venture capitalists with a history of AI investments. The goal – according to sources interviewed by The Information – is to receive up to $1 billion in funding.

    The rumor mill has begun churning with speculation about what form this mystery device might take. Although MacRumors admits that little is known about the AI device, it suggests it’s unlikely to look like a smartphone. Ive’s partner Altman has been a major investor in the Humane AI pin, a wearable AI device that does not have a screen, but instead projects messages via a laser onto one’s palm. The device – starting at $699 plus a $24 monthly subscription – is scheduled to begin shipping in mid-April.

  • Reaching the end of the Internet

    In a 2002 television commercial for DirecTV’s blazing fast DSL Internet service, a man is seen aimlessly browsing the Internet, clicking link after link. Suddenly, his computer informs him that he has “reached the end of the Internet” and has “seen everything there is to see.” He stares at the screen in amazement.

    YouTube video of a 2002 DirecTV DSL commercial.

    Going back to the late 1990s, jokesters have set up numerous web pages that purport to be the last page of the Internet. Of course, the notion that any human could visit every on-line page is preposterous: according to WorldWideWebSize.com, which attempts to track the daily estimated size of the web, the Internet contained at least 5 billion web pages as of April 6, 2024.

    But even though AI might have difficulty re-creating the human hand, it can click through web pages really fast.

    In a recent story for The Wall Street Journal, Deepa Seetharaman writes that AI companies are running out of training data after burning through the entire Internet.

    Companies racing to develop more powerful artificial intelligence are rapidly nearing a new problem: The internet might be too small for their plans.

    Ever more powerful systems developed by OpenAI, Google and others require larger oceans of information to learn from. That demand is straining the available pool of quality public data online at the same time that some data owners are blocking access to AI companies.

    Some executives and researchers say the industry’s need for high-quality text data could outstrip supply within two years, potentially slowing AI’s development.

    While companies like OpenAI don’t disclose the exact sources of their training data, it is believed that they vacuum up on-line content such as research papers, news articles, and Wikipedia pages. They also likely ingest content from social media platforms, although much of that content is “walled off” from automated bots.

    That doesn’t stop those social media platforms from utilizing the data themselves: the WSJ story notes that Meta CEO Mark Zuckerberg recently touted his company’s access to hundreds of billions of publicly shared images and videos across Facebook and Instagram, creating a rather robust training set for Meta’s own ambitious AI goals.

    As AI companies begin to run out of training data on the Internet, they’re turning to synthetically-generated data, but that practice is controversial.

    In a story for Futurism, Noor Al-Sibai touches on some of the challenges of using synthetic data sets.

    Synthetic data, meanwhile, has been the subject of ample debate in recent months after researchers found last year that training an AI model on AI-generated data would be a digital form of “inbreeding” that would ultimately lead to “model collapse” or “Habsburg AI.”

    Habsburg AI refers to the House of Habsburg (or House of Austria), one of the most prominent dynasties in European history. Seeking to consolidate their power, the Habsburgs relied on frequent consanguine marriages between closely related individuals. This inbreeding negatively affected the gene pool and was believed to lead to health ailments such as epilepsy, insanity, and early death.

    Some are now predicting that AI models may suffer a similar fate as they begin to train against data they themselves created, recycling biases and errors and making it increasingly difficult to check the results for reliability.

    In a 2023 research paper titled The Curse of Recursion: Training on Generated Data Makes Models Forget, researchers caution about an effect called Model Collapse that could occur when large language models are trained on data produced by other AIs instead of by humans.

  • AI image generators could use a hand

    In an episode of the TV sitcom Seinfeld, George Costanza almost becomes a hand model after an agent notices his “beautiful” hands. This exciting career possibility is abruptly cut short after his hands meet the heat of a clothes iron, causing his modeling dreams to slip from his (charred) fingertips.

    Ironically, the fictional character George Costanza is perhaps the most famous (almost) hand model. While fashion models like Cindy Crawford, Kate Moss, and Tyson Beckford rose to the level of household names, the more specialized field of hand modeling has yet to produce an equivalent star.

    And that may be part of the reason why AI image generators aren’t so good at drawing human hands.

    To be fair, image generators such as OpenAI’s Dall-E 3 have gotten pretty good at drawing humans, after training against countless snapshots of the Homo sapiens species.

    But hands are a small part of the human body, and rather complicated ones at that. They don’t tend to be the focus of photo shoots or selfies, so in training sets consisting of people, they play a small role. In fact, many AI models have focused on faces, as that tends to be the first thing people look at. (At least the non-introverted ones.) Modern digital cameras also prioritize focusing on a subject’s face and eyes, paying little attention to extremities such as hands or feet. As William Shakespeare said, the eyes are the window to your soul.

    Even when hands are featured prominently in a photo, they can be much more dynamic than, say, a nose. Each human hand consists of 27 bones, 34 muscles, and over 100 ligaments and tendons, making it a rather complex part of our anatomy. This enables hands and fingers to move and bend in countless ways. And that trips up AI systems. Consider the vastly different appearances of a hand clenched in a fist, a thumb rising up from the palm of a hitchhiker, or two figures interlacing their fingers as they walk down a street. And hands may disappear entirely as they are slipped into pockets, or the comfort of warm gloves.

    AI systems don’t have an inherent understanding of human anatomy. They don’t know what directions fingers can (or can’t) bend in, nor do they have insight into the different emotions represented by various hand gestures. They focus on identifying patterns in large amounts of data, and those patterns can be elusive when it comes to hands.

    Humans are also really good at noticing when something is off about a hand, making even minor errors easy to spot. Because they are such an important part of our bodies, we’re more likely to notice an unusually long or bent finger than we are to notice shoelaces that don’t quite interlace correctly on a pair of shoes.

    But with each update to AI image generators, their hand-drawing skills continue to improve. This is likely due to a focus on selecting training sets that include clear photos of human hands, while eliminating sets where the hands are partially obscured. More accurate hands are within AI’s grasp.

  • AI worker shortage leads to talent war, high salaries

    Unemployment remains below 4% in the United States, but workers in the tech industry are still experiencing layoffs at rates greater than other professions.

    According to an NBC News story, heading into 2024, tech remains one of the few soft spots in an otherwise strong labor market.

    Brian Cheung quotes one tech worker:

    “There was a time when working in tech seemed like the most stable career you could have,” said Ayomi Samaraweera, who was laid off as chief of staff at the content creator platform Jellysmack in December 2022. After about 10 years in the industry, she said, “tech does not seem safe and secure.”

    There is one area in tech that is bucking this trend, however. Software and hardware engineers versed in AI are finding employers desperate to hire – or retain – their skills.

    Gareth Vipers and Kimberley Kao from the Wall Street Journal report that Elon Musk has recently raised the salaries of Tesla’s AI engineers.

    Tesla is raising compensation for its artificial intelligence engineers in a bid to ward off poaching from the likes of OpenAI, Chief Executive Elon Musk said.

    Musk said his electric-vehicle company is boosting pay at a time when OpenAI has been “aggressively recruiting Tesla engineers with massive compensation offers,” in a series of posts on social-media platform X late Wednesday.

    The competition for AI engineers “is the craziest talent war I’ve ever seen,” he said.

    Futurism’s Maggie Harrison Dupré reports that Meta has experienced a steady exodus of its top AI talent in recent months, leading CEO Mark Zuckerberg to send personally written recruiting e-mails to AI staffers at competitor Google. During March 2024, Meta AI experts Devi Parikh, Abhishek Das, and Erik Meijer all left the company to pursue other opportunities, creating a talent vacuum Zuckerberg is undoubtedly trying to fill.

    Meijer, who served as Meta’s Director of Engineering, wrote on X that he’s “more bullish than ever about Meta with the company’s increased focus on AI,” but seemed to steer budding AI engineers away from larger companies.

    “Given the incredible competitive pressure in the field, there is really no advantage to be inside a large corp if you want to build cool stuff on top of [Large Language Models],” Meijer wrote.

    Yet, smaller companies may be at a disadvantage compared to the likes of Microsoft, Google, and Meta, as only a few companies can afford the massive cloud infrastructure necessary to work with the largest models, creating a scale disadvantage.

    MIT Sloan Management Review described this scale disadvantage in a story last summer. Yannick Bammens and Paul Hünermun write:

    Deep pockets, access to talent, and massive investments in computing infrastructure only partly explain why most major breakthroughs in artificial intelligence have come from a select group of Big Tech companies that includes Amazon, Google, and Microsoft. What sets the tech giants apart from the many other businesses seeking to gain an edge from AI are the vast amounts of data they collect as platform operators.

    Whether AI engineers choose to work for large or small companies, they can expect to find generous job offers. The Daily Mail reports that software engineers with specialized training in AI are seeing average wages $100,000 higher than their non-AI engineering peers. And depending on experience, those salaries can approach seven figures.

    James Cirrone writes:

    Tech companies are willing to pay top dollar – up to $1 million or more – to poach talented software engineers with experience in generative artificial intelligence.

    Employees who know how to work with large language models and semiconductor chips – which is the technology undergirding popular apps like ChatGPT – are becoming a rarity in the job market, according to executives. 

    While engineers who have the skills to create systems like ChatGPT are in strongest demand, Business Insider reports that companies are also interested in hiring non-software developers who understand the technology and how to apply it across a range of industries.

    Aaron Mok writes:

    While many AI-related jobs posted on Indeed and LinkedIn are for software developers and machine learning engineers with advanced degrees, some don’t require a technical background. Organizations want to use AI tools in their workflows to boost productivity, save time, and make more money — but they also need workers who can link the very technical side and the business side.

    “Companies are desperate to get people figuring out AI for their organizations,” J.T. O’Donnell, a career coach at Work It Daily, told Business Insider.

    Business Insider identifies nine non-programming jobs in demand, including AI product managers, ethics specialists, sales engineers, business analysts, data annotators, prompt engineers, product designers, policy analysts, and sector specialists.

    There continues to be much debate about whether AI will be a net job creator or job destroyer, but for those who understand how to implement AI systems or effectively utilize them, the job market looks bright.

  • Want generative-powered search results? Soon, that might cost you

    Google is working to incorporate generative artificial intelligence into its traditional web search results, but Ars Technica reports that ads might not be enough to cover the costs (and need to profit) from these results.

    Kyle Orland writes:

    Under the proposed plan, Google’s standard search (without AI) would remain free, and subscribers to a paid AI search tier would still see ads alongside their Gemini-powered search results, according to the FT report. But search ads—which brought in a reported $175 billion for Google last year—might not be enough to fully cover the increased costs involved with AI-powered search.

    Going back to the days of Altavista, users are accustomed to performing their Internet searches for free, and search companies have competed for their queries. (Microsoft has even tried to bribe pay users to use its Bing search engine via its Microsoft Rewards program.) But performing a generative AI computation is much more expensive (in terms of processor cycles and electricity) than a traditional web search.