Mom

Mom

9 minutes of unbelievable violence changed everything. You – the kindest, gentlest person imaginable – were robbed of peace forever.

Four years later and there is still evidence of how you were beaten. But deeper than this is what you experienced.

You have faced the horrors of PTSD and depression and faced an ongoing daily battle after a series of breakdowns. The medications that have helped you survive, drown you in fog and have debilitating side effects. Your physical crises and injuries which have forced multiple restarts.

You have faced increased isolation as people moved on – especially after that first year when even I had to return to Joburg to try and rebuild my business for myself and my team. And as you pulled back from technology or were unable to manage it with your physical limitations. As people wrote off your condition to dementia.

Medical professionals failed you with misdiagnoses.

Medicines that should have worked didn’t or stopped working.

In all of this, your kindness, gentleness and character shone through. Your unending love for me, your smiles when you see me. Your concern and support.

Nurses fight to be allocated to care for you – you are unfailingly polite and kind and never fail to preface any request with a “please” and then ensure you thank people.

You have really tough days. And then days like yesterday when you found inspiration in the book “Mao’s Last Dancer” which you devoured in no time and told me about with such admiration.

I wish I could have done more and been more effective. I see mistakes I made or things I didn’t do and know there are things I do not know about – what I should have done or not done.

I try to learn from you and everything I confront.

Through all of the last four years I learnt about the life challenges that almost every person faces – be those aging parents, illness, mental health, relationship difficulties, divorce, cancer – and how to empathise more.

I learnt about angels – as you were to so many family members and me, as the few doctors who cared were, as my team members and assistant were. Clients and team members who believed. Friends. And even just the kind and thoughtful comments from a few wise people.

I have learnt about loss – especially of those who are close – and lack of control and influence. Again and again.

As I see you every four months, there are fewer and fewer times with you in my future.

So much of what is good in me is because of you.

Toxic chat

Toxic chat

I wrote about losing everything.

Nothing hurts more than losing people you are close to.

During the challenges I’ve faced over the past few years, some of the loss of those people happened due to toxic chat. I know, because I was told about it or given the clues.

When things are tough and people are having a tough time – especially in a business team – it is natural for people to talk with one another about the tough time they are facing and how another person may be contributing to that, being unfair, etc.

With that chatter, perspective and history is usually lost. But more critically, incorrect views can go unchallenged and misrepresentations can grow a life of their own.

As one of my team expressed unhappiness and referenced some data points last year, I followed up and checked their assertions. The “facts” they threw into an emotional argument were wrong. I never got to correct them before the team member moved on. From there, those “facts” might easily have been repeated (and seemingly were) and poisoned others views. All previous virtues get forgotten in moments like that and the poison spreads.

Griping, white-anting, etc destroys teams. But more than that, it destroys relationships. The poison revises the past and remains the memory.

Rebuilding a business is hard. But friends and team members who were lost are gone forever. Beyond that, it can result in losing people you love and care about. And it did.

The AI Agentic Workflow Explosion

The AI Agentic Workflow Explosion

I am Awe-Inspired and Terrified (to quote a friend)

I have said that I had an AI moment a few weeks ago (here). It was like I got hit by a lightning bolt.

It was such a simple thing – I saw Matt Wolfe embed a very targeted GPT request into a workflow. It’s a five month old video.

That was it. My world changed. I could see that using a GPT iteratively for inference-based micro tasks would change everything.

Since then I have watched what must be nearing a hundred of hours of video, read voraciously, experimented in deploying AI in Python and enabling local AI in large parts of our business platform. It has felt like I am living in science fiction.

The Current Dominant Consumer State of AI

I think many people have been using ChatGPT as kind of an advanced search (hallucinations and all). As they become more sophisticated they might start using it as an advisor and coder and use that for software configuration. The combination of those two can with be used for diagnosis and then correction.

This extended use can shortcut work or resolve unresolved errors. That is a big impact on productivity.

AI as a Workflow Step

With my new interest, my Google Assistant newsfeed showed me Sam Altman discussing how GPT-5 will include the ability to launch agents. I think this is where productivity really begins to explode, where multiple tasks in multiple apps on multiple servers can be chained together in a free-form fashion, as opposed to a deterministic workflow.

Altman says GPT 5 or something like a 4.5 will release in the US summer. So that is soon.

I think that will cement OpenAI not just as a chatbot and LLM but the front end to a ecosystem of models , orchestration and agents. The ChatGPT web or mobile interface already appears to be moving towards this.

While interacting with ChatGPT and watching various videos about AI, it is clear that ChatGPT is interface to more than the large language model . It is already calling agents for certain tasks. for example when we ask ChatGPT to draw a graph it spawns a python shell and generates a graph using that shell and that seems to represent a combination of the LLM which is good at inference and a python agent which is good at deterministic output. Perhaps the future will reflect a combination of these capabilities in a way that’s imperceptible to us where they combine and leverage and check one another.

Agent Visionaries

I am a bit late to this. Andrew Ng has been talking about agentic workflows for a while. At Sequoia’s recent AI open day (excellent – watch the whole day here), Andrew showed the impact of Agentic Workflow on the performance of underlying LLMs. It is staggering. It elevates lower capability LLMs to higher level performance – e.g. ChatGPT 3.5 outperforming ChatGPT 4.

Source: Andrew Ng – Sequoia Capital AI Ascent Day

I think what we’re going to see is the emergence of an AI agent economy and ecosystem.

In what seemed like simultaneous to my epiphany, Devin launched. It shook the software development industry. Suddenly macro level tasks that were outside the realm of GPT prompts due to their limited context window and lack of real-time content were accessible. Devin can independently execute an entire software development project. Within days, OpenDevin launched – an open source version. And then the open source Devika.

Devin is already working on tasks that are listed on Upwork. Devin solves it and then earns the bounty.

Devin represents a system using an overall orchestration platform for software development together with the ability to spawn agents and iteratively interact with LLMs.

I think that concept could be extended into other domains like research.

David Ondrej is a big fan of Crew.ai and Andrew Ng drew attention to the Python Langchain library (another presenter at the Sequoia open day). Both allow the use of agents in python. This is foundational to the explosion of further agentic applications like Devin.

Things accelerated. Yesterday Microsoft launched AutoDev – its agentic software development tool.

This will move exponentially faster now. I believe we will see agentic tools for provisioning IT servers and end-users in the next few weeks – it is an obvious candidate.

I think the above also will result in the atomization and extension of AI at multiple levels.

Agents as a Solution to Yann LeCun’s Hierarchical Planning Challenge

I wrote about Yann LeCun’s (Meta Chief AI scientist) criticism of the current AI models as a path to AGI here. He highlighted the inability of current LLMs to cross from the language domain and the complexity of hierarchical planning – his example: a trip from New York to London.

I think that the use of LLMs for discrete inferential tasks within broader workflows challenges that. Perhaps what we will see is a disaggregated model of AI performing these complex tasks and freedom from the context window limitations of current LLMs.

In my naive thinking, I imagine this to be like the specialist areas of the brain (e.g. the visual cortex) processing small discrete specialist tasks within a broader process.

Software Development Project Execution as a Model for Other Domains

Software development is similar to strategy consulting in that it requires a mix of hierarchical planning with lots of complexity and then execution. The idea that AI replaces a consultant is perhaps a long way off. The opportunity to execute a task exceptionally well remains a current opportunity. However, the combination of AI within consulting workflows and perhaps agents in ChatGPT5 is very exciting.

Researchers working with BCG (here) found that application of AI in task completion led to the following improvements:

  1. Increased Productivity: Consultants using AI completed significantly more tasks compared to those without AI. Specifically, they completed 12,2% more tasks on average.
  2. Enhanced Quality of Work: The quality of work, as measured by human graders, was significantly higher for consultants who used AI. The improvement in quality was more than 40% higher compared to the control group without AI.
  3. Faster Task Completion: Consultants using AI were able to complete tasks 25,1% more quickly, indicating a substantial increase in efficiency.
  4. Benefit Across Skill Levels: The study found that AI augmentation benefited consultants across the skills distribution. Consultants below the average performance threshold experienced a 43% increase in their scores, while those above the threshold saw a 17% increase compared to their own baseline scores without AI.

These are early days. However, let’s consider that summary again:

“For each one of a set of 18 realistic consulting tasks within the frontier of AI capabilities, consultants using AI were significantly more productive (they completed 12,2% more tasks on average, and completed tasks 25,1% more quickly), and produced significantly higher quality results (more than 40% higher quality compared to a control group).”

I think the challenge will be to structure consulting tasks into defined workflows so that discrete tasks suitable for AI and application of agents can be identified and automated. However, few would have thought software development was significantly less complex and could be challenged so quickly.

The Atomization of AI

We are already seeing “mixture of expert” LLM models – like the recent open source release of DBRX. This splits LLMs into expert domains and then chooses combinations to best answer a query. This has the benefit of smaller less costly LLMs and very fast tokens-per-second outputs. This is critical for agentic workflows that need to rapidly and iteratively execute calls to LLMs.

Perplexity.ai provides a free and paid for front end to multiple LLMs (e.g. OpenAI, Mistral, etc). This too allows some of the atomization too via aggregation at the front end. In the last few days, we now have an open source app that accomplishes this too.

Workflow, integration and agent tools like WSO2, Zapier, Make, Huginn and n8n clearly become infinitely more powerful using AI agents for embedded inferential tasks. Scanning unstructured text and returning a structured JSON string is trivial for an inferential agent.

We saw the combination of tools a long time ago – AlphaGo in 2016. Denis Hassabis believes that LLMs plus Tree Search is the fastest way to AGI.

There is quite a lot of talk that ultimately this will breed an AI OS – an OS of AI with agents and orchestration. And that will sit on top of AI optimised hardware (for native neural networking capability). This is already the direction Nvidia is going.

The Rise of Agent Ecosystems

I think that we’ll see agent ecosystems around particular software and tasks. For example, take the WordPress plugin library. Perhaps instead of marketplaces of plugins will see functionality requests being satisfied through AI.

In areas such as R and python, the library ecosystems will become dynamic where functionality will be extended by agents and AI. Libraries might represent a static view of functionality at a point in time before being further improved by agents – much as they represent a static version today before further human improvement.

The End of Application Marketplaces as a Monetization Mechanism?

Applications such as WordPress, SuiteCRM or others combine open-source software with marketplaces for extended functionality. This could be a method to monetize functionalities that are in high demand but not part of the free core system. It also presents an opportunity to bring in more developers.

On-demand agent-driven software development could destroy a current means of monetizing core open-source software.

In many cases, the existing plugin libraries attract huge subscription fees for extended functionality, particularly in business niches.

Data as a AI Building Block

Agent ecosystems could lead to interaction with broader application areas like finance and underpinning those application areas will be data – not just training data, but real-time data.

Perhaps instead of LLMs being trained on custom data sets we will move to an interaction between capable LLMs and dynamic data sets – like document libraries, knowledge bases, financial data, etc. LLMs and agents will then be used in real-time to interact with up-to-date data. In a very basic way, I think we already seeing something like that with ChatGPT’s ability to search the web and feed results into its responses. Retrieval Augmented Generation (RAG) is already part of PrivateGPT – a means of locally implementing LLMs to interact with private data.

Winners and Losers

The medium-term AI short trade is surely obvious now.

I can see the losers in the new world – those performing repetitive tasks with unstructured data. For example (at a very discrete level), isolating contact details from an email is a surprisingly difficult deterministic task via parsing rules. It becomes trivial with AI. There are hosts of admin intensive tasks in this kind of domain. Repetitive tasks such as Search Engine Optimisation change completely.

IT system administration, helpdesks, etc – they are a huge target. Software development as indicated above. System transformation and integration. AI can untangle logic in legacy systems and rebuild it better in modern platforms.

Call centers are already under threat. Klarna released outputs showing that customers received better (or no-worse) service from automated chatbots. Those chatbots handled two-thirds of customer service chats in their first month.

The creator economy is hugely impacted by AI. I have seen people claiming to generate multiple (10+) videos via AI within 1 hour and launch into Youtube and monetise via AI tuning to exponential (viral) growth. Given that AI strength right now is generative, I think there is a likely explosion of content. At some point this makes stored content overwhelming and meaningless (the “dead internet”). Ultimately any content can be generated real time. That blows the creator model and associated monetization. That feels like a very current coming change.

What happens to the Google goldmine – search with advertising? I think search is crucial to combined use with AI. LLMs are static and search provides context and currency. I think most people misinterpret AI as a replacement for search. That is poor use, But what people do, will be. A mitigating step is that AI summaries already present in Google and Bing. Perhaps the broader implication is the rise of subscriptions or on demand paid-for opt-outs from advertising impregnated results. Currently top-end AI is only meaningfully available through paid subscription (e.g. GPT4) or behind paid service (e.g. Microsoft Copilot). So hybridisation is probably the outcome.

Software firms could be major losers as functionality is generated and customised for end users. Microsoft is moving exceptionally quickly. An AI operating system with ecosystem integration must be a medium to long-term prospect.

Law is changing immediately. Combining LLMs with libraries utilising embeddings and knowledge vectors will allow reliable searching for precedents and referencing. I think we will see AI with the ability to present referenced arguments in the near term.

There are lots of opportunities in manufacturing, supply chain and logistics. This was an early target for Andrew Ng – but he does talk of the difficulties in using AI in fields like quality control, etc. Not a slam dunk and probably lots of opportunities on small tasks, particularly related to integration with IoT.

The potential disruption in security, law enforcement, etc is enormous. AI is already being used for monitoring camera footage for alarm conditions – this happened a long time ago. China has built a surveillance state on the back of AI-driven surveillance. I believe most of these were based on custom and tailored development. Agentic workflow extends customised opportunities to a huge market.

Foundational layers win for the forseeable future – computing power and energy – so yes, obviously NVIDIA.

However, AI could obsolete IP – i.e. it might design better software, drugs or even a GPU. Hinton and others are talking of a cure for cancer. An optimal GPU for another NVIDIA seems trivial. For example – it takes a PhD student 5 years to map one protein. Deepmind did 200 million proteins in no comparative time. AI potentially improves exponentially as it is applied to the full engineering stack – from materials, to hardware to software.

There are already players who are redesigning for AI from the silicon – Groq designed their LPU chips for inference from the ground up – allowing 500 tokens per second with Mixtral. This means that multiple LLM responses can be processed with no delay to the user. Beyond AI agents performing quicker, this allows LLM agents to check one another in real time to stop hallucinations being presented to the user.

Elon Musk has mentioned a few times that beyond chips, it becomes as simple availability of other components such as step-down transformers for computing processors in the near term (within the next year).

If you go more foundational than that, then silicon and other materials win no matter what.

But there must be something more clever than that. Probably the clever stuff is which fields take off due to being changed completely – health and pharmaceuticals is obviously one area.

One of the most hopeful outcomes (beyond improved healthcare) is improved education. It is a short step to a chatbot for every poverty-stricken child on a smartphone (probably with a LLM on their phone) – with the whole of the internet as training data. That kid gets a perfect (soon) teacher instead of an poorly trained one in a badly resourced classroom. That also talks to the extension of AI to a huge target base as opposed to Google search only being available to a more affluent segment of the population. Monetisation becomes big issue – but you could displace badly spent money to subscriptions – e.g. education budgets.

Larry Summers believes AI is coming for the cognitive class. Fields like asset management and wealth management will surely be transformed. AI supports algorithmic trading and investing, removing effects of bias and performing arbitrage to the extent it still exists. Perhaps it scales to far bigger effects than this.

I think we realistically can expect to be able to analyse all standard financial measures via AI – e.g. dump investment Due Diligence data into AI and get an analytical output. Then that can be scaled to all publicly available data. Anthropic’s Claude is already better at ChatGPT at this. Private data access becomes a game changer.

Managing AI hallucinations and testing will be absolutely critical. Perhaps just like how transformers in LLM model training turned out to be one of the biggest AI breakthroughs, so will checking in an atomized AI architecture be, with agents performing this task. This is already happening in the software development tools mentioned above.

There is surely an anti-AI reaction – a bit like people going back to LPs and film photography. This exists at many levels, and perhaps face-to-face and relationships become more sought after.

The old big spender is what it always was – defence. Hinton mentions US defence now and again. AI clearly has immediate application to misinformation bots or monitoring. China has an enormous AI-driven surveillance state.

There are lots of conspiracies on OpenAI cracking encryption (as being the reason behind Altman was temporarily fired). Apparently Ilya Sutskever is now only focused on AI ethics and security. I am not sure of inference being able to break encryption. Maybe I am just scared of the consequence. And AI engines are becoming better at maths. JP Morgan have had a team working on defence against AI encryption cracking for a few years. If AI was close, the genie would be out of the bottle. I imagine that Russia and China will be racing to leverage all the open domain and spied AI progress to get ahead. Cleary the world will pay anything to defend against this – if it is defendable.

Broader Implications

There must be room for game changing combinations of technology. Clearly energy and AI supercomputing is one area. Microsoft and Amazon have both pursued adjacency to nuclear power sources to cater for the massive energy needs of current AI. Sam Altman talks of the potential of nuclear fusion and perhaps fission. The combination of AI with quantum computing is an exciting or scary prospect. These combinations become nearer term when perhaps AI is deployed to solve the road-blocks to progress in those domains.

For me, the most scary part of the pace of development is that while the dominant actors such as OpenAI, Anthropic, Amazon, Microsoft and Google might be scrambling to manage the potential for a bad AI outcome, this genie is well and truly out of the bottle. While rigorous testing, LLM ablation and other techniques might mitigate bad outcomes, jailbreaks are still a problem. However, the bigger problem is surely that developers are feeding off one another at a rate of knots. Previous thoughts on roadblocks to AI development such as what were thought to be diminishing returns on LLM scaling have proven to be false and encouraged others to rapidly intensify their efforts. The scope for bad actors such as hacker groups to use localised open source LLMs as an inference engine in a hacking agentic workflow seems obvious and immediate. Or to increase effectiveness of spam and cyberscams. Or to launch massive paid misinformation campaigns. At a Nation State level, restrictions baked into massive LLMs are not relevant if researchers build their own.

But, more immediate to most of us, is that in the current term, application of AI to discrete tasks for inference is happening now and agentic workflows change the world today.

AI Life Hacks – Transcription

AI Life Hacks – Transcription

Headline summary:
1. Transcription to phone and then AI improvement helps me get work done on the move.

2. Speech-to-speech interaction with ChatGPT on my phone makes for an interactive learning process.

Transcription as a Life Hack

Transcription is essential for us due to the advancements in voice notes for meetings, video calls, etc. Although transcription services are expensive, the rapid pace of development and the productivity boost they offer make them a worthwhile investment.

In our ongoing research into self-hosted transcription services, I’ve evaluated using Whisper, subscription services like Otter.ai, and conducted further research into GPT models.

Should we choose the self-hosted option, we’d require a server equipped with GPU technology for audio-to-text conversion. I’ve considered refurbished AI servers, but options are limited in South Africa.

I’ve also looked into old mining rigs, but due to the AI and crypto boom, prices are inflated. A high-spec server from the US can cost anything between $500 to $7500 before huge shipping costs, with GPUs not included at the lower end, and Tesla NVIDIA GPU prices being particularly high. This could easily amount to anything from R200000 to a million rand for an AI server, making subscription services a more feasible option.

So, why not just start with what’s available?

I set Google’s voice-to-text as my default keyboard on Android to transcribe my thoughts into notes. The initial output was quite rough, with many incorrect word choices. Therefore, I activated Microsoft Copilot from my Android keyboard and asked it to clean up the transcription from my note, which it did impressively. This process can be frequently repeated.

One of the benefits is that I can convert ideas, thoughts, and instructions into written form much quicker and with less effort than before. This not only frees up my time but also allows me to increase my output, having a cumulative effect on productivity.

Previously, many of these notes and ideas might have remained in my head and never been shared, impacting delegation and leaving execution solely to me. Now, as I have numerous concurrent thoughts about clients and projects, I can quickly transcribe them into notes, refine them through GPT, transform them into tasks, and share them with team members. It’s quite remarkable.

I believe the process of using live voice-to-text transcription, then chaining it into CoPilot for improvement and then back into a notepad, feels stilted and not as productive as it could be, although it is obviously a significant gain in itself.

I think that recording voice notes on an ad hoc basis and then uploading and transcribing them for improvement is probably a better route.

However, long form transcriptions break ChatGPT’s context window and iterating backwards and forwards with sections makes for a difficult process.

Using an Iterative AI Approach to Improve Long Form Transcriptions or Essays

ChatGPT recommends the this iterative approach to transcriptions or even improvement of long-form essays (situational analysis or SA).

  1. Break down the content: If you have a lengthy SA document, it’s useful to break it down into smaller sections or questions. This approach makes it easier for the model to provide relevant and focused responses.
  2. Be specific with your queries: When asking questions related to the SA content, be as specific as possible. Specific questions lead to more precise answers, which can be more useful for your analysis.
  3. Use clear and concise language: While complex concepts might be involved, try to phrase your questions in clear and concise language to avoid misunderstandings.
  4. Sequential querying: If you’re exploring a complex topic, consider using a sequential querying approach. Start with more general questions and gradually drill down to more specific aspects based on the responses you receive.
  5. Highlight key information: If you’re inputting a section of your SA for analysis, highlight or summarize key points you want the model to consider. This helps in focusing the response on the most relevant aspects.
  6. Iterative refinement: Based on the initial responses, refine your questions or provide additional context to get more detailed insights. It’s often an iterative process to zero in on the most valuable information.
  7. Summarization requests: For comprehensive sections of the SA, you can request summaries to capture the essence of the content. This can be a good starting point for deeper exploration.
  8. Cross-referencing: If your SA involves multiple sources or documents, reference them in your queries to understand how they interrelate or to compare their perspectives.

Is it effective passing a section of a transcription to ChatGPT with the previously optimized sections for context? This iteration can be tedious – passing a section of a transcription to ChatGPT, along with previously optimized sections for context, can be effective, particularly for maintaining coherence and relevance in the conversation. Here’s why this approach can work well:

  1. Context Preservation: ChatGPT can generate more accurate and relevant responses when it has context. By providing optimized sections of the transcription as context, you help the model understand the broader conversation or document theme, leading to more informed and appropriate responses.
  2. Coherence: Including previous sections helps maintain coherence across the conversation. ChatGPT can refer back to earlier points, ensuring that the discussion flows logically and consistently.
  3. Efficiency: Optimizing the context by selecting the most relevant sections can make the interaction more efficient. It prevents information overload and helps ChatGPT focus on the most pertinent information, improving response quality.
  4. Customization: This approach allows for a degree of customization in how you interact with ChatGPT. You can tailor the context you provide based on what you’ve found to be most effective for your specific use case, whether it’s a detailed analysis, brainstorming session, or any other application.

However, it’s important to keep a few things in mind:

  1. Context Limit: ChatGPT has a limit to the amount of context it can consider for each response (the token limit). Be mindful of this when selecting which sections to include as context.
  2. Relevance: Ensure that the sections you’re passing as context are directly relevant to the current section you’re inquiring about. Irrelevant context can lead to less accurate responses.
  3. Iteration: You might need to experiment with different approaches to optimizing context to find what works best for your specific needs.

The context limit for ChatGPT is measured in “tokens.” A token can be as short as one character or as long as one word. On average, a token is about four characters in English. The limit for GPT-4 is around 8192 tokens, which includes both the prompt and the generated response. This limit encompasses all the text that ChatGPT can consider at one time when generating a response, so it’s important to manage the amount of context provided to ensure that the most relevant information is included within this constraint.

ChatGPT Speech-to-Speech

A simple life hack that moves beyond transcription is live speech-to-speech interaction with ChatCPT. This transcribes the interaction as you go.

It feels like the above makes for ripe opportunity to develop a workflow front end for the iterative process of optimising long-form text.

Thoughts on (artificial) intelligence

Thoughts on (artificial) intelligence


AI’s Got the Whole World in Its Hands

The excitement surrounding AI is exploding, and with good reason. Yet, there’s a fine line between enthusiasm and losing sight of the immediate, tangible benefits it brings. The power of this moment in AI’s evolution is hard to overstate, a sentiment echoed by Geoffrey Hinton, a pioneer in the field. He noted that advancements in AI, particularly through platforms like OpenAI and ChatGPT, have effectively unlocked the entirety of human knowledge (as represented by the textual information on the Internet) for us. Our natural human limitations in memory, processing, and communication stand in stark contrast to AI’s capabilities, which can handle vast amounts of information and deliver insights almost instantaneously.

OpenAI’s development, often linked to the concept of Artificial General Intelligence (AGI), promises to extend AI’s reach even further, leveraging the expanse of data available online. The potential gap between our cognitive abilities and AI’s prowess in managing and interpreting data is both awe-inspiring and somewhat daunting. Could we embed such expansive knowledge into a compact form, like a chip encapsulating a neural network model? This could mean having access to an immense compression of the internet’s data, up to a recent snapshot, right at our fingertips. The implications of such technology stretch far beyond our current understanding, hinting at a future where AI’s integration into our lives could redefine what it means to access and utilize information.

AI as the Ultimate Savant

Geoffrey Hinton referred to large language models, or LLMs, like GPT, as “idiot savants” – essentially, AI systems that excel remarkably in specific tasks but can’t quite clarify the processes behind their outcomes. This analogy struck a chord with me, particularly when considering the parallels to certain aspects of human intelligence, such as the focused capabilities often observed in individuals with autism, where there’s a pronounced proficiency in one area, potentially at the expense of broader cognitive functions.

This comparison led me to wonder the nature of AI’s current trajectory, especially in the realm of LLMs. They demonstrate unparalleled skill in specific domains, such as language processing, yet their understanding and applicability in broader, more integrated contexts remain limited. It raises intriguing questions about the underpinnings of both AI and human intelligence. Could exploring these parallels between AI’s focused expertise and human cognitive diversity offer us deeper insights into making AI more holistic, or even enhance our comprehension of the human brain’s versatility?

Bill Gates Getting Real About AI Taking Over

It’s a bit of a wake-up call when someone like Bill Gates openly reflects on the future of AI surpassing human abilities in areas we excel in. Gates pondered the implications of AI outperforming him in initiatives he’s passionate about, such as eradicating malaria. This moment of realization from a tech visionary underscores the dual nature of AI’s advancement – it’s both exciting and somewhat daunting. The thought that this level of AI proficiency isn’t reserved for the elite but is accessible to everyone, regardless of their skill level, really opens up a world of possibilities. It prompts a reevaluation of our roles and contributions in a world where AI capabilities are becoming increasingly integral.

Yann LeCun’s Reality Check on AI Hype

In a conversation with Lex Friedman, Yann LeCun, who leads AI research at Meta, provided a much-needed perspective amidst the ongoing excitement about AI. LeCun’s insights are particularly sobering regarding the current state and future of Artificial General Intelligence (AGI). He highlighted that while AI has shown remarkable progress in areas like large language models (LLMs), this success is not as widespread across other AI domains. For example, LeCun pointed out the stark difference in learning rates between humans and AI by noting that a teenager could learn to drive in a matter of days—a task AI systems are still far from mastering. This serves to underline that a universally competent AI, akin to human intelligence, remains a distant goal.

LeCun criticized the narrow focus on LLMs for their autoregressive properties, where they excel in generating sequences based on preceding input but lack a holistic understanding or the ability to apply knowledge in varied contexts. He argued that this specialization does not equate to the broader, more nuanced intelligence that encompasses learning, reasoning, and applying knowledge across diverse scenarios, including physical tasks and decision-making. According to LeCun, the excitement surrounding AGI’s imminent arrival might be premature and suggests a need to reassess our approach if we are to make significant strides toward truly comprehensive AI intelligence.

AI Planning: Not Just There Yet

Yann LeCun brought up a fascinating aspect of AI that’s often overlooked: hierarchical planning and training. He illustrated this with the complex example of planning a trip from New York to London. At a high level, it involves several major steps, but when you zoom in, each step breaks down into countless smaller actions, right down to the minute muscle movements needed to walk towards a door. LeCun pointed out that AI hasn’t quite mastered this level of detailed, layered planning and execution. The challenge lies in breaking down and then effectively executing a plan with potentially millions of micro-steps, something humans do quite naturally but AI struggles with. This gap highlights a significant area ripe for exploration and advancement in AI research.

When AI Starts Thinking Outside the Text Box

The dialogue between Friedman and LeCun really stirred up my thoughts, particularly about how AI, through training large language models, might unlock patterns that delve into the essence of human intelligence. It’s compelling to ponder whether AI can unearth a form of understanding that mirrors the intricate workings of our minds.

The notion of emergent properties in AI, like Google Bard’s spontaneous grasp of Bengali, underscores the adaptability and potential of AI to venture beyond its programmed boundaries. This phenomenon not only showcases AI’s learning capabilities but also poses questions about its parallels with human cognitive flexibility.

What’s equally remarkable is the idea of cross-domain learning in AI, suggesting that proficiency in one area, such as language, could catalyze advancements in others, including vision and movement. This concept mirrors the human brain’s ability to efficiently synthesize and apply knowledge across various domains, a trait that’s been pivotal in our evolution.

The reduction of vast amounts of internet data by AI into comprehensible models, akin to the data compression seen in zip files or hash totals (those arguing that LLMs are a stochastic parrot would love this analogy – those arguing that we are approaching AGI would hate it and argue that those models are much more like a DNA for the thinking behind knowledge), hints at a pattern recognition ability that could parallel human cognitive processes. This efficiency in learning and connecting disparate pieces of information could be reflective of the higher-level cognitive functions in humans, bridging the gap between system 1 (intuitive) and system 2 (analytical) thinking.

The exploration of AI’s potential extends to experimental research where organic materials, such as brain cells in a petri dish, are utilized in training, employing stimulus-response mechanisms akin to human learning. This innovative approach not only blurs the lines between organic and artificial learning systems but also prompts a reevaluation of what constitutes life and intelligence. The adaptability and resilience demonstrated by humans through stimulus-response learning and evolution might find a counterpart in AI’s evolving landscape, challenging our perceptions of superiority and the unique aspects of human intelligence.

Our Opportunity

Navigating this era where AI’s capabilities burgeon around us is crucial for our collective journey forward. Recognizing and leveraging the unique advantages we hold as humans—our capacity for complex rationalization and emotional intelligence, where AI still lags—is pivotal.

However, the landscape of AI is rapidly evolving, transcending beyond singular models or frameworks. The current methodologies, from transformers to GANs and the iterative enhancements through backpropagation, only scratch the surface of what’s possible. We’re on the cusp of a more integrated and dynamic future where diverse AI models and systems could synergize, enhancing and cross-validating each other in real-time. This collaborative AI ecosystem, augmented by task-specific agents, could redefine efficiency and innovation.

The example of Devin, a software development platform, epitomizes this potential. By orchestrating various AI models and deploying agents for task execution, Devin offers a glimpse into a future where AI’s collaborative capabilities could dramatically amplify its effectiveness and applicability. It’s both an exciting and scary prospect, highlighting the need for us to steer this technology thoughtfully.