Headline summary:
1. Transcription to phone and then AI improvement helps me get work done on the move.
2. Speech-to-speech interaction with ChatGPT on my phone makes for an interactive learning process.
Transcription as a Life Hack
Transcription is essential for us due to the advancements in voice notes for meetings, video calls, etc. Although transcription services are expensive, the rapid pace of development and the productivity boost they offer make them a worthwhile investment.
In our ongoing research into self-hosted transcription services, I’ve evaluated using Whisper, subscription services like Otter.ai, and conducted further research into GPT models.
Should we choose the self-hosted option, we’d require a server equipped with GPU technology for audio-to-text conversion. I’ve considered refurbished AI servers, but options are limited in South Africa.
I’ve also looked into old mining rigs, but due to the AI and crypto boom, prices are inflated. A high-spec server from the US can cost anything between $500 to $7500 before huge shipping costs, with GPUs not included at the lower end, and Tesla NVIDIA GPU prices being particularly high. This could easily amount to anything from R200000 to a million rand for an AI server, making subscription services a more feasible option.
So, why not just start with what’s available?
I set Google’s voice-to-text as my default keyboard on Android to transcribe my thoughts into notes. The initial output was quite rough, with many incorrect word choices. Therefore, I activated Microsoft Copilot from my Android keyboard and asked it to clean up the transcription from my note, which it did impressively. This process can be frequently repeated.
One of the benefits is that I can convert ideas, thoughts, and instructions into written form much quicker and with less effort than before. This not only frees up my time but also allows me to increase my output, having a cumulative effect on productivity.
Previously, many of these notes and ideas might have remained in my head and never been shared, impacting delegation and leaving execution solely to me. Now, as I have numerous concurrent thoughts about clients and projects, I can quickly transcribe them into notes, refine them through GPT, transform them into tasks, and share them with team members. It’s quite remarkable.
I believe the process of using live voice-to-text transcription, then chaining it into CoPilot for improvement and then back into a notepad, feels stilted and not as productive as it could be, although it is obviously a significant gain in itself.
I think that recording voice notes on an ad hoc basis and then uploading and transcribing them for improvement is probably a better route.
However, long form transcriptions break ChatGPT’s context window and iterating backwards and forwards with sections makes for a difficult process.
Using an Iterative AI Approach to Improve Long Form Transcriptions or Essays
ChatGPT recommends the this iterative approach to transcriptions or even improvement of long-form essays (situational analysis or SA).
- Break down the content: If you have a lengthy SA document, it’s useful to break it down into smaller sections or questions. This approach makes it easier for the model to provide relevant and focused responses.
- Be specific with your queries: When asking questions related to the SA content, be as specific as possible. Specific questions lead to more precise answers, which can be more useful for your analysis.
- Use clear and concise language: While complex concepts might be involved, try to phrase your questions in clear and concise language to avoid misunderstandings.
- Sequential querying: If you’re exploring a complex topic, consider using a sequential querying approach. Start with more general questions and gradually drill down to more specific aspects based on the responses you receive.
- Highlight key information: If you’re inputting a section of your SA for analysis, highlight or summarize key points you want the model to consider. This helps in focusing the response on the most relevant aspects.
- Iterative refinement: Based on the initial responses, refine your questions or provide additional context to get more detailed insights. It’s often an iterative process to zero in on the most valuable information.
- Summarization requests: For comprehensive sections of the SA, you can request summaries to capture the essence of the content. This can be a good starting point for deeper exploration.
- Cross-referencing: If your SA involves multiple sources or documents, reference them in your queries to understand how they interrelate or to compare their perspectives.
Is it effective passing a section of a transcription to ChatGPT with the previously optimized sections for context? This iteration can be tedious – passing a section of a transcription to ChatGPT, along with previously optimized sections for context, can be effective, particularly for maintaining coherence and relevance in the conversation. Here’s why this approach can work well:
- Context Preservation: ChatGPT can generate more accurate and relevant responses when it has context. By providing optimized sections of the transcription as context, you help the model understand the broader conversation or document theme, leading to more informed and appropriate responses.
- Coherence: Including previous sections helps maintain coherence across the conversation. ChatGPT can refer back to earlier points, ensuring that the discussion flows logically and consistently.
- Efficiency: Optimizing the context by selecting the most relevant sections can make the interaction more efficient. It prevents information overload and helps ChatGPT focus on the most pertinent information, improving response quality.
- Customization: This approach allows for a degree of customization in how you interact with ChatGPT. You can tailor the context you provide based on what you’ve found to be most effective for your specific use case, whether it’s a detailed analysis, brainstorming session, or any other application.
However, it’s important to keep a few things in mind:
- Context Limit: ChatGPT has a limit to the amount of context it can consider for each response (the token limit). Be mindful of this when selecting which sections to include as context.
- Relevance: Ensure that the sections you’re passing as context are directly relevant to the current section you’re inquiring about. Irrelevant context can lead to less accurate responses.
- Iteration: You might need to experiment with different approaches to optimizing context to find what works best for your specific needs.
The context limit for ChatGPT is measured in “tokens.” A token can be as short as one character or as long as one word. On average, a token is about four characters in English. The limit for GPT-4 is around 8192 tokens, which includes both the prompt and the generated response. This limit encompasses all the text that ChatGPT can consider at one time when generating a response, so it’s important to manage the amount of context provided to ensure that the most relevant information is included within this constraint.
ChatGPT Speech-to-Speech
A simple life hack that moves beyond transcription is live speech-to-speech interaction with ChatCPT. This transcribes the interaction as you go.
It feels like the above makes for ripe opportunity to develop a workflow front end for the iterative process of optimising long-form text.