What is uniquely human? AI impacts on the workforce.
By Tela Gallagher Mathias
We held our first ever public-private sector AI exchange yesterday, and we opened with a question – what is it that makes humans uniquely human? Answers ranged from “procreation” to “cooking”, to “empathy”. This question is relevant because, as a refresher, AI is a field of computer science focused on enabling computers to perform tasks that typically require human intelligence. Human intelligence is our ability to sense, understand, and create. So, the question of “what is humanness” matters more acutely now than ever, as these innately human qualities are what will be most valuable in the workforce in the future.
The Good News
In doing research about the historical impacts of disruptive technology on the workforce, I was very encouraged and posted about that earlier this week. The bottom line is that with every major disruption since the 1750s, including both industrial revolutions, significant numbers and types of new jobs were created. For example, in the first industrial revolution water-power spinning machines were invented, igniting the shift to factory textile production. This created new labor classes concentrated in mills and sparked the modern wage labor system.
In a recent MIT study that evaluated 80 years of Census data, researchers found that literally 60% of the jobs we have today did not exist in 1940. Not only that, but many of these jobs wouldn’t have even made sense at that time (I’m looking at you “content creator”). For context, tattooer became a job recognized by the US Census in 1950, software engineer in 1970, conference planner in 1990, and solar photovoltaic electrician in 2018.
This is really encouraging to me but does rest somewhat on hope as a strategy. Every single time technology has disrupted our lives, jobs have been created, and we as a people have survived. Therefore, it is likely we will survive this one. I also know that overall, if gross domestic product (GDP) is a measure of prosperity, we are a wildly more prosperous company than we were. From $3B in 1790 to $23T today.
Green bars represent positive GDP growth (expansion years), red bars represent negative GDP growth.
Adjusted for inflation, U.S. output is roughly 6,400 times its 1790 level and has kept a long‑run trend of about three percent real growth per year despite wars, panics, and recessions. In addition, per-capita prosperity multiplied by about 30 times. Real GDP per person rose from about $2,000 (in today dollars) in 1820 to more than $70 000 today, illustrating how sustained innovation converts into living‑standard gains when paired with education and market dynamism.
What I don’t know, however, is what negative consequences this introduced to the people that were in the jobs that become obsolete. And I don’t know what happened to the young people who were attempting to enter the job markets in the time just after the disruptions.
The Bad News
We will see significant impact across virtually all major economic sectors. No matter the sugar coating by the major technology companies and the Silicon Valley attitude, jobs will be eliminated. We will continue to see job loss due to robotics. We will continue to see job loss due to generative AI in create fields. We will continue to see job loss due to the continuous accelerated pace of automation.
Education will be very slow to adapt curriculums and learning approaches, with public education trailing far behind independent schools. This group of elementary school kids that are matriculating now are especially at risk of falling behind. They had their learning radically disrupted first by COVID, and they are now the last kids to be born after ChatGPT. This means the ability to prepare this generation will require an even more significant investment by parents. With so much of the responsibility having to happen in the home, preparing these kids will fall to the parents.
I think Jensen Huang puts it the best. At the Milken Institute Global Conference, Huang explained that the disruption from AI is “not simply about outright job loss through automation but about a growing divide between those who harness AI as a tool and those who do not,” highlighting the risk of inequality between the so-called “AI-skilled elite” and everyone else. Out of the global population of about eight billion, only about 30 million people are proficient in programming and advanced AI technologies. That is less than 0.4% of the global population. This small tech-fluent cohort wields disproportionate power with AI, while many others could be left behind.
So, I think the message for current workforce is to adapt or fall behind. Many of the big tech companies at least claim that they are significantly investing in reskilling and training current workforces with the increased reliance on machines and decreased reliance on the human workforce.
Workforce Trends for 2025
Gartner had an eclectic, if somewhat silly and self-promoting at times, perspective on future work trends in 2025.
The expertise gap intensifies as retirements surge and technology disrupts.
There is no doubt about this one. Add to it the mass exit of federal employees and we are exacerbating this phenomenon.
Organizations redesign to prepare for technological innovation.
We have definitely done this at our company, and I have advised companies on what an AI-first, or at least AI-ready organization looks like. The job names are changing. New jobs created, at my company we have “value engineers” and “evaluation specialists”. And, of course, my title has changed – I call myself the chief nerd and mad scientist as that seems to be what fits what I do the best.
Nudgetech experiments bridge the widening communication gap.
This is a silly and Gartner self-promoting one for me. So “nudge theory” suggests that subtle, indirect suggestions or environmental changes can influence people's choices and behaviors without restricting their freedom of choice. For example, placing healthy foods in the pantry at kid eye level. From an employer perspective, this is the idea that we consider when a text or an email is better, when we should call. Certainly, Tanya Brennan is the master at this. I find the idea that we will have technology tell us when to communicate and how to be offensive and I’m not into it.
Employees embrace bots over bosses in the pursuit of fairness.
I had to look this one up. This trend reflects a growing workplace dynamic where workers increasingly trust AI systems more than human managers, especially in areas related to fairness, transparency, and objectivity. I think the alliteration is a bit cheesy, but I can see where this is coming from. My brother, Gil Gallagher, who is the middle school director at The Field School, told me about a trend towards (as well as resistance to) algorithmic based grading. I wonder if this could be fairer than humans’ subjective evaluation. We see this in my industry as well, although there is a lot of skepticism towards probabilistic algorithmic based decisions.
Organizations must define fraud v. fair play when it comes to AI.
This one highlights the growing need for companies to establish clear, principled boundaries between acceptable AI use and deceptive or unethical behavior. Obviously, this will play out in education. What is considered cheating now will just be the ways things are done in a few years, maybe sooner. I saw this myself in thinking about corporate testing. I recently had to take a test to validate my cyber awareness – is it cheating to use ChatGPT to confirm my answers to some of the questions? Is it cheating if one employee uses genAI to help them at work, but another doesn’t?
Organizations shift focus to inclusion and belonging with unexpected benefits.
This one is interesting. If we look at traditional diversity, equity, and inclusion (DEI) efforts, and its focus on the numbers, we saw many organizations dissatisfied with the results, and, of course, the major anti-DEI backlash that is now playing out. I myself have wondered about the effectiveness of our DEI program, and the effectiveness of efforts I have provided significant financial support for in the recovery community. What if instead we focused more on how we made people feel, and less on how many of them there were? Harder to measure, certainly, better? I think so.
AI first organizations with destroy productivity in their search for it.
I have seen this, and, in fact, embraced it. We put a serious premium on experimenting and failing at our company. We have a team of value engineers, and we ask them to try out all the new stuff, and struggle for a while, even if this means failing a few times. We do, however, encourage asking for help. Yes, the struggle is part of the process but so is learning from those who have gone before. I am seeing this show up in the industry as using a chainsaw to cut a tissue. Everything doesn’t need an agent. In fact, there are many, many mortgage use cases that really shouldn’t use an agent at all. Do you need high precision? Do you need complete transparency? Do you need it to work 100% of the time on 100% of the cases. Yeah, maybe not an agent right now.
Loneliness becomes a business risk, not just a well-being challenges.
This one reframes employee isolation as more than a personal concern. It is strategic and operational liability that can erode team performance, innovation, and retention if left unaddressed. I found a study recently that indicated about 26% of employees in 2024 I think reported that they were happy with collaboration at work, which was DOWN from 31% in 2021 (I can’t find it now or I would link to it, I promise it exists, and this isn’t a ChatGPT hallucination). That’s fascinating. That’s worse than during COVID. I do occasionally get lonely at work, and I have a great partnership with Tom Westerlind and Tanya Brennan, and of course my teams. But when I do get lonely and there is no one around, and none to call, it can be very isolating. Those are the times when I think about maybe going to work somewhere else, and I own the company. (Unabashed Jensen Huang superfan, just make me an offer).
Employee activism drives adoption and norms for responsible AI.
I suppose this one is about an employee uprising about use of genAI at work. For those companies that have still banned it (yes, they are out there), they will see serious dissatisfaction from growing numbers of employees. Those employees are effectively being forced by their organizations to fall behind. In this job market, no-one can afford to fall behind and this failure to get with the genAI times is going to be a real talent drag.
Preparing for the Future
So where does this leave us? I think it leads us back to what makes us uniquely human. That which makes us uniquely human is the differentiator in the AI future, which is really just the AI now. Many of the big tech companies talk about hiring not for technical skill, but for their more human talents. Microsoft, for example, has said very recently that “Microsoft still plans to hire more software engineers than it has today, but it cares more about what makes them human and less about their technical abilities.”
And what does that mean? What are those qualities? They are, and I see this validated time and time again, the ability to lead in uncertainty, creativity, judgement in difficult situations, and the ability to connect the dots. Steve Jobs once said at a new famous commencement speech that “you can only connect the dots looking backwards”, and I have definitely found this to be true.
We make the best decisions we can, with the information we have, relying heavily on intuition and experience. We hope they are the right ones, we how we made them at the right time, and we pray that we made them with the right people. Then, in the fullness of time, more is revealed, and we see the dots that we connected. I don’t (yet?) see that in ChatGPT. We made a brutally hard business decision in 2024 that affected a lot of people. I made it together with my partners, and it really seemed like the right one. It fundamentally changed our company, initially for the worse and in the long term for the better. I wasn’t sure it was right at the time, and only after a lot of pain, a lot of time, and a lot of new data has it been revealed unambiguously to be the right one. Those are really scary, gut-wrenching decisions, I don’t think I would leave them to ChatGPT.
At PhoenixTeam, we live by a simple mantra: “Pivot or Die.” This bold approach drives how we tackle complex, overwhelming projects. For the mortgage industry, a directive such as HUD’s latest 251-page Mortgagee Letter (due by October) presents a massive implementation challenge – traditionally requiring months of analysis, coordination, and development. PhoenixTeam’s generative AI-powered data products turn this compliance hurdle into an opportunity, delivering results within hours instead of months.
The Challenge: Overwhelming Compliance Workloads
Regulatory changes demand rapid action. A 251-page update isn’t just lengthy – it’s dense with new rules, impacts on processes, documentation updates, and testing requirements. Compliance and business executives know that deciphering such a tome under tight deadlines strains resources and budgets. Teams must manually identify each change, assess its impact, write requirements and user stories, update test cases, and ensure nothing falls through the cracks – a monumental effort prone to error. With conventional methods, by the time you’ve parsed the entire document and assigned tasks, weeks or months have flown by and compliance risks have multiplied.
The Solution: GenAI-Powered Compliance Automation
Phoenix Burst– PhoenixTeam’s AI-driven compliance fulfillment platform – does the heavy lifting for you. It ingests complex regulatory, compliance, and policy documents (like that 251-page Mortgagee Letter) and automatically generates clear, actionable outputs in a fraction of the time. What once took an army of organizationally distributed practitioners months to accomplish, Phoenix Burst delivers within hours. The platform uses generative AI to identify each compliance change and requirement hidden in the text, producing plain-language change statements and procedure impact analyses for each one. It then transforms these into ready-to-build business requirements, user stories, acceptance criteria, and test cases – essentially an entire implementation plan at the click of a button.
Phoenix Burst doesn’t stop at documentation. It assigns each change to the appropriate team and output format for action, providing the results in a structured, audit-ready spreadsheet or other standard formats. Instead of juggling email threads and manual trackers, you get an organized, end-to-end view of the compliance update that’s ready to execute and traceable for regulators. A human is always in the loop for quality control – our platform serves up the work to your experts (or ours) for curation, helping ensure AI-generated artifacts meet your unique standards. Need extra assurance? We offer an optional review by seasoned compliance attorneys as an add-on.
The Phoenix Burst Impact is Immediate
By making compliance seamless, PhoenixTeam’s genAI solution clears the way for real innovation. Your teams reclaim time to focus on strategic work instead of drudging through paperwork. In fact, one major mortgage servicer proved the power of Phoenix Burst in a pilot project – transforming a complex regulation into plain-language requirements, user stories, and test cases within 24 hours of ingestion. What used to be a labor-intensive, error-prone slog is now a swift, accurate process. Compliance becomes a catalyst for improvement rather than a bottleneck.
Key Benefits at a Glance
Months of work in hours: Automatically decomposes complex compliance documents (e.g. hundreds of pages of new regulations) into actionable components within hours – not months.
Built-in project alignment: Assigns each change to the appropriate team and provides structured, audit-ready outputs (e.g. Excel or dashboard reports) for easy tracking and oversight.
Ready-to-build outputs: Generates complete impact assessments, requirements, user stories, acceptance criteria, and test cases for your implementation teams, saving countless hours of manual drafting.
Expert oversight on demand: Includes a human-in-the-loop for validation, with an optional add-on for curated review by PhoenixTeam’s curation specialists and attorneys to insert additional human control steps
Beyond HUD – universal application: Not just for HUD mortgage letters. PhoenixTeam’s genAI platform adapts to any federal or state regulatory change – from Fannie Mae policy changes to state banking laws – making it a one-stop solution for compliance across the board.
The Competitive Edge
By cutting implementation timelines from months to hours, you stay ahead of regulatory deadlines and free your talent to drive innovation, not tedious manual effort. Phoenix Burst’s award-winning technology is transforming compliance from a costly barrier into a strategic advantage.
Ready to Transform Compliance?
It’s time to leave tedious compliance workflows in the past. Experience the PhoenixTeam difference: visit https://www.phoenixoutcomes.com/phoenix-burst to learn more about our generative AI-powered data products, or contact us to schedule a demo. Accelerate your compliance implementation, create audit-ready results, and give your teams the gift of time – all while confidently meeting regulatory demands. Let PhoenixTeam help you turn your compliance processes into an engine of progress.
The Medley of Misfits – Reflections from Day 2 at the AI Engineer World’s Fair
By Tela Mathias
I love being at events like this, I feel like I have met “my people”. We are this weird, eclectic, smart, funny, and super enthusiastic bunch of nerds. Just really nerdy. And I love it. We are just a medley of misfits. So many great things at Day 2 – but the two major highlights were at the beginning and the end. Simon Willison is a hilariously competent and compelling speaker, and definitely part of our medley of misfits. And closing the day with Greg Brockman was an absolute inspiration. The theme of yesterday was the “the power of optimism”, but maybe that’s just because I’m an optimistic person.
Spark to System: Building the Open Agentic Web with Asha Sharma
Wish I new his name, but the demo guy at Microsoft was ON POINT. He showed what you can do with Github Copilot and I have to say – wow. The intersection of spaces, Jira, agents, and agent task assignment really told a good story. Imagine that an agent is just another team member, logged in and working like you or me. Imagine that you could assign a task to an agent, readme file generation was the example they used, and then the agent does the work and updates the work item. Now imagine that you need a team member to do a machine learning model, yeah you can assign that to a coworker agent too.
This definitely made we want to make sure that we are making maximum use of the full set of Microsoft capabilities at the team level. It was not, however, enough to make me move from AWS Bedrock to Azure AI Foundary. Maybe I’ll regret this decision at some point but I’m sticking for now. You’re welcome Amazon Web Services.
State of Startups and AI 2025 with Sarah Guo
Talk about overcoming adversity, Sarah Guo is a presentation boss. None of the technology was working, AV was a hot mess, and honestly, they barely figured it out. She used the time with mastery, and I was riveted by her take.
Sarah is the founder of Conviction, an AI venture capital company. She was speaking on the state of startups in 2025 and providing practical advice. One of the things they are very interested in and encouraged is to think about (you know how VC loves their analogies) is: “Cursor for [_X_]”. In our case it would be “Cursor for Compliance” although that’s not where we are yet, but we will be.
One of the reasons Cursor has been so successful is because it was built by engineers for engineers. And engineers know engineers. She out the cherry on top of what we have been hearing for the past six months, content is king. Knowing your customer, knowing your domain space, really building what you know for and market you know – that continues to be the moat.
Domain is king. Needs no further explanation.
Show up informed. Have a product that has an opinion. Have a product that reflects what we know, what our customers know.
Requiring a prompt is a bug, not a feature. Loved this one, and it validated what we have done. The idea that a user has to prompt the system to do what they need is a bug – the system should just do what you need. And present thoughtful outputs at the appropriate times to the right people, in an excellent ux. I mean it’s easy really.
The moat is execution. Just out execute everybody. Move fast. Continue to move fast. Get to market. (I’m here for it, sister!)
Copilots are still underrated and viable solutions. This was kind of a relief, honestly. I straddle federal, commercial mortgage, and Silicon Valley. I see so many different stages on the adoption curve, and different stages of technology delivery maturity. It is really hard to go from the AI future to the mortgage now. I struggle with what we can/should actually do with all this light speed tech, and this was a helpful sentiment.
BE IRONMAN. Think of your solution as a supercharged companion. Some things Tony Stark has to do, some things the suit does autonomously. Over time the suit does more and more and Tony does less, but also more different. Be Ironman.
I loved the idea that building the ironman suit is the bath of least frustration. Start with what you know, you can always make it better. Sarah Gua is a BOSS. Loved her.
2025 in LLMs so Far with Simon Willison
I had, sadly, never heard of Simon Willison. He was falling out of your chair funny. I love his personal eval, “product an SVG of a pelican riding a bike”. This reminded me of Ethan Mollick and his “otter taking a plane ride” eval. So Simon was there to talk about the past year in LLMs, but there was too much so he skinnied his scope down to the past six months.
The reason he uses the pelican riding the bike is because (a) he’s tired of the other benchmarks and has lost trust and (b) it’s a great test because it requires technical prowess in producing the SVG, the pelican has very difficult anatomical structures that are incompatible with riding a bicycle, and the bicycle seems simple but is actually a challenge for humans to illustrate due to its interesting geometry.
Some of the key points made clear in the past six months:
Local is good now.
Prices of good models have absolutely plummeted, which is a good thing for us. We will continue to see a crushing pace on the releases of mew models and model upgrades. The basic message here is that there was so much improvement that you really do have to pay attention.
Humorous discussion of the infamous OpenAI sycophantism bug. Evidently the source prompts that were used to fix it leaked so you can see the actual from and to documentation, fascinating. That one was hilarious.
Somber noting of the Grok White Genocide horror show. Enough said there. I just can’t with Elon.
Evidently Claude 4 will “rat you out to the feds” for certain prompts and content generation. I really had no idea, but I guess it makes sense. I’m not sure how I feel about this.
Impact AI on Consulting
This one was near and dear to my consulting roots. I had never heard of the company, but I really resonated with what they were talking about. They talk about the staffing models – traditional pyramid v. inverted pyramid (relying on junior staff to do most of the work v. relying on senior staff to do most of the work). And their hypothesis on the future for professional services is the inverted pyramid in the center, with traditional pyramids of agents at each side. This makes a lot of sense to me. Not sure I would have illustrated it this way but intuitively, it’s the right move.
I was surprised that they did not discuss voice agents more specifically, I think the opportunity there is massive. Imagine if you would interview and entire company in, like, two hours. Yeah, voice agents. I’m here for it.
Windsurf Everywhere, Doing Everything, All at Once
I’m so glad that we pivoted away from automating software development because man, Windsurf has pretty much crushed that. This one was personal, when we started this AI journey in December of 2023, I was hell bent on “push button, get software”. Many of our early research meetings were about this idea of automating software development. As we learned more, and really listened to our industry feedback, we realized we needed to be a lot more specific and much closer to the market – hence mortgage compliance change management, of which software development is a key part.
And that was a really good pivot. Windsurf is going to absolutely crush this space. Their vision is ridiculously bold – to be everywhere, doing everything, all at once. And I believe them when they talk about how they intend to do it. I think they will crush everyone, they certainly would have crushed my original product concept. Phew – dodged a bullet there.
Reflections from Greg Brockman, President OpenAI
Greg really gave Jenson a run for his money on being my idol and personal hero (#jensenisstillthegoat). I’m slightly embarrassed to admit I did not know him before this closing keynote. Well, I certainly won’t forget him. He was absolutely inspiring. This will be the subject of a separate article. Too short on time to do it justice.
The key theme of day one at the AI Engineer World’s Fair was evaluations (and agents, of course), which I am up to my eyeballs in as we prepare for enterprise adoption of Phoenix Burst so the timing was apropos. As a refresher, evaluations are:
The systematic assessment and measurement of the performance of LLMs and their applications.
A series of tests and metrics meticulously crafted to judge the “production-readiness” of your application.
Crucial instruments that offer deep insight into how your application interacts with user inputs and real-world data.
Robust evaluation means ensuring that it not only adheres to technical specifications but also resonates with user expectations and proves its worth in practical scenarios.
Think of evals as the way we test genAI systems to ensure they are actually good. Many of you have heard me say this before – but every genAI demo is amazing. As long as the vibe is good, you can generally have a great demo. But the real value is in the content, and if the content is not “better” then really you don’t have much. Evals are how you measure better. I’ve written before about evals as the new SLAs, and that continues to be true. It’s not real until you understand the evals.
Beyond Benchmarking – Strategies for Evaluating LLMs in Production (Taylor Smith, AI Developer Advocate, RedHat)
Great session with Taylor Smith. I’m already at least an amateur when it comes to evals, certainly no Eugene Yan but I can hold my own. This 80-minute workshop included at least 45 minutes of a quasi-doomed-from-the-start hands-on activity. As usual, as soon as the Jupyter notebooks come out, I know it’s time for me to step out. But the content and presenter were great. I hadn’t though about it this way before, but she placed benchmarking within the context of the super set of model evaluations. Meaning model benchmarks are just a specialized instance of a form of evaluations.
We honed in on two major forms of evaluation – system performance and model performance, both of which are equally important. The latter is primarily focused on content, and the former is around the AI-flavored traditional system performance metrics (latency, throughput, cost, scalability). She placed these within an “evaluation pyramid”.
Tela’s advice – it’s easy to get stuck in eval purgatory, going round and round and round forever and getting nowhere. Just start. It’s easy to start (and much hard to scale) but there’s a framework. Here I am talking specifically about domain specific content evals, these are the differentiated aspects of your application – your moat.
Vibe check – everyone starts here, this is why most genAI demos are great. You get a good vibe from the content.
Human evaluations – this is where the hard work really sets in, and you need content subject matter experts for this. This is a painstakingly precise activity to create or acquire “known good” baseline content and compare model results to the baseline. We use these results to identify bugs, prompt optimization needs, and any fundamental flaws (which, of course, you hope you won’t find).
System evaluations – one you have your human evals, we move to automate them in the system so they can be rerun anytime we make a change. This is really the new way of performing regression testing.
Trust me, you need evals. Evals are the only reliable way to get to production. And we are up to our eyeballs in them. So much gratitude for Vicki Lowe Withrow and her amazing curation team. But I digress. Two great examples of why you need evals – Stable Diffusion and the infamous Google AI glue pizza incident.
Bloomberg found that “the world, according to Stable Diffusion is run by white male CEOs, women are rarely doctors, lawyers, or judges, men with dark skin commit crimes, while women with dark skin flip burgers.” Yikes. Come on guys, we can do better.
Why does this happen?
ARXIV (pronounced “archive”) is where most AI papers are published first, sometimes months before they ger peer reviewed and fully vetted. A paper published by researchers at Rice University (Professor Richard Baraniuk) called Self-Consuming Generative Models Go MAD pointed out that “our primary conclusion across all scenarios is that without enough fresh real data in each generation of an autophagous loop, future generative models are doomed to have their quality (precision) or diversity (recall) progressively decrease. We term this condition Model Autophagy disorder (MAD), making analogy to mad cow disease."
An autophagous loop (also called a self-consuming loop) is when AI models are trained on data generated by previous AI models, creating a feedback cycle where the model essentially "eats its own tail".
Building Multimodal AI Agents from Scratch
I randomly met Papanii Okai standing in the hallway waiting for this one to start, what a small world. I mean “of all the gin joints in all the towns in all the world…”. Was great to see a fellow industry zealot out in the wild. Jupyter notebooks also made a significant appearance at this one, which is where I made my exit. The opening was a lot of primer material on agents, but we did get into patterns for creating multimodal agents, which we have done, but I hadn’t put it together that that’s what we did.
Quick refresher – there are four main components of an AI agent: perception (how it gets information), planning and reasoning, tools (external interfaces, actions), and memory.
Perception – the mechanism used to gather information in its environment. You can think text, images, speech, a multimodal mix, and physical sensor data.
Planning and reasoning – the process of figuring out how to solve the problem and then create a task based on that understanding. Plans can be with or without feedback. The former uses a zero shot or few shot approach, and the latter uses frameworks like ReAct (Reasoning and Acting), which is a prompting paradigm that combines reasoning and action-taking in language models, creating a loop.
Tools – these are functions typically, what have two types of instructions – when they should be called and the arguments for their use. The tool needs to be defined in the tool schema.
Memory – enables the agent to remember, reason, and learn from past interactions. Memory is a complex topic, for agents it takes two forms – short term memory (think one conversation) and long-term memory (thing multiple conversations over time). It is memory that enables personalization.
Multimodality refers to the ability for machine learning models to process, understand, and generate data in different forms. I didn’t realize there were different embedding models for different medium, which, of course, makes sense. The really big takeaway from this session was the emerging alternatives to typical RAG approaches to parsing, chunking, and vectorizing – which many of us know is a major pain in the ass.
Context loss at chunk boundaries.
Complex element extraction pipelines.
Parallelized embedding models like CLIP (OpenAI) where images and text go on separate paths and you end up with irrelevant things whose vector embeddings are inappropriately near each other.
There is a new type of transformer available, VLM based, where “screenshots are all you need” (see what they did there?). Preparing mixed modality data for retrieval can require data transformers, vision transformers, and possibly table to text converters. This alternative has the document snapped one page at a time and fed into the VLM. The amazing benefits of this were espoused, but the audience was skeptical and brought up many valid questions that had so-so answers. Worth looking at for sure but the silver bullet we all want in not yet out there.
This is actually NOT the pattern we discussed but it was the best I could find. You just take basically a screen snap, page by page, and feed that into the VLM.
Model Maxxing with OpenAI (Ilan Rigio, Developer Experience Engineer, Open AI)
Ilan Bigio was a great presenter. There was a lot of good content covered here, even though it wandered a bit at the end. The basic message that was reinforced up front is stick with prompt engineering/tuning as long as possible and until you really know that you might be able to do better – then consider fine tuning. Meaning, you have good command of your evals, you know you can do better, and you have exhausted what can be done with prompting. This was validating.
As a refresher for some of us, prompting is like a bunch of general-purpose tools that you can use to do a wide variety of things. Fine tuning is like a precision laser-guided table saw. You can do a smaller number of things incredibly well. Prompting has a low barrier, low(er) cost (relative to fine tuning), and is generally enough for most problems. Fine-tuning incurs a higher up front cots, takes longer to implement, and is good for specialized performance gains of a particular type.
Three types of fine tuning were covered – supervised fine tuning (SFT), direct preference optimization (DPO), and reinforcement fine tuning (RFT). With SFT, think about “imitation” or, DPO is “more like this and less like that”, and RFT is the epic “learn to figure it out”. I’m not fully grasping how RFT works but…wait for it… I’ll figure it out. (See what I did there?).
I hope Ilan won't mind I borrowed his diagram.
AWS and Anthropic Networking
I had high hopes for the AWS and Anthropic event, especially because I had to effectively submit a proposal for why I should be accepted to attend this event. I figured if there was an application process, surely this would be top notch. Two highlights here, hearing directly from Anthropic about their new products and vision for the future, and, of course, finding the non-men at the event. Yes, as per usual, it’s a sea of men at these events (nothing against men), with just a light dusting of non-men. I did manage to find a few of my people.
I was not aware of what could be done with Claude Code until seeing the demo at this event, and it was powerful. It has the feel of a command line application, but I have no doubt that I will be able to use it effectively based on what I saw. I suspect it will be able to do a lot of things that the team currently does in Replit, perhaps more effectively.
The "to-do" capability promises to be next level and apparently you can interrupt the flow and alter the to-dos. Whoa.
Mongo DB Networking
Among the most interesting parts of the day was at the very end. After leaving the okish AWS/Anthropic event early, I decided to head over to the host hotel for the Mongo DB welcome reception. I met the very interesting Mark Myshatyn, who is literally “the AI guy” for Los Alamos Labs. I asked him, “So what exactly does Los Alamos Labs Do?” and he goes, “Did you see Oppenheimer? We do that.” Whoa. He’ll be speaking at the AWS Summit in DC, if you are attending you won’t want to miss it.
Tela’s Parting Thoughts on Day 1
It was an amazing day, and we are just getting started. The people, the content, the experience – so worth it. I do these events so I can continue to find out what’s going on, and adapt what’s going on in the work I do for my mortgage clients. I intend to:
Continue work we are doing with multi-agent solutions, especially expand our use of multi-modal agents and possible alternatives to traditional RAG pipelines.
Get through our extensive eval optimization through prompt tuning and consider, if needed, fine tuning. I’m not convinced we’ll need it but we might and I won’t be afraid of it.
Explore the use of Claude Code as an adjunct to our current tolling for product development acceleration.
Continue to have the enormous gratitude I have for the opportunity to attend events like these, especially in difficult times.