Paige Roberts

Consultant at Strigid Insight

Hamilton, United States

Paige Roberts (@RobertsPaige) has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 25 years. She contributed to "97 Things Every Data Engineer Should Know" and co-wrote "Accelerate Machine Learning with a Unified Analytics Architecture" and "Up and Running with Aerospike" all published by O'Reilly Media. She's spoken at conferences like Big Data London, Strata, DBTA Data Summit, Data Connect, ODSC, and Big Data Conference Europe. She's worked for companies like Data Junction, Pervasive, Bloor Group, Hortonworks, Syncsort, Vertica, and GridGain. Now, she promotes understanding of realtime distributed data processing, high scale data engineering architecture, and how the analytics revolution is changing the world.

Available For: Advising, Authoring, Consulting, Speaking
Travels From: Texas
Speaking Topics: AI, Analytics, Machine Learning, Data Architecture, Event Stream Processing, Data Pipelines, Data Management

Speaking Fee $5,000 (In-Person), $2,000 (Virtual)

Paige Roberts	Points
Academic	31
Author	213
Influencer	82
Speaker	85
Entrepreneur	0
Total	411

Points based upon Thinkers360 patent-pending algorithm.

Thought Leader Profile

Portfolio Mix

Strigid Insight

Consultant

https://medium.com/@paigeonthewing

Company Type: Individual

Theatre: North America, EMEA, global

Minimum Project Size: N/A

Average Hourly Rate: $200-$300

Number of Employees: N/A

Company Founded Date: 2025

Media Experience: 25 years

Areas of Expertise

5G 30.05

Agentic AI 30.22

AI 32.54

Analytics 44.35

Big Data 35.32

Climate Change 30.11

Cloud 30.69

Cybersecurity 30.27

Design Thinking 30.36

DevOps 65

Digital Transformation

Emerging Technology 30.05

IoT 30.34

IT Operations 30.70

Management 30.09

Marketing

Predictive Analytics 30.50

Risk Management 30.13

Industry Experience

Other

Publications

3 Academic Certifications

Hazelcast Platform Essentials
Credly
June 14, 2025

Basics of Hazelcast structure, purpose, and operation

See credential

See publication

Tags: AI, Analytics, Big Data

GridGain Developer Essentials: Apache Ignite
GridGain
November 26, 2024

The owner of this badge attended an Apache Ignite Essentials training session for developers and architects and learned how to increase the speed and scale of applications by leveraging Apache Ignite's essential capabilities (data partitioning, affinity co-location, and co-located processing)

See publication

Tags: AI, Analytics, Big Data

IASA CITA-Foundation
IASA Global
May 29, 2023

Earners of the IASA CITA-Foundation badge have demonstrated the ability to interact with Architects based on an awareness of the IT Architecture Body of Knowledge, with an understanding across the five pillars of Architecture.

See publication

Tags: AI, Analytics, Big Data

1 Academic Course

Women in Architecture
IASA Globa
March 14, 2024

We drive the expansion of women in architecture roles towards an equal balance and enable women to thrive and contribute with impact.Through the increased and equal contributions of women in architecture, we will leverage our own diversity as the way forward to build unity and harness the full power of architecture to make a difference for our organizations, societies, and world.

See publication

Tags: Big Data

32 Article/Blogs

Streaming vs Batch, Lakehouse, and Data Architectures of the Future
Medium
August 14, 2025

My old friend Jesse Anderson has a video podcast series called Unapologetically Technical. He interviews prominent folks in the data management space, discussing their careers. Then the guest interviewee dives into a technical topic on a whiteboard. The latest guest on Unapologetically Technical was ME! And I know this is shocking to all who know me, but the technical subject I dove into was data architectures, past, present, and future.

See publication

Tags: AI, Analytics, Big Data

https://medium.com/@paigeonthewing/gen-ai-writing-theft-and-llms-eating-their-own-tails-7789a16a04d5
Medium
May 27, 2025

Not using AI to generate my content doesn't make me "anti-AI." You can't spell Paige without "AI." Generative AI and the Large Language Models (LLM) behind it, are powerful, with a lot of great uses. Still, as a multi-decade experienced writer, and data management industry pundit, I have no need to use generative AI to write my content for me. But, the opposite isn't true. To train the next generation of more capable gen AI models, apparently, Meta apparently needs to train them on my copyrighted content.

See publication

Tags: Agentic AI, AI, Analytics

Gen AI, Writing, Theft, and LLMs Eating Their Own Tails
Medium
May 27, 2025

The other day, I posted a rant on LinkedIn about someone referring to me as “anti-AI” because I don’t use Gen AI to create my content. For the record, I am the opposite of anti-AI. I am doing my best to help people make AI successful in their organizations by feeding AI more and better data faster. I’ve made my living for years talking about better data architectures for AI. Fred Lardaro commented on that rant

See publication

Tags: AI, Analytics, Big Data

Agentic AI is the New Internet
Medium
October 17, 2024

It’s obvious that agentic AI is the next logical step in the AI evolution.

Agentic AI is the hottest new buzzword. It takes the one shot — prompt it and get a response — of generative AI, and puts that as a single step in a more complex workflow which at the end DOES something. The steps in the workflow are things like planning, revision, checking or testing, using tools to do more things like searching the net for more information, using multiple AI agents to do different parts of a job so it can be done in parallel. There are a lot of concepts in Agentic AI designed to make Gen AI more useful, more like a complete application.

See publication

Tags: Agentic AI, AI, Emerging Technology

APTs are Everywhere — Why my piddly site keeps getting Cyrillic junk comments and attacks from Singapore
Medium
September 20, 2024

Anyone with their resume on line, a blog, or any kind of personal website is a target of nation-state sponsored advanced persistent threat attacks. An old friend, Eric Kavanagh, recently lamented on LinkedIn that his analytics showed multiple attacks on his data management journalism site from Singapore and Bulgaria.

See publication

Tags: AI, Analytics, Big Data

When Linear Scaling is Too Slow—Build for Fast Hardware
Medium
February 09, 2024

The main question the talk I gave at Data Day Texas 2024 sought to answer was: What strategies do cutting edge database technologies use to get eye-popping performance at petabyte scale?

All of the strategies I’ve talked about in previous posts have been spinning disk focused, mainly strategies that Vertica, an analytical database built for commodity hardware uses. Anything that takes longer than a microsecond for many use cases is too slow. The strategies I discussed earlier involve optimizing spinning disk I/O, but in the end, spinning disk can only go so fast. But Aerospike needed a way to take transactional databases to the next level and flash drives held the key. So, they built an entire database on top of this new SSD technology.

Build your software to run on the fastest available hardware.

See publication

Tags: Analytics, Big Data, Design Thinking

When Linear Scaling is Too Slow - Compress Your Data
Medium
February 08, 2024

The main question the talk I did at Data Day Texas 2024 sought to answer was: What strategies do cutting edge database technologies use to get eye-popping performance at petabyte scale?

The strategy that really takes an application past linear scaling and into reverse linear scaling is one that Vertica (or whatever OpenText is going to call it now) has mastered — aggressive data compression.

Data compression can take your application way beyond linear scaling to reverse linear scaling.

See publication

Tags: Analytics, Big Data, Design Thinking

When Linear Scaling is Too Slow - Share Nothing
Medium
February 07, 2024

The main question the talk sought to answer overall was: What strategies do cutting edge database technologies use to get eye-popping performance at petabyte scale?

The next strategy I recommended at Data Day when designing software for high performance at extreme scale is a shared nothing architecture. Both Vertica and Aerospike have it, and frankly, any distributed data processing system built since Hadoop should have this foundational strategy. They don’t all have it, but they should. Hadoop taught us that edge, master, leader, whatever you want to call it nodes were not helpful. I have been surpised over the years to find otherwise smart distributed data processing systems like Presto still making that mistake.

One lesson learned from the last 15 years was that differentiated nodes are bottlenecks that can limit scaling.

See publication

Tags: Analytics, Big Data, Design Thinking

When Linear Scaling is Too Slow: Isolate Your Workloads
Medium
February 06, 2024

I used knowledge from many years in this industry, including 5 years working at Vertica. Plus, a fair amount of knowledge gained from working on an O’Reilly book for Aerospike alongside the architect, and a lead database engineer from The TradeDesk, a company that uses both Vertica and Aerospike.

The main question the talk sought to answer was: What strategies do cutting edge database technologies use to get eye-popping performance at petabyte scale?

The first good strategy I presented was workload isolation.

See publication

Tags: Analytics, Big Data, Design Thinking

When Linear Scaling is Too Slow — The last strategy you should use.
Medium
February 05, 2024

How do you design for data processing performance at extreme scale?

Sometimes linear scaling just isn’t good enough. What strategies do cutting-edge technologies use to get eye-popping performance in petabyte scale databases?
What scaling strategy that everyone thinks of first, is actually the last strategy you should use?

See publication

Tags: Analytics, Big Data, Design Thinking

What’s New in OpenText Analytics Database
OpenText
January 17, 2024

The newest version of OpenText Vertica 24.1 (representing the first quarter of 2024) is all about saving operating costs while boosting value. The star in this release is an extraordinary new capability – workload routing. It makes each job more efficient and performant, decreasing spending and energy usage for each type of work by directing it to ideal hardware automatically. Read on to learn more or request a demo of the OpenText Analytics & AI platform today.

See publication

Tags: AI, Analytics, Big Data

Busting Cloud Myths and Avoiding Cloud Pitfalls
Cloud Computing Magazine
September 22, 2023

Cloud computing is one of the greatest IT revolutions of our time, but many of the things that “everyone knows” about cloud are actually myths. Let’s explore the most common concepts surrounding cloud computing to determine which have been proven out by time, and which myths have been busted.

See publication

Tags: Analytics, Big Data, Cloud

AIOps: 4 Common Challenges and 3 Key Considerations for Using AI in IT Operations
CDO Magazine
August 29, 2023

AIOps projects hold tremendous potential in IT operations management. However, only about half of all AI projects see production. Author Paige Roberts sheds light on the critical factors that add to that difficulty along with the important considerations that can help.

See publication

Tags: AI, Analytics, IT Operations

What Does Real-time Really Mean In Data Analytics?
InsideBigData
August 25, 2023

Demystifying Real-time Data Analytics: Understanding Definitions, Categories, and Strategies for Unlocking Value in the Data-driven Era

Is an analytical response within 300 milliseconds on data generated yesterday considered realtime? In today’s fast-paced digital landscape the concept of realtime data analysis has become increasingly prevalent and essential to business success. Yet, there’s a lot of confusion about what “realtime” really means.

Understanding the definitions when discussing real-time data analysis is crucial to unlocking the potential of realtime analytics to propel business growth in this data-driven era.

One refinement I propose is the need to differentiate between end-to-end realtime data analysis and fast response from already prepared data.

See publication

Tags: Analytics, AI, IoT

Consider Performance, Growth & Budget When Buying Data Analytics
DevOps.com
May 09, 2023

So, your business needs to invest in data analytics technology to improve efficiencies, competitive advantage or business outcomes. There are a ton of things to consider, but underlying each decision are three driving factors:

Performance – The need to meet service level agreements.
Growth – The need to accommodate success, as well as deal with inevitable industry changes.
Budget – The need to do both of the above while keeping costs low enough to not erode the profit gain from analytics.
These three forces underpin nearly every technology selection decision, but what frequently isn’t noticed is how much these three drivers interact.

See publication

Tags: Analytics, Cloud, DevOps

Save the Planet with Better Software — Vertica Analytics for ESG
Medium
April 21, 2023

Whether you call it Green IT or ESG (Environmental, Social, Governance), every responsible company in every industry is striving to change the way they do things to do their part to save this planet. The compute industry has been focusing on this concern for a long time. Co-location data centers have strived for years to go from giant energy hogs to zero carbon impact.

A lot of people don’t think that software like Vertica, now a part of OpenText Analytics and AI, has any contribution to make. But they’d be wrong. I’m going to make a wild claim. And then I’m going to back it up with some facts.

If every company that does big data analytics now switched to Vertica, we could cut energy usage for analytics globally in half.

See publication

Tags: Analytics, Big Data, Climate Change

Is Cloud Repatriation a Big Lie Server Vendors Are Shilling?
Spiceworks
April 03, 2023

Cloud repatriation spotlights the need to avoid letting hype determine your analytics strategy. Paige Roberts, Open Source relations manager at Vertica, gets to the bottom of what reverse cloud migration is all about and the purpose it may serve.

See publication

Tags: Analytics, Cloud, Design Thinking

Why Can’t We Have AI-Driven Database Design?
Medium
February 21, 2023

One of the most time-consuming, expensive, expertise-intensive aspects of getting a new analytics database up and running is data modeling and overall database design. There’s this huge, months long project that involves interviewing all the folks wanting to do analytics — trying to understand their requirements, and then understand the data in relation to those requirements. How much data do they need to analyze, how fast is that data coming in, what kind of analyses do they do on it? Will they need to power a dashboard, run an application, analyze locations, deal with the avalanche of IoT data? Usage patterns? Constraints? Primary keys? Naming conventions? Indexes?

So many questions, and so much time and effort by smart people to determine how a database’s tables, indexes, etc. should be set up.

A few companies are pioneering a better way.

See publication

Tags: AI, Analytics, Big Data

Five Reasons Why In-Database Machine Learning Makes Sense
IASA Architecture and Governance Magazine
February 12, 2023

Of the few ML projects that make it to production, most take 3 months to a year. Organizations don’t receive benefits from data science until it’s in production. Does that mean adding another technology to bloated stacks? Vertica’s presentation is about getting machine learning into production faster with something nearly every company already has: a good analytics database.

See publication

Tags: Analytics, AI, Big Data

New O’Reilly Book: Accelerate Machine Learning with a Unified Analytics Architecture
Medium
February 16, 2022

Between 40 and 60% of machine learning projects fail, most at the point in the workflow between proof of concept and production. One day, it may be as easy for an organization to put an ML model into production as it is to put a new visualization in a BI report. The right data architecture design can be the key.

See publication

Tags: AI, Analytics, Big Data

What’s the Difference Between a Data Lakehouse and a Unified Analytics Platform?
Architecture and Governance Magazine
November 12, 2021

I’ve been doing a bunch of speeches at various conferences on the merging of the data warehouse and data lake into a single unified analytics platform. I inevitably get one question, “How is this different from a lakehouse?” There are two answers, a short one that’s glib and easy, and a longer one that really dives into things. Short answer, “They’re extremely similar architectural concepts.” The rest of this article is the long answer.

See publication

Tags: Analytics, Big Data, Predictive Analytics

It’s a Trap! — Cloud Financial Incentive for Badly Optimized Analytics Software
Medium
October 15, 2021

For all the years I’ve been working with data management and analytics software, there’s always been a powerful motivation to be as efficient as possible. The smarter your software is about using available computer resources — hardware, disk, memory, CPU… — the bigger your edge over the competition. The happier your customers are, the more money your company makes. The financial incentive to be more and more performant on less and less compute has always been enough to motivate endless tweaks to eke out just a little more speed, or figure out ways to do just a little bit more with the same hardware.

This benefits the customer, who constantly gets better and better software.

Then the cloud came along, and things seemed the same, for the most part. You could no longer say “hardware” to mean the storage and compute infrastructure, but I still assumed everyone in the data management and analytics software industry was in that same race, to be more and more performant on less and less compute “infrastructure.”

See publication

Tags: Analytics, Big Data, Cloud

What Do People Mean by “Cloud-Native?”
Medium
July 28, 2021

Cloud-native is an important buzz word in the data storage and analytics space these days. The way we hear folks use it to advertise their software, it sounds like it must be something wonderful, a data analytics superhighway. But it seems like the meaning shifts depending on who is saying it. It’s a big red flag to me when a phrase means whatever people want it to mean at that moment, mainly to convince you that their software is superior to other software in some nebulous, undefined way, so you’ll buy it. The next time you hear someone using cloud-native in a sentence, consider what they might actually mean.

See publication

Tags: Analytics, Big Data, Cloud

Container Boom: Should Databases Be Containerized?
Rtinsights
June 11, 2021

Several years back, the application technology industry had this concept of breaking big applications up into smaller independent components, microservices, and deploying each in its own container. The container idea has some pretty cool advantages it turns out:

See publication

Tags: Analytics, Big Data, Cloud

Why is Cloud Repatriation Happening?
https://www.rtinsights.com/
March 16, 2021

More and more organizations who went all-in on cloud early are now finding that some analytics workloads are better on-premises and are pulling those workloads back.

See publication

Tags: Analytics, Big Data, Cloud

More

1 Author Newsletter

Moving to real-time stream processing-some basics
Linkedin
September 03, 2025

Welcome to Streaming Intelligence News, the newsletter designed to provide you with the latest information in streaming and graph data processing. This edition discusses streaming data processing, data pipelines, and how they fit together in an analytics architecture or real-time application architecture.

See publication

Tags: AI, Analytics, Big Data

3 Books

Aerospike Up and Running
O'Reilly
August 25, 2023

Early Release:
If you're a developer looking to build a distributed, resilient, scalable, high-performance application, you may be evaluating distributed SQL and NoSQL solutions. Perhaps you're considering the Aerospike database.

This practical book shows developers, architects, and engineers how to get the highly scalable and extremely low-latency Aerospike database up and running. You will learn how to power your globally distributed applications and take advantage of Aerospike's hybrid memory architecture with the real-time performance of in-memory plus dependable persistence. After reading this book, you'll be able to build applications that can process up to tens of millions of transactions per second for millions of concurrent users on any scale of data.

This practical guide provides:

Step-by-step instructions on installing and connecting to Aerospike
A clear explanation of the programming models available
All the advice you need to develop your Aerospike application
Coverage of issues such as administration, connectors, consistency, and security
Code examples and tutorials to get you up and running quickly
And more

See publication

Tags: AI, Analytics, Big Data

Accelerate Machine Learning with a Unified Analytics Architecture
O'Reilly
February 12, 2022

Unification of data warehouse and data lake architectures into something new - whether you call it a unified analytics architecture, a data lakehouse, or something else - is a trend that nearly every company seems to be moving toward over the last five years. This new architecture combined with in place machine learning on whole data sets is revolutionizing how data analysis at scale gets done. Read this book to learn how you can get machine learning models into production in minutes, not months.

See publication

Tags: AI, Analytics, Big Data

97 Things Every Data Engineer Should Know
O'Reilly
July 06, 2021

From the Preface
Data engineering as a distinct role is relatively new, but the responsibilities have existed for decades. Broadly speaking, a data engineer makes data available for use in analytics, machine learning, business intelligence, etc. The introduction of big data technologies, data science, distributed computing, and the cloud have all contributed to making the work of the data engineer more necessary, more complex, and (paradoxically) more possible. It is an impossible task to write a single book that encompasses everything that you will need to know to be effective as a data engineer, but there are still a number of core principles that will help you in your journey.

This book is a collection of advice from a wide range of individuals who have learned valuable lessons about working with data the hard way.

To save you the work of making their same mistakes, we have collected their advice to give you a set of building blocks that can be used to lay your own foundation for a successful career in data engineering. In these pages you will find career tips for working in data teams, engineering advice for how to think about your tools, and fundamental principles of distributed systems.

There are many paths into data engineering, and no two people will use the same set of tools, but we hope that you will find the inspiration that will guide you on your journey. So regardless of whether this is your first step on the road, or you have been walking it for years we wish you the best of luck in your adventures.

See publication

Tags: Analytics, Big Data, DevOps

1 Keynote

Strategies to Modernize Your Data & Analytics Architecture
Camp IT Education
June 30, 2020

Data warehouses were analytics workhorses for decades, but couldn’t handle modern data volumes, types, and advanced analyses like machine learning. Big Hadoop promises about the data lake didn’t pan out. Learn how successful past, current and future architectures combine strengths of data lakes and data warehouses to make something better than both.

See publication

Tags: Analytics, Big Data, Predictive Analytics

2 Media Interviews

Unapologetically Technical Paige Roberts - Independent Ep.23
Youtube
August 12, 2025

In this episode of Unapologetically Technical, Jesse interviews Paige Roberts, a 30-year veteran of the data industry who describes herself as an author, presenter, and "data nerd." Paige shares her multi-path career journey, revealing how she went from teaching deaf students and learning programming on the fly to becoming a programmer, tech writer, consultant, marketing manager, and data architect, often wearing multiple hats at once.

See publication

Tags: AI, Big Data

Faster Time-to-Value with In-Database Machine Learning
https://techhq.com/
February 03, 2022

We spoke recently to Paige Roberts, the Open Source Relations Manager at Vertica, about how organizations solve some of the problems of getting advanced analytics projects into production, reducing the time taken to have ML models start producing practical and useful results for businesses using in-database machine learning.

See publication

Tags: AI, Analytics, Predictive Analytics

17 Speaking Engagements

Architecting for Speed & Scale - Get Consistency & Real-Time Latency With a DIH
DBTA Data Summit
May 15, 2025

Systems of record (SORs) are scattered across large enterprises, each individually fit for a specific purpose. If you want to use that data to digitally transform business, you need to access all your data to drive applications and analytics. A data integration hub (DIH) isn’t another database. It’s an architectural concept that fits in between SORs and front-end applications. Necessary data is provided at real-time speed, and long-term data is reconciled across sources and persisted dependably, regardless of source format. Come to this talk to see some real-world implementations in financial, telecom, transportation, and logistics industries of a DIH. Learn the concepts, tips, tricks, and gotchas.

See publication

Tags: AI, Analytics, Big Data

Unlocking the Power of Real-Time Data and Analytics
DBTA
November 14, 2024

DBTA round table webinar with multiple presenters. Each did a short presentation, then a panel discussion on real-time data processing challenges and important tips and tricks.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Streaming Graph Processing on Categorical Data Enables Real-time Risk Calculation
Women in Analytics - Data Connect 2024
July 12, 2024

Failures like the Silicon Valley Bank in 2023 is the extreme result of not accurately calculating risk in a timely manner. Nearly every financial institution has a focus on minimizing risk, but the way we calculate that inherently requires close analysis of categorical data and relationships. Yet the majority of our algorithms only work on static, numeric data. That means persisting the data, converting it using something like one hot encoding into numerical data that is bloated, sparse, and slow to analyze, then after analysis, often having to convert again to figure out the original categories. This is painfully slow, with the state of the art being measured in hours. If we could shift that analysis left, process the original categorical data as it streams in, without modification, that could cut mean time to insight down to seconds, and possibly save financial institutions some large dollar signs. That could also enable many other options, such as using graph NLP on flowing data, finding novel behavior, detecting anomalies such as cyber-attacks before they affect systems. The speed of an in-line data processing engine like Flink or KsqlDb combined with graph algorithms and categorical analysis is uniquely powerful. Come learn about a new open source streaming intelligence system that changes the game for risk analysis and other fast categorical data processing.

See publication

Tags: Analytics, Cybersecurity, Risk Management

Get Better Analytics by Putting Less Data in Your Database
DBTA Data Summit 2024
May 08, 2024

A recent survey showed that 67% of companies had their software budgets cut during 2023. SaaS databases are easy to use and powerful, but they put a strain on budgets. Still, no one can afford to skimp on smart data analytics. How do you get more analytics out of your SaaS data warehouse/lakehouse, without spending more money? Treat incoming data streams as a graph. Relationships and categories of data can immediately be seen and acted upon. Duplicate entities can be resolved. Key pattern signals in noisy data streams can be pinpointed and the noise that you don’t need tossed out. By putting only relevant and clean data into analytical repositories, tons of useless data never have to be stored in pay-per-use systems, vastly reducing costs. You get smarter answers on clean, pre-filtered data in real time.

See publication

Tags: Analytics, Cybersecurity, IoT

Shift Difficult Problems Left with Graph Analysis on Streaming Data
The Bloor Group - Inside Analysis
April 29, 2024

Host @eric_kavanagh will interview former Gartner Analyst Sanjeev Mohan, ExxonMobil Senior Technical Engineer H. Alexander Huskey, and Paige Roberts, Director of Product Innovation at thatDot, who will explain the value of shifting tough analysis to earlier in the process. She'll discuss how a graph analysis of flowing data can benefit your business.

Key business applications include:
• Real-time Risk analysis and Fraud Detection: Uncover intricate fraudulent activity patterns across transactions, user behaviors, and device data that traditional rules-based systems would miss.
• Cybersecurity and APT Detection: Identify anomalous patterns in network traffic, user logins and device logs to proactively prevent security breaches with no time window limitations.
• IoT Edge Smart Filtering: Monitor sensor data streams from industrial and other devices, understand which data is useful so you don’t flood downstream systems, and act immediately when problems arise.

See publication

Tags: Analytics, Cybersecurity, IoT

When linear scaling is too slow – strategies for high scale data processing
Data Day Texas
January 27, 2024

How does the TradeDesk handle 10 million ad auctions per second and generate 40 thousand reports in less than 6 hours on 15 petabytes of data? If you want to crunch all the data to train an LLM AI model, or handle real-time machine scale IoT problems for AIOps, or juggle millions of transactions per second, linear scaling is far too slow.
Is the answer a 1000-node database with a ton of memory on every node? If it was, companies like the TradeDesk would have to declare bankruptcy. Throwing more nodes or serverless executors at the problem either on cloud or on-premises is neither the only, nor even a good solution. You will rapidly hit both performance and cost limitations, providing diminishing returns.
So, how do extreme high scale databases keep up? What strategies in both open source and proprietary data processing systems leave linear scaling in the dust, without eating up corporate ROI? In this talk, you’ll learn some of the strategies that provide affordable reverse linear scaling for multiple modern databases, and which direction the future of data processing is going.

See publication

Tags: Analytics, Big Data, Design Thinking

Reduce Risk and Avoid Lock-in When Moving from On-Prem to Cloud or Hybrid
DBTA
May 10, 2023

There are a lot of advantages to moving to the cloud. But the more various organizations move to the cloud, the more we see them tripping over hidden land mines like out-of-control costs, platform lock-in that restricts future options, regulations that restrict data location, etc. How can a company shift data analysis workloads into the cloud, while minimizing their exposure to those risks?
In this session, you will learn how to:
• transition to the cloud, while keeping some workloads on prem,
• move to multiple clouds without hopelessly complicating analytics
• maintain high throughput and low latency in cloud analytics
• keep platform options for the future open

See publication

Tags: AI, Analytics, Cloud

Use AIOps to Improve Uptime and Reduce Mean Time to Repair
IWCE
March 27, 2023

Hardware breaks. While a lot of industries talk about going to the cloud – aka someone else’s hardware – to solve uptime issues, this doesn’t work so well when you ARE someone’s hardware provider. Whether you’re a telecom company with towers and networks, a computer infrastructure company, an energy utility, a manufacturing company, or even a car company – anyone who provides solid metal hardware needs to make sure it keeps running.
AIOps is the smart application of advanced analytics including machine learning to monitor, and in many cases fix IT related problems before the end user is even aware of them. A good example is HPE Infosight, that has embedded analysis capability in every computer they sell. For them, it improves the productivity of support technicians by reducing support calls as much as 80% or more, and boosting customer satisfaction by increasing uptime proportionately. A wide variety of other industries are getting similar benefits by embedding the ability to analyze IoT data.
Telecom is another industry with a constant need to monitor traffic, automatically identify overloaded areas, and reroute calls to prevent outages or dropped calls. Where should your next tower go? How can the flood of 5G data be captured, managed, and put to use, without overloading staff or technology?
In every industry, the key question is: How can I find and fix issues before customers are aware of the problem, or at least vastly reduce the mean time to repair those issues that do occur? AIOps holds the key. Harness the flood of IoT data to catch and prevent incidents before they happen, or fix them as soon as possible after they happen.
This sounds amazing as a concept, but how do companies implement this? What is involved? What gotchas hide in the details? With examples from leaders in multiple industries, this presentation will help you learn:
• What is AIOps and what can you expect from a solid implementation
• Benefits AIOps provides in various use cases: predictive maintenance, performance optimization, network bottleneck identification and remediation, MTTR reduction, etc.
• Example implementations - problems you are likely to run into, requirements to make it work, tradeoffs you need to consider, etc.

See publication

Tags: 5G, Analytics, IoT

Build analytics for performance and growth on a budget
Data Day Texas
January 28, 2023

When building or changing an enterprise analytics architecture, there are a lot of things to consider–Cloud or on-prem, hybrid or multi-cloud, this cloud or that cloud, containerized, build tech, buy tech, use the skills in house, train new skills, etc. While balancing those decisions, there are a lot of considerations, but the main three are performance, costs, and planning for the future including future growth in analytics demand.
In this session, get some solid data on how to build a data analytics architecture for performance and rapid growth, without breaking the budget, by focusing on what is important and looking at some examples of architectures at companies that are tackling some of the toughest analytics use cases. Learn from others’ mistakes and successes. Learn how real companies like the Index Exchange, Simpli.fi, and the Tradedesk analyze data up to petabyte ranges, track millions of realtime actions, generate 10’s of thousands of reports a day, keep thousands of machine learning models in production and performing, and still keep budgets under control.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Get Projects into Production Faster with In-DB ML
Big Data Europe
November 24, 2022

MLOps has rocketed to prominence based on one, clear problem: many machine learning projects never make it into production. According to a couple of recent surveys, between 30 and 60% of even the small percentage of projects that do make it take 3 months to a year to get put to work. Since data science is a cost center for organizations until those models are deployed, the need to shorten, organize, and streamline the process from ideation to production is essential.
Data science is not simply for the broadening of human knowledge, data science teams get paid to find ways to shave costs and boost revenues. That can mean preventative maintenance that keeps machines on line, churn reduction, customer experience improvements, targeted marketing that earns and keeps good customers, fraud prevention or cybersecurity that keeps assets safe and prevents loss, or AIOps that optimizes IT to get maximum hardware uptime for minimum costs.
To get those benefits, do you need to add yet another piece of technology to already bloated stacks? There may be a way for organizations to get machine learning into production faster with something nearly every company already has: a good analytics database.
Learn how to:
• Enable data science teams to use their preferred tools – Python, R, Jupyter – on multi-terabyte data sets
• Provide dozens of data types and formats at high scale to data science teams, without duplicating data pipeline efforts
• Make new machine learning projects just as straightforward as enabling BI teams to create a new dashboard
• Get machine learning projects from finished model to production money-maker in minutes, not months

See publication

Tags: AI, Big Data, Predictive Analytics

Reduce Risk and Avoid Lock-in When Moving from On-Prem to Cloud or Hybrid
Big Data and AI Toronto
October 06, 2022

There are a lot of advantages to moving to the cloud. But the more various organizations move to the cloud, the more we see them tripping over hidden land mines like out-of-control costs, platform lock-in that restricts future options, regulations that restrict data location, etc. How can a company shift data analysis workloads into the cloud, while minimizing their exposure to those risks?
In this session, you will learn how to:
• transition to the cloud, while keeping some workloads on prem,
• move to multiple clouds without hopelessly complicating analytics
• maintain high throughput and low latency in cloud analytics
• keep platform options for the future open

See publication

Tags: Analytics, Big Data, Cloud

Achieving Unified Analytics
DBTA Data Summit
May 17, 2022

The data warehouse has been an analytics workhorse for decades for business intelligence teams. But unprecedented volumes and new types of data, plus the need for advanced analyses, brought on the age of the data lake. Now, many companies have a data lake for data science, a data warehouse for BI, or a mishmash of both—possibly combined with a mandate to go to the cloud. Find out how technical and spiritual unification of the two camps can have a powerful impact on the effectiveness of analytics for the business overall.

See publication

Tags: AI, Analytics, Big Data

Data Con LA 2021 - In-Database Machine Learning with Jupyter
DataCon LA
September 29, 2021

Jupyter with Python code is a productive way to prepare models, but putting machine learning models into production at scale may require re-building the entire workflow. Using the same interactive tools, but letting a distributed database do the work could get ML models into production in minutes, not months.

See publication

Tags: Analytics, AI, Big Data

Making Production Data Accessible for Data Science at Scale
Big Data London
September 22, 2021

The data warehouse has been an analytics workhorse for decades for business intelligence teams. Unprecedented volumes of data, new types of data, and the need for advanced analyses like machine learning brought on the age of the data lake. Now, many companies have a data lake for data science, a data warehouse for BI, or a mishmash of both, possibly combined with a mandate to go to the cloud. The end result can be a sprawling mess, a lot of duplicated effort, a lot of missed opportunities, a lot of projects that never made it into production, and a lot of financial investment without return. Technical and spiritual unification of the two opposed camps can make a powerful impact on the effectiveness of analytics for the business overall.

- Look at successful data architectures from companies like Philips, The TradeDesk, Climate Corporation, …
- Learn to eliminate duplication of effort between data science and BI data engineering teams
- See a variety of ways companies are getting AI and ML projects into production where they have real impact, without bogging down essential BI
- Study analytics architectures that work, why and how they work, and where they’re going from here

See publication

Tags: Analytics, Big Data, IoT

Python + MPP Database = Large Scale AI/ML Projects in Production Faster
ODSC East
April 28, 2021

Getting Python data science work into large scale production at companies like Uber, Twitter or Etsy requires a whole new level of data engineering. Economies of scale, concurrency, data manipulation and performance are the bread and butter of MPP analytics databases. Learn how to take advantage of MPP scalability and performance to get your Python work into production where it can make an impact.

See publication

Tags: AI, Big Data, Predictive Analytics

Unifying Analytics - Production Analytics Architecture Evolution
Big Data Virtual Masterclass
July 22, 2020

The data warehouse has been an analytics workhorse for decades. Unprecedented volumes of data, new types of data, and the need for advanced analyses like machine learning brought on the age of the data lake. But Hadoop by itself doesn’t really live up to the hype. Now, many companies have a data lake, a data warehouse, or a mishmash of both, possibly combined with a mandate to go to the cloud. The end result can be a sprawling mess, a lot of duplicated effort, a lot of missed opportunities, a lot of projects that never made it into production, and a lot of financial investment without return.
Technical and spiritual unification of the two opposed camps can make a powerful impact on the effectiveness of analytics for the business overall.
Over time, different organizations with massive IoT workloads have found practical ways to bridge the artificial gap between these two data management strategies. Look under the hood at how companies have gotten IoT ML projects working, and how their data architectures have changed over time. Learn about new architectures that successfully supply the needs of both business analysts and data scientists. Get a peek at the future. In this area, no one likes surprises.
- Look at successful data architectures from companies like Philips, Anritsu, Uber, …
- Learn to eliminate duplication of effort between data science and BI data engineering teams
- Avoid some of the traps that have caused so many big data analytics implementations to fail
- Get AI and ML projects into production where they have real impact, without bogging down essential BI
- Study analytics architectures that work, why and how they work, and where they’re going from here

See publication

Tags: Analytics, Big Data, IoT

Unifying Analytics: Architecting Production IOT Analytics
Pulsar Summit
January 24, 2020

Analyzing Internet of Things data has broad applications in a variety of industries from smart buildings to smart farming, from network optimization for telecoms to preventative maintenance on expensive medical machines or factory robots. When you look at technology and data engineering choices, even in companies with wildly different use cases and requirements, you see something surprising: Successful production IoT architectures show a remarkable number of similarities.
Join us as we drill into the data architectures in a selection of companies like Philips, Anritsu, and Optimal+. Each company, regardless of industry or use case, has one thing in common: highly successful IoT analytics programs in large scale enterprise production deployments.
By comparing the architectures of these companies, you’ll see the commonalities, and gain a deep understanding of why certain architectural choices make sense in a variety of IoT applications.

See publication

Tags: Analytics, Big Data, IoT

1 Video

Unapologetically Technical
YouTube
August 13, 2025

My old friend Jesse Anderson has a video podcast series called Unapologetically Technical. He interviews prominent folks in the data management space, discussing their careers. Then the guest interviewee dives into a technical topic on a whiteboard. The latest guest on Unapologetically Technical was ME! And I know this is shocking to all who know me, but the technical subject I dove into was data architectures, past, present, and future.

See publication

Tags: Analytics, Big Data, IoT

7 Webinars

InsightJam - Where Will Agentic AI Advance the Most in the Next 1-3 Years?
Solutions Review
March 31, 2025

Will agentic AI replace traditional enterprise applications?
What role will humans play in future AI agent-driven systems?

Our industry experts explore the near-future evolution of agentic AI, emphasizing its shift from handling tasks to reshaping entire applications and workflows. The panel discusses how AI will create personalized interfaces that respond to natural language rather than requiring complex button sequences. They highlight AI's emerging role as an orchestration layer working alongside existing systems while enabling digital twins that handle routine interactions on our behalf.

See publication

Tags: Agentic AI, AI, Analytics

The Race to Unified Analytics: Next-Gen Data Platforms and Architectures
DBTA
July 13, 2023

Despite ongoing investment in cloud platforms and analytics tools, many companies are still struggling with truly unlocking the business value of all their data. While the ways that users want and need to work with data continue to evolve, the increasing complexity of data and analytics systems and proliferation of data silos often pose significant hurdles. This is especially true when data is managed across a patchwork of legacy systems and new technologies adopted tactically without full consideration of broader data management imperatives and future needs. Ultimately, organizations want to empower their users with fast, easy access to actionable, reliable information and insights.

See publication

Tags: AI, Analytics, Big Data

Unlock the Value in Data: Rise of Hybrid Cloud, Multi-Cloud Platforms
Vertica
April 21, 2022

Being limited to analyzing data on-premises is a known problem. But analytics limited to cloud, or just a single cloud vendor, can also reduce the return on your data investment. To unlock the value of data, companies must embrace the reality of a hybrid world.

In this webcast, we’ll dive into solid Eckerson Group research on how companies across industries are getting their arms around data in multiple clouds and on-prem systems. ThinkData Works is an example of a successful technology company at the center of this important trend. We invite you to learn how ThinkData Works is helping customers pull in new sources and manage external data at scale to reduce risk, boost efficiency, and drive innovation.

See publication

Tags: Analytics, Big Data, Cloud

Cloud Without Compromises: Crucial Analytical Data Platform Requirements
Vertica
March 15, 2022

Most organizations are moving their analytical data platforms – whether based on data warehouses, data lakes, or both — into the cloud. But how do you choose the right platform to fit your organizational realities, your technology strategy and direction, and important product requirements? What are the compromises in choosing a platform that is only available as a cloud service or only available in one cloud? And what are the capabilities you should look for beyond support for business intelligence and analytics, particularly when it comes to supporting machine learning and data science?

Join Doug Henschen, VP and principal analyst at Constellation Research, and author of “What to Consider When Choosing a Cloud-Centric Analytical Data Platform,” for this informative web event on March 10 at 8 am PT/11 am ET. He’ll be joined by Paige Roberts, Open Source Relations Manager at Vertica, and by Bert Corderman, Senior Manager of Engineering at The Trade Desk.

See publication

Tags: Analytics, Big Data, Cloud

Find the Balance Between MPP Databases and Spark for Analytical Processing
Vertica
August 25, 2021

Both Apache Spark and massively parallel processing (MPP) databases are designed for the demands of analytical workloads. Each has strengths related to the full data science workflow, from consolidating data from many siloes, to deploying and managing machine learning models. Understanding the power of each technology, and the cost and performance trade-offs between them can help you optimize your analytics architecture to get the best of both. Learn when using Spark accelerates data processing, and when it spreads far beyond what you want to maintain. Learn when an MPP database can provide blazing fast analytics, and when it can fail to meet your needs. Most of all, learn how these two powerful technologies can combine to create a perfect balance of power, cost, and performance.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Thought Leadership: Modernize Data Warehousing – Beyond Performance
Vertica
March 15, 2021

Configuration, management, tuning and other tasks can take away from valuable time spent on business analytics. If a platform leads to coding workarounds, non-intuitive implementations and other problems, it can make a big impact on long-term resource usage and cost. A lot of enterprise analytics platform evaluations focus on query price-performance to the exclusion of other features that can have a huge impact on business value, and can cause major headaches if you don’t take them into consideration.

In this webinar, we’ll go beyond price-performance, and focus on everything else needed to modernize your data warehouse.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Natural Language Processing Augmented Analytics
Vertica
November 17, 2020

The goal of data analytics, whether business intelligence or advanced analytics like machine learning has always been to guide organizations with solid data, rather than feelings. While every company strives to be data-driven, this requires making analytics accessible to more people. What could be more accessible than asking your data a question in your own language? Tune in to learn about natural language processing, the challenges and benefits of this exciting technology, and how it can democratize data analytics, and bring business results to the next level.

See publication

Tags: AI, Analytics, Predictive Analytics

Thinkers360 Credentials

10 Badges

Radar

1 Technology

Agentic AI coding - Specifications as Code

Date : June 21, 2025

As agentic AI workflows become more capable of writing code, testing it, tweaking it, and producing nearly complete applications based on the specifications, software engineering will be radically accelerated, and democratized to some extent. Impact and risk factors are that a lot of organizations will fire engineers, thinking AI can now replace them. This is a seriously bad idea. In the end, they'll either end up hiring them back for higher salaries, or see a major drop in code quality. We're already seeing that rehire happening. Eventually, this specs as code technique could make good software a thing that many more people can create or modify. But the AI itself IS code. Good developers will be needed to make it work, and agentic workflows will become an extremely valuable area of expertise.

See Radar

Blog

Opportunities

1 SEO & Content Marketing

Content Targeting, Management, and Creation for Data Management, Analytics, and AI

Location: Middle of Nowhere, TX Fees: $200 - $300/hour, by the word, or f

Service Type: Service Offered

Do you need white papers, web content, someone to manage, edit, and contribute to your company blog? I have over 25 years experience in the data management, analytics, and AI space. My specialties are matching your software's strengths to the market's needs, and creating content that is perfectly targeted for the audience you want to reach.

Respond to this opportunity

Paige Roberts

Consultant at Strigid Insight

Hamilton, United States

Speaking Fee $5,000 (In-Person), $2,000 (Virtual)

Thought Leader Profile

Portfolio Mix

Featured Videos

Featured Topics

Company Information

Strigid Insight

Consultant

Areas of Expertise

Industry Experience

Publications

Thinkers360 Credentials

Radar

Blog

Opportunities

Contact Paige Roberts

Book Paige Roberts for Speaking

Book a Video Meeting

Media Kit

Share Profile

Contact Info

Latest Activity

Latest Opportunities

Membership

Membership

Ask for a recommendation

Analyst Relations Portal

Membership

Membership

Restriction Content

Membership

Membership

Membership

Membership

Membership

Quote Limit

Thinkers360 Content Library

Product Feedback

Dashboard

Paige Roberts

Consultant at Strigid Insight

Hamilton, United States

Speaking Fee $5,000 (In-Person), $2,000 (Virtual)

Thought Leader Profile

Portfolio Mix

Featured Videos

Featured Topics

Company Information

Strigid Insight

Consultant

Areas of Expertise

Industry Experience

Publications

Thinkers360 Credentials

Radar

Blog

Opportunities

Video

Become a Member

Membership

Sign in to save your Shortlist

Contact Paige Roberts

Book Paige Roberts for Speaking

Book a Video Meeting

Media Kit

Share Profile

Contact Info

Latest Activity

Latest Opportunities