Thinkers360
Interested in getting your own thought leader profile? Get Started Today.

Paige Roberts

Open Source Relations Manager at Vertica

Hamilton, United States

6429 Followers

Paige Roberts (@RobertsPaige) has worked as an engineer, trainer, support technician, technical writer, marketer, product manager, and a consultant in the last 24 years. She contributed to "97 Things Every Data Engineer Should Know" and co-wrote "Accelerate Machine Learning with a Unified Analytics Architecture - Get ML Models Into Production in Minutes, Not Months" both published by O'Reilly Media. She's worked for companies like Data Junction, Pervasive, Bloor Group, Hortonworks, Syncsort, and Vertica. Now, she promotes understanding of Vertica, distributed data processing, open source, high scale data engineering architecture, and how the analytics revolution is changing the world.

Available For: Speaking
Travels From: Texas
Speaking Topics: Analytics, Machine Learning, Data Architecture

Paige Roberts Points
Academic 0
Author 131
Influencer 79
Speaker 43
Entrepreneur 0
Total 253

Points based upon Thinkers360 patent-pending algorithm.

Thought Leader Profile

Portfolio Mix

Company Information

Company Type: Company

Areas of Expertise

AI 31.02
Analytics 44.47
Big Data 46.23
Cloud 31.30
Digital Transformation
IoT 31.08
Marketing
Predictive Analytics 30.40
DevOps 76.67

Industry Experience

Other

Publications

11 Article/Blogs
New O’Reilly Book: Accelerate Machine Learning with a Unified Analytics Architecture
Medium
February 16, 2022
Between 40 and 60% of machine learning projects fail, most at the point in the workflow between proof of concept and production. One day, it may be as easy for an organization to put an ML model into production as it is to put a new visualization in a BI report. The right data architecture design can be the key.

See publication

Tags: AI, Analytics, Big Data

What’s the Difference Between a Data Lakehouse and a Unified Analytics Platform?
Architecture and Governance Magazine
November 12, 2021
I’ve been doing a bunch of speeches at various conferences on the merging of the data warehouse and data lake into a single unified analytics platform. I inevitably get one question, “How is this different from a lakehouse?” There are two answers, a short one that’s glib and easy, and a longer one that really dives into things. Short answer, “They’re extremely similar architectural concepts.” The rest of this article is the long answer.

See publication

Tags: Analytics, Big Data, Predictive Analytics

It’s a Trap! — Cloud Financial Incentive for Badly Optimized Analytics Software
Medium
October 15, 2021
For all the years I’ve been working with data management and analytics software, there’s always been a powerful motivation to be as efficient as possible. The smarter your software is about using available computer resources — hardware, disk, memory, CPU… — the bigger your edge over the competition. The happier your customers are, the more money your company makes. The financial incentive to be more and more performant on less and less compute has always been enough to motivate endless tweaks to eke out just a little more speed, or figure out ways to do just a little bit more with the same hardware.

This benefits the customer, who constantly gets better and better software.

Then the cloud came along, and things seemed the same, for the most part. You could no longer say “hardware” to mean the storage and compute infrastructure, but I still assumed everyone in the data management and analytics software industry was in that same race, to be more and more performant on less and less compute “infrastructure.”

See publication

Tags: Analytics, Big Data, Cloud

What Do People Mean by “Cloud-Native?”
Medium
July 28, 2021
Cloud-native is an important buzz word in the data storage and analytics space these days. The way we hear folks use it to advertise their software, it sounds like it must be something wonderful, a data analytics superhighway. But it seems like the meaning shifts depending on who is saying it. It’s a big red flag to me when a phrase means whatever people want it to mean at that moment, mainly to convince you that their software is superior to other software in some nebulous, undefined way, so you’ll buy it. The next time you hear someone using cloud-native in a sentence, consider what they might actually mean.

See publication

Tags: Analytics, Big Data, Cloud

Container Boom: Should Databases Be Containerized?
Rtinsights
June 11, 2021

Several years back, the application technology industry had this concept of breaking big applications up into smaller independent components, microservices, and deploying each in its own container. The container idea has some pretty cool advantages it turns out:


See publication

Tags: Analytics, Big Data, Cloud

Why is Cloud Repatriation Happening?
https://www.rtinsights.com/
March 16, 2021
More and more organizations who went all-in on cloud early are now finding that some analytics workloads are better on-premises and are pulling those workloads back.

See publication

Tags: Analytics, Big Data, Cloud

Natural Language Processing Augmented Analytics
https://www.vertica.com/blog/
February 03, 2021
It’s Like Your Data Saying, “Ask Me Anything”

Analytics only makes an impact when it’s put to work to do a job automatically, or more often, help people do their jobs. The more people who can use analytics, the more valuable it becomes. And nearly every role could benefit from answers their company’s data could provide. What stops analytics from becoming part of everyone’s daily routine? It isn’t a slacking data engineering team, or an imperfect data architecture, it’s the interface. If I need to know something for my job, instead of learning complex SQL queries, or interpreting a bunch of graphs, why can’t I just ask?

See publication

Tags: AI, Analytics, Big Data

Deliver Analytics Like Amazon Delivers Packages
https://www.rtinsights.com/
August 31, 2020
Instead of focusing on where the data lives, focus on making the analytics experience as smooth as possible for everyone in your organization.

See publication

Tags: Analytics, Cloud, Predictive Analytics

Evolution of the Modern Data Warehouse
Medium
July 24, 2020
There are a lot of definitions of the data warehouse. I grabbed a random definition off the web. It fits the general understanding in the data management industry of what a data warehouse is, and what it isn’t.

It’s also wrong.

“Data warehousing is a technology that aggregates structured data from one or more sources so that it can be compared and analyzed for greater business intelligence.”

If you’re looking at that definition and thinking, “That looks right to me,” then read on. Once upon a time, I probably would have agreed with this definition as well. But times have changed.

See publication

Tags: Analytics, Cloud, Predictive Analytics

Can Presto SQL on Hadoop Replace Your Data Warehouse?
http://bigdatapage.com/
July 06, 2020
Presto is the best of the SQL on Hadoop open source bunch. Why not just use it and ditch your analytical database? Uber knows why …

See publication

Tags: Analytics, Big Data, Predictive Analytics

What is the Best Hadoop Alternative
Medium
April 28, 2020
Apache Hadoop took the world by storm and looked like it was going to own the data analytics and data management industries for a while there. But now, the hype machine, and the weaknesses of Hadoop — complexity, lack of security and governance, slow performance, poor concurrency, etc. — have everyone looking for a good Hadoop alternative.

Let’s look at some of the options that are being touted for doing Hadoop data analytics, and their pros and cons as Hadoop alternatives.

See publication

Tags: Analytics, Big Data, Predictive Analytics

2 Books
Accelerate Machine Learning with a Unified Analytics Architecture
O'Reilly
February 12, 2022
Unification of data warehouse and data lake architectures into something new - whether you call it a unified analytics architecture, a data lakehouse, or something else - is a trend that nearly every company seems to be moving toward over the last five years. This new architecture combined with in place machine learning on whole data sets is revolutionizing how data analysis at scale gets done. Read this book to learn how you can get machine learning models into production in minutes, not months.

See publication

Tags: AI, Analytics, Big Data

97 Things Every Data Engineer Should Know
O'Reilly
July 06, 2021
From the Preface
Data engineering as a distinct role is relatively new, but the responsibilities have existed for decades. Broadly speaking, a data engineer makes data available for use in analytics, machine learning, business intelligence, etc. The introduction of big data technologies, data science, distributed computing, and the cloud have all contributed to making the work of the data engineer more necessary, more complex, and (paradoxically) more possible. It is an impossible task to write a single book that encompasses everything that you will need to know to be effective as a data engineer, but there are still a number of core principles that will help you in your journey.

This book is a collection of advice from a wide range of individuals who have learned valuable lessons about working with data the hard way.

To save you the work of making their same mistakes, we have collected their advice to give you a set of building blocks that can be used to lay your own foundation for a successful career in data engineering. In these pages you will find career tips for working in data teams, engineering advice for how to think about your tools, and fundamental principles of distributed systems.

There are many paths into data engineering, and no two people will use the same set of tools, but we hope that you will find the inspiration that will guide you on your journey. So regardless of whether this is your first step on the road, or you have been walking it for years we wish you the best of luck in your adventures.

See publication

Tags: Analytics, Big Data, DevOps

1 Keynote
Strategies to Modernize Your Data & Analytics Architecture
Camp IT Education
June 30, 2020
Data warehouses were analytics workhorses for decades, but couldn’t handle modern data volumes, types, and advanced analyses like machine learning. Big Hadoop promises about the data lake didn’t pan out. Learn how successful past, current and future architectures combine strengths of data lakes and data warehouses to make something better than both.

See publication

Tags: Analytics, Big Data, Predictive Analytics

1 Media Interview
Faster Time-to-Value with In-Database Machine Learning
https://techhq.com/
February 03, 2022
We spoke recently to Paige Roberts, the Open Source Relations Manager at Vertica, about how organizations solve some of the problems of getting advanced analytics projects into production, reducing the time taken to have ML models start producing practical and useful results for businesses using in-database machine learning.

See publication

Tags: AI, Analytics, Predictive Analytics

6 Speaking Engagements
Achieving Unified Analytics
DBTA Data Summit
May 17, 2022
The data warehouse has been an analytics workhorse for decades for business intelligence teams. But unprecedented volumes and new types of data, plus the need for advanced analyses, brought on the age of the data lake. Now, many companies have a data lake for data science, a data warehouse for BI, or a mishmash of both—possibly combined with a mandate to go to the cloud. Find out how technical and spiritual unification of the two camps can have a powerful impact on the effectiveness of analytics for the business overall.

See publication

Tags: AI, Analytics, Big Data

Data Con LA 2021 - In-Database Machine Learning with Jupyter
DataCon LA
September 29, 2021
Jupyter with Python code is a productive way to prepare models, but putting machine learning models into production at scale may require re-building the entire workflow. Using the same interactive tools, but letting a distributed database do the work could get ML models into production in minutes, not months.

See publication

Tags: Analytics, AI, Big Data

Making Production Data Accessible for Data Science at Scale
Big Data London
September 22, 2021
The data warehouse has been an analytics workhorse for decades for business intelligence teams. Unprecedented volumes of data, new types of data, and the need for advanced analyses like machine learning brought on the age of the data lake. Now, many companies have a data lake for data science, a data warehouse for BI, or a mishmash of both, possibly combined with a mandate to go to the cloud. The end result can be a sprawling mess, a lot of duplicated effort, a lot of missed opportunities, a lot of projects that never made it into production, and a lot of financial investment without return. Technical and spiritual unification of the two opposed camps can make a powerful impact on the effectiveness of analytics for the business overall.

- Look at successful data architectures from companies like Philips, The TradeDesk, Climate Corporation, …
- Learn to eliminate duplication of effort between data science and BI data engineering teams
- See a variety of ways companies are getting AI and ML projects into production where they have real impact, without bogging down essential BI
- Study analytics architectures that work, why and how they work, and where they’re going from here

See publication

Tags: Analytics, Big Data, IoT

Python + MPP Database = Large Scale AI/ML Projects in Production Faster
ODSC East
April 28, 2021
Getting Python data science work into large scale production at companies like Uber, Twitter or Etsy requires a whole new level of data engineering. Economies of scale, concurrency, data manipulation and performance are the bread and butter of MPP analytics databases. Learn how to take advantage of MPP scalability and performance to get your Python work into production where it can make an impact.

See publication

Tags: AI, Big Data, Predictive Analytics

Unifying Analytics - Production Analytics Architecture Evolution
Big Data Virtual Masterclass
July 22, 2020
The data warehouse has been an analytics workhorse for decades. Unprecedented volumes of data, new types of data, and the need for advanced analyses like machine learning brought on the age of the data lake. But Hadoop by itself doesn’t really live up to the hype. Now, many companies have a data lake, a data warehouse, or a mishmash of both, possibly combined with a mandate to go to the cloud. The end result can be a sprawling mess, a lot of duplicated effort, a lot of missed opportunities, a lot of projects that never made it into production, and a lot of financial investment without return.
Technical and spiritual unification of the two opposed camps can make a powerful impact on the effectiveness of analytics for the business overall.
Over time, different organizations with massive IoT workloads have found practical ways to bridge the artificial gap between these two data management strategies. Look under the hood at how companies have gotten IoT ML projects working, and how their data architectures have changed over time. Learn about new architectures that successfully supply the needs of both business analysts and data scientists. Get a peek at the future. In this area, no one likes surprises.
- Look at successful data architectures from companies like Philips, Anritsu, Uber, …
- Learn to eliminate duplication of effort between data science and BI data engineering teams
- Avoid some of the traps that have caused so many big data analytics implementations to fail
- Get AI and ML projects into production where they have real impact, without bogging down essential BI
- Study analytics architectures that work, why and how they work, and where they’re going from here

See publication

Tags: Analytics, Big Data, IoT

Unifying Analytics: Architecting Production IOT Analytics
Pulsar Summit
January 24, 2020
Analyzing Internet of Things data has broad applications in a variety of industries from smart buildings to smart farming, from network optimization for telecoms to preventative maintenance on expensive medical machines or factory robots. When you look at technology and data engineering choices, even in companies with wildly different use cases and requirements, you see something surprising: Successful production IoT architectures show a remarkable number of similarities.
Join us as we drill into the data architectures in a selection of companies like Philips, Anritsu, and Optimal+. Each company, regardless of industry or use case, has one thing in common: highly successful IoT analytics programs in large scale enterprise production deployments.
By comparing the architectures of these companies, you’ll see the commonalities, and gain a deep understanding of why certain architectural choices make sense in a variety of IoT applications.

See publication

Tags: Analytics, Big Data, IoT

5 Webinars
Unlock the Value in Data: Rise of Hybrid Cloud, Multi-Cloud Platforms
Vertica
April 21, 2022
Being limited to analyzing data on-premises is a known problem. But analytics limited to cloud, or just a single cloud vendor, can also reduce the return on your data investment. To unlock the value of data, companies must embrace the reality of a hybrid world.

In this webcast, we’ll dive into solid Eckerson Group research on how companies across industries are getting their arms around data in multiple clouds and on-prem systems. ThinkData Works is an example of a successful technology company at the center of this important trend. We invite you to learn how ThinkData Works is helping customers pull in new sources and manage external data at scale to reduce risk, boost efficiency, and drive innovation.

See publication

Tags: Analytics, Big Data, Cloud

Cloud Without Compromises: Crucial Analytical Data Platform Requirements
Vertica
March 15, 2022
Most organizations are moving their analytical data platforms – whether based on data warehouses, data lakes, or both — into the cloud. But how do you choose the right platform to fit your organizational realities, your technology strategy and direction, and important product requirements? What are the compromises in choosing a platform that is only available as a cloud service or only available in one cloud? And what are the capabilities you should look for beyond support for business intelligence and analytics, particularly when it comes to supporting machine learning and data science?

Join Doug Henschen, VP and principal analyst at Constellation Research, and author of “What to Consider When Choosing a Cloud-Centric Analytical Data Platform,” for this informative web event on March 10 at 8 am PT/11 am ET. He’ll be joined by Paige Roberts, Open Source Relations Manager at Vertica, and by Bert Corderman, Senior Manager of Engineering at The Trade Desk.

See publication

Tags: Analytics, Big Data, Cloud

Find the Balance Between MPP Databases and Spark for Analytical Processing
Vertica
August 25, 2021
Both Apache Spark and massively parallel processing (MPP) databases are designed for the demands of analytical workloads. Each has strengths related to the full data science workflow, from consolidating data from many siloes, to deploying and managing machine learning models. Understanding the power of each technology, and the cost and performance trade-offs between them can help you optimize your analytics architecture to get the best of both. Learn when using Spark accelerates data processing, and when it spreads far beyond what you want to maintain. Learn when an MPP database can provide blazing fast analytics, and when it can fail to meet your needs. Most of all, learn how these two powerful technologies can combine to create a perfect balance of power, cost, and performance.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Thought Leadership: Modernize Data Warehousing – Beyond Performance
Vertica
March 15, 2021
Configuration, management, tuning and other tasks can take away from valuable time spent on business analytics. If a platform leads to coding workarounds, non-intuitive implementations and other problems, it can make a big impact on long-term resource usage and cost. A lot of enterprise analytics platform evaluations focus on query price-performance to the exclusion of other features that can have a huge impact on business value, and can cause major headaches if you don’t take them into consideration.

In this webinar, we’ll go beyond price-performance, and focus on everything else needed to modernize your data warehouse.

See publication

Tags: Analytics, Big Data, Predictive Analytics

Natural Language Processing Augmented Analytics
Vertica
November 17, 2020
The goal of data analytics, whether business intelligence or advanced analytics like machine learning has always been to guide organizations with solid data, rather than feelings. While every company strives to be data-driven, this requires making analytics accessible to more people. What could be more accessible than asking your data a question in your own language? Tune in to learn about natural language processing, the challenges and benefits of this exciting technology, and how it can democratize data analytics, and bring business results to the next level.

See publication

Tags: AI, Analytics, Predictive Analytics

Blog

Opportunities

Contact Paige Roberts

Book Paige Roberts for Speaking

Book a Meeting

Media Kit

Share Profile

Contact Info

  Profile

Paige Roberts