Navigating the Challenges of Mission-Critical AI With DDN

When NASA needed to land a probe on Mars, the landing sequence was the riskiest part of the multi-year, multi-billion-dollar mission-critical program. Prior to launch, NASA simulated the process under all potential weather and environmental conditions. Leaving nothing to chance, they used DDN storage solutions to fuel the artificial intelligence (AI) and advanced machine learning applications. After the successful landing on Mars, DDN was further entrusted with the data that made the 91-million-mile journey back to earth for analysis and other simulations.

Whether for ensuring the success of missions on Mars, investigating the impacts of coastal erosions, or other more terrestrial-based missions, DDN’s AI-optimized storage solution is at the heart of many critical Federal programs.

MeriTalk recently sat down with Rob Genkinger, vice president of program and strategy at DDN, for an in-depth discussion on the importance of mission-critical storage infrastructure decisions in successfully deploying and growing AI projects.

MeriTalk: What are the biggest drivers for AI in the Federal government today?

Rob Genkinger: Federal agencies are pushing to adopt AI technology for lots of different reasons.

Agencies are realizing they can achieve and enhance their missions by using a data-centric approach to tackle ever more complex problems, adopting AI techniques allows them to accelerate analysis and develop more accurate, data-driven insights to support strategic decision-making. With commercial organizations driving AI innovation, Federal agencies can and should leverage those technologies and bring strategic investment to ensure America’s AI leadership position.

Federal AI programs, supported by the recent bipartisan appropriations bill, which includes a further $1 billion in funding, will help fund investments in AI and machine learning. DDN is partnering with Federal agencies on defense, intelligence, energy, public health, and environmental programs to ensure that they can meet their data security and data management goals for the modernization of high-performance computing, physics research, and AI initiatives.

MeriTalk: What are the foundational elements of an AI-ready infrastructure?

Genkinger: The three foundational elements for a successful AI project include infrastructure (compute, networks, software, and storage), data, and people. You need all three, and they all need to be ready to do their job in order to have a successful AI project.

We are seeing the technologies and the processes maturing side by side. Early AI programs were built as monolithic, specialized systems, but increasingly, agencies need to build systems that can become more agile and evolve and expand to support more projects.

At DDN, we believe that a fully integrated AI platform, with full orchestration of compute, network, and storage, is the best way to support these evolving disciplines, to provide the tools for continuous delivery and optimization of AI services, and to allow users to innovate and focus on mission-critical outcomes.

MeriTalk: What should agencies that are starting to modernize legacy technology keep in mind as they work through their AI roadmap?

Genkinger: There are a few primary considerations. A common statistic is that nine out of 10 AI projects fail to reach production for one reason or another. We find that it is essential to set clear mission goals and objectives for AI programs.

We also find that data can overwhelm AI systems when they scale into production, so we recommend that agencies adopt a data-first strategy to plan for how they will collect, process, and archive that data at scale, ensuring AI programs can accelerate and achieve their objectives.

A common question is whether to host in the cloud or in an on-premises facility. Certain workloads will thrive in cloud-based systems. In other cases, the amount of data, and data throughput, may overwhelm the interconnect resources available in the cloud, and in those cases, an on-premises platform is going to be more effective in achieving mission goals. Performance and latency can be additional concerns when operating in cloud-based environments. Certain data-intensive workloads will demand an on-premises solution.

Agencies should draw on validated reference architectures to simplify integration and leverage experience and expertise from their technology partners. By collaborating and sharing experiences and best practices, agencies can build an AI Center of Excellence as a focus for expertise and resources.

By building an AI practice, agencies can learn and share with other experts who have seen projects from concept to architecture to execution and focus on data-engineering aspects of the mission to optimize data collection, processing, and governance across the end-to-end data lifecycle.

MeriTalk: What is the data lifecycle in an AI environment, and how does infrastructure affect the lifecycle?

Genkinger: The AI data lifecycle starts with data acquisition. There’s a ton of data out there, some usable, some not. It’s important to pick data that matters to give yourself a reasonable shot at solving your problem.

Next, you’re preparing the data. Once you’ve pulled the data, you’ve got to label it, clean it, and transform it. Then, you train it, score it, and then use the model to apply to live data to develop insights, recommendations and deliver outcomes.

It’s a continuous improvement process and requires a lot of volume and throughput, so the data must be running nonstop. The final step is storing the data for review, archive, and analysis. Properly designed infrastructure will not only support these operational aspects, but also allow for collaboration, cooperation, and continuity. With DDN, customers minimize data movement between each of these stages – which can be very costly in both time and money – by deploying a central data repository.

MeriTalk: What common missteps do you see in AI planning, and how can they be overcome?

Genkinger: We often see a pattern where an agency invests in a strategic AI program and starts using traditional IT infrastructure or the cloud to get started quickly. They skip the planning and design, hoping to see quick results.

Then, as their AI models grow, data management becomes more complex, and the system becomes slow and unmanageable. They try to add even more storage, but it’s not helping because the problem isn’t necessarily capacity. It’s the capacity, management, and governance of the data that has become untenable. It is harder to bring in new data, and the data scientists and analysts can’t get their job done.

The most important part of the project is the people. Data engineers, data scientists, storage engineers, and solution architects are all involved in driving the systems and data to deliver outcomes from an AI program. Few things are worse than having data scientists feel like they can’t do their job because of poor infrastructure that was simply designed for a quick start.

The easiest way to avoid these issues is to invest in time with experts. You can talk to other agencies that had successful AI deployments and get best practices and ideas, so you’re not reinventing the wheel.

MeriTalk: How do agencies balance the need to architect to solve a problem today and plan for future projects that they haven’t envisioned?

Genkinger: You can’t always predict the future, and not all existing storage solutions have the ability to support this type of growth. But with DDN, it’s easy to design an architecture to be consumable and composable, doing the small things first to explore the problem space, and then scale out as you gain confidence.

We have built a variety of solutions capable of addressing a wide range of requirements, ranging from individual systems to global-scale workloads. Many of our customers have systems that started with a couple hundred terabytes before needing to scale to petabytes, even hundreds of petabytes at full production. We encourage customers to start small, then scale later when they understand the size of the data required to deliver the outcomes they need.

MeriTalk: Let’s talk about DDN. What sets your company apart in terms of your approach to AI in the Federal government?

Genkinger: Trust. DDN is the trusted global leader in AI, with 20 years’ experience working with Federal agencies on high-performance analytics and big data management.

We’ve used that experience and invested in R&D to make data a lot easier to consume, manage, and monitor, so agencies can get started quickly with AI in a pilot program and then scale up and out dramatically – without needing specialist skills to run the infrastructure.

And with close collaborative partnerships with other AI technology providers, we bring together our seasoned experts with leading Federal systems integrators to build exascale systems for mission-critical AI.

Cookie	Duration	Description
AWSALBCORS	7 days	Amazon Web Services set this cookie for load balancing.
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie records the user consent for the cookies in the "Advertisement" category.
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
CookieLawInfoConsent	1 year	CookieYes sets this cookie to record the default button state of the corresponding category and the status of CCPA. It works only in coordination with the primary cookie.
JSESSIONID	session	New Relic uses this cookie to store a session identifier so that New Relic can monitor session counts for an application.
PHPSESSID	session	This cookie is native to PHP applications. The cookie stores and identifies a user's unique session ID to manage user sessions on the website. The cookie is a session cookie and will be deleted when all the browser windows are closed.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
_pxhd	1 year	PerimeterX sets this cookie for server-side bot detection, which helps identify malicious bots on the site.

Cookie	Duration	Description
lidc	1 day	LinkedIn sets the lidc cookie to facilitate data center selection.
li_gc	5 months 27 days	Linkedin set this cookie for storing visitor's consent regarding using cookies for non-essential purposes.
UserMatchHistory	1 month	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
__cf_bm	30 minutes	Cloudflare set the cookie to support Cloudflare Bot Management.

Cookie	Duration	Description
AWSALB	7 days	AWSALB is an application load balancer cookie set by Amazon Web Services to map the session to the target.
_gat	1 minute	Google Universal Analytics sets this cookie to restrain request rate and thus limit data collection on high-traffic sites.

Cookie	Duration	Description
AnalyticsSyncHistory	1 month	Linkedin set this cookie to store information about the time a sync took place with the lms_analytics cookie.
CONSENT	2 years	YouTube sets this cookie via embedded YouTube videos and registers anonymous statistical data.
ln_or	1 day	Linkedin sets this cookie to registers statistical data on users' behaviour on the website for internal analytics.
pardot	past	The pardot cookie is set while the visitor is logged in as a Pardot user. The cookie indicates an active session and is not used for tracking.
UID	1 year 1 month 4 days	Scorecard Research sets this cookie for browser behaviour research.
vuid	1 year 1 month 4 days	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos on the website.
_ga	1 year 1 month 4 days	Google Analytics sets this cookie to calculate visitor, session and campaign data and track site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	1 year 1 month 4 days	Google Analytics sets this cookie to store and count page views.
_gcl_au	3 months	Google Tag Manager sets the cookie to experiment advertisement efficiency of websites using their services.
_gid	1 day	Google Analytics sets this cookie to store information on how visitors use a website while also creating an analytics report of the website's performance. Some of the collected data includes the number of visitors, their source, and the pages they visit anonymously.
__gads	1 year 24 days	Google sets this cookie under the DoubleClick domain, tracks the number of times users see an advert, measures the campaign's success, and calculates its revenue. This cookie can only be read from the domain they are currently on and will not track any data while they are browsing other sites.

Cookie	Duration	Description
anj	3 months	AppNexus sets the anj cookie that contains data stating whether a cookie ID is synced with partners.
bcookie	1 year	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognize browser IDs.
bscookie	1 year	LinkedIn sets this cookie to store performed actions on the website.
GoogleAdServingTest	session	Google sets this cookie to determine what ads have been shown to the website visitor.
IDE	1 year 24 days	Google DoubleClick IDE cookies store information about how the user uses the website to present them with relevant ads according to the user profile.
li_sugr	3 months	LinkedIn sets this cookie to collect user behaviour data to optimise the website and make advertisements on the website more relevant.
muc_ads	1 year 1 month 4 days	Twitter sets this cookie to collect user behaviour and interaction data to optimize the website.
personalization_id	1 year 1 month 4 days	Twitter sets this cookie to integrate and share features for social media and also store information about how the user uses the website, for tracking and targeting.
test_cookie	15 minutes	doubleclick.net sets this cookie to determine if the user's browser supports cookies.
uuid2	3 months	The uuid2 cookie is set by AppNexus and records information that helps differentiate between devices and browsers. This information is used to pick out ads delivered by the platform and assess the ad performance and its attribute payment.
VISITOR_INFO1_LIVE	5 months 27 days	YouTube sets this cookie to measure bandwidth, determining whether the user gets the new or old player interface.
YSC	session	Youtube sets this cookie to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt-remote-device-id	never	YouTube sets this cookie to store the user's video preferences using embedded YouTube videos.
yt.innertube::nextId	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
yt.innertube::requests	never	YouTube sets this cookie to register a unique ID to store data on what videos from YouTube the user has seen.
_mkto_trk	1 year 1 month 4 days	This cookie, provided by Marketo, has information (such as a unique user ID) that is used to track the user's site usage. The cookies set by Marketo are readable only by Marketo.
__gpi	1 year 24 days	Google Ads Service uses this cookie to collect information about from multiple websites for retargeting ads.