Trifacta expands platform to deliver a data engineering cloud
Trifacta announced a major expansion to its platform to deliver the data engineering cloud.
In keeping with its mission to create radical productivity for people who work with data, Trifacta’s expanded capabilities now fully support modern data engineers, who apply software development and DevOps practices to build curated, accessible data products for advanced data insights and analytics.
With expectations for these data products rising and cycle times falling, the key to unlocking enterprise value is to enable a broader set of users to collaborate, experiment, and iterate quickly on data science and analytics projects.
“Trifacta is addressing the needs of modern data workers by providing a collaborative, cloud environment where users of all skill levels can come together to improve data quality and streamline data operations as they on-board, assess, and refine raw data,” said Trifacta CEO Adam Wilson.
“Accelerating data preparation and democratizing ETL for these users and their cloud data warehousing projects requires an enterprise-grade data engineering platform that is open, intelligent, and self-service. We’ve created the Trifacta Data Engineering Cloud to meet these needs.”
Open: A data engineering environment needs to be flexible to operate seamlessly within end-to-end analytic workflows and integrate fully into modern tool chains.
By design, the environment should be extensible enough to allow users to bring their own magic to fill in the gaps and free them from the tyranny of monolithic stacks and vendor lock-in.
An open data engineering environment sustains independence and loose coupling while fostering composability of services and the interoperability that enables users to embrace the tools that can help solve complex data problems faster.
To support this, Trifacta now offers:
- Multi-cloud support: The Trifacta data engineering cloud offers native solutions optimized for each platform: Google, AWS, Azure. Multi-cloud support means freedom of choice and freedom from code. Users can change their minds about which platform they prefer or run different workloads in different environments without re-writing any code.
- Flexible execution: Users can choose between ETL or ELT, or an optimal combination of the two based on cost. Flexible execution also means users have the freedom to generate SQL, Spark, Dataflow/Beam, or Python.
- Universal connectivity: Users can connect any application with data from more than 180 enterprise data sources—both on-premise and in the cloud—and publish refined data to spreadsheets, BI and reporting tools, as well as to data science notebooks. The connectivity of the Trifacta data engineering cloud is extended with Trifacta’s REST, XML, and JDBC frameworks.
- API-driven: The Trifacta cloud integrates with any and all tool chains. Through SDKs and OpenAPI standards available in a multitude of languages, users can integrate Trifacta into existing workflows or use Trifacta to orchestrate across third-party applications, from source control, ingestion, and replication tools to catalogs and business glossaries.
Intelligent: Modern data engineering tools should be intelligent and learn from the data itself and from user interactions to automate the most complex and time consuming parts of data cleaning and transformation, improving the user experience and accelerating data-driven innovation.
This intelligence should also apply to improving ongoing data operations, providing self-monitoring and, in some cases, automated remediation of issues that would normally disrupt data pipelines.
To support this, Trifacta now offers:
- Predictive transformation: The Trifacta data engineering cloud features a visual “guide and decide” interface that leverages machine learning to make understanding and resolving data transformation challenges issues intuitive to users of all backgrounds, regardless of their technical acumen. Predictive transformation capabilities include: automatically detecting and applying format to unstructured and semistructured data sets; using examples to infer transformation logic; synthesizing data models from source data; and auto-mapping data to a predefined target.
- Adaptive data quality: Trifacta’s active data profiling that extends beyond traditional data quality rules. Users can now more easily discover and validate data quality issues. Statistical data profiles are used to identify complex patterns, automatically suggesting possible quality rules such as integrity constraints, formatting patterns, and columns dependencies. Users are offered transformations to consider based on classifiers for probabilistic data quality rules and can more easily standardize data with support for sophisticated clustering.
- Smart data pipelines: Trifacta empowers users to model data flows while managing relationships across data sets and recipes. They can operationalize and automate data flows through plans that enable parallel and conditional execution, as well as pre- and post-processing. In addition, monitoring point-in-time and historical data quality trends provides the context for proactive alerting (via email, Slack, PagerDuty, and other platforms) to changes of schema and data distributions that may affect data’s fitness for use downstream.
Self-service: “See for yourself, help yourself” has always been Trifacta’s mantra, but this means different things to different users, depending on their technical know-how.
The Trifacta data engineering platform expands the number and variety of people who can participate in the data engineering process. It removes bottlenecks and leverages the collective wisdom of the organization to create new and interesting data products.
It provides modern data workers and analytics innovators with the tools they need to collaborate and share knowledge. It caters to less technical subject matter experts as well as to more technical developers who want to operate at differing levels of abstraction from the raw data.
To support this, Trifacta is offering:
- Low-code/no-code authoring: Whether building data engineering logic visually or from the command line with SQL, Python or Javascript, the Trifacta data engineering cloud has users covered. It supports a mix and match of both approaches based on users’ skills and needs. Users can also author popular frameworks like dbt and Google Dataform. With live feedback and continuous validation, users always have eyes on the data as they work to see the immediate impact of their work.
- Macros, templates, sharing: The Trifacta data engineering cloud features macros, shareable data flows, recipes, and templates that reduce repeated tasks, increase consistency of implementation, and enable new projects to be onboarded quickly. These reusable assets increase knowledge sharing within and across organizations, standardizing the approach to common data engineering problems.
- Knowledge sharing: Best practices, “how tos”, discussions, certifications all come together in the new Trifacta Community. The Community extends Trifacta’s commitment to self-service by empowering users to build on each other’s innovation. More than just a place to learn, this forum is where users can easily contribute their best ideas and assets, and clone and customize what other users have done in a wide range of ETL, data quality and automation scenarios. Instead of always starting from scratch, users can tap into the Trifacta Community to benefit from each other’s hard work, creative thinking, and problem-solving.
- Usage-based pricing: It’s not enough for technology to encourage democratization; its pricing and packaging must also enable broad adoption. Trifacta lets users start on its data engineering cloud for free and only pay as they see value. Convenient online credit card purchasing, and “pay-as-you-go” options bring the power of data prep, ETL, and data quality to individuals, teams, and organizations of all sizes.
Enterprise-grade: Analytic innovators increasingly relying on data engineering as mission- critical. Questions of scalability and governance must be considered up front.
To support this, Trifacta is offering:
- Unlimited scalability: The Trifacta data engineering cloud is completely serverless and completely elastic. It handles it all— from individual spreadsheets to petabytes of data, from smart samples to entire data sets.
- Built-in governance: With full audit trails and lineage, versioning and SDLC support, the Trifacta data engineering cloud tracks and manages change automatically across projects and environments.
- Support and reliability: With follow-the-sun support, high availability, and built-in redundancy, the Trifacta data engineering cloud ensures full resilience and eliminates DevOps costs. There’s nothing to deploy. An in-product “advisor” chat bot also makes Trifacta experts and expertise available at the moment they’re needed.
- Security: Authentication, authorization and encryption, combined with VPC support and a host of certifications with leading security standards (SOC2 Type 2, GDPR, etc.), ensure data is fully protected at all times.