Azure: Announcing New Real-time Data Streaming and Data Factory Services

Spread the love
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

The last three weeks have been busy ones for Azure.  Two weeks ago we Once created, events can be sent to an Event Hub with either a strongly-typed API (e.g. .NET or Java client library) or by just sending a raw HTTP or AMQP message to the service.  Below is a simple example of how easy it is to log an IoT event to an Event Hub using just a standard HTTP post request.  Notice the Authorization header in the HTTP post – you can use this to optionally enable flexible authentication/authorization for your devices: POST https://your-namespace.servicebus.windows.net/your-event-hub/messages?timeout=60&api-version=2014-01 HTTP/1.1 Authorization: SharedAccessSignature sr=your-namespace.servicebus.windows.net&sig=tYu8qdH563Pc96Lky0SFs5PhbGnljF7mLYQwCZmk9M0%3d&se=1403736877&skn=RootManageSharedAccessKey ContentType: application/atom+xml;type=entry;charset=utf-8 Host: your-namespace.servicebus.windows.net Content-Length: 42 Expect: 100-continue   { “DeviceId”:”dev-01″, “Temperature”:”37.0″ } Your Event Hub can collect up to millions of messages per second like this, each storing whatever data schema you want within them, and the Event Hubs service will store them in-order for you to later read/consume. Downstream Event Processing Once you collect events, you no doubt want to do something with them.  Event Hubs includes an intelligent processing agent that allows for automatic partition management and load distribution across readers.  You can implement any logic you want within readers, and the data sent to the readers is delivered in the order it was sent to the Event Hub. In addition to supporting the ability for you to write custom Event Readers, we also have two easy ways to work with pre-built stream processing systems: including our new Azure Stream Analytics Service and Apache Storm.  Our new Azure Stream Analytics service supports doing stream processing directly from Event Hubs, and Microsoft has created an Event Hubs Storm Spout for use with Apache Storm clusters. The below diagram helps express some of the many rich ways you can use Event Hubs to collect and then hand-off events/data for processing: Event Hubs provides a super flexible and cost effective building-block that you can use to collect and process any events or data you can stream to the cloud.  It is very cost effective, and provides the scalability you need to meet any needs. Learning More about Event Hubs For more information about Azure Event Hubs, please review the following resources: Setup Streaming Data Input Once created, your first step will be to add a Streaming Data Input.  This allows you to indicate where the data you want to perform stream processing on is coming from.  From within the portal you can choose Inputs->Add An Input to launch a wizard that enables you to specify this: We can use the Azure Event Hub Service to deliver us a stream of data to perform processing on. If you already have an Event Hub created, you can choose it from a list populated in the wizard above.  You will also be asked to specify the format that is being used to serialize incoming event in the Event Hub (e.g. JSON, CSV or Avro formats). Setup Output Location The next step to developing our Stream Analytics job is to add a Streaming Output Location.  This will configure where we want the output results of our stream processing pipeline to go.  We can choose to easily output the results to Blob Storage, another Event Hub, or a SQL Database: Note that being able to use another Event Hub as a target provides a powerful way to connect multiple streams into an overall pipeline with multiple steps. Write Streaming Queries Now that we have our input and output sources configured, we can now write SQL queries to transform, aggregate and/or correlate the incoming input (or set of inputs in the event of multiple input sources) and output them to our output target.  We can do this within the portal by selecting the QUERY tab at the top. There are a number of interesting queries you can write to processing the incoming stream of data.  For example, in the previous Event Hub section of this blog post I showed how you can use an HTTP POST command to submit JSON based temperature data from an IoT device to an Event Hub with data in JSON format like so: { “DeviceId”:”dev-01″, “Temperature”:”37.0″ } When multiple devices are streaming events simultaneously into our Event Hub like this, it would feed into our Stream Analytics job as a stream of continuous data events that look like the sequence below: Wouldn’t it be interesting to be able to analyze this data using a time-window perspective instead?  For example, it would be useful to calculate in real-time what the average temperature of each device was in the last 5 seconds of multiple readings. With the Stream Analytics Service we can now dynamically calculate this over our incoming live stream of data just by writing a SQL query like so: SELECT DateAdd(second,-5,System.TimeStamp) as WinStartTime, system.TimeStamp as WinEndTime, DeviceId, Avg(Temperature) as AvgTemperature, Count(*) as EventCount
FROM input
GROUP BY TumblingWindow(second, 5), DeviceId

Running this query in our Stream Analytics job will aggregate/transform our incoming stream of data events and output data like below into the output source we configured for our job (e,g, a blog storage file or a SQL Database):

The great thing about this approach is that the data is being aggregated/transformed in real time as events are being streamed to us, and it scales to handle literally gigabytes of data event streamed per second.
Scaling your Stream Analytics Job
Once defined, you can easily monitor the activity of your Stream Analytics Jobs in the Azure Portal:

You can use the SCALE tab to dynamically increase or decrease scale capacity for your stream processing – allowing you to pay only for the compute capacity you need, and enabling you to handle jobs with gigabytes/sec of streamed data. 
Learning More about Stream Analytics Service
For more information about Stream Analytics, please review the following resources:
Stream Analytics Home Page
Blog Post Announcing Stream Analytics Preview
Getting Started with Stream Analytics Tutorial
Stream Analytics DocumentationData Factory: Fully managed service to build and manage information production pipelines
Organizations are increasingly looking to fully leverage all of the data available to their business.  As they do so, the data processing landscape is becoming more diverse than ever before – data is being processed across geographic locations, on-premises and cloud, across a wide variety of data types and sources (SQL, NoSQL, Hadoop, etc), and the volume of data needing to be processed is increasing exponentially. Developers today are often left writing large amounts of custom logic to deliver an information production system that can manage and co-ordinate all of this data and processing work.
To help make this process simpler, I’m excited to announce the preview of our new Azure Data Factory service – a fully managed service that makes it easy to compose data storage, processing, and data movement services into streamlined, scalable & reliable data production pipelines. Once a pipeline is deployed, Data Factory enables easy monitoring and management of it, greatly reducing operational costs.  Easy to Get Started
The Azure Data Factory is a fully managed service. Getting started with Data Factory is simple. With a few clicks in the
Orchestrating Information Production Pipelines across multiple data sources
Data Factory makes it easy to coordinate and manage data sources from a variety of locations – including ones both in the cloud and on-premises.  Support for working with data on-premises inside SQL Server, as well as Azure Blob, Tables, HDInsight Hadoop systems and SQL Databases is included in this week’s preview release.  Access to on-premises data is supported through a data management gateway that allows for easy configuration and management of secure connections to your on-premises SQL Servers.  Data Factory balances the scale & agility provided by the cloud, Hadoop and non-relational platforms, with the management & monitoring that enterprise systems require to enable information production in a hybrid environment.
Custom Data Processing Activities using Hive, Pig and C#
This week’s preview enables data processing using Hive, Pig and custom C# code activities.  Data Factory activities can be used to clean data, anonymize/mask critical data fields, and transform the data in a wide variety of complex ways.
The Hive and Pig activities can be run on an HDInsight cluster you create, or alternatively you can allow Data Factory to fully manage the Hadoop cluster lifecycle on your behalf.  Simply author your activities, combine them into a pipeline, set an execution schedule and you’re done – no manual Hadoop cluster setup or management required. 
Built-in Information Production Monitoring and Dashboarding
Data Factory also offers an up-to-the moment monitoring dashboard, which means you can deploy your data pipelines and immediately begin to view them as part of your monitoring dashboard.  Once you have created and deployed pipelines to your Data Factory you can quickly assess end-to-end data pipeline health, pinpoint issues, and take corrective action as needed.
Within the Azure Preview Portal, you get a visual layout of all of your pipelines and data inputs and outputs. You can see all the relationships and dependencies of your data pipelines across all of your sources so you always know where data is coming from and where it is going at a glance. We also provide you with a historical accounting of job execution, data production status, and system health in a single monitoring dashboard:

Learning More about Stream Analytics Service
For more information about Data Factory, please review the following resources:
Getting Started Tutorial for Data Factory
More Data Factory Tutorials
Install the PowerShell SDK
Access our code sample repositoryOther Great Data Improvements
Today’s releases make it even easier for customers to stream, process and manage the movement of data in the cloud.  Over the last few months we’ve released a bunch of other great data updates as well that make Azure a great platform to perform any data needs.  Since August:  We released a major update of our SQL Database service, which is a relational database as a service offering.  The new SQL DB editions (Basic/Standard/Premium ) support a 99.99% SLA, larger database sizes, dedicated performance guarantees, point-in-time recovery, new auditing features, and the ability to easily setup active geo-DR support. 
We released a preview of our new DocumentDB service, which is a fully-managed, highly-scalable, NoSQL Document Database service that supports saving and querying JSON based data.  It enables you to linearly scale your document store and scale to any application size.  Microsoft MSN portal recently was rewritten to use it – and stores more than 20TB of data within it.
We released our new Redis Cache service, which is a secure/dedicated Redis cache offering, managed as a service by Microsoft.  Redis is a popular open-source solution that enables high-performance data types, and our Redis Cache service enables you to standup an in-memory cache that can make the performance of any application much faster.
We released major updates to our HDInsight Hadoop service, which is a 100% Apache Hadoop-based service in the cloud. We have also added built-in support for using two popular frameworks in the Hadoop ecosystem: Apache HBase and Apache Storm.
We released a preview of our new Search-As-A-Service offering, which provides a managed search offering based on ElasticSearch that you can easily integrate into any Web or Mobile Application.  It enables you to build search experiences over any data your application uses (including data in SQLDB, DocDB, Hadoop and more).
And we have released a preview of our Machine Learning service, which provides a powerful cloud-based predictive analytics service.  It is designed for both new and experienced data scientists, includes 100s of algorithms from both the open source world and Microsoft Research, and supports writing ML solutions using the popular R open-source language.
You’ll continue to see major data improvements in the months ahead – we have an exciting roadmap of improvements ahead.
Summary
Today’s Microsoft Azure release enables some great new data scenarios, and makes building applications that work with data in the cloud even easier.
If you don’t already have a Azure account, you can sign-up for a free trial and start using all of the above features today.  Then visit the Microsoft Azure Developer Center to learn more about how to build apps with it.
Hope this helps,
Scott
P.S. In addition to blogging, I am also now using Twitter for quick updates and to share links. Follow me at: twitter.com/scottgu

X ITM Cloud News

Ana

Next Post

Azure: Machine Learning Service, Hadoop Storm, Cluster Scaling, Linux Support, Site Recovery and More

Mon Nov 25 , 2019
Spread the love          Today we released a number of great enhancements to Microsoft Azure. These include: Machine Learning: General Availability of the Azure Machine Learning Service Hadoop: General Availability of Apache Storm Support, Hadoop 2.6 support, Cluster Scaling, Node Size Selection and preview of next Linux OS support Site Recovery: General […]
X- ITM

Cloud Computing – Consultancy – Development – Hosting – APIs – Legacy Systems

X-ITM Technology helps our customers across the entire enterprise technology stack with differentiated industry solutions. We modernize IT, optimize data architectures, and make everything secure, scalable and orchestrated across public, private and hybrid clouds.

This image has an empty alt attribute; its file name is x-itmdc.jpg

The enterprise technology stack includes ITO; Cloud and Security Services; Applications and Industry IP; Data, Analytics and Engineering Services; and Advisory.

Watch an animation of  X-ITM‘s Enterprise Technology Stack

We combine years of experience running mission-critical systems with the latest digital innovations to deliver better business outcomes and new levels of performance, competitiveness and experiences for our customers and their stakeholders.

X-ITM invests in three key drivers of growth: People, Customers and Operational Execution.

The company’s global scale, talent and innovation platforms serve 6,000 private and public-sector clients in 70 countries.

X-ITM’s extensive partner network helps drive collaboration and leverage technology independence. The company has established more than 200 industry-leading global Partner Network relationships, including 15 strategic partners: Amazon Web Services, AT&T, Dell Technologies, Google Cloud, HCL, HP, HPE, IBM, Micro Focus, Microsoft, Oracle, PwC, SAP, ServiceNow and VMware

.

X ITM