Alibaba Cloud: MaxCompute at a glance

Introduction

In one of our crowd source based iOT app that is targeting potentially more than 100,000 devices to begin with, we ended up using Alibaba MaxCompute. As per our analysis it was cost effectively and efficiently seemed capable of computations on these massive datasets.

The app in question was a hybrid mobile application and that would be quickly reaching a count of few million devices. It is supposed to feed in temperature, humidity and traffic related data. We all know how much data such devices can generate these days.

The team in this case having a budget constraints/targets and while also wanting the advantage of latest technology and a robust platform. With many existing cloud providers falling short on our cost and technology factors we turned to MaxCompute from Alibaba.

MaxCompute can be used to analyse massive amounts of batch structured data easily. Over the period of time many enterprises have adapted and stored GBs/PBs of data which can’t be easily computed upon easily with traditional solutions. It is also used for warehouses with massive storage needs and provides easy tools for setting up an effective large scale analytical modelling service, on top of this data. And above all we found it more cost effective than others.

Benefits

Let’s see the Benefits of MaxCompute:

  1. Large-scale computing and storage
  2. Multiple computation models
  3. Strong data security
  4. Low-cost

For single server based big data analysis the immediate limitations are pretty clear and does not warrant a detailed discussion here. Data analysts try to come up with their distributed solutions. However these solutions can be cost intensive and complex to design and maintain. With MaxCompute we have the advantage of ready-made solution without any upfront cost and worries of distributed computing.

MaxCompute doesn’t have complex jargon when it comes to storing such data let’s have a quick look at some basic terms while we try to Map the normal Database terms with MaxCompute:

 Main Structures

  1. Database -> Project [MaxCompute]
  2. Table -> Table [MaxCompute]
  3. Partition -> Partition [MaxCompute]

 

Data Types

Type New Constant Description
TINYINT Yes 1Y,-127Y 8-bit signed integer, range -128 to 127
SMALLINT Yes 32767S, -100S  16-bit signed integer, range -32768 to 32767
INT Yes 1000,-15645787 (

Note1)

32-bit signed integer-

231 to 231 – 1

BIGINT No 100000000000L, -1L 64-bit signed integer, range -263 + 1 to 263

– 1

FLOAT Yes None 32-bit binary floating point
DOUBLE No 3.1415926 1E+7 64-bit binary floating point
DECIMAL No 3.5BD, 99999999999

.9999999BD

10-in-order exact numeric type, Plastic  Part range-1036 + 1  to 1036-1, fractional portion accurate to 10

-18

VARCHAR Yes None (Note2) Variable-length character type, n is the  length, and the range is 1 to 65535.
STRING No “abc”,’bcd’,”alibaba” ‘ inc’ (Note3) A single string length can be up to 8M
BINARY Yes None Binary data type, a single string length can be up to 8M
DATETIME No DATETIME ‘2017-11-

11 00:00:00’

0001-01-01 00:00:00

~ 9999-12-31 23:59:59

Type New Constant Description
      , Date type, use UTC+ 8 as the standard time  system
TIMESTAMP Yes TIMESTAMP ‘2017

-11-11 00:00:00.

123456789’

It is independent of the  time zone and ranges from January 1st 0000  to December 31, 9999  23.59:59.999999999

,  and is accurate to nanosecond-level.

BOOLEAN No TRUE,FALSE True/False, Boolean type

 

For computing and analysis tasks MaxCompute provides multiple computing models:

  1. SQL – as data is stored in tables, SQL queries are available at your disposal. There is a slight learning curve involved here with the absence of Transactions, Index and Update/Delete in MaxCompute SQL. We will see in a later article why it is so and what is the way around.

    Job scheduling is also support.

  2. UDF – user-defined functions – these are apart from the numerous built in functions.
  3. MapReduce – A basic understanding of concepts of Java MapReduce programming model should be enough to get you started. Java API are provided.
  4. Graph – it is a processing framework that implements Iterative Graph based computing.

Another important thing to note is that there are two kinds of tables in MaxCompute:

  1. Internal – Stored in MaxCompute
  2. External – Stored on Object Storage Service [OSS] or Table Store [OTS]

We will delve deeper in OSS and OTS in other articles.

Leave A Reply