In one of our crowd source based iOT app that is targeting potentially more than 100,000 devices to begin with, we ended up using Alibaba MaxCompute. As per our analysis it was cost effectively and efficiently seemed capable of computations on these massive datasets.
The app in question was a hybrid mobile application and that would be quickly reaching a count of few million devices. It is supposed to feed in temperature, humidity and traffic related data. We all know how much data such devices can generate these days.
The team in this case having a budget constraints/targets and while also wanting the advantage of latest technology and a robust platform. With many existing cloud providers falling short on our cost and technology factors we turned to MaxCompute from Alibaba.
MaxCompute can be used to analyse massive amounts of batch structured data easily. Over the period of time many enterprises have adapted and stored GBs/PBs of data which can’t be easily computed upon easily with traditional solutions. It is also used for warehouses with massive storage needs and provides easy tools for setting up an effective large scale analytical modelling service, on top of this data. And above all we found it more cost effective than others.
Let’s see the Benefits of MaxCompute:
- Large-scale computing and storage
- Multiple computation models
- Strong data security
For single server based big data analysis the immediate limitations are pretty clear and does not warrant a detailed discussion here. Data analysts try to come up with their distributed solutions. However these solutions can be cost intensive and complex to design and maintain. With MaxCompute we have the advantage of ready-made solution without any upfront cost and worries of distributed computing.
MaxCompute doesn’t have complex jargon when it comes to storing such data let’s have a quick look at some basic terms while we try to Map the normal Database terms with MaxCompute:
- Database -> Project [MaxCompute]
- Table -> Table [MaxCompute]
- Partition -> Partition [MaxCompute]
|TINYINT||Yes||1Y，-127Y||8-bit signed integer, range -128 to 127|
|SMALLINT||Yes||32767S, -100S||16-bit signed integer, range -32768 to 32767|
|32-bit signed integer-
231 to 231 – 1
|BIGINT||No||100000000000L, -1L||64-bit signed integer, range -263 + 1 to 263
|FLOAT||Yes||None||32-bit binary floating point|
|DOUBLE||No||3.1415926 1E+7||64-bit binary floating point|
|10-in-order exact numeric type, Plastic Part range-1036 + 1 to 1036-1, fractional portion accurate to 10
|VARCHAR||Yes||None (Note2)||Variable-length character type, n is the length, and the range is 1 to 65535.|
|STRING||No||“abc”,’bcd’,”alibaba” ‘ inc’ (Note3)||A single string length can be up to 8M|
|BINARY||Yes||None||Binary data type, a single string length can be up to 8M|
~ 9999-12-31 23:59:59
|, Date type, use UTC+ 8 as the standard time system|
|It is independent of the time zone and ranges from January 1st 0000 to December 31, 9999 23.59:59.999999999
, and is accurate to nanosecond-level.
|BOOLEAN||No||TRUE，FALSE||True/False, Boolean type|
For computing and analysis tasks MaxCompute provides multiple computing models:
- SQL – as data is stored in tables, SQL queries are available at your disposal. There is a slight learning curve involved here with the absence of Transactions, Index and Update/Delete in MaxCompute SQL. We will see in a later article why it is so and what is the way around.
Job scheduling is also support.
- UDF – user-defined functions – these are apart from the numerous built in functions.
- MapReduce – A basic understanding of concepts of Java MapReduce programming model should be enough to get you started. Java API are provided.
- Graph – it is a processing framework that implements Iterative Graph based computing.
Another important thing to note is that there are two kinds of tables in MaxCompute:
- Internal – Stored in MaxCompute
- External – Stored on Object Storage Service [OSS] or Table Store [OTS]
We will delve deeper in OSS and OTS in other articles.