[Ubuntu] Install Hadoop 3.0.0 & Hive on Ubuntu 16.04

What’s Hadoop?

Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

The project includes these modules:

Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

What’s Hive?

The Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.

閱讀更多[Ubuntu] Install Hadoop 3.0.0 & Hive on Ubuntu 16.04

[Ubuntu] Install Spark2.2.1 on Ubuntu 16.04

What is Spark?

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

You can download Spark from here, or follow the below instructions to download and install Spark2.2.1.

閱讀更多[Ubuntu] Install Spark2.2.1 on Ubuntu 16.04

[Python] Use python boto3 and crontab to backup MySQL to AWS S3

I wrote a python script to backup MySQL and upload to AWS S3. I will show you these steps in the following. First of all, you need to create a user in AWS IAM.

閱讀更多[Python] Use python boto3 and crontab to backup MySQL to AWS S3

[Ubuntu] Using Amazon RDS with WordPress

In order to improve efficiency and elasticity of EC2 instance, we need to use a database that is installed on a separate machine. A good solution is using AWS RDS. Before we create a database RDS instance, we need to make a dump of the existing database on our WordPress server.

In the following, I will list these instructions.

閱讀更多[Ubuntu] Using Amazon RDS with WordPress

[Ubuntu] Install WordPress on AWS EC2 Ubuntu 14.04

I change my blog from google blogger to wordpress recently. First of all, I got an Amazon Web Service account, then I created 2 instances(EC2 and RDS), finally, I installed wordpress on EC2. I will list instructions in the following steps.

1. Create AWS EC2 instance
2. Install Linux, Apache, MySQL, PHP (LAMP) stack on Ubuntu 14.04
3. Install WordPress on Ubuntu 14.04
4. Setup FTP server for wordpress
5. (Option) Create AWS RDS instance and connect to WordPress

閱讀更多[Ubuntu] Install WordPress on AWS EC2 Ubuntu 14.04