Hadoop is an open-source framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
The project includes these modules:
Hadoop Common: The common utilities that support the other Hadoop modules. Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data. Hadoop YARN: A framework for job scheduling and cluster resource management. Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
The Hive data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. …
Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
You can download Spark from here, or follow the below instructions to download and install Spark2.2.1.
Chef is a systems and cloud infrastructure automation framework that makes it easy to deploy servers and applications to any physical, virtual, or cloud location, no matter the size of the infrastructure. Each organization is comprised of one (or more) workstations, a single server, and every node that will be configured and maintained by the chef-client. Cookbooks (and recipes) are used to tell the chef-client how each node in your organization should be configured. The chef-client (which is installed on every node) does the actual configuration.
OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. If you’ve used Amazon Web Service, OpenStack is just a bit like AWS.
In order to improve efficiency and elasticity of EC2 instance, we need to use a database that is installed on a separate machine. A good solution is using AWS RDS. Before we create a database RDS instance, we need to make a dump of the existing database on our WordPress server.
I change my blog from google blogger to wordpress recently. First of all, I got an Amazon Web Service account, then I created 2 instances(EC2 and RDS), finally, I installed wordpress on EC2. I will list instructions in the following steps.