Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.
You can download Spark from here, or follow the below command to download and install Spark1.6.0.
Chef is a systems and cloud infrastructure automation framework that makes it easy to deploy servers and applications to any physical, virtual, or cloud location, no matter the size of the infrastructure. Each organization is comprised of one (or more) workstations, a single server, and every node that will be configured and maintained by the chef-client. Cookbooks (and recipes) are used to tell the chef-client how each node in your organization should be configured. The chef-client (which is installed on every node) does the actual configuration.
OpenStack is a cloud operating system that controls large pools of compute, storage, and networking resources throughout a datacenter, managed through a dashboard or via the OpenStack API. If you’ve used Amazon Web Service, OpenStack is just a bit like AWS.
About PostgreSQL is a powerful, open source object-relational database system. It has more than 15 years of active development and a proven architecture that has earned it a strong reputation for reliability, data integrity, and correctness. It runs on all major operating systems, including Linux, UNIX (AIX, BSD, HP-UX, SGI IRIX, Mac OS X, Solaris, Tru64), and Windows.
Jaspersoft Studio is an Eclipse-based report designer for JasperReports Library and JasperReports Server; it’s available as an Eclipse plug-in or as a stand-alone application. Jaspersoft Studio allows you to create sophisticated layouts containing charts, images, subreports, crosstabs, and more. You can access your data through a variety of sources including JDBC, TableModels, JavaBeans, XML, Hibernate, Big Data (such as Hive), CSV, XML/A, as well as custom sources, then publish your reports as PDF, RTF, XML, XLS, CSV, HTML, XHTML, text, DOCX, or OpenOffice. But my focus is on how to create a simple report and simple code.
In order to improve efficiency and elasticity of EC2 instance, we need to use a database that is installed on a separate machine. A good solution is using AWS RDS. Before we create a database RDS instance, we need to make a dump of the existing database on our WordPress server.