HDFS – Hadoop Distributed File System

HDFS definition HDFS (Hadoop Distributed File System) is Hadoop’s primary file storage System. Works well with large volumes of data, reduces I/O, high scalability, and availability and fault tolerance due to data replication. The Hadoop file system is typically...

Apache Spark Components

Components Spark Core Spark core is the core where all the architecture is supported, provides: Distributing tasks Programming Input/output operations Using Java, Python, Scala and R programming interfaces focused on RDDs’s abstraction. It establishes a...

Virtual Environment in Python

Define virtual Environment from command line > python -m venv develop_virtual_enviroment Activate in Environment > ..\develop_virtual_enviroment\Scripts\activate.bat (for Windows) > ..\develop_virtual_enviroment\bin\activate.bat (for Linux)   Disable the...