100% Guaranteed Results


CSP554 – Creating an EMR Cluster Solved
$ 20.99
Category:

Description

5/5 – (1 vote)

Running TestDataGen.Class
Magic Number: 123236

Moving the files created using testDatagen.class to newly created Directory csp554 using copyFropmLocal command

Exercise 1:

Loading the foodratings.txt file to foodratings DataFrame

Command Used:

from pyspark.sql.types import *

tab1=StructType().add(“name”,StringType(),True).add(“food1”,IntegerType(),True).add(“food2”,Integ erType(),True).add(“food3”,IntegerType(),True).add(“food4”,IntegerType(),True).add(“Placeid”,Intege rType(),True)

foodratings=spark.read.schema(tab1).csv(‘hdfs:///user/csp554/foodratings.txt’)

Showing top 5 Rows
Command Used: foodratings.show(5)

Exercise 2:

Loading the foodplaces.txt file to foodplaces DataFrame

Command Used:
tab2=StructType().add(“placeid”,IntegerType(),True).add(“Placename”,StringType(),True)

foodplaces=spark.read.schema(tab2).csv(‘hdfs:///user/csp554/foodplaces.txt’)

foodplaces.printSchema()

Showing top 5 Rows
Command Used: foodplaces.show(5)

Exercise 3:
a) Creating a table using the below command
Command used:
foodratings.createOrReplaceTempView(“foodratingsT”) foodplaces.createOrReplaceTempView(“foodplacesT”)

b) Creating a new table from the fooodratingsT created at above step
Command Used: foodratings_ex3a=spark.sql(“select * from foodratingsT where food2<25 and food4>40”)

Showing top 5 Rows
Command Used: foodpratings_ex3a.show(5)

c) Creating a new table from the fooodplacesT created at above step
Command Used:
foodplaces_ex3b=spark.sql(“select * from foodplacesT where placeid> 3”) foodplaces_ex3b.printSchema()

Showing top 5 Rows
Command Used: foodplaces.show(5)

Exercise 4:
Creating a new DataFrame using the below command
Command Used:
foodratings_ex4=foodratings.filter((foodratings.name==’Mel’) & (foodratings.food3<25))

foodratings_ex4.printSchema()

Showing top 5 Rows
Command Used: foodratings_ex4.show(5)

Exercise 5: Creating a new DataFrame using columns name and PlaceId
Command Used:
foodratings_ex5=foodratings.select((foodratings.name),(foodratings.Placeid))

Showing top 5 Rows
Command Used: foodratings_ex5.show(5)

Exercise 6: Creating a new Dataframe using below command
Command Used:
ex6=foodratings.join(foodplaces,foodratings.Placeid==foodplaces.placeid,’inner’).drop(foodplaces.pla ceid) ex6.printSchema()

Showing top 5 Rows
Command Used: ex6.show(5)

Reviews

There are no reviews yet.

Be the first to review “CSP554 – Creating an EMR Cluster Solved”

Your email address will not be published. Required fields are marked *

Related products