当前位置：K88软件开发 → 文章中心 → 编程语言 → SQL → Spark → 文章内容

Spark SQL JSON数据集

减小字体

增大字体作者：佚名来源：网上搜集发布时间：2019-1-19 4:50:47

由 ligaihe 创建，路飞最后一次修改 2016-02-24 Spark SQL JSON数据集Spark SQL能够自动推断JSON数据集的模式，加载它为一个SchemaRDD。这种转换可以通过下面两种方法来实现jsonFile ：从一个包含JSON文件的目录中加载。文件中的每一行是一个JSON对象jsonRDD ：从存在的RDD加载数据，这些RDD的每个元素是一个包含JSON对象的字符串注意，作为jsonFile的文件不是一个典型的JSON文件，每行必须是独立的并且包含一个有效的JSON对象。结果是，一个多行的JSON文件经常会失败// sc is an existing SparkContext.val sqlContext = new org.apache.spark.sql.SQLContext(sc)// A JSON dataset is pointed to by path.// The path can be either a single text file or a directory storing text files.val path = "examples/src/main/resources/people.json"// Create a SchemaRDD from the file(s) pointed to by pathval people = sqlContext.jsonFile(path)// The inferred schema can be visualized using the printSchema() method.people.printSchema()// root// |-- age: integer (nullable = true)// |-- name: string (nullable = true)// Register this SchemaRDD as a table.people.registerTempTable("people")// SQL statements can be run by using the sql methods provided by sqlContext.val teenagers = sqlContext.sql("SELECT name FROM people WHERE age >= 13 AND age <= 19")// Alternatively, a SchemaRDD can be created for a JSON dataset represented by// an RDD[String] storing one JSON object per string.val anotherPeopleRDD = sc.parallelize( """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil)val anotherPeople = sqlContext.jsonRDD(anotherPeopleRDD)

Spark SQL JSON数据集

[] [返回上一页] [打印]

·上一篇文章：Spark SQL parquet文件
·下一篇文章：Spark SQL Hive表

Spark SQL JSON数据集

文章评论评论内容只代表网友观点，与本站立场无关！

频道栏目导航

本类热门阅览

相关文章