[{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"overview","dir":"Articles","previous_headings":"","what":"Overview","title":"SparkR - Practical Guide","text":"SparkR R package provides light-weight frontend use Apache Spark R. Spark 3.3.4, SparkR provides distributed data frame implementation supports data processing operations like selection, filtering, aggregation etc. distributed machine learning using MLlib.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"getting-started","dir":"Articles","previous_headings":"","what":"Getting Started","title":"SparkR - Practical Guide","text":"begin example running local machine provide overview use SparkR: data ingestion, data processing machine learning. First, let’s load attach package. SparkSession entry point SparkR connects R program Spark cluster. can create SparkSession using sparkR.session pass options application name, Spark packages depended , etc. use default settings runs local mode. auto downloads Spark package background previous installation found. details setup, see Spark Session. operations SparkR centered around R class called SparkDataFrame. distributed collection data organized named columns, conceptually equivalent table relational database data frame R, richer optimizations hood. SparkDataFrame can constructed wide array sources : structured data files, tables Hive, external databases, existing local R data frames. example, create SparkDataFrame local R data frame, can view first rows SparkDataFrame head showDF function. Common data processing operations filter select supported SparkDataFrame. SparkR can use many common aggregation functions grouping. results carsDF carsSubDF SparkDataFrame objects. convert back R data.frame, can use collect. Caution: can cause interactive environment run memory, though, collect() fetches entire distributed DataFrame client, acting Spark driver. SparkR supports number commonly used machine learning algorithms. hood, SparkR uses MLlib train model. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models. SparkR supports subset R formula operators model fitting, including ‘~’, ‘.’, ‘:’, ‘+’, ‘-‘. use linear regression example. result matches returned R glm function applied corresponding data.frame mtcars carsDF. fact, Generalized Linear Model, specifically expose glm SparkDataFrame well equivalent model <- glm(mpg ~ wt + cyl, data = carsDF). model can saved write.ml loaded back using read.ml. end, can stop Spark Session running","code":"library(SparkR) sparkR.session() ## Java ref type org.apache.spark.sql.SparkSession id 1 cars <- cbind(model = rownames(mtcars), mtcars) carsDF <- createDataFrame(cars) head(carsDF) ## model mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4 ## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 ## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 ## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 ## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 ## 6 Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1 carsSubDF <- select(carsDF, \"model\", \"mpg\", \"hp\") carsSubDF <- filter(carsSubDF, carsSubDF$hp >= 200) head(carsSubDF) ## model mpg hp ## 1 Duster 360 14.3 245 ## 2 Cadillac Fleetwood 10.4 205 ## 3 Lincoln Continental 10.4 215 ## 4 Chrysler Imperial 14.7 230 ## 5 Camaro Z28 13.3 245 ## 6 Ford Pantera L 15.8 264 carsGPDF <- summarize(groupBy(carsDF, carsDF$gear), count = n(carsDF$gear)) head(carsGPDF) ## gear count ## 1 4 12 ## 2 3 15 ## 3 5 5 carsGP <- collect(carsGPDF) class(carsGP) ## [1] \"data.frame\" model <- spark.glm(carsDF, mpg ~ wt + cyl) summary(model) ## ## Deviance Residuals: ## (Note: These are approximate quantiles with relative error <= 0.01) ## Min 1Q Median 3Q Max ## -4.2893 -1.7085 -0.4713 1.5729 6.1004 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 39.6863 1.71498 23.1409 0.00000000 ## wt -3.1910 0.75691 -4.2158 0.00022202 ## cyl -1.5078 0.41469 -3.6360 0.00106428 ## ## (Dispersion parameter for gaussian family taken to be 6.592137) ## ## Null deviance: 1126.05 on 31 degrees of freedom ## Residual deviance: 191.17 on 29 degrees of freedom ## AIC: 156 ## ## Number of Fisher Scoring iterations: 1 write.ml(model, path = \"/HOME/tmp/mlModel/glmModel\") sparkR.session.stop()"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"installation","dir":"Articles","previous_headings":"Setup","what":"Installation","title":"SparkR - Practical Guide","text":"Different many R packages, use SparkR, need additional installation Apache Spark. Spark installation used run backend process compile execute SparkR programs. installing SparkR package, can call sparkR.session explained previous section start check Spark installation. working SparkR interactive shell (e.g. R, RStudio) Spark downloaded cached automatically found. Alternatively, provide easy--use function install.spark running manually. don’t Spark installed computer, may download Apache Spark Website. already Spark installed, don’t install can pass sparkHome argument sparkR.session let SparkR know existing Spark installation .","code":"install.spark() sparkR.session(sparkHome = \"/HOME/spark\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"SetupSparkSession","dir":"Articles","previous_headings":"Setup","what":"Spark Session","title":"SparkR - Practical Guide","text":"addition sparkHome, many options can specified sparkR.session. complete list, see Starting : SparkSession SparkR API doc. particular, following Spark driver properties can set sparkConfig. Windows users: Due different file prefixes across operating systems, avoid issue potential wrong prefix, current workaround specify spark.sql.warehouse.dir starting SparkSession.","code":"spark_warehouse_path <- file.path(path.expand('~'), \"spark-warehouse\") sparkR.session(spark.sql.warehouse.dir = spark_warehouse_path)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"cluster-mode","dir":"Articles","previous_headings":"Setup > Spark Session","what":"Cluster Mode","title":"SparkR - Practical Guide","text":"SparkR can connect remote Spark clusters. Cluster Mode Overview good introduction different Spark cluster modes. connecting SparkR remote Spark cluster, make sure Spark version Hadoop version machine match corresponding versions cluster. Current SparkR package compatible used local computer remote cluster. connect, pass URL master node sparkR.session. complete list can seen Spark Master URLs. example, connect local standalone Spark master, can call YARN cluster, SparkR supports client mode master set “yarn”. Yarn cluster mode supported current version.","code":"## [1] \"Spark 3.3.4\" sparkR.session(master = \"spark://local:7077\") sparkR.session(master = \"yarn\")"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"local-data-frame","dir":"Articles","previous_headings":"Data Import","what":"Local Data Frame","title":"SparkR - Practical Guide","text":"simplest way convert local R data frame SparkDataFrame. Specifically can use .DataFrame createDataFrame pass local R data frame create SparkDataFrame. example, following creates SparkDataFrame based using faithful dataset R.","code":"df <- as.DataFrame(faithful) head(df) ## eruptions waiting ## 1 3.600 79 ## 2 1.800 54 ## 3 3.333 74 ## 4 2.283 62 ## 5 4.533 85 ## 6 2.883 55"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"data-sources","dir":"Articles","previous_headings":"Data Import","what":"Data Sources","title":"SparkR - Practical Guide","text":"SparkR supports operating variety data sources SparkDataFrame interface. can check Spark SQL Programming Guide specific options available built-data sources. general method creating SparkDataFrame data sources read.df. method takes path file load type data source, currently active Spark Session used automatically. SparkR supports reading CSV, JSON Parquet files natively Spark Packages can find data source connectors popular file formats like Avro. packages can added sparkPackages parameter initializing SparkSession using sparkR.session. can see use data sources using example CSV input file. information please refer SparkR read.df API documentation. data sources API natively supports JSON formatted input files. Note file used typical JSON file. line file must contain separate, self-contained valid JSON object. consequence, regular multi-line JSON file often fail. Let’s take look first two lines raw JSON file used . use read.df read SparkDataFrame. SparkR automatically infers schema JSON file. want read multiple JSON files, read.json can used. data sources API can also used save SparkDataFrames multiple file formats. example can save SparkDataFrame previous example Parquet file using write.df.","code":"sparkR.session(sparkPackages = \"com.databricks:spark-avro_2.12:3.0.0\") df <- read.df(csvPath, \"csv\", header = \"true\", inferSchema = \"true\", na.strings = \"NA\") filePath <- paste0(sparkR.conf(\"spark.home\"), \"/examples/src/main/resources/people.json\") readLines(filePath, n = 2L) ## [1] \"{\\\"name\\\":\\\"Michael\\\"}\" \"{\\\"name\\\":\\\"Andy\\\", \\\"age\\\":30}\" people <- read.df(filePath, \"json\") count(people) ## [1] 3 head(people) ## age name ## 1 NA Michael ## 2 30 Andy ## 3 19 Justin printSchema(people) ## root ## |-- age: long (nullable = true) ## |-- name: string (nullable = true) people <- read.json(paste0(Sys.getenv(\"SPARK_HOME\"), c(\"/examples/src/main/resources/people.json\", \"/examples/src/main/resources/people.json\"))) count(people) ## [1] 6 write.df(people, path = \"people.parquet\", source = \"parquet\", mode = \"overwrite\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"hive-tables","dir":"Articles","previous_headings":"Data Import","what":"Hive Tables","title":"SparkR - Practical Guide","text":"can also create SparkDataFrames Hive tables. need create SparkSession Hive support can access tables Hive MetaStore. Note Spark built Hive support details can found SQL Programming Guide. SparkR, default attempt create SparkSession Hive support enabled (enableHiveSupport = TRUE).","code":"sql(\"CREATE TABLE IF NOT EXISTS src (key INT, value STRING)\") txtPath <- paste0(sparkR.conf(\"spark.home\"), \"/examples/src/main/resources/kv1.txt\") sqlCMD <- sprintf(\"LOAD DATA LOCAL INPATH '%s' INTO TABLE src\", txtPath) sql(sqlCMD) results <- sql(\"FROM src SELECT key, value\") # results is now a SparkDataFrame head(results)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"data-processing","dir":"Articles","previous_headings":"","what":"Data Processing","title":"SparkR - Practical Guide","text":"dplyr users: SparkR similar interface dplyr data processing. However, noticeable differences worth mentioning first place. use df represent SparkDataFrame col represent name column . indicate columns. SparkR uses either character string column name Column object constructed $ indicate column. example, select col df, can write select(df, \"col\") select(df, df$col). describe conditions. SparkR, Column object representation can inserted condition directly, can use character string describe condition, without referring SparkDataFrame used. example, select rows value > 1, can write filter(df, df$col > 1) filter(df, \"col > 1\"). concrete examples. differences mentioned specific methods. use SparkDataFrame carsDF created . can get basic information SparkDataFrame. Print schema tree format.","code":"carsDF ## SparkDataFrame[model:string, mpg:double, cyl:double, disp:double, hp:double, drat:double, wt:double, qsec:double, vs:double, am:double, gear:double, carb:double] printSchema(carsDF) ## root ## |-- model: string (nullable = true) ## |-- mpg: double (nullable = true) ## |-- cyl: double (nullable = true) ## |-- disp: double (nullable = true) ## |-- hp: double (nullable = true) ## |-- drat: double (nullable = true) ## |-- wt: double (nullable = true) ## |-- qsec: double (nullable = true) ## |-- vs: double (nullable = true) ## |-- am: double (nullable = true) ## |-- gear: double (nullable = true) ## |-- carb: double (nullable = true)"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"selecting-rows-columns","dir":"Articles","previous_headings":"Data Processing > SparkDataFrame Operations","what":"Selecting rows, columns","title":"SparkR - Practical Guide","text":"SparkDataFrames support number functions structured data processing. include basic examples complete list can found API docs: can also pass column name strings. Filter SparkDataFrame retain rows mpg less 20 miles/gallon.","code":"head(select(carsDF, \"mpg\")) ## mpg ## 1 21.0 ## 2 21.0 ## 3 22.8 ## 4 21.4 ## 5 18.7 ## 6 18.1 head(filter(carsDF, carsDF$mpg < 20)) ## model mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.44 17.02 0 0 3 2 ## 2 Valiant 18.1 6 225.0 105 2.76 3.46 20.22 1 0 3 1 ## 3 Duster 360 14.3 8 360.0 245 3.21 3.57 15.84 0 0 3 4 ## 4 Merc 280 19.2 6 167.6 123 3.92 3.44 18.30 1 0 4 4 ## 5 Merc 280C 17.8 6 167.6 123 3.92 3.44 18.90 1 0 4 4 ## 6 Merc 450SE 16.4 8 275.8 180 3.07 4.07 17.40 0 0 3 3"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"grouping-aggregation","dir":"Articles","previous_headings":"Data Processing > SparkDataFrame Operations","what":"Grouping, Aggregation","title":"SparkR - Practical Guide","text":"common flow grouping aggregation Use groupBy group_by respect grouping variables create GroupedData object Feed GroupedData object agg summarize functions, provided aggregation functions compute number within group. number widely used functions supported aggregate data grouping, including avg, count_distinct, count, first, kurtosis, last, max, mean, min, sd, skewness, stddev_pop, stddev_samp, sum_distinct, sum, var_pop, var_samp, var. See API doc aggregate functions linked . example can compute histogram number cylinders mtcars dataset shown . Use cube rollup compute subtotals across multiple dimensions. generates groupings {(cyl, gear, ), (cyl, gear), (cyl), ()}, generates groupings possible combinations grouping columns.","code":"numCyl <- summarize(groupBy(carsDF, carsDF$cyl), count = n(carsDF$cyl)) head(numCyl) ## cyl count ## 1 8 14 ## 2 4 11 ## 3 6 7 mean(cube(carsDF, \"cyl\", \"gear\", \"am\"), \"mpg\") ## SparkDataFrame[cyl:double, gear:double, am:double, avg(mpg):double] mean(rollup(carsDF, \"cyl\", \"gear\", \"am\"), \"mpg\") ## SparkDataFrame[cyl:double, gear:double, am:double, avg(mpg):double]"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"operating-on-columns","dir":"Articles","previous_headings":"Data Processing > SparkDataFrame Operations","what":"Operating on Columns","title":"SparkR - Practical Guide","text":"SparkR also provides number functions can directly applied columns data processing aggregation. example shows use basic arithmetic functions.","code":"carsDF_km <- carsDF carsDF_km$kmpg <- carsDF_km$mpg * 1.61 head(select(carsDF_km, \"model\", \"mpg\", \"kmpg\")) ## model mpg kmpg ## 1 Mazda RX4 21.0 33.810 ## 2 Mazda RX4 Wag 21.0 33.810 ## 3 Datsun 710 22.8 36.708 ## 4 Hornet 4 Drive 21.4 34.454 ## 5 Hornet Sportabout 18.7 30.107 ## 6 Valiant 18.1 29.141"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"window-functions","dir":"Articles","previous_headings":"Data Processing","what":"Window Functions","title":"SparkR - Practical Guide","text":"window function variation aggregation function. simple words, aggregation function: n 1 mapping - returns single value group entries. Examples include sum, count, max. window function: n n mapping - returns one value entry group, value may depend entries group. Examples include rank, lead, lag. Formally, group mentioned called frame. Every input row can unique frame associated output window function row based rows confined frame. Window functions often used conjunction following functions: windowPartitionBy, windowOrderBy, partitionBy, orderBy, . illustrate next look example. still use mtcars dataset. corresponding SparkDataFrame carsDF. Suppose number cylinders, want calculate rank car mpg within group. explain detail steps. windowPartitionBy creates window specification object WindowSpec defines partition. controls rows partition given row. case, rows value cyl put partition. orderBy defines ordering - position given row partition. resulting WindowSpec returned ws. window specification methods include rangeBetween, can define boundaries frame value, rowsBetween, can define boundaries row indices. withColumn appends Column called rank SparkDataFrame. returns windowing column. first argument usually Column returned window function(s) rank(), lead(carsDF$wt). calculates corresponding values according partitioned--ordered table.","code":"carsSubDF <- select(carsDF, \"model\", \"mpg\", \"cyl\") ws <- orderBy(windowPartitionBy(\"cyl\"), \"mpg\") carsRank <- withColumn(carsSubDF, \"rank\", over(rank(), ws)) head(carsRank, n = 20L) ## model mpg cyl rank ## 1 Volvo 142E 21.4 4 1 ## 2 Toyota Corona 21.5 4 2 ## 3 Datsun 710 22.8 4 3 ## 4 Merc 230 22.8 4 3 ## 5 Merc 240D 24.4 4 5 ## 6 Porsche 914-2 26.0 4 6 ## 7 Fiat X1-9 27.3 4 7 ## 8 Honda Civic 30.4 4 8 ## 9 Lotus Europa 30.4 4 8 ## 10 Fiat 128 32.4 4 10 ## 11 Toyota Corolla 33.9 4 11 ## 12 Merc 280C 17.8 6 1 ## 13 Valiant 18.1 6 2 ## 14 Merc 280 19.2 6 3 ## 15 Ferrari Dino 19.7 6 4 ## 16 Mazda RX4 21.0 6 5 ## 17 Mazda RX4 Wag 21.0 6 5 ## 18 Hornet 4 Drive 21.4 6 7 ## 19 Cadillac Fleetwood 10.4 8 1 ## 20 Lincoln Continental 10.4 8 1"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"user-defined-function","dir":"Articles","previous_headings":"Data Processing","what":"User-Defined Function","title":"SparkR - Practical Guide","text":"SparkR, support several kinds user-defined functions (UDFs).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"apply-by-partition","dir":"Articles","previous_headings":"Data Processing > User-Defined Function","what":"Apply by Partition","title":"SparkR - Practical Guide","text":"dapply can apply function partition SparkDataFrame. function applied partition SparkDataFrame one parameter, data.frame corresponding partition, output data.frame well. Schema specifies row format resulting SparkDataFrame. must match data types returned value. See mapping R Spark. convert mpg kmpg (kilometers per gallon). carsSubDF SparkDataFrame subset carsDF columns. Like dapply, dapplyCollect can apply function partition SparkDataFrame collect result back. output function data.frame, schema required case. Note dapplyCollect can fail output UDF partitions pulled driver’s memory.","code":"carsSubDF <- select(carsDF, \"model\", \"mpg\") schema <- \"model STRING, mpg DOUBLE, kmpg DOUBLE\" out <- dapply(carsSubDF, function(x) { x <- cbind(x, x$mpg * 1.61) }, schema) head(collect(out)) ## model mpg kmpg ## 1 Mazda RX4 21.0 33.810 ## 2 Mazda RX4 Wag 21.0 33.810 ## 3 Datsun 710 22.8 36.708 ## 4 Hornet 4 Drive 21.4 34.454 ## 5 Hornet Sportabout 18.7 30.107 ## 6 Valiant 18.1 29.141 out <- dapplyCollect( carsSubDF, function(x) { x <- cbind(x, \"kmpg\" = x$mpg * 1.61) }) head(out, 3) ## model mpg kmpg ## 1 Mazda RX4 21.0 33.810 ## 2 Mazda RX4 Wag 21.0 33.810 ## 3 Datsun 710 22.8 36.708"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"apply-by-group","dir":"Articles","previous_headings":"Data Processing > User-Defined Function","what":"Apply by Group","title":"SparkR - Practical Guide","text":"gapply can apply function group SparkDataFrame. function applied group SparkDataFrame two parameters: grouping key R data.frame corresponding key. groups chosen SparkDataFrames column(s). output function data.frame. Schema specifies row format resulting SparkDataFrame. must represent R function’s output schema basis Spark data types. column names returned data.frame set user. See mapping R Spark. Like gapply, gapplyCollect can apply function partition SparkDataFrame collect result back R data.frame. output function data.frame schema required case. Note gapplyCollect can fail output UDF partitions pulled driver’s memory.","code":"schema <- structType(structField(\"cyl\", \"double\"), structField(\"max_mpg\", \"double\")) result <- gapply( carsDF, \"cyl\", function(key, x) { y <- data.frame(key, max(x$mpg)) }, schema) head(arrange(result, \"max_mpg\", decreasing = TRUE)) ## cyl max_mpg ## 1 4 33.9 ## 2 6 21.4 ## 3 8 19.2 result <- gapplyCollect( carsDF, \"cyl\", function(key, x) { y <- data.frame(key, max(x$mpg)) colnames(y) <- c(\"cyl\", \"max_mpg\") y }) head(result[order(result$max_mpg, decreasing = TRUE), ]) ## cyl max_mpg ## 1 4 33.9 ## 2 6 21.4 ## 3 8 19.2"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"distribute-local-functions","dir":"Articles","previous_headings":"Data Processing > User-Defined Function","what":"Distribute Local Functions","title":"SparkR - Practical Guide","text":"Similar lapply native R, spark.lapply runs function list elements distributes computations Spark. spark.lapply works manner similar doParallel lapply elements list. results computations fit single machine. case can something like df <- createDataFrame(list) use dapply. use svm package e1071 example. use default settings except varying costs constraints violation. spark.lapply can train different models parallel. Return list model’s summaries. avoid lengthy display, present partial result second fitted model. free inspect models well.","code":"costs <- exp(seq(from = log(1), to = log(1000), length.out = 5)) train <- function(cost) { stopifnot(requireNamespace(\"e1071\", quietly = TRUE)) model <- e1071::svm(Species ~ ., data = iris, cost = cost) summary(model) } model.summaries <- spark.lapply(costs, train) class(model.summaries) ## [1] \"list\" print(model.summaries[[2]]) ## $call ## svm(formula = Species ~ ., data = iris, cost = cost) ## ## $type ## [1] 0 ## ## $kernel ## [1] 2 ## ## $cost ## [1] 5.623413 ## ## $degree ## [1] 3 ## ## $gamma ## [1] 0.25 ## ## $coef0 ## [1] 0 ## ## $nu ## [1] 0.5 ## ## $epsilon ## [1] 0.1 ## ## $sparse ## [1] FALSE ## ## $scaled ## [1] TRUE TRUE TRUE TRUE ## ## $x.scale ## $x.scale$`scaled:center` ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 5.843333 3.057333 3.758000 1.199333 ## ## $x.scale$`scaled:scale` ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 0.8280661 0.4358663 1.7652982 0.7622377 ## ## ## $y.scale ## NULL ## ## $nclasses ## [1] 3 ## ## $levels ## [1] \"setosa\" \"versicolor\" \"virginica\" ## ## $tot.nSV ## [1] 35 ## ## $nSV ## [1] 6 15 14 ## ## $labels ## [1] 1 2 3 ## ## $SV ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## 14 -1.86378030 -0.13153881 -1.5056946 -1.4422448 ## 16 -0.17309407 3.08045544 -1.2791040 -1.0486668 ## 21 -0.53538397 0.78617383 -1.1658087 -1.3110521 ## 23 -1.50149039 1.24503015 -1.5623422 -1.3110521 ## 24 -0.89767388 0.55674567 -1.1658087 -0.9174741 ## 42 -1.62225369 -1.73753594 -1.3923993 -1.1798595 ## 51 1.39682886 0.32731751 0.5336209 0.2632600 ## 53 1.27606556 0.09788935 0.6469162 0.3944526 ## 54 -0.41462067 -1.73753594 0.1370873 0.1320673 ## 55 0.79301235 -0.59039513 0.4769732 0.3944526 ## [ reached getOption(\"max.print\") -- omitted 25 rows ] ## ## $index ## [1] 14 16 21 23 24 42 51 53 54 55 58 61 69 71 73 78 79 84 85 ## [20] 86 99 107 111 119 120 124 127 128 130 132 134 135 139 149 150 ## ## $rho ## [1] -0.10346530 0.12160294 -0.09540346 ## ## $compprob ## [1] FALSE ## ## $probA ## NULL ## ## $probB ## NULL ## ## $sigma ## NULL ## ## $coefs ## [,1] [,2] ## [1,] 0.00000000 0.06561739 ## [2,] 0.76813720 0.93378721 ## [3,] 0.00000000 0.12123270 ## [4,] 0.00000000 0.31170741 ## [5,] 1.11614066 0.46397392 ## [6,] 1.88141600 1.10392128 ## [7,] -0.55872622 0.00000000 ## [8,] 0.00000000 5.62341325 ## [9,] 0.00000000 0.27711792 ## [10,] 0.00000000 5.28440007 ## [11,] -1.06596713 0.00000000 ## [12,] -0.57076709 1.09019756 ## [13,] -0.03365904 5.62341325 ## [14,] 0.00000000 5.62341325 ## [15,] 0.00000000 5.62341325 ## [16,] 0.00000000 5.62341325 ## [17,] 0.00000000 4.70398738 ## [18,] 0.00000000 5.62341325 ## [19,] 0.00000000 4.97981371 ## [20,] -0.77497987 0.00000000 ## [ reached getOption(\"max.print\") -- omitted 15 rows ] ## ## $na.action ## NULL ## ## $xlevels ## named list() ## ## $fitted ## 1 2 3 4 5 6 7 8 9 10 11 ## setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa ## 12 13 14 15 16 17 18 19 20 21 22 ## setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa ## 23 24 25 26 27 28 29 30 31 32 33 ## setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa ## 34 35 36 37 38 39 40 ## setosa setosa setosa setosa setosa setosa setosa ## [ reached getOption(\"max.print\") -- omitted 110 entries ] ## Levels: setosa versicolor virginica ## ## $decision.values ## setosa/versicolor setosa/virginica versicolor/virginica ## 1 1.1911739 1.0908424 1.1275805 ## 2 1.1336557 1.0619543 1.3260964 ## 3 1.2085065 1.0698101 1.0511345 ## 4 1.1646153 1.0505915 1.0806874 ## 5 1.1880814 1.0950348 0.9542815 ## 6 1.0990761 1.0984626 0.9326361 ## 7 1.1573474 1.0343287 0.9726843 ## 8 1.1851598 1.0815750 1.2206802 ## 9 1.1673499 1.0406734 0.8837945 ## 10 1.1629911 1.0560925 1.2430067 ## 11 1.1339282 1.0803946 1.0338357 ## 12 1.1724182 1.0641469 1.1190423 ## 13 1.1827355 1.0667956 1.1414844 ## [ reached getOption(\"max.print\") -- omitted 137 rows ] ## ## $terms ## Species ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width ## attr(,\"variables\") ## list(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ## attr(,\"factors\") ## Sepal.Length Sepal.Width Petal.Length Petal.Width ## Species 0 0 0 0 ## Sepal.Length 1 0 0 0 ## Sepal.Width 0 1 0 0 ## Petal.Length 0 0 1 0 ## Petal.Width 0 0 0 1 ## attr(,\"term.labels\") ## [1] \"Sepal.Length\" \"Sepal.Width\" \"Petal.Length\" \"Petal.Width\" ## attr(,\"order\") ## [1] 1 1 1 1 ## attr(,\"intercept\") ## [1] 0 ## attr(,\"response\") ## [1] 1 ## attr(,\".Environment\") ## ## attr(,\"predvars\") ## list(Species, Sepal.Length, Sepal.Width, Petal.Length, Petal.Width) ## attr(,\"dataClasses\") ## Species Sepal.Length Sepal.Width Petal.Length Petal.Width ## \"factor\" \"numeric\" \"numeric\" \"numeric\" \"numeric\" ## ## attr(,\"class\") ## [1] \"summary.svm\""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"sql-queries","dir":"Articles","previous_headings":"Data Processing","what":"SQL Queries","title":"SparkR - Practical Guide","text":"SparkDataFrame can also registered temporary view Spark SQL one can run SQL queries data. sql function enables applications run SQL queries programmatically returns result SparkDataFrame. Register SparkDataFrame temporary view. SQL statements can run using sql method.","code":"people <- read.df(paste0(sparkR.conf(\"spark.home\"), \"/examples/src/main/resources/people.json\"), \"json\") createOrReplaceTempView(people, \"people\") teenagers <- sql(\"SELECT name FROM people WHERE age >= 13 AND age <= 19\") head(teenagers) ## name ## 1 Justin"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"machine-learning","dir":"Articles","previous_headings":"","what":"Machine Learning","title":"SparkR - Practical Guide","text":"SparkR supports following machine learning models algorithms.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"classification","dir":"Articles","previous_headings":"Machine Learning","what":"Classification","title":"SparkR - Practical Guide","text":"Linear Support Vector Machine (SVM) Classifier Logistic Regression Multilayer Perceptron (MLP) Naive Bayes Factorization Machines (FM) Classifier","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"regression","dir":"Articles","previous_headings":"Machine Learning","what":"Regression","title":"SparkR - Practical Guide","text":"Accelerated Failure Time (AFT) Survival Model Generalized Linear Model (GLM) Isotonic Regression Linear Regression Factorization Machines (FM) Regressor","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"tree---classification-and-regression","dir":"Articles","previous_headings":"Machine Learning","what":"Tree - Classification and Regression","title":"SparkR - Practical Guide","text":"Decision Tree Gradient-Boosted Trees (GBT) Random Forest","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"clustering","dir":"Articles","previous_headings":"Machine Learning","what":"Clustering","title":"SparkR - Practical Guide","text":"Bisecting \\(k\\)-means Gaussian Mixture Model (GMM) \\(k\\)-means Clustering Latent Dirichlet Allocation (LDA) Power Iteration Clustering (PIC)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"collaborative-filtering","dir":"Articles","previous_headings":"Machine Learning","what":"Collaborative Filtering","title":"SparkR - Practical Guide","text":"Alternating Least Squares (ALS)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"frequent-pattern-mining","dir":"Articles","previous_headings":"Machine Learning","what":"Frequent Pattern Mining","title":"SparkR - Practical Guide","text":"FP-growth PrefixSpan","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"statistics","dir":"Articles","previous_headings":"Machine Learning","what":"Statistics","title":"SparkR - Practical Guide","text":"Kolmogorov-Smirnov Test","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"r-formula","dir":"Articles","previous_headings":"Machine Learning","what":"R Formula","title":"SparkR - Practical Guide","text":", SparkR supports R formula operators, including ~, ., :, + - model fitting. makes similar experience using R functions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"training-and-test-sets","dir":"Articles","previous_headings":"Machine Learning","what":"Training and Test Sets","title":"SparkR - Practical Guide","text":"can easily split SparkDataFrame random training test sets randomSplit function. returns list split SparkDataFrames provided weights. use carsDF example want \\(70%\\) training data \\(30%\\) test data.","code":"splitDF_list <- randomSplit(carsDF, c(0.7, 0.3), seed = 0) carsDF_train <- splitDF_list[[1]] carsDF_test <- splitDF_list[[2]] count(carsDF_train) ## [1] 24 head(carsDF_train) ## model mpg cyl disp hp drat wt qsec vs am gear carb ## 1 Cadillac Fleetwood 10.4 8 472 205 2.93 5.250 17.98 0 0 3 4 ## 2 Camaro Z28 13.3 8 350 245 3.73 3.840 15.41 0 0 3 4 ## 3 Chrysler Imperial 14.7 8 440 230 3.23 5.345 17.42 0 0 3 4 ## 4 Dodge Challenger 15.5 8 318 150 2.76 3.520 16.87 0 0 3 2 ## 5 Duster 360 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4 ## 6 Ferrari Dino 19.7 6 145 175 3.62 2.770 15.50 0 1 5 6 count(carsDF_test) ## [1] 8 head(carsDF_test) ## model mpg cyl disp hp drat wt qsec vs am gear carb ## 1 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 ## 2 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 3 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 ## 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 5 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## 6 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"linear-support-vector-machine-svm-classifier","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Linear Support Vector Machine (SVM) Classifier","title":"SparkR - Practical Guide","text":"Linear Support Vector Machine (SVM) classifier SVM classifier linear kernels. binary classifier. use simple example show use spark.svmLinear binary classification. Predict values training data","code":"# load training data and create a DataFrame t <- as.data.frame(Titanic) training <- createDataFrame(t) # fit a Linear SVM classifier model model <- spark.svmLinear(training, Survived ~ ., regParam = 0.01, maxIter = 10) summary(model) ## $coefficients ## Estimate ## (Intercept) 0.993131388 ## Class_1st -0.386500359 ## Class_2nd -0.622627816 ## Class_3rd -0.204446602 ## Sex_Female -0.589950309 ## Age_Adult 0.741676902 ## Freq -0.006582887 ## ## $numClasses ## [1] 2 ## ## $numFeatures ## [1] 6 prediction <- predict(model, training) head(select(prediction, \"Class\", \"Sex\", \"Age\", \"Freq\", \"Survived\", \"prediction\")) ## Class Sex Age Freq Survived prediction ## 1 1st Male Child 0 No Yes ## 2 2nd Male Child 0 No Yes ## 3 3rd Male Child 35 No Yes ## 4 Crew Male Child 0 No Yes ## 5 1st Female Child 0 No Yes ## 6 2nd Female Child 0 No No"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"logistic-regression","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Logistic Regression","title":"SparkR - Practical Guide","text":"Logistic regression widely-used model response categorical. can seen special case Generalized Linear Predictive Model. provide spark.logit top spark.glm support logistic regression advanced hyper-parameters. supports binary multiclass classification elastic-net regularization feature standardization, similar glmnet. use simple example demonstrate spark.logit usage. general, three steps using spark.logit: 1). Create dataframe proper data source; 2). Fit logistic regression model using spark.logit proper parameter setting; 3). Obtain coefficient matrix fitted model using summary use model prediction predict. Binomial logistic regression Predict values training data Multinomial logistic regression three classes","code":"t <- as.data.frame(Titanic) training <- createDataFrame(t) model <- spark.logit(training, Survived ~ ., regParam = 0.04741301) summary(model) ## $coefficients ## Estimate ## (Intercept) 0.2255014282 ## Class_1st -0.1338856652 ## Class_2nd -0.1479826947 ## Class_3rd 0.0005674937 ## Sex_Female -0.2011183871 ## Age_Adult 0.3263186885 ## Freq -0.0033111157 fitted <- predict(model, training) head(select(fitted, \"Class\", \"Sex\", \"Age\", \"Freq\", \"Survived\", \"prediction\")) ## Class Sex Age Freq Survived prediction ## 1 1st Male Child 0 No Yes ## 2 2nd Male Child 0 No Yes ## 3 3rd Male Child 35 No Yes ## 4 Crew Male Child 0 No Yes ## 5 1st Female Child 0 No No ## 6 2nd Female Child 0 No No t <- as.data.frame(Titanic) training <- createDataFrame(t) # Note in this case, Spark infers it is multinomial logistic regression, so family = \"multinomial\" is optional. model <- spark.logit(training, Class ~ ., regParam = 0.07815179) summary(model) ## $coefficients ## 1st 2nd 3rd Crew ## (Intercept) 0.051662845 0.062998145 -0.039083689 -0.075577300 ## Sex_Female -0.088030587 -0.102528148 0.059233106 0.131325629 ## Age_Adult 0.141935316 0.169492058 -0.102562719 -0.208864654 ## Survived_No 0.052721020 0.057980057 -0.029408423 -0.081292653 ## Freq -0.001555912 -0.001970377 0.001303836 0.002222453"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"multilayer-perceptron","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Multilayer Perceptron","title":"SparkR - Practical Guide","text":"Multilayer perceptron classifier (MLPC) classifier based feedforward artificial neural network. MLPC consists multiple layers nodes. layer fully connected next layer network. Nodes input layer represent input data. nodes map inputs outputs linear combination inputs node’s weights \\(w\\) bias \\(b\\) applying activation function. can written matrix form MLPC \\(K+1\\) layers follows: \\[ y(x)=f_K(\\ldots f_2(w_2^T f_1(w_1^T x + b_1) + b_2) \\ldots + b_K). \\] Nodes intermediate layers use sigmoid (logistic) function: \\[ f(z_i) = \\frac{1}{1+e^{-z_i}}. \\] Nodes output layer use softmax function: \\[ f(z_i) = \\frac{e^{z_i}}{\\sum_{k=1}^N e^{z_k}}. \\] number nodes \\(N\\) output layer corresponds number classes. MLPC employs backpropagation learning model. use logistic loss function optimization L-BFGS optimization routine. spark.mlp requires least two columns data: one named \"label\" one \"features\". \"features\" column libSVM-format. use Titanic data set show use spark.mlp classification. avoid lengthy display, present partial results model summary. can check full result sparkR shell.","code":"t <- as.data.frame(Titanic) training <- createDataFrame(t) # fit a Multilayer Perceptron Classification Model model <- spark.mlp(training, Survived ~ Age + Sex, blockSize = 128, layers = c(2, 2), solver = \"l-bfgs\", maxIter = 100, tol = 0.5, stepSize = 1, seed = 1, initialWeights = c( 0, 0, 5, 5, 9, 9)) # check the summary of the fitted model summary(model) ## $numOfInputs ## [1] 2 ## ## $numOfOutputs ## [1] 2 ## ## $layers ## [1] 2 2 ## ## $weights ## $weights[[1]] ## [1] 0 ## ## $weights[[2]] ## [1] 0 ## ## $weights[[3]] ## [1] 5 ## ## $weights[[4]] ## [1] 5 ## ## $weights[[5]] ## [1] 9 ## ## $weights[[6]] ## [1] 9 # make predictions use the fitted model predictions <- predict(model, training) head(select(predictions, predictions$prediction)) ## prediction ## 1 No ## 2 No ## 3 No ## 4 No ## 5 No ## 6 No"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"naive-bayes","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Naive Bayes","title":"SparkR - Practical Guide","text":"Naive Bayes model assumes independence among features. spark.naiveBayes fits Bernoulli naive Bayes model SparkDataFrame. data categorical. models often used document classification.","code":"titanic <- as.data.frame(Titanic) titanicDF <- createDataFrame(titanic[titanic$Freq > 0, -5]) naiveBayesModel <- spark.naiveBayes(titanicDF, Survived ~ Class + Sex + Age) summary(naiveBayesModel) ## $apriori ## Yes No ## [1,] 0.5769231 0.4230769 ## ## $tables ## Class_3rd Class_1st Class_2nd Sex_Female Age_Adult ## Yes 0.3125 0.3125 0.3125 0.5 0.5625 ## No 0.4166667 0.25 0.25 0.5 0.75 naiveBayesPrediction <- predict(naiveBayesModel, titanicDF) head(select(naiveBayesPrediction, \"Class\", \"Sex\", \"Age\", \"Survived\", \"prediction\")) ## Class Sex Age Survived prediction ## 1 3rd Male Child No Yes ## 2 3rd Female Child No Yes ## 3 1st Male Adult No Yes ## 4 2nd Male Adult No Yes ## 5 3rd Male Adult No No ## 6 Crew Male Adult No Yes"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"factorization-machines-classifier","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Factorization Machines Classifier","title":"SparkR - Practical Guide","text":"Factorization Machines classification problems. background details implementation factorization machines, refer Factorization Machines section.","code":"t <- as.data.frame(Titanic) training <- createDataFrame(t) model <- spark.fmClassifier(training, Survived ~ Age + Sex) summary(model) ## $coefficients ## Estimate ## (Intercept) 0.0064275991 ## Age_Adult 0.0001294448 ## Sex_Female 0.0001294448 ## ## $factors ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] -0.3256224 0.11912568 0.1460235 0.1620567 0.13153516 0.06403695 ## [2,] -0.1382155 -0.03658261 0.1717808 -0.1602241 -0.08446129 -0.19287098 ## [,7] [,8] ## [1,] -0.03292446 -0.05166818 ## [2,] 0.19252571 0.06237194 ## ## $numClasses ## [1] 2 ## ## $numFeatures ## [1] 2 ## ## $factorSize ## [1] 8 predictions <- predict(model, training) head(select(predictions, predictions$prediction)) ## prediction ## 1 Yes ## 2 Yes ## 3 Yes ## 4 Yes ## 5 Yes ## 6 Yes"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"accelerated-failure-time-survival-model","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Accelerated Failure Time Survival Model","title":"SparkR - Practical Guide","text":"Survival analysis studies expected duration time event happens, often relationship risk factors treatment taken subject. contrast standard regression analysis, survival modeling deal special characteristics data including non-negative survival time censoring. Accelerated Failure Time (AFT) model parametric survival model censored data assumes effect covariate accelerate decelerate life course event constant. information, refer Wikipedia page AFT Model references . Different Proportional Hazards Model designed purpose, AFT model easier parallelize instance contributes objective function independently.","code":"library(survival) ovarianDF <- createDataFrame(ovarian) aftModel <- spark.survreg(ovarianDF, Surv(futime, fustat) ~ ecog_ps + rx) summary(aftModel) ## $coefficients ## Value ## (Intercept) 6.8966910 ## ecog_ps -0.3850414 ## rx 0.5286455 ## Log(scale) -0.1234429 aftPredictions <- predict(aftModel, ovarianDF) head(aftPredictions) ## futime fustat age resid_ds rx ecog_ps label prediction ## 1 59 1 72.3315 2 1 1 59 1141.724 ## 2 115 1 74.4932 2 1 1 115 1141.724 ## 3 156 1 66.4658 2 1 2 156 776.855 ## 4 421 0 53.3644 2 2 1 421 1937.087 ## 5 431 1 50.3397 2 1 1 431 1141.724 ## 6 448 0 56.4301 1 1 2 448 776.855"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"generalized-linear-model","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Generalized Linear Model","title":"SparkR - Practical Guide","text":"main function spark.glm. following families link functions supported. default gaussian. three ways specify family argument. Family name character string, e.g. family = \"gaussian\". Family function, e.g. family = binomial. Result returned family function, e.g. family = poisson(link = log). Set family = \"tweedie\" specify var.power link.power package statmod loaded, tweedie family specified using family definition therein, .e., tweedie(). information regarding families link functions, see Wikipedia page Generalized Linear Model. use mtcars dataset illustration. corresponding SparkDataFrame carsDF. fitting model, print summary see fitted values making predictions original dataset. can also pass new SparkDataFrame schema predict new data. prediction, new column called prediction appended. Let’s look subset columns . following fit using tweedie family: can try distributions tweedie family, example, compound Poisson distribution log link:","code":"gaussianGLM <- spark.glm(carsDF, mpg ~ wt + hp) summary(gaussianGLM) ## ## Deviance Residuals: ## (Note: These are approximate quantiles with relative error <= 0.01) ## Min 1Q Median 3Q Max ## -3.9410 -1.6499 -0.3267 1.0373 5.8538 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.227270 1.5987875 23.2847 0.0000e+00 ## wt -3.877831 0.6327335 -6.1287 1.1196e-06 ## hp -0.031773 0.0090297 -3.5187 1.4512e-03 ## ## (Dispersion parameter for gaussian family taken to be 6.725785) ## ## Null deviance: 1126.05 on 31 degrees of freedom ## Residual deviance: 195.05 on 29 degrees of freedom ## AIC: 156.7 ## ## Number of Fisher Scoring iterations: 1 gaussianFitted <- predict(gaussianGLM, carsDF) head(select(gaussianFitted, \"model\", \"prediction\", \"mpg\", \"wt\", \"hp\")) ## model prediction mpg wt hp ## 1 Mazda RX4 23.57233 21.0 2.620 110 ## 2 Mazda RX4 Wag 22.58348 21.0 2.875 110 ## 3 Datsun 710 25.27582 22.8 2.320 93 ## 4 Hornet 4 Drive 21.26502 21.4 3.215 110 ## 5 Hornet Sportabout 18.32727 18.7 3.440 175 ## 6 Valiant 20.47382 18.1 3.460 105 tweedieGLM1 <- spark.glm(carsDF, mpg ~ wt + hp, family = \"tweedie\", var.power = 0.0) summary(tweedieGLM1) ## ## Deviance Residuals: ## (Note: These are approximate quantiles with relative error <= 0.01) ## Min 1Q Median 3Q Max ## -3.9410 -1.6499 -0.3267 1.0373 5.8538 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.227270 1.5987875 23.2847 0.0000e+00 ## wt -3.877831 0.6327335 -6.1287 1.1196e-06 ## hp -0.031773 0.0090297 -3.5187 1.4512e-03 ## ## (Dispersion parameter for tweedie family taken to be 6.725785) ## ## Null deviance: 1126.05 on 31 degrees of freedom ## Residual deviance: 195.05 on 29 degrees of freedom ## AIC: 156.7 ## ## Number of Fisher Scoring iterations: 1 tweedieGLM2 <- spark.glm(carsDF, mpg ~ wt + hp, family = \"tweedie\", var.power = 1.2, link.power = 0.0) summary(tweedieGLM2) ## ## Deviance Residuals: ## (Note: These are approximate quantiles with relative error <= 0.01) ## Min 1Q Median 3Q Max ## -0.58074 -0.25335 -0.09892 0.18608 0.82717 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 3.8500849 0.06698272 57.4788 0.0000e+00 ## wt -0.2018426 0.02897283 -6.9666 1.1691e-07 ## hp -0.0016248 0.00041603 -3.9054 5.1697e-04 ## ## (Dispersion parameter for tweedie family taken to be 0.1340111) ## ## Null deviance: 29.8820 on 31 degrees of freedom ## Residual deviance: 3.7739 on 29 degrees of freedom ## AIC: NA ## ## Number of Fisher Scoring iterations: 4"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"isotonic-regression","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Isotonic Regression","title":"SparkR - Practical Guide","text":"spark.isoreg fits Isotonic Regression model SparkDataFrame. solves weighted univariate regression problem complete order constraint. Specifically, given set real observed responses \\(y_1, \\ldots, y_n\\), corresponding real features \\(x_1, \\ldots, x_n\\), optionally positive weights \\(w_1, \\ldots, w_n\\), want find monotone (piecewise linear) function \\(f\\) minimize \\[ \\ell(f) = \\sum_{=1}^n w_i (y_i - f(x_i))^2. \\] arguments may useful. weightCol: character string specifying weight column. isotonic: logical value indicating whether output sequence isotonic/increasing (TRUE) antitonic/decreasing (FALSE). featureIndex: index feature right hand side formula vector column (default: 0), effect otherwise. use artificial example show use. prediction stage, based fitted monotone piecewise function, rules : prediction input exactly matches training feature associated prediction returned. case multiple predictions feature one returned. one undefined. prediction input lower higher training features prediction lowest highest feature returned respectively. case multiple predictions feature lowest highest returned respectively. prediction input falls two training features prediction treated piecewise linear function interpolated value calculated predictions two closest features. case multiple values feature rules previous point used. example, input \\(3.2\\), two closest feature values \\(3.0\\) \\(3.5\\), predicted value linear interpolation predicted values \\(3.0\\) \\(3.5\\).","code":"y <- c(3.0, 6.0, 8.0, 5.0, 7.0) x <- c(1.0, 2.0, 3.5, 3.0, 4.0) w <- rep(1.0, 5) data <- data.frame(y = y, x = x, w = w) df <- createDataFrame(data) isoregModel <- spark.isoreg(df, y ~ x, weightCol = \"w\") isoregFitted <- predict(isoregModel, df) head(select(isoregFitted, \"x\", \"y\", \"prediction\")) ## x y prediction ## 1 1.0 3 3.0 ## 2 2.0 6 5.5 ## 3 3.5 8 7.5 ## 4 3.0 5 5.5 ## 5 4.0 7 7.5 newDF <- createDataFrame(data.frame(x = c(1.5, 3.2))) head(predict(isoregModel, newDF)) ## x prediction ## 1 1.5 4.25 ## 2 3.2 6.30"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"linear-regression","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Linear Regression","title":"SparkR - Practical Guide","text":"Linear regression model.","code":"model <- spark.lm(carsDF, mpg ~ wt + hp) summary(model) ## $coefficients ## Estimate ## (Intercept) 37.22727012 ## wt -3.87783074 ## hp -0.03177295 ## ## $numFeatures ## [1] 2 predictions <- predict(model, carsDF) head(select(predictions, predictions$prediction)) ## prediction ## 1 23.57233 ## 2 22.58348 ## 3 25.27582 ## 4 21.26502 ## 5 18.32727 ## 6 20.47382"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"factorization-machines-regressor","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Factorization Machines Regressor","title":"SparkR - Practical Guide","text":"Factorization Machines regression problems. background details implementation factorization machines, refer Factorization Machines section.","code":"model <- spark.fmRegressor(carsDF, mpg ~ wt + hp) summary(model) ## $coefficients ## Estimate ## (Intercept) 0.1518559 ## wt 3.6472555 ## hp 2.8026828 ## ## $factors ## [,1] [,2] [,3] [,4] [,5] [,6] ## [1,] 0.1424420 -0.1178110 -0.3970272 -0.4696695 0.400288 0.3690930 ## [2,] -0.1626185 0.1512138 0.3690435 0.4076975 -0.625752 -0.3715109 ## [,7] [,8] ## [1,] 0.03472468 -0.1703219 ## [2,] -0.02109148 -0.2006249 ## ## $numFeatures ## [1] 2 ## ## $factorSize ## [1] 8 predictions <- predict(model, carsDF) head(select(predictions, predictions$prediction)) ## prediction ## 1 106.70996 ## 2 87.07526 ## 3 111.07931 ## 4 60.89565 ## 5 61.81374 ## 6 40.70095"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"decision-tree","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Decision Tree","title":"SparkR - Practical Guide","text":"spark.decisionTree fits decision tree classification regression model SparkDataFrame. Users can call summary get summary fitted model, predict make predictions, write.ml/read.ml save/load fitted models. use Titanic dataset train decision tree make predictions:","code":"t <- as.data.frame(Titanic) df <- createDataFrame(t) dtModel <- spark.decisionTree(df, Survived ~ ., type = \"classification\", maxDepth = 2) summary(dtModel) ## Formula: Survived ~ . ## Number of features: 6 ## Features: Class_1st Class_2nd Class_3rd Sex_Female Age_Adult Freq ## Feature importances: (6,[5],[1.0]) ## Max Depth: 2 ## DecisionTreeClassificationModel: uid=dtc_9a622896f329, depth=2, numNodes=5, numClasses=2, numFeatures=6 ## If (feature 5 <= 4.5) ## Predict: 0.0 ## Else (feature 5 > 4.5) ## If (feature 5 <= 84.5) ## Predict: 1.0 ## Else (feature 5 > 84.5) ## Predict: 0.0 ## predictions <- predict(dtModel, df) head(select(predictions, \"Class\", \"Sex\", \"Age\", \"Freq\", \"Survived\", \"prediction\")) ## Class Sex Age Freq Survived prediction ## 1 1st Male Child 0 No No ## 2 2nd Male Child 0 No No ## 3 3rd Male Child 35 No Yes ## 4 Crew Male Child 0 No No ## 5 1st Female Child 0 No No ## 6 2nd Female Child 0 No No"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"gradient-boosted-trees","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Gradient-Boosted Trees","title":"SparkR - Practical Guide","text":"spark.gbt fits gradient-boosted tree classification regression model SparkDataFrame. Users can call summary get summary fitted model, predict make predictions, write.ml/read.ml save/load fitted models. use Titanic dataset train gradient-boosted tree make predictions:","code":"t <- as.data.frame(Titanic) df <- createDataFrame(t) gbtModel <- spark.gbt(df, Survived ~ ., type = \"classification\", maxDepth = 2, maxIter = 2) summary(gbtModel) ## Formula: Survived ~ . ## Number of features: 6 ## Features: Class_1st Class_2nd Class_3rd Sex_Female Age_Adult Freq ## Feature importances: (6,[1,2,5],[0.03336902858878361,0.16099525743106016,0.8056357139801562]) ## Max Depth: 2 ## Number of trees: 2 ## Tree weights: 1 0.1 ## GBTClassificationModel: uid = gbtc_1164a08461b9, numTrees=2, numClasses=2, numFeatures=6 ## Tree 0 (weight 1.0): ## If (feature 5 <= 4.5) ## If (feature 1 in {1.0}) ## Predict: -1.0 ## Else (feature 1 not in {1.0}) ## Predict: -0.3333333333333333 ## Else (feature 5 > 4.5) ## If (feature 5 <= 84.5) ## Predict: 0.5714285714285714 ## Else (feature 5 > 84.5) ## Predict: -0.42857142857142855 ## Tree 1 (weight 0.1): ## If (feature 2 in {1.0}) ## If (feature 5 <= 15.5) ## Predict: 0.9671846896296403 ## Else (feature 5 > 15.5) ## Predict: -1.0857923804083338 ## Else (feature 2 not in {1.0}) ## If (feature 5 <= 13.5) ## Predict: -0.08651035613926407 ## Else (feature 5 > 13.5) ## Predict: 0.6566673506774614 ## predictions <- predict(gbtModel, df) head(select(predictions, \"Class\", \"Sex\", \"Age\", \"Freq\", \"Survived\", \"prediction\")) ## Class Sex Age Freq Survived prediction ## 1 1st Male Child 0 No No ## 2 2nd Male Child 0 No No ## 3 3rd Male Child 35 No Yes ## 4 Crew Male Child 0 No No ## 5 1st Female Child 0 No No ## 6 2nd Female Child 0 No No"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"random-forest","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Random Forest","title":"SparkR - Practical Guide","text":"spark.randomForest fits random forest classification regression model SparkDataFrame. Users can call summary get summary fitted model, predict make predictions, write.ml/read.ml save/load fitted models. following example, use Titanic dataset train random forest make predictions:","code":"t <- as.data.frame(Titanic) df <- createDataFrame(t) rfModel <- spark.randomForest(df, Survived ~ ., type = \"classification\", maxDepth = 2, numTrees = 2) summary(rfModel) ## Formula: Survived ~ . ## Number of features: 6 ## Features: Class_1st Class_2nd Class_3rd Sex_Female Age_Adult Freq ## Feature importances: (6,[3,4,5],[0.17058779274099098,0.09676977311565654,0.7326424341433525]) ## Max Depth: 2 ## Number of trees: 2 ## Tree weights: 1 1 ## RandomForestClassificationModel: uid=rfc_0078c6dda0b5, numTrees=2, numClasses=2, numFeatures=6 ## Tree 0 (weight 1.0): ## If (feature 4 in {0.0}) ## If (feature 3 in {0.0}) ## Predict: 0.0 ## Else (feature 3 not in {0.0}) ## Predict: 1.0 ## Else (feature 4 not in {0.0}) ## If (feature 5 <= 13.5) ## Predict: 0.0 ## Else (feature 5 > 13.5) ## Predict: 1.0 ## Tree 1 (weight 1.0): ## If (feature 5 <= 84.5) ## If (feature 5 <= 4.5) ## Predict: 0.0 ## Else (feature 5 > 4.5) ## Predict: 1.0 ## Else (feature 5 > 84.5) ## Predict: 0.0 ## predictions <- predict(rfModel, df) head(select(predictions, \"Class\", \"Sex\", \"Age\", \"Freq\", \"Survived\", \"prediction\")) ## Class Sex Age Freq Survived prediction ## 1 1st Male Child 0 No No ## 2 2nd Male Child 0 No No ## 3 3rd Male Child 35 No Yes ## 4 Crew Male Child 0 No No ## 5 1st Female Child 0 No No ## 6 2nd Female Child 0 No No"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"bisecting-k-means","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Bisecting k-Means","title":"SparkR - Practical Guide","text":"spark.bisectingKmeans kind hierarchical clustering using divisive (“top-”) approach: observations start one cluster, splits performed recursively one moves hierarchy.","code":"t <- as.data.frame(Titanic) training <- createDataFrame(t) model <- spark.bisectingKmeans(training, Class ~ Survived, k = 4) summary(model) ## $k ## [1] 4 ## ## $coefficients ## Survived_No ## 1 0 ## 2 1 ## 3 0 ## 4 1 ## ## $size ## $size[[1]] ## [1] 16 ## ## $size[[2]] ## [1] 16 ## ## $size[[3]] ## [1] 0 ## ## $size[[4]] ## [1] 0 ## ## ## $cluster ## SparkDataFrame[prediction:int] ## ## $is.loaded ## [1] FALSE fitted <- predict(model, training) head(select(fitted, \"Class\", \"prediction\")) ## Class prediction ## 1 1st 1 ## 2 2nd 1 ## 3 3rd 1 ## 4 Crew 1 ## 5 1st 1 ## 6 2nd 1"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"gaussian-mixture-model","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Gaussian Mixture Model","title":"SparkR - Practical Guide","text":"spark.gaussianMixture fits multivariate Gaussian Mixture Model (GMM) SparkDataFrame. Expectation-Maximization (EM) used approximate maximum likelihood estimator (MLE) model. use simulated example demonstrate usage.","code":"X1 <- data.frame(V1 = rnorm(4), V2 = rnorm(4)) X2 <- data.frame(V1 = rnorm(6, 3), V2 = rnorm(6, 4)) data <- rbind(X1, X2) df <- createDataFrame(data) gmmModel <- spark.gaussianMixture(df, ~ V1 + V2, k = 2) summary(gmmModel) ## $lambda ## [1] 0.4 0.6 ## ## $mu ## $mu[[1]] ## [1] 0.1116975 -0.0364584 ## ## $mu[[2]] ## [1] 2.850285 4.511970 ## ## ## $sigma ## $sigma[[1]] ## [,1] [,2] ## [1,] 0.3526357 0.1528754 ## [2,] 0.1528754 0.07738071 ## ## $sigma[[2]] ## [,1] [,2] ## [1,] 1.639517 0.717046 ## [2,] 0.717046 0.6648726 ## ## ## $loglik ## [1] -22.36829 ## ## $posterior ## SparkDataFrame[posterior:array] ## ## $is.loaded ## [1] FALSE gmmFitted <- predict(gmmModel, df) head(select(gmmFitted, \"V1\", \"V2\", \"prediction\")) ## V1 V2 prediction ## 1 -0.3229633 -0.1773441 0 ## 2 1.0644834 0.4448537 0 ## 3 -0.4470192 -0.2124869 0 ## 4 0.1522892 -0.2008563 0 ## 5 2.6114404 3.8304970 1 ## 6 5.2803801 5.6987158 1"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"k-means-clustering","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"k-Means Clustering","title":"SparkR - Practical Guide","text":"spark.kmeans fits \\(k\\)-means clustering model SparkDataFrame. unsupervised learning method, don’t need response variable. Hence, left hand side R formula left blank. clustering based variables right hand side.","code":"kmeansModel <- spark.kmeans(carsDF, ~ mpg + hp + wt, k = 3) summary(kmeansModel) ## $k ## [1] 3 ## ## $coefficients ## mpg hp wt ## 1 24.22353 93.52941 2.599588 ## 2 15.80000 178.50000 3.926400 ## 3 14.62000 263.80000 3.899000 ## ## $size ## $size[[1]] ## [1] 17 ## ## $size[[2]] ## [1] 10 ## ## $size[[3]] ## [1] 5 ## ## ## $cluster ## SparkDataFrame[prediction:int] ## ## $is.loaded ## [1] FALSE ## ## $clusterSize ## [1] 3 kmeansPredictions <- predict(kmeansModel, carsDF) head(select(kmeansPredictions, \"model\", \"mpg\", \"hp\", \"wt\", \"prediction\"), n = 20L) ## model mpg hp wt prediction ## 1 Mazda RX4 21.0 110 2.620 0 ## 2 Mazda RX4 Wag 21.0 110 2.875 0 ## 3 Datsun 710 22.8 93 2.320 0 ## 4 Hornet 4 Drive 21.4 110 3.215 0 ## 5 Hornet Sportabout 18.7 175 3.440 1 ## 6 Valiant 18.1 105 3.460 0 ## 7 Duster 360 14.3 245 3.570 2 ## 8 Merc 240D 24.4 62 3.190 0 ## 9 Merc 230 22.8 95 3.150 0 ## 10 Merc 280 19.2 123 3.440 0 ## 11 Merc 280C 17.8 123 3.440 0 ## 12 Merc 450SE 16.4 180 4.070 1 ## 13 Merc 450SL 17.3 180 3.730 1 ## 14 Merc 450SLC 15.2 180 3.780 1 ## 15 Cadillac Fleetwood 10.4 205 5.250 1 ## 16 Lincoln Continental 10.4 215 5.424 1 ## 17 Chrysler Imperial 14.7 230 5.345 2 ## 18 Fiat 128 32.4 66 2.200 0 ## 19 Honda Civic 30.4 52 1.615 0 ## 20 Toyota Corolla 33.9 65 1.835 0"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"latent-dirichlet-allocation","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Latent Dirichlet Allocation","title":"SparkR - Practical Guide","text":"spark.lda fits Latent Dirichlet Allocation model SparkDataFrame. often used topic modeling topics inferred collection text documents. LDA can thought clustering algorithm follows: Topics correspond cluster centers, documents correspond examples (rows) dataset. Topics documents exist feature space, feature vectors vectors word counts (bag words). Rather clustering using traditional distance, LDA uses function based statistical model text documents generated. use LDA, need specify features column data entry represents document. two options column: character string: can string whole document. parsed automatically. Additional stop words can added customizedStopWords. libSVM: entry collection words processed directly. Two functions provided fitted model. spark.posterior returns SparkDataFrame containing column posterior probabilities vectors named “topicDistribution”. spark.perplexity returns log perplexity given SparkDataFrame, log perplexity training data missing argument data. information, see help document ?spark.lda. Let’s look artificial example.","code":"corpus <- data.frame(features = c( \"1 2 6 0 2 3 1 1 0 0 3\", \"1 3 0 1 3 0 0 2 0 0 1\", \"1 4 1 0 0 4 9 0 1 2 0\", \"2 1 0 3 0 0 5 0 2 3 9\", \"3 1 1 9 3 0 2 0 0 1 3\", \"4 2 0 3 4 5 1 1 1 4 0\", \"2 1 0 3 0 0 5 0 2 2 9\", \"1 1 1 9 2 1 2 0 0 1 3\", \"4 4 0 3 4 2 1 3 0 0 0\", \"2 8 2 0 3 0 2 0 2 7 2\", \"1 1 1 9 0 2 2 0 0 3 3\", \"4 1 0 0 4 5 1 3 0 1 0\")) corpusDF <- createDataFrame(corpus) model <- spark.lda(data = corpusDF, k = 5, optimizer = \"em\") summary(model) ## $docConcentration ## [1] 11 11 11 11 11 ## ## $topicConcentration ## [1] 1.1 ## ## $logLikelihood ## [1] -353.2948 ## ## $logPerplexity ## [1] 2.676476 ## ## $isDistributed ## [1] TRUE ## ## $vocabSize ## [1] 10 ## ## $topics ## SparkDataFrame[topic:int, term:array, termWeights:array] ## ## $vocabulary ## [1] \"0\" \"1\" \"2\" \"3\" \"4\" \"9\" \"5\" \"8\" \"7\" \"6\" ## ## $trainingLogLikelihood ## [1] -239.5629 ## ## $logPrior ## [1] -980.2974 posterior <- spark.posterior(model, corpusDF) head(posterior) ## features topicDistribution ## 1 1 2 6 0 2 3 1 1 0 0 3 0.1972169, 0.1986611, 0.2022021, 0.2006635, 0.2012564 ## 2 1 3 0 1 3 0 0 2 0 0 1 0.1989972, 0.1988703, 0.2015973, 0.2006437, 0.1998915 ## 3 1 4 1 0 0 4 9 0 1 2 0 0.2020603, 0.2026033, 0.1968850, 0.1987348, 0.1997165 ## 4 2 1 0 3 0 0 5 0 2 3 9 0.2004063, 0.1981903, 0.2013020, 0.2006371, 0.1994644 ## 5 3 1 1 9 3 0 2 0 0 1 3 0.1971473, 0.1983922, 0.2023582, 0.2011645, 0.2009377 ## 6 4 2 0 3 4 5 1 1 1 4 0 0.2020235, 0.2041760, 0.1955404, 0.1997292, 0.1985309 perplexity <- spark.perplexity(model, corpusDF) perplexity ## [1] 2.676476"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"alternating-least-squares","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Alternating Least Squares","title":"SparkR - Practical Guide","text":"spark.als learns latent factors collaborative filtering via alternating least squares. multiple options can configured spark.als, including rank, reg, nonnegative. complete list, refer help file. Extract latent factors. Make predictions.","code":"ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), list(2, 1, 1.0), list(2, 2, 5.0)) df <- createDataFrame(ratings, c(\"user\", \"item\", \"rating\")) model <- spark.als(df, \"rating\", \"user\", \"item\", rank = 10, reg = 0.1, nonnegative = TRUE) stats <- summary(model) userFactors <- stats$userFactors itemFactors <- stats$itemFactors head(userFactors) head(itemFactors) predicted <- predict(model, df) head(predicted)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"power-iteration-clustering","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Power Iteration Clustering","title":"SparkR - Practical Guide","text":"Power Iteration Clustering (PIC) scalable graph clustering algorithm. spark.assignClusters method runs PIC algorithm returns cluster assignment input vertex.","code":"df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), list(1L, 2L, 1.0), list(3L, 4L, 1.0), list(4L, 0L, 0.1)), schema = c(\"src\", \"dst\", \"weight\")) head(spark.assignClusters(df, initMode = \"degree\", weightCol = \"weight\")) ## id cluster ## 1 4 1 ## 2 0 0 ## 3 1 0 ## 4 3 1 ## 5 2 0"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"fp-growth","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"FP-growth","title":"SparkR - Practical Guide","text":"spark.fpGrowth executes FP-growth algorithm mine frequent itemsets SparkDataFrame. itemsCol array values. spark.freqItemsets method can used retrieve SparkDataFrame frequent itemsets. spark.associationRules returns SparkDataFrame association rules. can make predictions based antecedent.","code":"df <- selectExpr(createDataFrame(data.frame(rawItems = c( \"T,R,U\", \"T,S\", \"V,R\", \"R,U,T,V\", \"R,S\", \"V,S,U\", \"U,R\", \"S,T\", \"V,R\", \"V,U,S\", \"T,V,U\", \"R,V\", \"T,S\", \"T,S\", \"S,T\", \"S,U\", \"T,R\", \"V,R\", \"S,V\", \"T,S,U\" ))), \"split(rawItems, ',') AS items\") fpm <- spark.fpGrowth(df, minSupport = 0.2, minConfidence = 0.5) head(spark.freqItemsets(fpm)) ## items freq ## 1 R 9 ## 2 U 8 ## 3 U, T 4 ## 4 U, V 4 ## 5 U, S 4 ## 6 T 10 head(spark.associationRules(fpm)) ## antecedent consequent confidence lift support ## 1 V R 0.5555556 1.234568 0.25 ## 2 S T 0.5454545 1.090909 0.30 ## 3 T S 0.6000000 1.090909 0.30 ## 4 R V 0.5555556 1.234568 0.25 ## 5 U T 0.5000000 1.000000 0.20 ## 6 U V 0.5000000 1.111111 0.20 head(predict(fpm, df)) ## items prediction ## 1 T, R, U S, V ## 2 T, S NULL ## 3 V, R NULL ## 4 R, U, T, V S ## 5 R, S T, V ## 6 V, S, U R, T"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"prefixspan","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"PrefixSpan","title":"SparkR - Practical Guide","text":"spark.findFrequentSequentialPatterns method can used find complete set frequent sequential patterns input sequences itemsets.","code":"df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))), list(list(list(1L), list(3L, 2L), list(1L, 2L))), list(list(list(1L, 2L), list(5L))), list(list(list(6L)))), schema = c(\"sequence\")) head(spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L)) ## sequence freq ## 1 1 3 ## 2 3 2 ## 3 2 3 ## 4 1, 2 3 ## 5 1, 3 2"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"kolmogorov-smirnov-test","dir":"Articles","previous_headings":"Machine Learning > Models and Algorithms","what":"Kolmogorov-Smirnov Test","title":"SparkR - Practical Guide","text":"spark.kstest runs two-sided, one-sample Kolmogorov-Smirnov (KS) test. Given SparkDataFrame, test compares continuous data given column testCol theoretical distribution specified parameter nullHypothesis. Users can call summary get summary test results. following example, test whether Titanic dataset’s Freq column follows normal distribution. set parameters normal distribution using mean standard deviation sample.","code":"t <- as.data.frame(Titanic) df <- createDataFrame(t) freqStats <- head(select(df, mean(df$Freq), sd(df$Freq))) freqMean <- freqStats[1] freqStd <- freqStats[2] test <- spark.kstest(df, \"Freq\", \"norm\", c(freqMean, freqStd)) testSummary <- summary(test) testSummary ## Kolmogorov-Smirnov test summary: ## degrees of freedom = 0 ## statistic = 0.3065126710255011 ## pValue = 0.0036336792155329256 ## Very strong presumption against null hypothesis: Sample follows theoretical distribution."},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"model-persistence","dir":"Articles","previous_headings":"Machine Learning","what":"Model Persistence","title":"SparkR - Practical Guide","text":"following example shows save/load ML model SparkR.","code":"t <- as.data.frame(Titanic) training <- createDataFrame(t) gaussianGLM <- spark.glm(training, Freq ~ Sex + Age, family = \"gaussian\") # Save and then load a fitted MLlib model modelPath <- tempfile(pattern = \"ml\", fileext = \".tmp\") write.ml(gaussianGLM, modelPath) gaussianGLM2 <- read.ml(modelPath) # Check model summary summary(gaussianGLM2) ## ## Saved-loaded model does not support output 'Deviance Residuals'. ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 46.219 35.994 1.2841 0.2092846 ## Sex_Female -78.812 41.562 -1.8962 0.0679311 ## Age_Adult 123.938 41.562 2.9820 0.0057522 ## ## (Dispersion parameter for gaussian family taken to be 13819.52) ## ## Null deviance: 573341 on 31 degrees of freedom ## Residual deviance: 400766 on 29 degrees of freedom ## AIC: 400.7 ## ## Number of Fisher Scoring iterations: 1 # Check model prediction gaussianPredictions <- predict(gaussianGLM2, training) head(gaussianPredictions) ## Class Sex Age Survived Freq label prediction ## 1 1st Male Child No 0 0 46.21875 ## 2 2nd Male Child No 0 0 46.21875 ## 3 3rd Male Child No 35 35 46.21875 ## 4 Crew Male Child No 0 0 46.21875 ## 5 1st Female Child No 0 0 -32.59375 ## 6 2nd Female Child No 0 0 -32.59375 unlink(modelPath)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"structured-streaming","dir":"Articles","previous_headings":"","what":"Structured Streaming","title":"SparkR - Practical Guide","text":"SparkR supports Structured Streaming API. can check Structured Streaming Programming Guide introduction programming model basic concepts.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"simple-source-and-sink","dir":"Articles","previous_headings":"Structured Streaming","what":"Simple Source and Sink","title":"SparkR - Practical Guide","text":"Spark built-input sources. example, test socket source reading text words displaying computed word counts:","code":"# Create DataFrame representing the stream of input lines from connection lines <- read.stream(\"socket\", host = hostname, port = port) # Split the lines into words words <- selectExpr(lines, \"explode(split(value, ' ')) as word\") # Generate running word count wordCounts <- count(groupBy(words, \"word\")) # Start running the query that prints the running counts to the console query <- write.stream(wordCounts, \"console\", outputMode = \"complete\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"kafka-source","dir":"Articles","previous_headings":"Structured Streaming","what":"Kafka Source","title":"SparkR - Practical Guide","text":"simple read data Kafka. information, see Input Sources supported Structured Streaming.","code":"topic <- read.stream(\"kafka\", kafka.bootstrap.servers = \"host1:port1,host2:port2\", subscribe = \"topic1\") keyvalue <- selectExpr(topic, \"CAST(key AS STRING)\", \"CAST(value AS STRING)\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"operations-and-sinks","dir":"Articles","previous_headings":"Structured Streaming","what":"Operations and Sinks","title":"SparkR - Practical Guide","text":"common operations SparkDataFrame supported streaming, including selection, projection, aggregation. defined final result, start streaming computation, call write.stream method setting sink outputMode. streaming SparkDataFrame can written debugging console, temporary -memory table, processing fault-tolerant manner File Sink different formats.","code":"noAggDF <- select(where(deviceDataStreamingDf, \"signal > 10\"), \"device\") # Print new data to console write.stream(noAggDF, \"console\") # Write new data to Parquet files write.stream(noAggDF, \"parquet\", path = \"path/to/destination/dir\", checkpointLocation = \"path/to/checkpoint/dir\") # Aggregate aggDF <- count(groupBy(noAggDF, \"device\")) # Print updated aggregations to console write.stream(aggDF, \"console\", outputMode = \"complete\") # Have all the aggregates in an in memory table. The query name will be the table name write.stream(aggDF, \"memory\", queryName = \"aggregates\", outputMode = \"complete\") head(sql(\"select * from aggregates\"))"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"sparkr-object-classes","dir":"Articles","previous_headings":"Advanced Topics","what":"SparkR Object Classes","title":"SparkR - Practical Guide","text":"three main object classes SparkR may working . sdf stores reference corresponding Spark Dataset Spark JVM backend. env saves meta-information object isCached. can created data import methods transforming existing SparkDataFrame. can manipulate SparkDataFrame numerous data processing functions feed machine learning algorithms. Column: S4 class representing column SparkDataFrame. slot jc saves reference corresponding Column object Spark JVM backend. can obtained SparkDataFrame $ operator, e.g., df$col. often, used together functions, example, select select particular columns, filter constructed conditions select rows, aggregation functions compute aggregate statistics group. GroupedData: S4 class representing grouped data created groupBy transforming GroupedData. sgd slot saves reference RelationalGroupedDataset object backend. often intermediate object group information followed aggregation operations.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"architecture","dir":"Articles","previous_headings":"Advanced Topics","what":"Architecture","title":"SparkR - Practical Guide","text":"complete description architecture can seen references, particular paper SparkR: Scaling R Programs Spark. hood SparkR Spark SQL engine. avoids overheads running interpreted R code, optimized SQL execution engine Spark uses structural information data computation flow perform bunch optimizations speed computation. main method calls actual computation happen Spark JVM driver. socket-based SparkR API allows us invoke functions JVM R. use SparkR JVM backend listens Netty-based socket server. Two kinds RPCs supported SparkR JVM backend: method invocation creating new objects. Method invocation can done two ways. sparkR.callJMethod takes reference existing Java object list arguments passed method. sparkR.callJStatic takes class name static method list arguments passed method. arguments serialized using custom wire format deserialized JVM side. use Java reflection invoke appropriate method. create objects, sparkR.newJObject used similarly appropriate constructor invoked provided arguments. Finally, use new R class jobj refers Java object existing backend. references tracked Java side automatically garbage collected go scope R side.","code":""},{"path":[]},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/articles/sparkr-vignettes.html","id":"references","dir":"Articles","previous_headings":"","what":"References","title":"SparkR - Practical Guide","text":"Spark Cluster Mode Overview Submitting Spark Applications Machine Learning Library Guide (MLlib) SparkR: Scaling R Programs Spark, Shivaram Venkataraman, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin, Ion Stoica, Matei Zaharia. SIGMOD 2016. June 2016.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Apache Software Foundation. Author, maintainer, copyright holder.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Apache Software Foundation (2023). SparkR: R Front End 'Apache Spark'. R package version 3.3.4https://www.apache.org https://spark.apache.org, https://www.apache.org https://spark.apache.org.","code":"@Manual{, title = {SparkR: R Front End for 'Apache Spark'}, author = {{The Apache Software Foundation}}, year = {2023}, note = {R package version 3.3.4https://www.apache.org https://spark.apache.org}, url = {https://www.apache.org https://spark.apache.org}, }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"r-on-spark","dir":"","previous_headings":"","what":"R Front End for Apache Spark","title":"R Front End for Apache Spark","text":"SparkR R package provides light-weight frontend use Spark R.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"installing-sparkr","dir":"","previous_headings":"","what":"Installing sparkR","title":"R Front End for Apache Spark","text":"Libraries sparkR need created $SPARK_HOME/R/lib. can done running script $SPARK_HOME/R/install-dev.sh. default script uses system wide installation R. However, can changed user installed location R setting environment variable R_HOME full path base directory R installed, running install-dev.sh script. Example:","code":"# where /home/username/R is where R is installed and /home/username/R/bin contains the files R and RScript export R_HOME=/home/username/R ./install-dev.sh"},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"build-spark","dir":"","previous_headings":"SparkR development","what":"Build Spark","title":"R Front End for Apache Spark","text":"Build Spark Maven SBT, include -Psparkr profile build R package. example use default Hadoop versions can run","code":"# Maven ./build/mvn -DskipTests -Psparkr package # SBT ./build/sbt -Psparkr package"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"running-sparkr","dir":"","previous_headings":"SparkR development","what":"Running sparkR","title":"R Front End for Apache Spark","text":"can start using SparkR launching SparkR shell sparkR script automatically creates SparkContext Spark default local mode. specify Spark master cluster automatically created SparkContext, can run set options like driver memory, executor memory etc. can pass spark-submit arguments ./bin/sparkR","code":"./bin/sparkR ./bin/sparkR --master \"local[2]\""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"using-sparkr-from-rstudio","dir":"","previous_headings":"SparkR development","what":"Using SparkR from RStudio","title":"R Front End for Apache Spark","text":"wish use SparkR RStudio, please refer SparkR documentation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"making-changes-to-sparkr","dir":"","previous_headings":"SparkR development","what":"Making changes to SparkR","title":"R Front End for Apache Spark","text":"instructions making contributions Spark also apply SparkR. make R file changes (.e. Scala changes) can just re-install R package using R/install-dev.sh test changes. made changes, please include unit tests run existing unit tests using R/run-tests.sh script described .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"generating-documentation","dir":"","previous_headings":"SparkR development","what":"Generating documentation","title":"R Front End for Apache Spark","text":"SparkR documentation (Rd files HTML files) part source repository. generate can run script R/create-docs.sh. script uses devtools knitr generate docs packages need installed machine using script. Also, may need install prerequisites. See also, R/DOCUMENTATION.md","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"examples-unit-tests","dir":"","previous_headings":"","what":"Examples, Unit tests","title":"R Front End for Apache Spark","text":"SparkR comes several sample programs examples/src/main/r directory. run one , use ./bin/spark-submit . example: can run R unit tests following instructions Running R Tests.","code":"./bin/spark-submit examples/src/main/r/dataframe.R"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/index.html","id":"running-on-yarn","dir":"","previous_headings":"","what":"Running on YARN","title":"R Front End for Apache Spark","text":"./bin/spark-submit can also used submit jobs YARN clusters. need set YARN conf dir . example CDH can run","code":"export YARN_CONF_DIR=/etc/hadoop/conf ./bin/spark-submit --master yarn examples/src/main/r/dataframe.R"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/AFTSurvivalRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a AFTSurvivalRegressionModel — AFTSurvivalRegressionModel-class","title":"S4 class that represents a AFTSurvivalRegressionModel — AFTSurvivalRegressionModel-class","text":"S4 class represents AFTSurvivalRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/AFTSurvivalRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a AFTSurvivalRegressionModel — AFTSurvivalRegressionModel-class","text":"jobj Java object reference backing Scala AFTSurvivalRegressionWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/AFTSurvivalRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a AFTSurvivalRegressionModel — AFTSurvivalRegressionModel-class","text":"AFTSurvivalRegressionModel since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ALSModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an ALSModel — ALSModel-class","title":"S4 class that represents an ALSModel — ALSModel-class","text":"S4 class represents ALSModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ALSModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an ALSModel — ALSModel-class","text":"jobj Java object reference backing Scala ALSWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ALSModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an ALSModel — ALSModel-class","text":"ALSModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/BisectingKMeansModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a BisectingKMeansModel — BisectingKMeansModel-class","title":"S4 class that represents a BisectingKMeansModel — BisectingKMeansModel-class","text":"S4 class represents BisectingKMeansModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/BisectingKMeansModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a BisectingKMeansModel — BisectingKMeansModel-class","text":"jobj Java object reference backing Scala BisectingKMeansModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/BisectingKMeansModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a BisectingKMeansModel — BisectingKMeansModel-class","text":"BisectingKMeansModel since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeClassificationModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a DecisionTreeClassificationModel — DecisionTreeClassificationModel-class","title":"S4 class that represents a DecisionTreeClassificationModel — DecisionTreeClassificationModel-class","text":"S4 class represents DecisionTreeClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeClassificationModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a DecisionTreeClassificationModel — DecisionTreeClassificationModel-class","text":"jobj Java object reference backing Scala DecisionTreeClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeClassificationModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a DecisionTreeClassificationModel — DecisionTreeClassificationModel-class","text":"DecisionTreeClassificationModel since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a DecisionTreeRegressionModel — DecisionTreeRegressionModel-class","title":"S4 class that represents a DecisionTreeRegressionModel — DecisionTreeRegressionModel-class","text":"S4 class represents DecisionTreeRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a DecisionTreeRegressionModel — DecisionTreeRegressionModel-class","text":"jobj Java object reference backing Scala DecisionTreeRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/DecisionTreeRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a DecisionTreeRegressionModel — DecisionTreeRegressionModel-class","text":"DecisionTreeRegressionModel since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMClassificationModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a FMClassificationModel — FMClassificationModel-class","title":"S4 class that represents a FMClassificationModel — FMClassificationModel-class","text":"S4 class represents FMClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMClassificationModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a FMClassificationModel — FMClassificationModel-class","text":"jobj Java object reference backing Scala FMClassifierWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMClassificationModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a FMClassificationModel — FMClassificationModel-class","text":"FMClassificationModel since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a FMRegressionModel — FMRegressionModel-class","title":"S4 class that represents a FMRegressionModel — FMRegressionModel-class","text":"S4 class represents FMRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a FMRegressionModel — FMRegressionModel-class","text":"jobj Java object reference backing Scala FMRegressorWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FMRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a FMRegressionModel — FMRegressionModel-class","text":"FMRegressionModel since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FPGrowthModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a FPGrowthModel — FPGrowthModel-class","title":"S4 class that represents a FPGrowthModel — FPGrowthModel-class","text":"S4 class represents FPGrowthModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FPGrowthModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a FPGrowthModel — FPGrowthModel-class","text":"jobj Java object reference backing Scala FPGrowthModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/FPGrowthModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a FPGrowthModel — FPGrowthModel-class","text":"FPGrowthModel since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTClassificationModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a GBTClassificationModel — GBTClassificationModel-class","title":"S4 class that represents a GBTClassificationModel — GBTClassificationModel-class","text":"S4 class represents GBTClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTClassificationModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a GBTClassificationModel — GBTClassificationModel-class","text":"jobj Java object reference backing Scala GBTClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTClassificationModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a GBTClassificationModel — GBTClassificationModel-class","text":"GBTClassificationModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a GBTRegressionModel — GBTRegressionModel-class","title":"S4 class that represents a GBTRegressionModel — GBTRegressionModel-class","text":"S4 class represents GBTRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a GBTRegressionModel — GBTRegressionModel-class","text":"jobj Java object reference backing Scala GBTRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GBTRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a GBTRegressionModel — GBTRegressionModel-class","text":"GBTRegressionModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GaussianMixtureModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a GaussianMixtureModel — GaussianMixtureModel-class","title":"S4 class that represents a GaussianMixtureModel — GaussianMixtureModel-class","text":"S4 class represents GaussianMixtureModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GaussianMixtureModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a GaussianMixtureModel — GaussianMixtureModel-class","text":"jobj Java object reference backing Scala GaussianMixtureModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GaussianMixtureModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a GaussianMixtureModel — GaussianMixtureModel-class","text":"GaussianMixtureModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GeneralizedLinearRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a generalized linear model — GeneralizedLinearRegressionModel-class","title":"S4 class that represents a generalized linear model — GeneralizedLinearRegressionModel-class","text":"S4 class represents generalized linear model","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GeneralizedLinearRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a generalized linear model — GeneralizedLinearRegressionModel-class","text":"jobj Java object reference backing Scala GeneralizedLinearRegressionWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GeneralizedLinearRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a generalized linear model — GeneralizedLinearRegressionModel-class","text":"GeneralizedLinearRegressionModel since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GroupedData.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a GroupedData — GroupedData-class","title":"S4 class that represents a GroupedData — GroupedData-class","text":"GroupedDatas can created using groupBy() SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GroupedData.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"S4 class that represents a GroupedData — GroupedData-class","text":"","code":"groupedData(sgd)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GroupedData.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a GroupedData — GroupedData-class","text":"sgd Java object reference backing Scala GroupedData","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/GroupedData.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a GroupedData — GroupedData-class","text":"GroupedData since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/IsotonicRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an IsotonicRegressionModel — IsotonicRegressionModel-class","title":"S4 class that represents an IsotonicRegressionModel — IsotonicRegressionModel-class","text":"S4 class represents IsotonicRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/IsotonicRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an IsotonicRegressionModel — IsotonicRegressionModel-class","text":"jobj Java object reference backing Scala IsotonicRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/IsotonicRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an IsotonicRegressionModel — IsotonicRegressionModel-class","text":"IsotonicRegressionModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KMeansModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a KMeansModel — KMeansModel-class","title":"S4 class that represents a KMeansModel — KMeansModel-class","text":"S4 class represents KMeansModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KMeansModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a KMeansModel — KMeansModel-class","text":"jobj Java object reference backing Scala KMeansModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KMeansModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a KMeansModel — KMeansModel-class","text":"KMeansModel since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KSTest-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an KSTest — KSTest-class","title":"S4 class that represents an KSTest — KSTest-class","text":"S4 class represents KSTest","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KSTest-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an KSTest — KSTest-class","text":"jobj Java object reference backing Scala KSTestWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/KSTest-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an KSTest — KSTest-class","text":"KSTest since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LDAModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an LDAModel — LDAModel-class","title":"S4 class that represents an LDAModel — LDAModel-class","text":"S4 class represents LDAModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LDAModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an LDAModel — LDAModel-class","text":"jobj Java object reference backing Scala LDAWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LDAModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an LDAModel — LDAModel-class","text":"LDAModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a LinearRegressionModel — LinearRegressionModel-class","title":"S4 class that represents a LinearRegressionModel — LinearRegressionModel-class","text":"S4 class represents LinearRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a LinearRegressionModel — LinearRegressionModel-class","text":"jobj Java object reference backing Scala LinearRegressionWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a LinearRegressionModel — LinearRegressionModel-class","text":"LinearRegressionModel since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearSVCModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an LinearSVCModel — LinearSVCModel-class","title":"S4 class that represents an LinearSVCModel — LinearSVCModel-class","text":"S4 class represents LinearSVCModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearSVCModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an LinearSVCModel — LinearSVCModel-class","text":"jobj Java object reference backing Scala LinearSVCModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LinearSVCModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an LinearSVCModel — LinearSVCModel-class","text":"LinearSVCModel since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LogisticRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents an LogisticRegressionModel — LogisticRegressionModel-class","title":"S4 class that represents an LogisticRegressionModel — LogisticRegressionModel-class","text":"S4 class represents LogisticRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LogisticRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents an LogisticRegressionModel — LogisticRegressionModel-class","text":"jobj Java object reference backing Scala LogisticRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/LogisticRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents an LogisticRegressionModel — LogisticRegressionModel-class","text":"LogisticRegressionModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/MultilayerPerceptronClassificationModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a MultilayerPerceptronClassificationModel — MultilayerPerceptronClassificationModel-class","title":"S4 class that represents a MultilayerPerceptronClassificationModel — MultilayerPerceptronClassificationModel-class","text":"S4 class represents MultilayerPerceptronClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/MultilayerPerceptronClassificationModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a MultilayerPerceptronClassificationModel — MultilayerPerceptronClassificationModel-class","text":"jobj Java object reference backing Scala MultilayerPerceptronClassifierWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/MultilayerPerceptronClassificationModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a MultilayerPerceptronClassificationModel — MultilayerPerceptronClassificationModel-class","text":"MultilayerPerceptronClassificationModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/NaiveBayesModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a NaiveBayesModel — NaiveBayesModel-class","title":"S4 class that represents a NaiveBayesModel — NaiveBayesModel-class","text":"S4 class represents NaiveBayesModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/NaiveBayesModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a NaiveBayesModel — NaiveBayesModel-class","text":"jobj Java object reference backing Scala NaiveBayesWrapper","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/NaiveBayesModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a NaiveBayesModel — NaiveBayesModel-class","text":"NaiveBayesModel since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PowerIterationClustering-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a PowerIterationClustering — PowerIterationClustering-class","title":"S4 class that represents a PowerIterationClustering — PowerIterationClustering-class","text":"S4 class represents PowerIterationClustering","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PowerIterationClustering-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a PowerIterationClustering — PowerIterationClustering-class","text":"jobj Java object reference backing Scala PowerIterationClustering","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PowerIterationClustering-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a PowerIterationClustering — PowerIterationClustering-class","text":"PowerIterationClustering since 3.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PrefixSpan-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a PrefixSpan — PrefixSpan-class","title":"S4 class that represents a PrefixSpan — PrefixSpan-class","text":"S4 class represents PrefixSpan","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PrefixSpan-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a PrefixSpan — PrefixSpan-class","text":"jobj Java object reference backing Scala PrefixSpan","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/PrefixSpan-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a PrefixSpan — PrefixSpan-class","text":"PrefixSpan since 3.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestClassificationModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a RandomForestClassificationModel — RandomForestClassificationModel-class","title":"S4 class that represents a RandomForestClassificationModel — RandomForestClassificationModel-class","text":"S4 class represents RandomForestClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestClassificationModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a RandomForestClassificationModel — RandomForestClassificationModel-class","text":"jobj Java object reference backing Scala RandomForestClassificationModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestClassificationModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a RandomForestClassificationModel — RandomForestClassificationModel-class","text":"RandomForestClassificationModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestRegressionModel-class.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a RandomForestRegressionModel — RandomForestRegressionModel-class","title":"S4 class that represents a RandomForestRegressionModel — RandomForestRegressionModel-class","text":"S4 class represents RandomForestRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestRegressionModel-class.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a RandomForestRegressionModel — RandomForestRegressionModel-class","text":"jobj Java object reference backing Scala RandomForestRegressionModel","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/RandomForestRegressionModel-class.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a RandomForestRegressionModel — RandomForestRegressionModel-class","text":"RandomForestRegressionModel since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/SparkDataFrame.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a SparkDataFrame — SparkDataFrame-class","title":"S4 class that represents a SparkDataFrame — SparkDataFrame-class","text":"SparkDataFrames can created using functions like createDataFrame, read.json, table etc.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/SparkDataFrame.html","id":"slots","dir":"Reference","previous_headings":"","what":"Slots","title":"S4 class that represents a SparkDataFrame — SparkDataFrame-class","text":"env R environment stores bookkeeping states SparkDataFrame sdf Java object reference backing Scala DataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/SparkDataFrame.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a SparkDataFrame — SparkDataFrame-class","text":"SparkDataFrame since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/SparkDataFrame.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"S4 class that represents a SparkDataFrame — SparkDataFrame-class","text":"","code":"if (FALSE) { sparkR.session() df <- createDataFrame(faithful) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/StreamingQuery.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a StreamingQuery — StreamingQuery-class","title":"S4 class that represents a StreamingQuery — StreamingQuery-class","text":"StreamingQuery can created using read.stream() write.stream()","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/StreamingQuery.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a StreamingQuery — StreamingQuery-class","text":"ssq Java object reference backing Scala StreamingQuery","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/StreamingQuery.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a StreamingQuery — StreamingQuery-class","text":"StreamingQuery since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/WindowSpec.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a WindowSpec — WindowSpec-class","title":"S4 class that represents a WindowSpec — WindowSpec-class","text":"WindowSpec can created using windowPartitionBy() windowOrderBy()","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/WindowSpec.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a WindowSpec — WindowSpec-class","text":"sws Java object reference backing Scala WindowSpec","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/WindowSpec.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a WindowSpec — WindowSpec-class","text":"WindowSpec since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":null,"dir":"Reference","previous_headings":"","what":"alias — alias","title":"alias — alias","text":"Returns new SparkDataFrame Column alias set. Equivalent SQL \"\" keyword.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"alias — alias","text":"","code":"# S4 method for Column alias(object, data) # S4 method for SparkDataFrame alias(object, data)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"alias — alias","text":"object x SparkDataFrame Column data new name use","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"alias — alias","text":"SparkDataFrame Column","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"alias — alias","text":"alias(Column) since 1.4.0 alias(SparkDataFrame) since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/alias.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"alias — alias","text":"","code":"if (FALSE) { df <- createDataFrame(iris) head(select( df, alias(df$Sepal_Length, \"slength\"), alias(df$Petal_Length, \"plength\") )) } if (FALSE) { df <- alias(createDataFrame(mtcars), \"mtcars\") avg_mpg <- alias(agg(groupBy(df, df$cyl), avg(df$mpg)), \"avg_mpg\") head(select(df, column(\"mtcars.mpg\"))) head(join(df, avg_mpg, column(\"mtcars.cyl\") == column(\"avg_mpg.cyl\"))) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":null,"dir":"Reference","previous_headings":"","what":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"Calculates approximate quantiles numerical columns SparkDataFrame. result algorithm following deterministic bound: SparkDataFrame N elements request quantile probability p error err, algorithm return sample x SparkDataFrame *exact* rank x close (p * N). precisely, floor((p - err) * N) <= rank(x) <= ceil((p + err) * N). method implements variation Greenwald-Khanna algorithm (speed optimizations). algorithm first present [[https://doi.org/10.1145/375663.375670 Space-efficient Online Computation Quantile Summaries]] Greenwald Khanna. Note NA values ignored numerical columns calculation. columns containing NA values, empty list returned.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"","code":"# S4 method for SparkDataFrame,character,numeric,numeric approxQuantile(x, cols, probabilities, relativeError)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"x SparkDataFrame. cols single column name, list names multiple columns. probabilities list quantile probabilities. number must belong [0, 1]. example 0 minimum, 0.5 median, 1 maximum. relativeError relative target precision achieve (>= 0). set zero, exact quantiles computed, expensive. Note values greater 1 accepted give result 1.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"approximate quantiles given probabilities. input single column name, output list approximate quantiles column; input multiple column names, output list, element list numeric values represents approximate quantiles corresponding column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"approxQuantile since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/approxQuantile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Calculates the approximate quantiles of numerical columns of a SparkDataFrame — approxQuantile","text":"","code":"if (FALSE) { df <- read.json(\"/path/to/file.json\") quantiles <- approxQuantile(df, \"key\", c(0.5, 0.8), 0.0) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":null,"dir":"Reference","previous_headings":"","what":"Arrange Rows by Variables — arrange","title":"Arrange Rows by Variables — arrange","text":"Sort SparkDataFrame specified column(s).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Arrange Rows by Variables — arrange","text":"","code":"arrange(x, col, ...) # S4 method for SparkDataFrame,Column arrange(x, col, ..., withinPartitions = FALSE) # S4 method for SparkDataFrame,character arrange(x, col, ..., decreasing = FALSE, withinPartitions = FALSE) # S4 method for SparkDataFrame,characterOrColumn orderBy(x, col, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Arrange Rows by Variables — arrange","text":"x SparkDataFrame sorted. col character Column object indicating fields sort ... additional sorting fields withinPartitions logical argument indicating whether sort within partition decreasing logical argument indicating sorting order columns character vector specified col","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Arrange Rows by Variables — arrange","text":"SparkDataFrame elements sorted.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Arrange Rows by Variables — arrange","text":"arrange(SparkDataFrame, Column) since 1.4.0 arrange(SparkDataFrame, character) since 1.4.0 orderBy(SparkDataFrame, characterOrColumn) since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/arrange.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Arrange Rows by Variables — arrange","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) arrange(df, df$col1) arrange(df, asc(df$col1), desc(abs(df$col2))) arrange(df, \"col1\", decreasing = TRUE) arrange(df, \"col1\", \"col2\", decreasing = c(TRUE, FALSE)) arrange(df, \"col1\", \"col2\", withinPartitions = TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":null,"dir":"Reference","previous_headings":"","what":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":"function downloads contents SparkDataFrame R's data.frame. Since data.frames held memory, ensure enough memory system accommodate contents.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":"","code":"as.data.frame(x, row.names = NULL, optional = FALSE, ...) # S4 method for SparkDataFrame as.data.frame(x, row.names = NULL, optional = FALSE, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":"x SparkDataFrame. row.names NULL character vector giving row names data frame. optional TRUE, converting column names optional. ... additional arguments pass base::.data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":"data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":".data.frame since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/as.data.frame.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download data from a SparkDataFrame into a R data.frame — as.data.frame","text":"","code":"if (FALSE) { irisDF <- createDataFrame(iris) df <- as.data.frame(irisDF[irisDF$Species == \"setosa\", ]) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/attach.html","id":null,"dir":"Reference","previous_headings":"","what":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","title":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","text":"specified SparkDataFrame attached R search path. means SparkDataFrame searched R evaluating variable, columns SparkDataFrame can accessed simply giving names.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/attach.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","text":"","code":"# S4 method for SparkDataFrame attach( what, pos = 2L, name = paste(deparse(substitute(what), backtick = FALSE), collapse = \" \"), warn.conflicts = TRUE )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/attach.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","text":"(SparkDataFrame) SparkDataFrame attach pos (integer) Specify position search() attach. name (character) Name use attached SparkDataFrame. Names starting package: reserved library. warn.conflicts (logical) TRUE, warnings printed conflicts attaching database, unless SparkDataFrame contains object","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/attach.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","text":"attach since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/attach.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Attach SparkDataFrame to R search path — attach,SparkDataFrame-method","text":"","code":"if (FALSE) { attach(irisDf) summary(Sepal_Width) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/avg.html","id":null,"dir":"Reference","previous_headings":"","what":"avg — avg","title":"avg — avg","text":"Aggregate function: returns average values group.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/avg.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"avg — avg","text":"","code":"avg(x, ...) # S4 method for Column avg(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/avg.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"avg — avg","text":"x Column compute GroupedData object. ... additional argument(s) x GroupedData object.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/avg.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"avg — avg","text":"avg since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/avg.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"avg — avg","text":"","code":"if (FALSE) avg(df$c)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":null,"dir":"Reference","previous_headings":"","what":"awaitTermination — awaitTermination","title":"awaitTermination — awaitTermination","text":"Waits termination query, either stopQuery error.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"awaitTermination — awaitTermination","text":"","code":"awaitTermination(x, timeout = NULL) # S4 method for StreamingQuery awaitTermination(x, timeout = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"awaitTermination — awaitTermination","text":"x StreamingQuery. timeout time wait milliseconds, omitted, wait indefinitely stopQuery called error occurred.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"awaitTermination — awaitTermination","text":"TRUE query terminated within timeout period; nothing timeout specified.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"awaitTermination — awaitTermination","text":"query terminated, subsequent calls method return TRUE immediately.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"awaitTermination — awaitTermination","text":"awaitTermination(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/awaitTermination.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"awaitTermination — awaitTermination","text":"","code":"if (FALSE) awaitTermination(sq, 10000)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/between.html","id":null,"dir":"Reference","previous_headings":"","what":"between — between","title":"between — between","text":"Test column lower bound upper bound, inclusive.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/between.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"between — between","text":"","code":"between(x, bounds) # S4 method for Column between(x, bounds)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/between.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"between — between","text":"x Column bounds lower upper bounds","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/between.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"between — between","text":"since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":null,"dir":"Reference","previous_headings":"","what":"broadcast — broadcast","title":"broadcast — broadcast","text":"Return new SparkDataFrame marked small enough use broadcast joins.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"broadcast — broadcast","text":"","code":"broadcast(x) # S4 method for SparkDataFrame broadcast(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"broadcast — broadcast","text":"x SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"broadcast — broadcast","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"broadcast — broadcast","text":"Equivalent hint(x, \"broadcast\").","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"broadcast — broadcast","text":"broadcast since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/broadcast.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"broadcast — broadcast","text":"","code":"if (FALSE) { df <- createDataFrame(mtcars) avg_mpg <- mean(groupBy(createDataFrame(mtcars), \"cyl\"), \"mpg\") head(join(df, broadcast(avg_mpg), df$cyl == avg_mpg$cyl)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cache.html","id":null,"dir":"Reference","previous_headings":"","what":"Cache — cache","title":"Cache — cache","text":"Persist default storage level (MEMORY_ONLY).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cache — cache","text":"","code":"cache(x) # S4 method for SparkDataFrame cache(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cache.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cache — cache","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cache.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Cache — cache","text":"cache since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cache.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Cache — cache","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) cache(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":null,"dir":"Reference","previous_headings":"","what":"Cache Table — cacheTable","title":"Cache Table — cacheTable","text":"Caches specified table -memory.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cache Table — cacheTable","text":"","code":"cacheTable(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cache Table — cacheTable","text":"tableName qualified unqualified name designates table. database identifier provided, refers table current database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Cache Table — cacheTable","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Cache Table — cacheTable","text":"cacheTable since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cacheTable.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Cache Table — cacheTable","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) createOrReplaceTempView(df, \"table\") cacheTable(\"table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cancelJobGroup.html","id":null,"dir":"Reference","previous_headings":"","what":"Cancel active jobs for the specified group — cancelJobGroup","title":"Cancel active jobs for the specified group — cancelJobGroup","text":"Cancel active jobs specified group","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cancelJobGroup.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Cancel active jobs for the specified group — cancelJobGroup","text":"","code":"cancelJobGroup(groupId)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cancelJobGroup.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Cancel active jobs for the specified group — cancelJobGroup","text":"groupId ID job group cancelled","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cancelJobGroup.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Cancel active jobs for the specified group — cancelJobGroup","text":"cancelJobGroup since 1.5.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cancelJobGroup.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Cancel active jobs for the specified group — cancelJobGroup","text":"","code":"if (FALSE) { sparkR.session() cancelJobGroup(\"myJobGroup\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cast.html","id":null,"dir":"Reference","previous_headings":"","what":"Casts the column to a different data type. — cast","title":"Casts the column to a different data type. — cast","text":"Casts column different data type.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cast.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Casts the column to a different data type. — cast","text":"","code":"cast(x, dataType) # S4 method for Column cast(x, dataType)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cast.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Casts the column to a different data type. — cast","text":"x Column. dataType character object describing target data type. See Spark Data Types available data types.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cast.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Casts the column to a different data type. — cast","text":"cast since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cast.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Casts the column to a different data type. — cast","text":"","code":"if (FALSE) { cast(df$age, \"string\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":null,"dir":"Reference","previous_headings":"","what":"checkpoint — checkpoint","title":"checkpoint — checkpoint","text":"Returns checkpointed version SparkDataFrame. Checkpointing can used truncate logical plan, especially useful iterative algorithms plan may grow exponentially. saved files inside checkpoint directory set setCheckpointDir","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"checkpoint — checkpoint","text":"","code":"checkpoint(x, eager = TRUE) # S4 method for SparkDataFrame checkpoint(x, eager = TRUE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"checkpoint — checkpoint","text":"x SparkDataFrame eager whether checkpoint SparkDataFrame immediately","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"checkpoint — checkpoint","text":"new checkpointed SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"checkpoint — checkpoint","text":"checkpoint since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/checkpoint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"checkpoint — checkpoint","text":"","code":"if (FALSE) { setCheckpointDir(\"/checkpoint\") df <- checkpoint(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearCache.html","id":null,"dir":"Reference","previous_headings":"","what":"Clear Cache — clearCache","title":"Clear Cache — clearCache","text":"Removes cached tables -memory cache.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearCache.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clear Cache — clearCache","text":"","code":"clearCache()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearCache.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Clear Cache — clearCache","text":"clearCache since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearCache.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Clear Cache — clearCache","text":"","code":"if (FALSE) { clearCache() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearJobGroup.html","id":null,"dir":"Reference","previous_headings":"","what":"Clear current job group ID and its description — clearJobGroup","title":"Clear current job group ID and its description — clearJobGroup","text":"Clear current job group ID description","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearJobGroup.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Clear current job group ID and its description — clearJobGroup","text":"","code":"clearJobGroup()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearJobGroup.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Clear current job group ID and its description — clearJobGroup","text":"clearJobGroup since 1.5.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/clearJobGroup.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Clear current job group ID and its description — clearJobGroup","text":"","code":"if (FALSE) { sparkR.session() clearJobGroup() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":null,"dir":"Reference","previous_headings":"","what":"Coalesce — coalesce","title":"Coalesce — coalesce","text":"Returns new SparkDataFrame exactly numPartitions partitions. operation results narrow dependency, e.g. go 1000 partitions 100 partitions, shuffle, instead 100 new partitions claim 10 current partitions. larger number partitions requested, stay current number partitions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Coalesce — coalesce","text":"","code":"coalesce(x, ...) # S4 method for SparkDataFrame coalesce(x, numPartitions)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Coalesce — coalesce","text":"x SparkDataFrame. ... additional argument(s). numPartitions number partitions use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Coalesce — coalesce","text":"However, drastic coalesce SparkDataFrame, e.g. numPartitions = 1, may result computation taking place fewer nodes like (e.g. one node case numPartitions = 1). avoid , call repartition. add shuffle step, means current upstream partitions executed parallel (per whatever current partitioning ).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Coalesce — coalesce","text":"coalesce(SparkDataFrame) since 2.1.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coalesce.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Coalesce — coalesce","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- coalesce(df, 1L) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/collect.html","id":null,"dir":"Reference","previous_headings":"","what":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","title":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","text":"Collects elements SparkDataFrame coerces R data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/collect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","text":"","code":"collect(x, ...) # S4 method for SparkDataFrame collect(x, stringsAsFactors = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/collect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","text":"x SparkDataFrame. ... arguments passed methods. stringsAsFactors (Optional) logical indicating whether string columns converted factors. FALSE default.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/collect.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","text":"collect since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/collect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Collects all the elements of a SparkDataFrame and coerces them into an R data.frame. — collect","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) collected <- collect(df) class(collected) firstName <- names(collected)[1] }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":null,"dir":"Reference","previous_headings":"","what":"coltypes — coltypes","title":"coltypes — coltypes","text":"Get column types SparkDataFrame Set column types SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"coltypes — coltypes","text":"","code":"coltypes(x) coltypes(x) <- value # S4 method for SparkDataFrame coltypes(x) # S4 method for SparkDataFrame,character coltypes(x) <- value"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"coltypes — coltypes","text":"x SparkDataFrame value character vector target column types given SparkDataFrame. Column types can one integer, numeric/double, character, logical, NA keep column -.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"coltypes — coltypes","text":"value character vector column types given SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"coltypes — coltypes","text":"coltypes since 1.6.0 coltypes<- since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/coltypes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"coltypes — coltypes","text":"","code":"if (FALSE) { irisDF <- createDataFrame(iris) coltypes(irisDF) # get column types } if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) coltypes(df) <- c(\"character\", \"integer\") # set column types coltypes(df) <- c(NA, \"numeric\") # set column types }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":null,"dir":"Reference","previous_headings":"","what":"S4 class that represents a SparkDataFrame column — column","title":"S4 class that represents a SparkDataFrame column — column","text":"column class supports unary, binary operations SparkDataFrame columns Returns Column based given column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"S4 class that represents a SparkDataFrame column — column","text":"","code":"column(x) # S4 method for jobj column(x) # S4 method for character column(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"S4 class that represents a SparkDataFrame column — column","text":"x Character column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":"slots","dir":"Reference","previous_headings":"","what":"Slots","title":"S4 class that represents a SparkDataFrame column — column","text":"jc reference JVM SparkDataFrame column","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"S4 class that represents a SparkDataFrame column — column","text":"Column since 1.4.0 column since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"S4 class that represents a SparkDataFrame column — column","text":"","code":"if (FALSE) column(\"name\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Aggregate functions for Column operations — column_aggregate_functions","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"Aggregate functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"","code":"approx_count_distinct(x, ...) approxCountDistinct(x, ...) collect_list(x) collect_set(x) count_distinct(x, ...) countDistinct(x, ...) grouping_bit(x) grouping_id(x, ...) kurtosis(x) max_by(x, y) min_by(x, y) n_distinct(x, ...) percentile_approx(x, percentage, ...) product(x) sd(x, na.rm = FALSE) skewness(x) stddev(x) stddev_pop(x) stddev_samp(x) sum_distinct(x) sumDistinct(x) var(x, y = NULL, na.rm = FALSE, use) variance(x) var_pop(x) var_samp(x) # S4 method for Column approx_count_distinct(x, rsd = 0.05) # S4 method for Column approxCountDistinct(x, rsd = 0.05) # S4 method for Column kurtosis(x) # S4 method for Column max(x) # S4 method for Column,Column max_by(x, y) # S4 method for Column mean(x) # S4 method for Column min(x) # S4 method for Column,Column min_by(x, y) # S4 method for Column product(x) # S4 method for characterOrColumn,numericOrColumn percentile_approx(x, percentage, accuracy = 10000) # S4 method for Column sd(x) # S4 method for Column skewness(x) # S4 method for Column stddev(x) # S4 method for Column stddev_pop(x) # S4 method for Column stddev_samp(x) # S4 method for Column sum(x) # S4 method for Column sum_distinct(x) # S4 method for Column sumDistinct(x) # S4 method for Column var(x) # S4 method for Column variance(x) # S4 method for Column var_pop(x) # S4 method for Column var_samp(x) # S4 method for Column approx_count_distinct(x, rsd = 0.05) # S4 method for Column approxCountDistinct(x, rsd = 0.05) # S4 method for Column count_distinct(x, ...) # S4 method for Column countDistinct(x, ...) # S4 method for Column n_distinct(x, ...) # S4 method for Column collect_list(x) # S4 method for Column collect_set(x) # S4 method for Column grouping_bit(x) # S4 method for Column grouping_id(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"x Column compute . ... additional argument(s). example, used pass additional Columns. y, na.rm, use currently used. percentage Numeric percentage percentile computed values 0 1. length equals 1 resulting column type double, otherwise, array type double. rsd maximum relative standard deviation allowed (default = 0.05). accuracy positive numeric literal (default: 10000) controls approximation accuracy cost memory. Higher value accuracy yields better accuracy, 1.0/accuracy relative error approximation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"approx_count_distinct: Returns approximate number distinct items group. approxCountDistinct: Returns approximate number distinct items group. kurtosis: Returns kurtosis values group. max: Returns maximum value expression group. max_by: Returns value associated maximum value ord. mean: Returns average values group. Alias avg. min: Returns minimum value expression group. min_by: Returns value associated minimum value ord. product: Returns product values group. percentile_approx Returns approximate percentile numeric column col smallest value ordered col values (sorted least greatest) percentage col values less value equal value. sd: Alias stddev_samp. skewness: Returns skewness values group. stddev: Alias std_dev. stddev_pop: Returns population standard deviation expression group. stddev_samp: Returns unbiased sample standard deviation expression group. sum: Returns sum values expression. sum_distinct: Returns sum distinct values expression. sumDistinct: Returns sum distinct values expression. var: Alias var_samp. var_pop: Returns population variance values group. var_samp: Returns unbiased variance values group. count_distinct: Returns number distinct items group. countDistinct: Returns number distinct items group. alias count_distinct, encouraged use count_distinct directly. n_distinct: Returns number distinct items group. collect_list: Creates list objects duplicates. Note: function non-deterministic order collected results depends order rows may non-deterministic shuffle. collect_set: Creates list objects duplicate elements eliminated. Note: function non-deterministic order collected results depends order rows may non-deterministic shuffle. grouping_bit: Indicates whether specified column GROUP list aggregated , returns 1 aggregated 0 aggregated result set. GROUPING SQL grouping function Scala. grouping_id: Returns level grouping. Equals grouping_bit(c1) * 2^(n - 1) + grouping_bit(c2) * 2^(n - 2) + ... + grouping_bit(cn) .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"approx_count_distinct(Column) since 3.0.0 approxCountDistinct(Column) since 1.4.0 kurtosis since 1.6.0 max since 1.5.0 max_by since 3.3.0 mean since 1.5.0 min since 1.5.0 min_by since 3.3.0 product since 3.2.0 percentile_approx since 3.1.0 sd since 1.6.0 skewness since 1.6.0 stddev since 1.6.0 stddev_pop since 1.6.0 stddev_samp since 1.6.0 sum since 1.5.0 sum_distinct since 3.2.0 sumDistinct since 1.4.0 var since 1.6.0 variance since 1.6.0 var_pop since 1.5.0 var_samp since 1.6.0 approx_count_distinct(Column, numeric) since 3.0.0 approxCountDistinct(Column, numeric) since 1.4.0 count_distinct since 3.2.0 countDistinct since 1.4.0 n_distinct since 1.4.0 collect_list since 2.3.0 collect_set since 2.3.0 grouping_bit since 2.3.0 grouping_id since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_aggregate_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Aggregate functions for Column operations — column_aggregate_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} if (FALSE) { head(select(df, approx_count_distinct(df$gear))) head(select(df, approx_count_distinct(df$gear, 0.02))) head(select(df, count_distinct(df$gear, df$cyl))) head(select(df, n_distinct(df$gear))) head(distinct(select(df, \"gear\")))} if (FALSE) { head(select(df, mean(df$mpg), sd(df$mpg), skewness(df$mpg), kurtosis(df$mpg)))} if (FALSE) { df <- createDataFrame( list(list(\"Java\", 2012, 20000), list(\"dotNET\", 2012, 5000), list(\"dotNET\", 2013, 48000), list(\"Java\", 2013, 30000)), list(\"course\", \"year\", \"earnings\") ) tmp <- agg(groupBy(df, df$\"course\"), \"max_by\" = max_by(df$\"year\", df$\"earnings\")) head(tmp)} if (FALSE) { head(select(df, avg(df$mpg), mean(df$mpg), sum(df$mpg), min(df$wt), max(df$qsec))) # metrics by num of cylinders tmp <- agg(groupBy(df, \"cyl\"), avg(df$mpg), avg(df$hp), avg(df$wt), avg(df$qsec)) head(orderBy(tmp, \"cyl\")) # car with the max mpg mpg_max <- as.numeric(collect(agg(df, max(df$mpg)))) head(where(df, df$mpg == mpg_max))} if (FALSE) { df <- createDataFrame( list(list(\"Java\", 2012, 20000), list(\"dotNET\", 2012, 5000), list(\"dotNET\", 2013, 48000), list(\"Java\", 2013, 30000)), list(\"course\", \"year\", \"earnings\") ) tmp <- agg(groupBy(df, df$\"course\"), \"min_by\" = min_by(df$\"year\", df$\"earnings\")) head(tmp)} if (FALSE) { head(select(df, sd(df$mpg), stddev(df$mpg), stddev_pop(df$wt), stddev_samp(df$qsec)))} if (FALSE) { head(select(df, sum_distinct(df$gear))) head(distinct(select(df, \"gear\")))} if (FALSE) { head(agg(df, var(df$mpg), variance(df$mpg), var_pop(df$mpg), var_samp(df$mpg)))} if (FALSE) { df2 = df[df$mpg > 20, ] collect(select(df2, collect_list(df2$gear))) collect(select(df2, collect_set(df2$gear)))} if (FALSE) { # With cube agg( cube(df, \"cyl\", \"gear\", \"am\"), mean(df$mpg), grouping_bit(df$cyl), grouping_bit(df$gear), grouping_bit(df$am) ) # With rollup agg( rollup(df, \"cyl\", \"gear\", \"am\"), mean(df$mpg), grouping_bit(df$cyl), grouping_bit(df$gear), grouping_bit(df$am) )} if (FALSE) { # With cube agg( cube(df, \"cyl\", \"gear\", \"am\"), mean(df$mpg), grouping_id(df$cyl, df$gear, df$am) ) # With rollup agg( rollup(df, \"cyl\", \"gear\", \"am\"), mean(df$mpg), grouping_id(df$cyl, df$gear, df$am) )}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Avro processing functions for Column operations — column_avro_functions","title":"Avro processing functions for Column operations — column_avro_functions","text":"Avro processing functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Avro processing functions for Column operations — column_avro_functions","text":"","code":"from_avro(x, ...) to_avro(x, ...) # S4 method for characterOrColumn from_avro(x, jsonFormatSchema, ...) # S4 method for characterOrColumn to_avro(x, jsonFormatSchema = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Avro processing functions for Column operations — column_avro_functions","text":"x Column compute . ... additional argument(s) passed parser options. jsonFormatSchema character Avro schema JSON string format","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Avro processing functions for Column operations — column_avro_functions","text":"from_avro Converts binary column Avro format corresponding catalyst value. specified schema must match read data, otherwise behavior undefined: may fail return arbitrary result. deserialize data compatible evolved schema, expected Avro schema can set via option avroSchema. to_avro Converts column binary Avro format.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Avro processing functions for Column operations — column_avro_functions","text":"Avro built-external data source module since Spark 2.4. Please deploy application per deployment section \"Apache Avro Data Source Guide\". from_avro since 3.1.0 to_avro since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_avro_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Avro processing functions for Column operations — column_avro_functions","text":"","code":"if (FALSE) { df <- createDataFrame(iris) schema <- paste( c( '{\"type\": \"record\", \"namespace\": \"example.avro\", \"name\": \"Iris\", \"fields\": [', '{\"type\": [\"double\", \"null\"], \"name\": \"Sepal_Length\"},', '{\"type\": [\"double\", \"null\"], \"name\": \"Sepal_Width\"},', '{\"type\": [\"double\", \"null\"], \"name\": \"Petal_Length\"},', '{\"type\": [\"double\", \"null\"], \"name\": \"Petal_Width\"},', '{\"type\": [\"string\", \"null\"], \"name\": \"Species\"}]}' ), collapse=\"\\\\n\" ) df_serialized <- select( df, alias(to_avro(alias(struct(column(\"*\")), \"fields\")), \"payload\") ) df_deserialized <- select( df_serialized, from_avro(df_serialized$payload, schema) ) head(df_deserialized) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Collection functions for Column operations — column_collection_functions","title":"Collection functions for Column operations — column_collection_functions","text":"Collection functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Collection functions for Column operations — column_collection_functions","text":"","code":"array_aggregate(x, initialValue, merge, ...) array_contains(x, value) array_distinct(x) array_except(x, y) array_exists(x, f) array_forall(x, f) array_filter(x, f) array_intersect(x, y) array_join(x, delimiter, ...) array_max(x) array_min(x) array_position(x, value) array_remove(x, value) array_repeat(x, count) array_sort(x) array_transform(x, f) arrays_overlap(x, y) array_union(x, y) arrays_zip(x, ...) arrays_zip_with(x, y, f) concat(x, ...) element_at(x, extraction) explode(x) explode_outer(x) flatten(x) from_json(x, schema, ...) from_csv(x, schema, ...) map_concat(x, ...) map_entries(x) map_filter(x, f) map_from_arrays(x, y) map_from_entries(x) map_keys(x) map_values(x) map_zip_with(x, y, f) posexplode(x) posexplode_outer(x) reverse(x) schema_of_csv(x, ...) schema_of_json(x, ...) shuffle(x) size(x) slice(x, start, length) sort_array(x, asc = TRUE) transform_keys(x, f) transform_values(x, f) to_json(x, ...) to_csv(x, ...) # S4 method for Column reverse(x) # S4 method for Column to_json(x, ...) # S4 method for Column to_csv(x, ...) # S4 method for Column concat(x, ...) # S4 method for Column,characterOrstructTypeOrColumn from_json(x, schema, as.json.array = FALSE, ...) # S4 method for characterOrColumn schema_of_json(x, ...) # S4 method for Column,characterOrstructTypeOrColumn from_csv(x, schema, ...) # S4 method for characterOrColumn schema_of_csv(x, ...) # S4 method for characterOrColumn,Column,`function` array_aggregate(x, initialValue, merge, finish = NULL) # S4 method for Column array_contains(x, value) # S4 method for Column array_distinct(x) # S4 method for Column,Column array_except(x, y) # S4 method for characterOrColumn,`function` array_exists(x, f) # S4 method for characterOrColumn,`function` array_filter(x, f) # S4 method for characterOrColumn,`function` array_forall(x, f) # S4 method for Column,Column array_intersect(x, y) # S4 method for Column,character array_join(x, delimiter, nullReplacement = NULL) # S4 method for Column array_max(x) # S4 method for Column array_min(x) # S4 method for Column array_position(x, value) # S4 method for Column array_remove(x, value) # S4 method for Column,numericOrColumn array_repeat(x, count) # S4 method for Column array_sort(x) # S4 method for characterOrColumn,`function` array_transform(x, f) # S4 method for Column,Column arrays_overlap(x, y) # S4 method for Column,Column array_union(x, y) # S4 method for Column arrays_zip(x, ...) # S4 method for characterOrColumn,characterOrColumn,`function` arrays_zip_with(x, y, f) # S4 method for Column shuffle(x) # S4 method for Column flatten(x) # S4 method for Column map_concat(x, ...) # S4 method for Column map_entries(x) # S4 method for characterOrColumn,`function` map_filter(x, f) # S4 method for Column,Column map_from_arrays(x, y) # S4 method for Column map_from_entries(x) # S4 method for Column map_keys(x) # S4 method for characterOrColumn,`function` transform_keys(x, f) # S4 method for characterOrColumn,`function` transform_values(x, f) # S4 method for Column map_values(x) # S4 method for characterOrColumn,characterOrColumn,`function` map_zip_with(x, y, f) # S4 method for Column element_at(x, extraction) # S4 method for Column explode(x) # S4 method for Column size(x) # S4 method for Column slice(x, start, length) # S4 method for Column sort_array(x, asc = TRUE) # S4 method for Column posexplode(x) # S4 method for Column explode_outer(x) # S4 method for Column posexplode_outer(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Collection functions for Column operations — column_collection_functions","text":"x Column compute . Note difference following methods: to_json: column containing struct, array structs, map array maps. to_csv: column containing struct. from_json: column containing JSON string. from_csv: column containing CSV string. initialValue Column used initial value array_aggregate merge function binary function (Column, Column) -> Column used array_aggregateto merge values (second argument) accumulator (first argument). ... additional argument(s). to_json, from_json schema_of_json: contains additional named properties control converted accepts options JSON data source. can find JSON-specific options reading/writing JSON files https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-optionData Source Option version use. to_json: supports \"pretty\" option enables pretty JSON generation. to_csv, from_csv schema_of_csv: contains additional named properties control converted accepts options CSV data source. can find CSV-specific options reading/writing CSV files https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-optionData Source Option version use. arrays_zip, contains additional Columns arrays merged. map_concat, contains additional Columns maps unioned. value value compute . array_contains: value checked contained column. array_position: value locate given array. array_remove: value remove given array. y Column compute . f function mapping Column(s) Column. array_exists array_filter Boolean function used filter data. Either unary binary. latter case second argument index array (0-based). array_forall Boolean unary function used filter data. array_transform function used transform data. Either unary binary. latter case second argument index array (0-based). arrays_zip_with map_zip_with map_filter Boolean binary function used filter data. first argument key, second argument value. transform_keys binary function used transform data. first argument key, second argument value. transform_values binary function used transform data. first argument key, second argument value. delimiter character string used concatenate elements column. count Column constant determining number repetitions. extraction index check array key check map schema from_json: structType object use schema use parsing JSON string. Since Spark 2.3, DDL-formatted string also supported schema. Since Spark 3.0, schema_of_json DDL-formatted string literal can also accepted. from_csv: structType object, DDL-formatted string schema_of_csv start starting index length length slice asc logical flag indicating sorting order. TRUE, sorting ascending order. FALSE, sorting descending order. .json.array indicating input string JSON array objects single object. finish unary function (Column) -> Column used apply final transformation accumulated data array_aggregate. nullReplacement optional character string used replace Null values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Collection functions for Column operations — column_collection_functions","text":"reverse: Returns reversed string array reverse order elements. to_json: Converts column containing structType, mapType arrayType Column JSON string. Resolving Column can fail unsupported type encountered. to_csv: Converts column containing structType Column CSV string. Resolving Column can fail unsupported type encountered. concat: Concatenates multiple input columns together single column. function works strings, binary compatible array columns. from_json: Parses column containing JSON string Column structType specified schema array structType .json.array set TRUE. string unparseable, Column contain value NA. schema_of_json: Parses JSON string infers schema DDL format. from_csv: Parses column containing CSV string Column structType specified schema. string unparseable, Column contain value NA. schema_of_csv: Parses CSV string infers schema DDL format. array_aggregate Applies binary operator initial state elements array, reduces single state. final state converted final result applying finish function. array_contains: Returns null array null, true array contains value, false otherwise. array_distinct: Removes duplicate values array. array_except: Returns array elements first array second array, without duplicates. order elements result determined. array_exists Returns whether predicate holds one elements array. array_filter Returns array elements predicate holds given array. array_forall Returns whether predicate holds every element array. array_intersect: Returns array elements intersection given two arrays, without duplicates. array_join: Concatenates elements column using delimiter. Null values replaced nullReplacement set, otherwise ignored. array_max: Returns maximum value array. array_min: Returns minimum value array. array_position: Locates position first occurrence given value given array. Returns NA either arguments NA. Note: position zero based, 1 based index. Returns 0 given value found array. array_remove: Removes elements equal element given array. array_repeat: Creates array containing x repeated number times given count. array_sort: Sorts input array ascending order. elements input array must orderable. NA elements placed end returned array. array_transform Returns array elements applying transformation element input array. arrays_overlap: Returns true input arrays least one non-null element common. arrays non-empty contains null, returns null. returns false otherwise. array_union: Returns array elements union given two arrays, without duplicates. arrays_zip: Returns merged array structs N-th struct contains N-th values input arrays. arrays_zip_with Merge two given arrays, element-wise, single array using function. one array shorter, nulls appended end match length longer array, applying function. shuffle: Returns random permutation given array. flatten: Creates single array array arrays. structure nested arrays deeper two levels, one level nesting removed. map_concat: Returns union given maps. map_entries: Returns unordered array entries given map. map_filter Returns map whose key-value pairs satisfy predicate. map_from_arrays: Creates new map column. array first column used keys. array second column used values. elements array key null. map_from_entries: Returns map created given array entries. map_keys: Returns unordered array containing keys map. transform_keys Applies function every key-value pair map returns map results applications new keys pairs. transform_values Applies function every key-value pair map returns map results applications new values pairs. map_values: Returns unordered array containing values map. map_zip Merge two given maps, key-wise single map using function. element_at: Returns element array given index extraction x array. Returns value given key extraction x map. Note: position zero based, 1 based index. explode: Creates new row element given array map column. Uses default column name col elements array key value elements map unless specified otherwise. size: Returns length array map. slice: Returns array containing elements x index start (array indices start 1, end start negative) specified length. sort_array: Sorts input array ascending descending order according natural ordering array elements. NA elements placed beginning returned array ascending order end returned array descending order. posexplode: Creates new row element position given array map column. Uses default column name pos position, col elements array key value elements map unless specified otherwise. explode: Creates new row element given array map column. Unlike explode, array/map null empty null produced. Uses default column name col elements array key value elements map unless specified otherwise. posexplode_outer: Creates new row element position given array map column. Unlike posexplode, array/map null empty row (null, null) produced. Uses default column name pos position, col elements array key value elements map unless specified otherwise.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Collection functions for Column operations — column_collection_functions","text":"reverse since 1.5.0 to_json since 2.2.0 to_csv since 3.0.0 concat since 1.5.0 from_json since 2.2.0 schema_of_json since 3.0.0 from_csv since 3.0.0 schema_of_csv since 3.0.0 array_aggregate since 3.1.0 array_contains since 1.6.0 array_distinct since 2.4.0 array_except since 2.4.0 array_exists since 3.1.0 array_filter since 3.1.0 array_forall since 3.1.0 array_intersect since 2.4.0 array_join since 2.4.0 array_max since 2.4.0 array_min since 2.4.0 array_position since 2.4.0 array_remove since 2.4.0 array_repeat since 2.4.0 array_sort since 2.4.0 array_transform since 3.1.0 arrays_overlap since 2.4.0 array_union since 2.4.0 arrays_zip since 2.4.0 zip_with since 3.1.0 shuffle since 2.4.0 flatten since 2.4.0 map_concat since 3.0.0 map_entries since 3.0.0 map_filter since 3.1.0 map_from_arrays since 2.4.0 map_from_entries since 3.0.0 map_keys since 2.3.0 transform_keys since 3.1.0 transform_values since 3.1.0 map_values since 2.3.0 map_zip_with since 3.1.0 element_at since 2.4.0 explode since 1.5.0 size since 1.5.0 slice since 2.4.0 sort_array since 1.6.0 posexplode since 2.1.0 explode_outer since 2.3.0 posexplode_outer since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_collection_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Collection functions for Column operations — column_collection_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) tmp <- mutate(df, v1 = create_array(df$mpg, df$cyl, df$hp)) head(select(tmp, array_contains(tmp$v1, 21), size(tmp$v1), shuffle(tmp$v1))) head(select(tmp, array_max(tmp$v1), array_min(tmp$v1), array_distinct(tmp$v1))) head(select(tmp, array_position(tmp$v1, 21), array_repeat(df$mpg, 3), array_sort(tmp$v1))) head(select(tmp, reverse(tmp$v1), array_remove(tmp$v1, 21))) head(select(tmp, array_transform(\"v1\", function(x) x * 10))) head(select(tmp, array_exists(\"v1\", function(x) x > 120))) head(select(tmp, array_forall(\"v1\", function(x) x >= 8.0))) head(select(tmp, array_filter(\"v1\", function(x) x < 10))) head(select(tmp, array_aggregate(\"v1\", lit(0), function(acc, y) acc + y))) head(select( tmp, array_aggregate(\"v1\", lit(0), function(acc, y) acc + y, function(acc) acc / 10))) tmp2 <- mutate(tmp, v2 = explode(tmp$v1)) head(tmp2) head(select(tmp, posexplode(tmp$v1))) head(select(tmp, slice(tmp$v1, 2L, 2L))) head(select(tmp, sort_array(tmp$v1))) head(select(tmp, sort_array(tmp$v1, asc = FALSE))) tmp3 <- mutate(df, v3 = create_map(df$model, df$cyl)) head(select(tmp3, map_entries(tmp3$v3), map_keys(tmp3$v3), map_values(tmp3$v3))) head(select(tmp3, element_at(tmp3$v3, \"Valiant\"), map_concat(tmp3$v3, tmp3$v3))) head(select(tmp3, transform_keys(\"v3\", function(k, v) upper(k)))) head(select(tmp3, transform_values(\"v3\", function(k, v) v * 10))) head(select(tmp3, map_filter(\"v3\", function(k, v) v < 42))) tmp4 <- mutate(df, v4 = create_array(df$mpg, df$cyl), v5 = create_array(df$cyl, df$hp)) head(select(tmp4, concat(tmp4$v4, tmp4$v5), arrays_overlap(tmp4$v4, tmp4$v5))) head(select(tmp4, array_except(tmp4$v4, tmp4$v5), array_intersect(tmp4$v4, tmp4$v5))) head(select(tmp4, array_union(tmp4$v4, tmp4$v5))) head(select(tmp4, arrays_zip(tmp4$v4, tmp4$v5))) head(select(tmp, concat(df$mpg, df$cyl, df$hp))) head(select(tmp4, arrays_zip_with(tmp4$v4, tmp4$v5, function(x, y) x * y))) tmp5 <- mutate(df, v6 = create_array(df$model, df$model)) head(select(tmp5, array_join(tmp5$v6, \"#\"), array_join(tmp5$v6, \"#\", \"NULL\"))) tmp6 <- mutate(df, v7 = create_array(create_array(df$model, df$model))) head(select(tmp6, flatten(tmp6$v7))) tmp7 <- mutate(df, v8 = create_array(df$model, df$cyl), v9 = create_array(df$model, df$hp)) head(select(tmp7, arrays_zip_with(\"v8\", \"v9\", function(x, y) (x * y) %% 3))) head(select(tmp7, map_from_arrays(tmp7$v8, tmp7$v9))) tmp8 <- mutate(df, v10 = create_array(struct(df$model, df$cyl))) head(select(tmp8, map_from_entries(tmp8$v10)))} if (FALSE) { # Converts a struct into a JSON object df2 <- sql(\"SELECT named_struct('date', cast('2000-01-01' as date)) as d\") select(df2, to_json(df2$d, dateFormat = 'dd/MM/yyyy')) # Converts an array of structs into a JSON array df2 <- sql(\"SELECT array(named_struct('name', 'Bob'), named_struct('name', 'Alice')) as people\") df2 <- mutate(df2, people_json = to_json(df2$people)) # Converts a map into a JSON object df2 <- sql(\"SELECT map('name', 'Bob') as people\") df2 <- mutate(df2, people_json = to_json(df2$people)) # Converts an array of maps into a JSON array df2 <- sql(\"SELECT array(map('name', 'Bob'), map('name', 'Alice')) as people\") df2 <- mutate(df2, people_json = to_json(df2$people)) # Converts a map into a pretty JSON object df2 <- sql(\"SELECT map('name', 'Bob') as people\") df2 <- mutate(df2, people_json = to_json(df2$people, pretty = TRUE))} if (FALSE) { # Converts a struct into a CSV string df2 <- sql(\"SELECT named_struct('date', cast('2000-01-01' as date)) as d\") select(df2, to_csv(df2$d, dateFormat = 'dd/MM/yyyy'))} if (FALSE) { df2 <- sql(\"SELECT named_struct('date', cast('2000-01-01' as date)) as d\") df2 <- mutate(df2, d2 = to_json(df2$d, dateFormat = 'dd/MM/yyyy')) schema <- structType(structField(\"date\", \"string\")) head(select(df2, from_json(df2$d2, schema, dateFormat = 'dd/MM/yyyy'))) df2 <- sql(\"SELECT named_struct('name', 'Bob') as people\") df2 <- mutate(df2, people_json = to_json(df2$people)) schema <- structType(structField(\"name\", \"string\")) head(select(df2, from_json(df2$people_json, schema))) head(select(df2, from_json(df2$people_json, \"name STRING\"))) head(select(df2, from_json(df2$people_json, schema_of_json(head(df2)$people_json))))} if (FALSE) { json <- \"{\\\"name\\\":\\\"Bob\\\"}\" df <- sql(\"SELECT * FROM range(1)\") head(select(df, schema_of_json(json)))} if (FALSE) { csv <- \"Amsterdam,2018\" df <- sql(paste0(\"SELECT '\", csv, \"' as csv\")) schema <- \"city STRING, year INT\" head(select(df, from_csv(df$csv, schema))) head(select(df, from_csv(df$csv, structType(schema)))) head(select(df, from_csv(df$csv, schema_of_csv(csv))))} if (FALSE) { csv <- \"Amsterdam,2018\" df <- sql(\"SELECT * FROM range(1)\") head(select(df, schema_of_csv(csv)))} if (FALSE) { df2 <- createDataFrame(data.frame( id = c(1, 2, 3), text = c(\"a,b,c\", NA, \"d,e\") )) head(select(df2, df2$id, explode_outer(split_string(df2$text, \",\")))) head(select(df2, df2$id, posexplode_outer(split_string(df2$text, \",\"))))}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"Date time arithmetic functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"","code":"add_months(y, x) datediff(y, x) date_add(y, x) date_format(y, x) date_sub(y, x) from_utc_timestamp(y, x) months_between(y, x, ...) next_day(y, x) to_utc_timestamp(y, x) # S4 method for Column datediff(y, x) # S4 method for Column months_between(y, x, roundOff = NULL) # S4 method for Column,character date_format(y, x) # S4 method for Column,character from_utc_timestamp(y, x) # S4 method for Column,character next_day(y, x) # S4 method for Column,character to_utc_timestamp(y, x) # S4 method for Column,numeric add_months(y, x) # S4 method for Column,numeric date_add(y, x) # S4 method for Column,numeric date_sub(y, x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"y Column compute . x class Column, column used perform arithmetic operations column y. class numeric, number months days added subtracted y. class character, date_format: date format specification. from_utc_timestamp, to_utc_timestamp: string detailing time zone ID input adjusted . format either region-based zone IDs zone offsets. Region IDs must form 'area/city', 'America/Los_Angeles'. Zone offsets must format (+|-)HH:mm', example '-08:00' '+01:00'. Also 'UTC' 'Z' supported aliases '+00:00'. short names recommended use can ambiguous. next_day: day week string. ... additional argument(s). months_between, contains optional parameter specify result rounded 8 digits. roundOff optional parameter specify result rounded 8 digits","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"datediff: Returns number days y x. y later x result positive. months_between: Returns number months dates y x. y later x, result positive. y x day month, last day month, time day ignored. Otherwise, difference calculated based 31 days per month, rounded 8 digits. date_format: Converts date/timestamp/string value string format specified date format given second argument. pattern instance dd.MM.yyyy return string like '18.03.1993'. pattern letters java.time.format.DateTimeFormatter can used. Note: Use ever possible specialized functions like year. benefit specialized implementation. from_utc_timestamp: common function databases supporting TIMESTAMP WITHOUT TIMEZONE. function takes timestamp timezone-agnostic, interprets timestamp UTC, renders timestamp timestamp given time zone. However, timestamp Spark represents number microseconds Unix epoch, timezone-agnostic. Spark function just shift timestamp value UTC timezone given timezone. function may return confusing result input string timezone, e.g. (2018-03-13T06:18:23+00:00). reason , Spark firstly cast string timestamp according timezone string, finally display result converting timestamp string according session local timezone. next_day: Given date column, returns first date later value date column specified day week. example, next_day(\"2015-07-27\", \"Sunday\") returns 2015-08-02 first Sunday 2015-07-27. Day week parameter case insensitive, accepts first three two characters: \"Mon\", \"Tue\", \"Wed\", \"Thu\", \"Fri\", \"Sat\", \"Sun\". to_utc_timestamp: common function databases supporting TIMESTAMP WITHOUT TIMEZONE. function takes timestamp timezone-agnostic, interprets timestamp given timezone, renders timestamp timestamp UTC. However, timestamp Spark represents number microseconds Unix epoch, timezone-agnostic. Spark function just shift timestamp value given timezone UTC timezone. function may return confusing result input string timezone, e.g. (2018-03-13T06:18:23+00:00). reason , Spark firstly cast string timestamp according timezone string, finally display result converting timestamp string according session local timezone. add_months: Returns date numMonths (x) startDate (y). date_add: Returns date x days . date_sub: Returns date x days .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"datediff since 1.5.0 months_between since 1.5.0 date_format since 1.5.0 from_utc_timestamp since 1.5.0 next_day since 1.5.0 to_utc_timestamp since 1.5.0 add_months since 1.5.0 date_add since 1.5.0 date_sub since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_diff_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Date time arithmetic functions for Column operations — column_datetime_diff_functions","text":"","code":"if (FALSE) { dts <- c(\"2005-01-02 18:47:22\", \"2005-12-24 16:30:58\", \"2005-10-28 07:30:05\", \"2005-12-28 07:01:05\", \"2006-01-24 00:01:10\") y <- c(2.0, 2.2, 3.4, 2.5, 1.8) df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y))} if (FALSE) { tmp <- createDataFrame(data.frame(time_string1 = as.POSIXct(dts), time_string2 = as.POSIXct(dts[order(runif(length(dts)))]))) tmp2 <- mutate(tmp, datediff = datediff(tmp$time_string1, tmp$time_string2), monthdiff = months_between(tmp$time_string1, tmp$time_string2)) head(tmp2)} if (FALSE) { tmp <- mutate(df, from_utc = from_utc_timestamp(df$time, \"PST\"), to_utc = to_utc_timestamp(df$time, \"PST\")) head(tmp)} if (FALSE) { tmp <- mutate(df, t1 = add_months(df$time, 1), t2 = date_add(df$time, 2), t3 = date_sub(df$time, 3), t4 = next_day(df$time, \"Sun\")) head(tmp)}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Date time functions for Column operations — timestamp_seconds","title":"Date time functions for Column operations — timestamp_seconds","text":"Date time functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Date time functions for Column operations — timestamp_seconds","text":"","code":"current_date(x = \"missing\") current_timestamp(x = \"missing\") date_trunc(format, x) dayofmonth(x) dayofweek(x) dayofyear(x) from_unixtime(x, ...) hour(x) last_day(x) make_date(x, y, z) minute(x) month(x) quarter(x) second(x) timestamp_seconds(x) to_date(x, format) to_timestamp(x, format) unix_timestamp(x, format) weekofyear(x) window(x, ...) year(x) # S4 method for Column dayofmonth(x) # S4 method for Column dayofweek(x) # S4 method for Column dayofyear(x) # S4 method for Column hour(x) # S4 method for Column last_day(x) # S4 method for Column,Column,Column make_date(x, y, z) # S4 method for Column minute(x) # S4 method for Column month(x) # S4 method for Column quarter(x) # S4 method for Column second(x) # S4 method for Column,missing to_date(x, format) # S4 method for Column,character to_date(x, format) # S4 method for Column,missing to_timestamp(x, format) # S4 method for Column,character to_timestamp(x, format) # S4 method for Column weekofyear(x) # S4 method for Column year(x) # S4 method for Column from_unixtime(x, format = \"yyyy-MM-dd HH:mm:ss\") # S4 method for Column window(x, windowDuration, slideDuration = NULL, startTime = NULL) # S4 method for missing,missing unix_timestamp(x, format) # S4 method for Column,missing unix_timestamp(x, format) # S4 method for Column,character unix_timestamp(x, format = \"yyyy-MM-dd HH:mm:ss\") # S4 method for Column trunc(x, format) # S4 method for character,Column date_trunc(format, x) # S4 method for missing current_date() # S4 method for missing current_timestamp() # S4 method for Column timestamp_seconds(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Date time functions for Column operations — timestamp_seconds","text":"x Column compute . window, must time Column TimestampType. used current_date current_timestamp format format given dates timestamps Column x. See format used following methods: to_date to_timestamp: string use parse Column x DateType TimestampType. trunc: string use specify truncation method. 'year', 'yyyy', 'yy' truncate year, 'month', 'mon', 'mm' truncate month options : 'week', 'quarter' date_trunc: similar trunc's additionally supports 'day', 'dd' truncate day, 'microsecond', 'millisecond', 'second', 'minute' 'hour' ... additional argument(s). y Column compute . z Column compute . windowDuration string specifying width window, e.g. '1 second', '1 day 12 hours', '2 minutes'. Valid interval strings 'week', 'day', 'hour', 'minute', 'second', 'millisecond', 'microsecond'. Note duration fixed length time, vary time according calendar. example, '1 day' always means 86,400,000 milliseconds, calendar day. slideDuration string specifying sliding interval window. format windowDuration. new window generated every slideDuration. Must less equal windowDuration. duration likewise absolute, vary according calendar. startTime offset respect 1970-01-01 00:00:00 UTC start window intervals. example, order hourly tumbling windows start 15 minutes past hour, e.g. 12:15-13:15, 13:15-14:15... provide startTime \"15 minutes\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Date time functions for Column operations — timestamp_seconds","text":"dayofmonth: Extracts day month integer given date/timestamp/string. dayofweek: Extracts day week integer given date/timestamp/string. Ranges 1 Sunday 7 Saturday dayofyear: Extracts day year integer given date/timestamp/string. hour: Extracts hour integer given date/timestamp/string. last_day: Given date column, returns last day month given date belongs . example, input \"2015-07-27\" returns \"2015-07-31\" since July 31 last day month July 2015. make_date: Create date year, month day fields. minute: Extracts minute integer given date/timestamp/string. month: Extracts month integer given date/timestamp/string. quarter: Extracts quarter integer given date/timestamp/string. second: Extracts second integer given date/timestamp/string. to_date: Converts column DateType. may optionally specify format according rules : Datetime Pattern string parsed according specified format (default), value column null. default, follows casting rules DateType format omitted (equivalent cast(df$x, \"date\")). to_timestamp: Converts column TimestampType. may optionally specify format according rules : Datetime Pattern string parsed according specified format (default), value column null. default, follows casting rules TimestampType format omitted (equivalent cast(df$x, \"timestamp\")). weekofyear: Extracts week number integer given date/timestamp/string. week considered start Monday week 1 first week 3 days, defined ISO 8601 year: Extracts year integer given date/timestamp/string. from_unixtime: Converts number seconds unix epoch (1970-01-01 00:00:00 UTC) string representing timestamp moment current system time zone JVM given format. See Datetime Pattern available options. window: Bucketizes rows one time windows given timestamp specifying column. Window starts inclusive window ends exclusive, e.g. 12:05 window [12:05,12:10) [12:00,12:05). Windows can support microsecond precision. Windows order months supported. returns output column struct called 'window' default nested columns 'start' 'end' unix_timestamp: Gets current Unix timestamp seconds. trunc: Returns date truncated unit specified format. date_trunc: Returns timestamp truncated unit specified format. current_date: Returns current date start query evaluation date column. calls current_date within query return value. current_timestamp: Returns current timestamp start query evaluation timestamp column. calls current_timestamp within query return value. timestamp_seconds: Creates timestamp number seconds since UTC epoch.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Date time functions for Column operations — timestamp_seconds","text":"dayofmonth since 1.5.0 dayofweek since 2.3.0 dayofyear since 1.5.0 hour since 1.5.0 last_day since 1.5.0 make_date since 3.3.0 minute since 1.5.0 month since 1.5.0 quarter since 1.5.0 second since 1.5.0 to_date(Column) since 1.5.0 to_date(Column, character) since 2.2.0 to_timestamp(Column) since 2.2.0 to_timestamp(Column, character) since 2.2.0 weekofyear since 1.5.0 year since 1.5.0 from_unixtime since 1.5.0 window since 2.0.0 unix_timestamp since 1.5.0 unix_timestamp(Column) since 1.5.0 unix_timestamp(Column, character) since 1.5.0 trunc since 2.3.0 date_trunc since 2.3.0 current_date since 2.3.0 current_timestamp since 2.3.0 timestamp_seconds since 3.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_datetime_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Date time functions for Column operations — timestamp_seconds","text":"","code":"if (FALSE) { dts <- c(\"2005-01-02 18:47:22\", \"2005-12-24 16:30:58\", \"2005-10-28 07:30:05\", \"2005-12-28 07:01:05\", \"2006-01-24 00:01:10\") y <- c(2.0, 2.2, 3.4, 2.5, 1.8) df <- createDataFrame(data.frame(time = as.POSIXct(dts), y = y))} if (FALSE) { head(select(df, df$time, year(df$time), quarter(df$time), month(df$time), dayofmonth(df$time), dayofweek(df$time), dayofyear(df$time), weekofyear(df$time))) head(agg(groupBy(df, year(df$time)), count(df$y), avg(df$y))) head(agg(groupBy(df, month(df$time)), avg(df$y)))} if (FALSE) { head(select(df, hour(df$time), minute(df$time), second(df$time))) head(agg(groupBy(df, dayofmonth(df$time)), avg(df$y))) head(agg(groupBy(df, hour(df$time)), avg(df$y))) head(agg(groupBy(df, minute(df$time)), avg(df$y)))} if (FALSE) { head(select(df, df$time, last_day(df$time), month(df$time)))} if (FALSE) { df <- createDataFrame( list(list(2021, 10, 22), list(2021, 13, 1), list(2021, 2, 29), list(2020, 2, 29)), list(\"year\", \"month\", \"day\") ) tmp <- head(select(df, make_date(df$year, df$month, df$day))) head(tmp)} if (FALSE) { tmp <- createDataFrame(data.frame(time_string = dts)) tmp2 <- mutate(tmp, date1 = to_date(tmp$time_string), date2 = to_date(tmp$time_string, \"yyyy-MM-dd\"), date3 = date_format(tmp$time_string, \"MM/dd/yyy\"), time1 = to_timestamp(tmp$time_string), time2 = to_timestamp(tmp$time_string, \"yyyy-MM-dd\")) head(tmp2)} if (FALSE) { tmp <- mutate(df, to_unix = unix_timestamp(df$time), to_unix2 = unix_timestamp(df$time, 'yyyy-MM-dd HH'), from_unix = from_unixtime(unix_timestamp(df$time)), from_unix2 = from_unixtime(unix_timestamp(df$time), 'yyyy-MM-dd HH:mm')) head(tmp)} if (FALSE) { # One minute windows every 15 seconds 10 seconds after the minute, e.g. 09:00:10-09:01:10, # 09:00:25-09:01:25, 09:00:40-09:01:40, ... window(df$time, \"1 minute\", \"15 seconds\", \"10 seconds\") # One minute tumbling windows 15 seconds after the minute, e.g. 09:00:15-09:01:15, # 09:01:15-09:02:15... window(df$time, \"1 minute\", startTime = \"15 seconds\") # Thirty-second windows every 10 seconds, e.g. 09:00:00-09:00:30, 09:00:10-09:00:40, ... window(df$time, \"30 seconds\", \"10 seconds\")} if (FALSE) { head(select(df, df$time, trunc(df$time, \"year\"), trunc(df$time, \"yy\"), trunc(df$time, \"month\"), trunc(df$time, \"mon\")))} if (FALSE) { head(select(df, df$time, date_trunc(\"hour\", df$time), date_trunc(\"minute\", df$time), date_trunc(\"week\", df$time), date_trunc(\"quarter\", df$time)))} if (FALSE) { head(select(df, current_date(), current_timestamp()))}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Math functions for Column operations — column_math_functions","title":"Math functions for Column operations — column_math_functions","text":"Math functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Math functions for Column operations — column_math_functions","text":"","code":"bin(x) bround(x, ...) cbrt(x) ceil(x) conv(x, fromBase, toBase) cot(x) csc(x) hex(x) hypot(y, x) pmod(y, x) rint(x) sec(x) shiftLeft(y, x) shiftleft(y, x) shiftRight(y, x) shiftright(y, x) shiftRightUnsigned(y, x) shiftrightunsigned(y, x) signum(x) degrees(x) toDegrees(x) radians(x) toRadians(x) unhex(x) # S4 method for Column abs(x) # S4 method for Column acos(x) # S4 method for Column acosh(x) # S4 method for Column asin(x) # S4 method for Column asinh(x) # S4 method for Column atan(x) # S4 method for Column atanh(x) # S4 method for Column bin(x) # S4 method for Column cbrt(x) # S4 method for Column ceil(x) # S4 method for Column ceiling(x) # S4 method for Column cos(x) # S4 method for Column cosh(x) # S4 method for Column cot(x) # S4 method for Column csc(x) # S4 method for Column exp(x) # S4 method for Column expm1(x) # S4 method for Column factorial(x) # S4 method for Column floor(x) # S4 method for Column hex(x) # S4 method for Column log(x) # S4 method for Column log10(x) # S4 method for Column log1p(x) # S4 method for Column log2(x) # S4 method for Column rint(x) # S4 method for Column round(x) # S4 method for Column bround(x, scale = 0) # S4 method for Column signum(x) # S4 method for Column sign(x) # S4 method for Column sin(x) # S4 method for Column sinh(x) # S4 method for Column sqrt(x) # S4 method for Column tan(x) # S4 method for Column tanh(x) # S4 method for Column toDegrees(x) # S4 method for Column degrees(x) # S4 method for Column toRadians(x) # S4 method for Column radians(x) # S4 method for Column unhex(x) # S4 method for Column atan2(y, x) # S4 method for Column hypot(y, x) # S4 method for Column pmod(y, x) # S4 method for Column,numeric shiftleft(y, x) # S4 method for Column,numeric shiftLeft(y, x) # S4 method for Column,numeric shiftright(y, x) # S4 method for Column,numeric shiftRight(y, x) # S4 method for Column,numeric shiftrightunsigned(y, x) # S4 method for Column,numeric shiftRightUnsigned(y, x) # S4 method for Column,numeric,numeric conv(x, fromBase, toBase) # S4 method for Column sec(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Math functions for Column operations — column_math_functions","text":"x Column compute . shiftLeft, shiftRight shiftRightUnsigned, number bits shift. ... additional argument(s). fromBase base convert . toBase base convert . y Column compute . scale round scale digits right decimal point scale > 0, nearest even number scale = 0, scale digits left decimal point scale < 0.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Math functions for Column operations — column_math_functions","text":"abs: Computes absolute value. acos: Returns inverse cosine given value, computed java.lang.Math.acos() acosh: Computes inverse hyperbolic cosine input column. asin: Returns inverse sine given value, computed java.lang.Math.asin() asinh: Computes inverse hyperbolic sine input column. atan: Returns inverse tangent given value, computed java.lang.Math.atan() atanh: Computes inverse hyperbolic tangent input column. bin: Returns string representation binary value given long column. example, bin(\"12\") returns \"1100\". cbrt: Computes cube-root given value. ceil: Computes ceiling given value. ceiling: Alias ceil. cos: Returns cosine given value, computed java.lang.Math.cos(). Units radians. cosh: Returns hyperbolic cosine given value, computed java.lang.Math.cosh(). cot: Returns cotangent given value. csc: Returns cosecant given value. exp: Computes exponential given value. expm1: Computes exponential given value minus one. factorial: Computes factorial given value. floor: Computes floor given value. hex: Computes hex value given column. log: Computes natural logarithm given value. log10: Computes logarithm given value base 10. log1p: Computes natural logarithm given value plus one. log2: Computes logarithm given column base 2. rint: Returns double value closest value argument equal mathematical integer. round: Returns value column rounded 0 decimal places using HALF_UP rounding mode. bround: Returns value column e rounded scale decimal places using HALF_EVEN rounding mode scale >= 0 integer part scale < 0. Also known Gaussian rounding bankers' rounding rounds nearest even number. bround(2.5, 0) = 2, bround(3.5, 0) = 4. signum: Computes signum given value. sign: Alias signum. sin: Returns sine given value, computed java.lang.Math.sin(). Units radians. sinh: Returns hyperbolic sine given value, computed java.lang.Math.sinh(). sqrt: Computes square root specified float value. tan: Returns tangent given value, computed java.lang.Math.tan(). Units radians. tanh: Returns hyperbolic tangent given value, computed java.lang.Math.tanh(). toDegrees: Converts angle measured radians approximately equivalent angle measured degrees. degrees: Converts angle measured radians approximately equivalent angle measured degrees. toRadians: Converts angle measured degrees approximately equivalent angle measured radians. radians: Converts angle measured degrees approximately equivalent angle measured radians. unhex: Inverse hex. Interprets pair characters hexadecimal number converts byte representation number. atan2: Returns angle theta conversion rectangular coordinates (x, y) polar coordinates (r, theta), computed java.lang.Math.atan2(). Units radians. hypot: Computes \"sqrt(^2 + b^2)\" without intermediate overflow underflow. pmod: Returns positive value dividend mod divisor. Column x divisor column, column y dividend column. shiftleft: Shifts given value numBits left. given value long value, function return long value else return integer value. shiftLeft: Shifts given value numBits left. given value long value, function return long value else return integer value. shiftright: (Signed) shifts given value numBits right. given value long value, return long value else return integer value. shiftRight: (Signed) shifts given value numBits right. given value long value, return long value else return integer value. shiftrightunsigned: (Unsigned) shifts given value numBits right. given value long value, return long value else return integer value. shiftRightUnsigned: (Unsigned) shifts given value numBits right. given value long value, return long value else return integer value. conv: Converts number string column one base another. sec: Returns secant given value.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Math functions for Column operations — column_math_functions","text":"abs since 1.5.0 acos since 1.5.0 acosh since 3.1.0 asin since 1.5.0 asinh since 3.1.0 atan since 1.5.0 atanh since 3.1.0 bin since 1.5.0 cbrt since 1.4.0 ceil since 1.5.0 ceiling since 1.5.0 cos since 1.5.0 cosh since 1.5.0 cot since 3.3.0 csc since 3.3.0 exp since 1.5.0 expm1 since 1.5.0 factorial since 1.5.0 floor since 1.5.0 hex since 1.5.0 log since 1.5.0 log10 since 1.5.0 log1p since 1.5.0 log2 since 1.5.0 rint since 1.5.0 round since 1.5.0 bround since 2.0.0 signum since 1.5.0 sign since 1.5.0 sin since 1.5.0 sinh since 1.5.0 sqrt since 1.5.0 tan since 1.5.0 tanh since 1.5.0 toDegrees since 1.4.0 degrees since 3.0.0 toRadians since 1.4.0 radians since 3.0.0 unhex since 1.5.0 atan2 since 1.5.0 hypot since 1.4.0 pmod since 1.5.0 shiftleft since 3.2.0 shiftLeft since 1.5.0 shiftright since 3.2.0 shiftRight since 1.5.0 shiftrightunsigned since 3.2.0 shiftRightUnsigned since 1.5.0 conv since 1.5.0 sec since 3.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_math_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Math functions for Column operations — column_math_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) tmp <- mutate(df, v1 = log(df$mpg), v2 = cbrt(df$disp), v3 = bround(df$wt, 1), v4 = bin(df$cyl), v5 = hex(df$wt), v6 = degrees(df$gear), v7 = atan2(df$cyl, df$am), v8 = hypot(df$cyl, df$am), v9 = pmod(df$hp, df$cyl), v10 = shiftLeft(df$disp, 1), v11 = conv(df$hp, 10, 16), v12 = sign(df$vs - 0.5), v13 = sqrt(df$disp), v14 = ceil(df$wt)) head(tmp)}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Miscellaneous functions for Column operations — column_misc_functions","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"Miscellaneous functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"","code":"assert_true(x, errMsg = NULL) crc32(x) hash(x, ...) md5(x) raise_error(x) sha1(x) sha2(y, x) xxhash64(x, ...) # S4 method for Column crc32(x) # S4 method for Column hash(x, ...) # S4 method for Column xxhash64(x, ...) # S4 method for Column assert_true(x, errMsg = NULL) # S4 method for characterOrColumn raise_error(x) # S4 method for Column md5(x) # S4 method for Column sha1(x) # S4 method for Column,numeric sha2(y, x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"x Column compute . sha2, one 224, 256, 384, 512. errMsg (optional) error message thrown. ... additional Columns. y Column compute .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"crc32: Calculates cyclic redundancy check value (CRC32) binary column returns value bigint. hash: Calculates hash code given columns, returns result int column. xxhash64: Calculates hash code given columns using 64-bit variant xxHash algorithm, returns result long column. assert_true: Returns null input column true; throws exception provided error message otherwise. raise_error: Throws exception provided error message. md5: Calculates MD5 digest binary column returns value 32 character hex string. sha1: Calculates SHA-1 digest binary column returns value 40 character hex string. sha2: Calculates SHA-2 family hash functions binary column returns value hex string. second argument x specifies number bits, one 224, 256, 384, 512.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"crc32 since 1.5.0 hash since 2.0.0 xxhash64 since 3.0.0 assert_true since 3.1.0 raise_error since 3.1.0 md5 since 1.5.0 sha1 since 1.5.0 sha2 since 1.5.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_misc_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Miscellaneous functions for Column operations — column_misc_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)[, 1:2]) tmp <- mutate(df, v1 = crc32(df$model), v2 = hash(df$model), v3 = hash(df$model, df$mpg), v4 = md5(df$model), v5 = sha1(df$model), v6 = sha2(df$model, 256)) head(tmp)} if (FALSE) { tmp <- mutate(df, v1 = assert_true(df$vs < 2), v2 = assert_true(df$vs < 2, \"custom error message\"), v3 = assert_true(df$vs < 2, df$vs)) head(tmp)} if (FALSE) { tmp <- mutate(df, v1 = raise_error(\"error message\")) head(tmp)}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"ML functions for Column operations — column_ml_functions","title":"ML functions for Column operations — column_ml_functions","text":"ML functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"ML functions for Column operations — column_ml_functions","text":"","code":"array_to_vector(x) vector_to_array(x, ...) # S4 method for Column array_to_vector(x) # S4 method for Column vector_to_array(x, dtype = c(\"float64\", \"float32\"))"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"ML functions for Column operations — column_ml_functions","text":"x Column compute . ... additional argument(s). dtype data type output array. Valid values: \"float64\" \"float32\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"ML functions for Column operations — column_ml_functions","text":"array_to_vector Converts column array numeric type column dense vectors MLlib vector_to_array Converts column MLlib sparse/dense vectors column dense arrays.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"ML functions for Column operations — column_ml_functions","text":"array_to_vector since 3.1.0 vector_to_array since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_ml_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"ML functions for Column operations — column_ml_functions","text":"","code":"if (FALSE) { df <- read.df(\"data/mllib/sample_libsvm_data.txt\", source = \"libsvm\") head( withColumn( withColumn(df, \"array\", vector_to_array(df$features)), \"vector\", array_to_vector(column(\"array\")) ) ) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Non-aggregate functions for Column operations — column_nonaggregate_functions","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"Non-aggregate functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"","code":"when(condition, value) bitwise_not(x) bitwiseNOT(x) create_array(x, ...) create_map(x, ...) expr(x) greatest(x, ...) input_file_name(x = \"missing\") isnan(x) least(x, ...) lit(x) monotonically_increasing_id(x = \"missing\") nanvl(y, x) negate(x) rand(seed) randn(seed) spark_partition_id(x = \"missing\") struct(x, ...) # S4 method for ANY lit(x) # S4 method for Column bitwise_not(x) # S4 method for Column bitwiseNOT(x) # S4 method for Column coalesce(x, ...) # S4 method for Column isnan(x) # S4 method for Column is.nan(x) # S4 method for missing monotonically_increasing_id() # S4 method for Column negate(x) # S4 method for missing spark_partition_id() # S4 method for characterOrColumn struct(x, ...) # S4 method for Column nanvl(y, x) # S4 method for Column greatest(x, ...) # S4 method for Column least(x, ...) # S4 method for character expr(x) # S4 method for missing rand(seed) # S4 method for numeric rand(seed) # S4 method for missing randn(seed) # S4 method for numeric randn(seed) # S4 method for Column when(condition, value) # S4 method for Column ifelse(test, yes, no) # S4 method for Column create_array(x, ...) # S4 method for Column create_map(x, ...) # S4 method for missing input_file_name()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"condition condition test . Must Column expression. value result expression. x Column compute . lit, literal value Column. expr, contains expression character object parsed. ... additional Columns. y Column compute . seed random seed. Can missing. test Column expression describes condition. yes return values TRUE elements test. return values FALSE elements test.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"lit: new Column created represent literal value. parameter Column, returned unchanged. bitwise_not: Computes bitwise . bitwiseNOT: Computes bitwise . coalesce: Returns first column NA, NA inputs . isnan: Returns true column NaN. .nan: Alias isnan. monotonically_increasing_id: Returns column generates monotonically increasing 64-bit integers. generated ID guaranteed monotonically increasing unique, consecutive. current implementation puts partition ID upper 31 bits, record number within partition lower 33 bits. assumption SparkDataFrame less 1 billion partitions, partition less 8 billion records. example, consider SparkDataFrame two partitions, 3 records. expression return following IDs: 0, 1, 2, 8589934592 (1L << 33), 8589934593, 8589934594. equivalent MONOTONICALLY_INCREASING_ID function SQL. method used argument. Note: function non-deterministic result depends partition IDs. negate: Unary minus, .e. negate expression. spark_partition_id: Returns partition ID SparkDataFrame column. Note nondeterministic depends data partitioning task scheduling. equivalent SPARK_PARTITION_ID function SQL. struct: Creates new struct column composes multiple input columns. nanvl: Returns first column (y) NaN, second column (x) first column NaN. inputs floating point columns (DoubleType FloatType). greatest: Returns greatest value list column names, skipping null values. function takes least 2 parameters. return null parameters null. least: Returns least value list column names, skipping null values. function takes least 2 parameters. return null parameters null. expr: Parses expression string column represents, similar SparkDataFrame.selectExpr rand: Generates random column independent identically distributed (..d.) samples uniformly distributed [0.0, 1.0). Note: function non-deterministic general case. randn: Generates column independent identically distributed (..d.) samples standard normal distribution. Note: function non-deterministic general case. : Evaluates list conditions returns one multiple possible result expressions. unmatched expressions null returned. ifelse: Evaluates list conditions returns yes conditions satisfied. Otherwise returned unmatched conditions. create_array: Creates new array column. input columns must data type. create_map: Creates new map column. input columns must grouped key-value pairs, e.g. (key1, value1, key2, value2, ...). key columns must data type, null. value columns must data type. input_file_name: Creates string column input file name given row. method used argument.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"lit since 1.5.0 bitwise_not since 3.2.0 bitwiseNOT since 1.5.0 coalesce(Column) since 2.1.1 isnan since 2.0.0 .nan since 2.0.0 negate since 1.5.0 spark_partition_id since 2.0.0 struct since 1.6.0 nanvl since 1.5.0 greatest since 1.5.0 least since 1.5.0 expr since 1.5.0 rand since 1.5.0 rand(numeric) since 1.5.0 randn since 1.5.0 randn(numeric) since 1.5.0 since 1.5.0 ifelse since 1.5.0 create_array since 2.3.0 create_map since 2.3.0 input_file_name since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_nonaggregate_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Non-aggregate functions for Column operations — column_nonaggregate_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars))} if (FALSE) { tmp <- mutate(df, v1 = lit(df$mpg), v2 = lit(\"x\"), v3 = lit(\"2015-01-01\"), v4 = negate(df$mpg), v5 = expr('length(model)'), v6 = greatest(df$vs, df$am), v7 = least(df$vs, df$am), v8 = column(\"mpg\")) head(tmp)} if (FALSE) { head(select(df, bitwise_not(cast(df$vs, \"int\"))))} if (FALSE) head(select(df, monotonically_increasing_id())) if (FALSE) head(select(df, spark_partition_id())) if (FALSE) { tmp <- mutate(df, v1 = struct(df$mpg, df$cyl), v2 = struct(\"hp\", \"wt\", \"vs\"), v3 = create_array(df$mpg, df$cyl, df$hp), v4 = create_map(lit(\"x\"), lit(1.0), lit(\"y\"), lit(-1.0))) head(tmp)} if (FALSE) { tmp <- mutate(df, r1 = rand(), r2 = rand(10), r3 = randn(), r4 = randn(10)) head(tmp)} if (FALSE) { tmp <- mutate(df, mpg_na = otherwise(when(df$mpg > 20, df$mpg), lit(NaN)), mpg2 = ifelse(df$mpg > 20 & df$am > 0, 0, 1), mpg3 = ifelse(df$mpg > 20, df$mpg, 20.0)) head(tmp) tmp <- mutate(tmp, ind_na1 = is.nan(tmp$mpg_na), ind_na2 = isnan(tmp$mpg_na)) head(select(tmp, coalesce(tmp$mpg_na, tmp$mpg))) head(select(tmp, nanvl(tmp$mpg_na, tmp$hp)))} if (FALSE) { tmp <- read.text(\"README.md\") head(select(tmp, input_file_name()))}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"String functions for Column operations — column_string_functions","title":"String functions for Column operations — column_string_functions","text":"String functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"String functions for Column operations — column_string_functions","text":"","code":"ascii(x) base64(x) bit_length(x, ...) concat_ws(sep, x, ...) decode(x, charset) encode(x, charset) format_number(y, x) format_string(format, x, ...) initcap(x) instr(y, x) levenshtein(y, x) locate(substr, str, ...) lower(x) lpad(x, len, pad) ltrim(x, trimString) octet_length(x, ...) overlay(x, replace, pos, ...) regexp_extract(x, pattern, idx) regexp_replace(x, pattern, replacement) repeat_string(x, n) rpad(x, len, pad) rtrim(x, trimString) split_string(x, pattern, ...) soundex(x) substring_index(x, delim, count) translate(x, matchingString, replaceString) trim(x, trimString) unbase64(x) upper(x) # S4 method for Column ascii(x) # S4 method for Column base64(x) # S4 method for Column bit_length(x) # S4 method for Column,character decode(x, charset) # S4 method for Column,character encode(x, charset) # S4 method for Column initcap(x) # S4 method for Column length(x) # S4 method for Column lower(x) # S4 method for Column,missing ltrim(x, trimString) # S4 method for Column,character ltrim(x, trimString) # S4 method for Column octet_length(x) # S4 method for Column,Column,numericOrColumn overlay(x, replace, pos, len = -1) # S4 method for Column,missing rtrim(x, trimString) # S4 method for Column,character rtrim(x, trimString) # S4 method for Column soundex(x) # S4 method for Column,missing trim(x, trimString) # S4 method for Column,character trim(x, trimString) # S4 method for Column unbase64(x) # S4 method for Column upper(x) # S4 method for Column levenshtein(y, x) # S4 method for Column,character instr(y, x) # S4 method for Column,numeric format_number(y, x) # S4 method for character,Column concat_ws(sep, x, ...) # S4 method for character,Column format_string(format, x, ...) # S4 method for character,Column locate(substr, str, pos = 1) # S4 method for Column,numeric,character lpad(x, len, pad) # S4 method for Column,character,numeric regexp_extract(x, pattern, idx) # S4 method for Column,character,character regexp_replace(x, pattern, replacement) # S4 method for Column,numeric,character rpad(x, len, pad) # S4 method for Column,character,numeric substring_index(x, delim, count) # S4 method for Column,character,character translate(x, matchingString, replaceString) # S4 method for Column,character split_string(x, pattern, limit = -1) # S4 method for Column,numeric repeat_string(x, n)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"String functions for Column operations — column_string_functions","text":"x Column compute except following methods: instr: character, substring check. See 'Details'. format_number: numeric, number decimal place format . See 'Details'. ... additional Columns. sep separator use. charset character set use (one \"US-ASCII\", \"ISO-8859-1\", \"UTF-8\", \"UTF-16BE\", \"UTF-16LE\", \"UTF-16\"). y Column compute . format character object format strings. substr character string matched. str Column matches sought entry. len lpad maximum length output result. overlay number bytes replace. pad character string padded . trimString character string trim replace Column replacement. pos locate: start position search. overlay: start position replacement. pattern regular expression. idx group index. replacement character string matched pattern replaced . n number repetitions. delim delimiter string. count number occurrences delim substring returned. positive number means counting left, negative means counting right. matchingString source string character translated. replaceString target string matchingString character replaced character replaceString location, . limit determines length returned array. limit > 0: length array limit limit <= 0: returned array can length","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"String functions for Column operations — column_string_functions","text":"ascii: Computes numeric value first character string column, returns result int column. base64: Computes BASE64 encoding binary column returns string column. reverse unbase64. bit_length: Calculates bit length specified string column. decode: Computes first argument string binary using provided character set. encode: Computes first argument binary string using provided character set. initcap: Returns new string column converting first letter word uppercase. Words delimited whitespace. example, \"hello world\" become \"Hello World\". length: Computes character length string data number bytes binary data. length string data includes trailing spaces. length binary data includes binary zeros. lower: Converts string column lower case. ltrim: Trims spaces left end specified string value. Optionally trimString can specified. octet_length: Calculates byte length specified string column. overlay: Overlay specified portion x replace, starting byte position pos src proceeding len bytes. rtrim: Trims spaces right end specified string value. Optionally trimString can specified. soundex: Returns soundex code specified expression. trim: Trims spaces ends specified string column. Optionally trimString can specified. unbase64: Decodes BASE64 encoded string column returns binary column. reverse base64. upper: Converts string column upper case. levenshtein: Computes Levenshtein distance two given string columns. instr: Locates position first occurrence substring (x) given string column (y). Returns null either arguments null. Note: position zero based, 1 based index. Returns 0 substring found string column. format_number: Formats numeric column y format like '#,###,###.##', rounded x decimal places HALF_EVEN round mode, returns result string column. x 0, result decimal point fractional part. x < 0, result null. concat_ws: Concatenates multiple input string columns together single string column, using given separator. format_string: Formats arguments printf-style returns result string column. locate: Locates position first occurrence substr. Note: position zero based, 1 based index. Returns 0 substr found str. lpad: Left-padded pad length len. regexp_extract: Extracts specific idx group identified Java regex, specified string column. regex match, specified group match, empty string returned. regexp_replace: Replaces substrings specified string value match regexp rep. rpad: Right-padded pad length len. substring_index: Returns substring string (x) count occurrences delimiter (delim). count positive, everything left final delimiter (counting left) returned. count negative, every right final delimiter (counting right) returned. substring_index performs case-sensitive match searching delimiter. translate: Translates character src character replaceString. characters replaceString corresponding characters matchingString. translate happen character string matching character matchingString. split_string: Splits string regular expression. Equivalent split SQL function. Optionally limit can specified repeat_string: Repeats string n times. Equivalent repeat SQL function.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"String functions for Column operations — column_string_functions","text":"ascii since 1.5.0 base64 since 1.5.0 length since 3.3.0 decode since 1.6.0 encode since 1.6.0 initcap since 1.5.0 length since 1.5.0 lower since 1.4.0 ltrim since 1.5.0 ltrim(Column, character) since 2.3.0 length since 3.3.0 overlay since 3.0.0 rtrim since 1.5.0 rtrim(Column, character) since 2.3.0 soundex since 1.5.0 trim since 1.5.0 trim(Column, character) since 2.3.0 unbase64 since 1.5.0 upper since 1.4.0 levenshtein since 1.5.0 instr since 1.5.0 format_number since 1.5.0 concat_ws since 1.5.0 format_string since 1.5.0 locate since 1.5.0 lpad since 1.5.0 regexp_extract since 1.5.0 regexp_replace since 1.5.0 rpad since 1.5.0 substring_index since 1.5.0 translate since 1.5.0 split_string 2.3.0 repeat_string since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_string_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"String functions for Column operations — column_string_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(as.data.frame(Titanic, stringsAsFactors = FALSE))} if (FALSE) { head(select(df, ascii(df$Class), ascii(df$Sex)))} if (FALSE) { tmp <- mutate(df, s1 = encode(df$Class, \"UTF-8\")) str(tmp) tmp2 <- mutate(tmp, s2 = base64(tmp$s1), s3 = decode(tmp$s1, \"UTF-8\"), s4 = soundex(tmp$Sex)) head(tmp2) head(select(tmp2, unbase64(tmp2$s2)))} if (FALSE) { tmp <- mutate(df, sex_lower = lower(df$Sex), age_upper = upper(df$age), sex_age = concat_ws(\" \", lower(df$sex), lower(df$age))) head(tmp) tmp2 <- mutate(tmp, s1 = initcap(tmp$sex_lower), s2 = initcap(tmp$sex_age), s3 = reverse(df$Sex)) head(tmp2)} if (FALSE) { tmp <- mutate(df, SexLpad = lpad(df$Sex, 6, \" \"), SexRpad = rpad(df$Sex, 7, \" \")) head(select(tmp, length(tmp$Sex), length(tmp$SexLpad), length(tmp$SexRpad))) tmp2 <- mutate(tmp, SexLtrim = ltrim(tmp$SexLpad), SexRtrim = rtrim(tmp$SexRpad), SexTrim = trim(tmp$SexLpad)) head(select(tmp2, length(tmp2$Sex), length(tmp2$SexLtrim), length(tmp2$SexRtrim), length(tmp2$SexTrim))) tmp <- mutate(df, SexLpad = lpad(df$Sex, 6, \"xx\"), SexRpad = rpad(df$Sex, 7, \"xx\")) head(tmp)} if (FALSE) { tmp <- mutate(df, d1 = levenshtein(df$Class, df$Sex), d2 = levenshtein(df$Age, df$Sex), d3 = levenshtein(df$Age, df$Age)) head(tmp)} if (FALSE) { tmp <- mutate(df, s1 = instr(df$Sex, \"m\"), s2 = instr(df$Sex, \"M\"), s3 = locate(\"m\", df$Sex), s4 = locate(\"m\", df$Sex, pos = 4)) head(tmp)} if (FALSE) { tmp <- mutate(df, v1 = df$Freq/3) head(select(tmp, format_number(tmp$v1, 0), format_number(tmp$v1, 2), format_string(\"%4.2f %s\", tmp$v1, tmp$Sex)), 10)} if (FALSE) { # concatenate strings tmp <- mutate(df, s1 = concat_ws(\"_\", df$Class, df$Sex), s2 = concat_ws(\"+\", df$Class, df$Sex, df$Age, df$Survived)) head(tmp)} if (FALSE) { tmp <- mutate(df, s1 = regexp_extract(df$Class, \"(\\\\d+)\\\\w+\", 1), s2 = regexp_extract(df$Sex, \"^(\\\\w)\\\\w+\", 1), s3 = regexp_replace(df$Class, \"\\\\D+\", \"\"), s4 = substring_index(df$Sex, \"a\", 1), s5 = substring_index(df$Sex, \"a\", -1), s6 = translate(df$Sex, \"ale\", \"\"), s7 = translate(df$Sex, \"a\", \"-\")) head(tmp)} if (FALSE) { head(select(df, split_string(df$Class, \"\\\\d\", 2))) head(select(df, split_string(df$Sex, \"a\"))) head(select(df, split_string(df$Class, \"\\\\d\"))) # This is equivalent to the following SQL expression head(selectExpr(df, \"split(Class, '\\\\\\\\d')\"))} if (FALSE) { head(select(df, repeat_string(df$Class, 3))) # This is equivalent to the following SQL expression head(selectExpr(df, \"repeat(Class, 3)\"))}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":null,"dir":"Reference","previous_headings":"","what":"Window functions for Column operations — column_window_functions","title":"Window functions for Column operations — column_window_functions","text":"Window functions defined Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Window functions for Column operations — column_window_functions","text":"","code":"cume_dist(x = \"missing\") dense_rank(x = \"missing\") lag(x, ...) lead(x, offset, defaultValue = NULL) nth_value(x, offset, ...) ntile(x) percent_rank(x = \"missing\") rank(x, ...) row_number(x = \"missing\") # S4 method for missing cume_dist() # S4 method for missing dense_rank() # S4 method for characterOrColumn lag(x, offset = 1, defaultValue = NULL) # S4 method for characterOrColumn,numeric lead(x, offset = 1, defaultValue = NULL) # S4 method for characterOrColumn,numeric nth_value(x, offset, na.rm = FALSE) # S4 method for numeric ntile(x) # S4 method for missing percent_rank() # S4 method for missing rank() # S4 method for ANY rank(x, ...) # S4 method for missing row_number()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Window functions for Column operations — column_window_functions","text":"x lag lead, column character string Column compute . ntile, number ntile groups. ... additional argument(s). offset numeric indicating number row use value defaultValue (optional) default use offset row exist. na.rm logical indicates Nth value skip null determination row use","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Window functions for Column operations — column_window_functions","text":"cume_dist: Returns cumulative distribution values within window partition, .e. fraction rows current row: (number values including x) / (total number rows partition). equivalent CUME_DIST function SQL. method used argument. dense_rank: Returns rank rows within window partition, without gaps. difference rank dense_rank dense_rank leaves gaps ranking sequence ties. , ranking competition using dense_rank three people tie second place, say three second place next person came third. Rank give sequential numbers, making person came third place (ties) register coming fifth. equivalent DENSE_RANK function SQL. method used argument. lag: Returns value offset rows current row, defaultValue less offset rows current row. example, offset one return previous row given point window partition. equivalent LAG function SQL. lead: Returns value offset rows current row, defaultValue less offset rows current row. example, offset one return next row given point window partition. equivalent LEAD function SQL. nth_value: Window function: returns value offsetth row window frame# (counting 1), null size window frame less offset rows. ntile: Returns ntile group id (1 n inclusive) ordered window partition. example, n 4, first quarter rows get value 1, second quarter get 2, third quarter get 3, last quarter get 4. equivalent NTILE function SQL. percent_rank: Returns relative rank (.e. percentile) rows within window partition. computed : (rank row partition - 1) / (number rows partition - 1). equivalent PERCENT_RANK function SQL. method used argument. rank: Returns rank rows within window partition. difference rank dense_rank dense_rank leaves gaps ranking sequence ties. , ranking competition using dense_rank three people tie second place, say three second place next person came third. Rank give sequential numbers, making person came third place (ties) register coming fifth. equivalent RANK function SQL. method used argument. row_number: Returns sequential number starting 1 within window partition. equivalent ROW_NUMBER function SQL. method used argument.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Window functions for Column operations — column_window_functions","text":"cume_dist since 1.6.0 dense_rank since 1.6.0 lag since 1.6.0 lead since 1.6.0 nth_value since 3.1.0 ntile since 1.6.0 percent_rank since 1.6.0 rank since 1.6.0 row_number since 1.6.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/column_window_functions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Window functions for Column operations — column_window_functions","text":"","code":"if (FALSE) { # Dataframe used throughout this doc df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) ws <- orderBy(windowPartitionBy(\"am\"), \"hp\") tmp <- mutate(df, dist = over(cume_dist(), ws), dense_rank = over(dense_rank(), ws), lag = over(lag(df$mpg), ws), lead = over(lead(df$mpg, 1), ws), percent_rank = over(percent_rank(), ws), rank = over(rank(), ws), row_number = over(row_number(), ws), nth_value = over(nth_value(df$mpg, 3), ws)) # Get ntile group id (1-4) for hp tmp <- mutate(tmp, ntile = over(ntile(4), ws)) head(tmp)}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columnfunctions.html","id":null,"dir":"Reference","previous_headings":"","what":"A set of operations working with SparkDataFrame columns — asc","title":"A set of operations working with SparkDataFrame columns — asc","text":"set operations working SparkDataFrame columns","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columnfunctions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"A set of operations working with SparkDataFrame columns — asc","text":"","code":"asc(x) asc_nulls_first(x) asc_nulls_last(x) contains(x, ...) desc(x) desc_nulls_first(x) desc_nulls_last(x) getField(x, ...) getItem(x, ...) isNaN(x) isNull(x) isNotNull(x) like(x, ...) rlike(x, ...) ilike(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columnfunctions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"A set of operations working with SparkDataFrame columns — asc","text":"x Column object. ... additional argument(s).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columns.html","id":null,"dir":"Reference","previous_headings":"","what":"Column Names of SparkDataFrame — colnames","title":"Column Names of SparkDataFrame — colnames","text":"Return vector column names.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columns.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Column Names of SparkDataFrame — colnames","text":"","code":"colnames(x, do.NULL = TRUE, prefix = \"col\") colnames(x) <- value columns(x) # S4 method for SparkDataFrame columns(x) # S4 method for SparkDataFrame names(x) # S4 method for SparkDataFrame names(x) <- value # S4 method for SparkDataFrame colnames(x) # S4 method for SparkDataFrame colnames(x) <- value"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columns.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Column Names of SparkDataFrame — colnames","text":"x SparkDataFrame. .NULL currently used. prefix currently used. value character vector. Must length number columns renamed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columns.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Column Names of SparkDataFrame — colnames","text":"columns since 1.4.0 names since 1.5.0 names<- since 1.5.0 colnames since 1.6.0 colnames<- since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/columns.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Column Names of SparkDataFrame — colnames","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) columns(df) colnames(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":null,"dir":"Reference","previous_headings":"","what":"corr — corr","title":"corr — corr","text":"Computes Pearson Correlation Coefficient two Columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"corr — corr","text":"","code":"corr(x, ...) # S4 method for Column corr(x, col2) # S4 method for SparkDataFrame corr(x, colName1, colName2, method = \"pearson\")"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"corr — corr","text":"x Column SparkDataFrame. ... additional argument(s). x Column, Column provided. x SparkDataFrame, two column names provided. col2 (second) Column. colName1 name first column colName2 name second column method Optional. character specifying method calculating correlation. \"pearson\" allowed now.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"corr — corr","text":"Pearson Correlation Coefficient Double.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"corr — corr","text":"corr since 1.6.0 corr since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/corr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"corr — corr","text":"","code":"if (FALSE) { df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) head(select(df, corr(df$mpg, df$hp)))} if (FALSE) { corr(df, \"mpg\", \"hp\") corr(df, \"mpg\", \"hp\", method = \"pearson\")}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":null,"dir":"Reference","previous_headings":"","what":"Count — count","title":"Count — count","text":"Count number rows group GroupedData input. resulting SparkDataFrame also contain grouping columns. can used column aggregate function Column input, returns number items group.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Count — count","text":"","code":"count(x) n(x) # S4 method for GroupedData count(x) # S4 method for Column count(x) # S4 method for Column n(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Count — count","text":"x GroupedData Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Count — count","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Count — count","text":"count since 1.4.0 count since 1.4.0 n since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/count.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Count — count","text":"","code":"if (FALSE) { count(groupBy(df, \"name\")) } if (FALSE) count(df$c) if (FALSE) n(df$c)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":null,"dir":"Reference","previous_headings":"","what":"cov — cov","title":"cov — cov","text":"Compute covariance two expressions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"cov — cov","text":"","code":"cov(x, ...) covar_samp(col1, col2) covar_pop(col1, col2) # S4 method for characterOrColumn cov(x, col2) # S4 method for characterOrColumn,characterOrColumn covar_samp(col1, col2) # S4 method for characterOrColumn,characterOrColumn covar_pop(col1, col2) # S4 method for SparkDataFrame cov(x, colName1, colName2)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"cov — cov","text":"x Column SparkDataFrame. ... additional argument(s). x Column, Column provided. x SparkDataFrame, two column names provided. col1 first Column. col2 second Column. colName1 name first column colName2 name second column","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"cov — cov","text":"covariance two columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"cov — cov","text":"cov: Compute sample covariance two expressions. covar_sample: Alias cov. covar_pop: Computes population covariance two expressions. cov: applied SparkDataFrame, calculates sample covariance two numerical columns one SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"cov — cov","text":"cov since 1.6.0 covar_samp since 2.0.0 covar_pop since 2.0.0 cov since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cov.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"cov — cov","text":"","code":"if (FALSE) { df <- createDataFrame(cbind(model = rownames(mtcars), mtcars)) head(select(df, cov(df$mpg, df$hp), cov(\"mpg\", \"hp\"), covar_samp(df$mpg, df$hp), covar_samp(\"mpg\", \"hp\"), covar_pop(df$mpg, df$hp), covar_pop(\"mpg\", \"hp\")))} if (FALSE) { cov(df, \"mpg\", \"hp\") cov(df, df$mpg, df$hp)}"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame — createDataFrame","title":"Create a SparkDataFrame — createDataFrame","text":"Converts R data.frame list SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame — createDataFrame","text":"","code":"createDataFrame(data, schema = NULL, samplingRatio = 1, numPartitions = NULL) as.DataFrame(data, schema = NULL, samplingRatio = 1, numPartitions = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame — createDataFrame","text":"data list data.frame. schema list column names named list (StructType), optional. samplingRatio Currently used. numPartitions number partitions SparkDataFrame. Defaults 1, limited length list number rows data.frame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame — createDataFrame","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame — createDataFrame","text":"createDataFrame since 1.4.0 .DataFrame since 1.6.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createDataFrame.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a SparkDataFrame — createDataFrame","text":"","code":"if (FALSE) { sparkR.session() df1 <- as.DataFrame(iris) df2 <- as.DataFrame(list(3,4,5,6)) df3 <- createDataFrame(iris) df4 <- createDataFrame(cars, numPartitions = 2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Create an external table — createExternalTable","title":"(Deprecated) Create an external table — createExternalTable","text":"Creates external table based dataset data source, Returns SparkDataFrame associated external table.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Create an external table — createExternalTable","text":"","code":"createExternalTable(tableName, path = NULL, source = NULL, schema = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Create an external table — createExternalTable","text":"tableName name table. path path files load. source name external data source. schema schema data required data sources. ... additional argument(s) passed method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"(Deprecated) Create an external table — createExternalTable","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"(Deprecated) Create an external table — createExternalTable","text":"data source specified source set options(...). source specified, default data source configured \"spark.sql.sources.default\" used.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Create an external table — createExternalTable","text":"createExternalTable since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createExternalTable-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Create an external table — createExternalTable","text":"","code":"if (FALSE) { sparkR.session() df <- createExternalTable(\"myjson\", path=\"path/to/json\", source=\"json\", schema) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createOrReplaceTempView.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates a temporary view using the given name. — createOrReplaceTempView","title":"Creates a temporary view using the given name. — createOrReplaceTempView","text":"Creates new temporary view using SparkDataFrame Spark Session. temporary view name already exists, replaces .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createOrReplaceTempView.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates a temporary view using the given name. — createOrReplaceTempView","text":"","code":"createOrReplaceTempView(x, viewName) # S4 method for SparkDataFrame,character createOrReplaceTempView(x, viewName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createOrReplaceTempView.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates a temporary view using the given name. — createOrReplaceTempView","text":"x SparkDataFrame viewName character vector containing name table","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createOrReplaceTempView.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Creates a temporary view using the given name. — createOrReplaceTempView","text":"createOrReplaceTempView since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createOrReplaceTempView.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Creates a temporary view using the given name. — createOrReplaceTempView","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) createOrReplaceTempView(df, \"json_df\") new_df <- sql(\"SELECT * FROM json_df\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":null,"dir":"Reference","previous_headings":"","what":"Creates a table based on the dataset in a data source — createTable","title":"Creates a table based on the dataset in a data source — createTable","text":"Creates table based dataset data source. Returns SparkDataFrame associated table.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Creates a table based on the dataset in a data source — createTable","text":"","code":"createTable(tableName, path = NULL, source = NULL, schema = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Creates a table based on the dataset in a data source — createTable","text":"tableName qualified unqualified name designates table. database identifier provided, refers table current database. path (optional) path files load. source (optional) name data source. schema (optional) schema data required data sources. ... additional named parameters options data source.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Creates a table based on the dataset in a data source — createTable","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Creates a table based on the dataset in a data source — createTable","text":"data source specified source set options(...). source specified, default data source configured \"spark.sql.sources.default\" used. path specified, external table created data given path. Otherwise managed table created.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Creates a table based on the dataset in a data source — createTable","text":"createTable since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/createTable.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Creates a table based on the dataset in a data source — createTable","text":"","code":"if (FALSE) { sparkR.session() df <- createTable(\"myjson\", path=\"path/to/json\", source=\"json\", schema) createTable(\"people\", source = \"json\", schema = schema) insertInto(df, \"people\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/create_lambda.html","id":null,"dir":"Reference","previous_headings":"","what":"Create o.a.s.sql.expressions.LambdaFunction corresponding\nto transformation described by func.\nUsed by higher order functions. — create_lambda","title":"Create o.a.s.sql.expressions.LambdaFunction corresponding\nto transformation described by func.\nUsed by higher order functions. — create_lambda","text":"Create o..s.sql.expressions.LambdaFunction corresponding transformation described func. Used higher order functions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/create_lambda.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create o.a.s.sql.expressions.LambdaFunction corresponding\nto transformation described by func.\nUsed by higher order functions. — create_lambda","text":"","code":"create_lambda(fun)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/create_lambda.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create o.a.s.sql.expressions.LambdaFunction corresponding\nto transformation described by func.\nUsed by higher order functions. — create_lambda","text":"fun R function (unary, binary ternary) transforms Columns Column","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/create_lambda.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create o.a.s.sql.expressions.LambdaFunction corresponding\nto transformation described by func.\nUsed by higher order functions. — create_lambda","text":"JVM LambdaFunction object","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":null,"dir":"Reference","previous_headings":"","what":"CrossJoin — crossJoin","title":"CrossJoin — crossJoin","text":"Returns Cartesian Product two SparkDataFrames.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"CrossJoin — crossJoin","text":"","code":"# S4 method for SparkDataFrame,SparkDataFrame crossJoin(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"CrossJoin — crossJoin","text":"x SparkDataFrame y SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"CrossJoin — crossJoin","text":"SparkDataFrame containing result join operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"CrossJoin — crossJoin","text":"crossJoin since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crossJoin.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"CrossJoin — crossJoin","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) crossJoin(df1, df2) # Performs a Cartesian }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":null,"dir":"Reference","previous_headings":"","what":"Computes a pair-wise frequency table of the given columns — crosstab","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"Computes pair-wise frequency table given columns. Also known contingency table. number distinct values column less 1e4. 1e6 non-zero pair frequencies returned.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"","code":"# S4 method for SparkDataFrame,character,character crosstab(x, col1, col2)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"x SparkDataFrame col1 name first column. Distinct items make first item row. col2 name second column. Distinct items make column names output.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"local R data.frame representing contingency table. first column row distinct values col1 column names distinct values col2. name first column \"col1_col2\". Pairs occurrences zero counts.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"crosstab since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/crosstab.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Computes a pair-wise frequency table of the given columns — crosstab","text":"","code":"if (FALSE) { df <- read.json(\"/path/to/file.json\") ct <- crosstab(df, \"title\", \"gender\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":null,"dir":"Reference","previous_headings":"","what":"cube — cube","title":"cube — cube","text":"Create multi-dimensional cube SparkDataFrame using specified columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"cube — cube","text":"","code":"cube(x, ...) # S4 method for SparkDataFrame cube(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"cube — cube","text":"x SparkDataFrame. ... character name(s) Column(s) group .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"cube — cube","text":"GroupedData.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"cube — cube","text":"grouping expression missing cube creates single global aggregate equivalent direct application agg.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"cube — cube","text":"cube since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/cube.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"cube — cube","text":"","code":"if (FALSE) { df <- createDataFrame(mtcars) mean(cube(df, \"cyl\", \"gear\", \"am\"), \"mpg\") # Following calls are equivalent agg(cube(df), mean(df$mpg)) agg(df, mean(df$mpg)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/currentDatabase.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the current default database — currentDatabase","title":"Returns the current default database — currentDatabase","text":"Returns current default database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/currentDatabase.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the current default database — currentDatabase","text":"","code":"currentDatabase()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/currentDatabase.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns the current default database — currentDatabase","text":"name current default database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/currentDatabase.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns the current default database — currentDatabase","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/currentDatabase.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the current default database — currentDatabase","text":"","code":"if (FALSE) { sparkR.session() currentDatabase() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapply.html","id":null,"dir":"Reference","previous_headings":"","what":"dapply — dapply","title":"dapply — dapply","text":"Apply function partition SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapply.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"dapply — dapply","text":"","code":"dapply(x, func, schema) # S4 method for SparkDataFrame,`function`,characterOrstructType dapply(x, func, schema)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapply.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"dapply — dapply","text":"x SparkDataFrame func function applied partition SparkDataFrame. func one parameter, R data.frame corresponds partition passed. output func R data.frame. schema schema resulting SparkDataFrame function applied. must match output func. Since Spark 2.3, DDL-formatted string also supported schema.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapply.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"dapply — dapply","text":"dapply since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapply.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"dapply — dapply","text":"","code":"if (FALSE) { df <- createDataFrame(iris) df1 <- dapply(df, function(x) { x }, schema(df)) collect(df1) # filter and add a column df <- createDataFrame( list(list(1L, 1, \"1\"), list(2L, 2, \"2\"), list(3L, 3, \"3\")), c(\"a\", \"b\", \"c\")) schema <- structType(structField(\"a\", \"integer\"), structField(\"b\", \"double\"), structField(\"c\", \"string\"), structField(\"d\", \"integer\")) df1 <- dapply( df, function(x) { y <- x[x[1] > 1, ] y <- cbind(y, y[1] + 1L) }, schema) # The schema also can be specified in a DDL-formatted string. schema <- \"a INT, d DOUBLE, c STRING, d INT\" df1 <- dapply( df, function(x) { y <- x[x[1] > 1, ] y <- cbind(y, y[1] + 1L) }, schema) collect(df1) # the result # a b c d # 1 2 2 2 3 # 2 3 3 3 4 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapplyCollect.html","id":null,"dir":"Reference","previous_headings":"","what":"dapplyCollect — dapplyCollect","title":"dapplyCollect — dapplyCollect","text":"Apply function partition SparkDataFrame collect result back R data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapplyCollect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"dapplyCollect — dapplyCollect","text":"","code":"dapplyCollect(x, func) # S4 method for SparkDataFrame,`function` dapplyCollect(x, func)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapplyCollect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"dapplyCollect — dapplyCollect","text":"x SparkDataFrame func function applied partition SparkDataFrame. func one parameter, R data.frame corresponds partition passed. output func R data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapplyCollect.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"dapplyCollect — dapplyCollect","text":"dapplyCollect since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dapplyCollect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"dapplyCollect — dapplyCollect","text":"","code":"if (FALSE) { df <- createDataFrame(iris) ldf <- dapplyCollect(df, function(x) { x }) # filter and add a column df <- createDataFrame( list(list(1L, 1, \"1\"), list(2L, 2, \"2\"), list(3L, 3, \"3\")), c(\"a\", \"b\", \"c\")) ldf <- dapplyCollect( df, function(x) { y <- x[x[1] > 1, ] y <- cbind(y, y[1] + 1L) }) # the result # a b c d # 2 2 2 3 # 3 3 3 4 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":null,"dir":"Reference","previous_headings":"","what":"describe — describe","title":"describe — describe","text":"Computes statistics numeric string columns. columns given, function computes statistics numerical string columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"describe — describe","text":"","code":"describe(x, col, ...) # S4 method for SparkDataFrame,character describe(x, col, ...) # S4 method for SparkDataFrame,ANY describe(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"describe — describe","text":"x SparkDataFrame computed. col string name. ... additional expressions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"describe — describe","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"describe — describe","text":"describe(SparkDataFrame, character) since 1.4.0 describe(SparkDataFrame) since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/describe.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"describe — describe","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) describe(df) describe(df, \"col1\") describe(df, \"col1\", \"col2\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dim.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the dimensions of SparkDataFrame — dim","title":"Returns the dimensions of SparkDataFrame — dim","text":"Returns dimensions (number rows columns) SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dim.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the dimensions of SparkDataFrame — dim","text":"","code":"# S4 method for SparkDataFrame dim(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dim.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns the dimensions of SparkDataFrame — dim","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dim.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns the dimensions of SparkDataFrame — dim","text":"dim since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dim.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the dimensions of SparkDataFrame — dim","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) dim(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/distinct.html","id":null,"dir":"Reference","previous_headings":"","what":"Distinct — distinct","title":"Distinct — distinct","text":"Return new SparkDataFrame containing distinct rows SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/distinct.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Distinct — distinct","text":"","code":"distinct(x) # S4 method for SparkDataFrame distinct(x) # S4 method for SparkDataFrame unique(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/distinct.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Distinct — distinct","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/distinct.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Distinct — distinct","text":"distinct since 1.4.0 unique since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/distinct.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Distinct — distinct","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) distinctDF <- distinct(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":null,"dir":"Reference","previous_headings":"","what":"drop — drop","title":"drop — drop","text":"Returns new SparkDataFrame columns dropped. -op schema contain column name(s).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"drop — drop","text":"","code":"drop(x, ...) # S4 method for SparkDataFrame drop(x, col) # S4 method for ANY drop(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"drop — drop","text":"x SparkDataFrame. ... arguments passed methods. col character vector column names Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"drop — drop","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"drop — drop","text":"drop since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/drop.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"drop — drop","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) drop(df, \"col1\") drop(df, c(\"col1\", \"col2\")) drop(df, df$col1) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":null,"dir":"Reference","previous_headings":"","what":"dropDuplicates — dropDuplicates","title":"dropDuplicates — dropDuplicates","text":"Returns new SparkDataFrame duplicate rows removed, considering subset columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"dropDuplicates — dropDuplicates","text":"","code":"dropDuplicates(x, ...) # S4 method for SparkDataFrame dropDuplicates(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"dropDuplicates — dropDuplicates","text":"x SparkDataFrame. ... character vector column names string column names. first argument contains character vector, followings ignored.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"dropDuplicates — dropDuplicates","text":"SparkDataFrame duplicate rows removed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"dropDuplicates — dropDuplicates","text":"dropDuplicates since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropDuplicates.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"dropDuplicates — dropDuplicates","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) dropDuplicates(df) dropDuplicates(df, \"col1\", \"col2\") dropDuplicates(df, c(\"col1\", \"col2\")) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropFields.html","id":null,"dir":"Reference","previous_headings":"","what":"dropFields — dropFields","title":"dropFields — dropFields","text":"Drops fields struct Column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropFields.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"dropFields — dropFields","text":"","code":"dropFields(x, ...) # S4 method for Column dropFields(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropFields.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"dropFields — dropFields","text":"x Column ... names fields dropped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropFields.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"dropFields — dropFields","text":"dropFields since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropFields.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"dropFields — dropFields","text":"","code":"if (FALSE) { df <- select( createDataFrame(iris), alias( struct( column(\"Sepal_Width\"), column(\"Sepal_Length\"), alias( struct( column(\"Petal_Width\"), column(\"Petal_Length\"), alias( column(\"Petal_Width\") * column(\"Petal_Length\"), \"Petal_Product\" ) ), \"Petal\" ) ), \"dimensions\" ) ) head(withColumn(df, \"dimensions\", dropFields(df$dimensions, \"Petal\"))) head( withColumn( df, \"dimensions\", dropFields(df$dimensions, \"Sepal_Width\", \"Sepal_Length\") ) ) # This method supports dropping multiple nested fields directly e.g. head( withColumn( df, \"dimensions\", dropFields(df$dimensions, \"Petal.Petal_Width\", \"Petal.Petal_Length\") ) ) # However, if you are going to add/replace multiple nested fields, # it is preferred to extract out the nested struct before # adding/replacing multiple fields e.g. head( withColumn( df, \"dimensions\", withField( column(\"dimensions\"), \"Petal\", dropFields(column(\"dimensions.Petal\"), \"Petal_Width\", \"Petal_Length\") ) ) ) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempTable-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Drop Temporary Table — dropTempTable","title":"(Deprecated) Drop Temporary Table — dropTempTable","text":"Drops temporary table given table name catalog. table cached/persisted , also unpersisted.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempTable-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Drop Temporary Table — dropTempTable","text":"","code":"dropTempTable(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempTable-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Drop Temporary Table — dropTempTable","text":"tableName name SparkSQL table dropped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempTable-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Drop Temporary Table — dropTempTable","text":"dropTempTable since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempTable-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Drop Temporary Table — dropTempTable","text":"","code":"if (FALSE) { sparkR.session() df <- read.df(path, \"parquet\") createOrReplaceTempView(df, \"table\") dropTempTable(\"table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":null,"dir":"Reference","previous_headings":"","what":"Drops the temporary view with the given view name in the catalog. — dropTempView","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"Drops temporary view given view name catalog. view cached , also uncached.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"","code":"dropTempView(viewName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"viewName name temporary view dropped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"TRUE view dropped successfully, FALSE otherwise.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dropTempView.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Drops the temporary view with the given view name in the catalog. — dropTempView","text":"","code":"if (FALSE) { sparkR.session() df <- read.df(path, \"parquet\") createOrReplaceTempView(df, \"table\") dropTempView(\"table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dtypes.html","id":null,"dir":"Reference","previous_headings":"","what":"DataTypes — dtypes","title":"DataTypes — dtypes","text":"Return column names data types list","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dtypes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"DataTypes — dtypes","text":"","code":"dtypes(x) # S4 method for SparkDataFrame dtypes(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dtypes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"DataTypes — dtypes","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dtypes.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"DataTypes — dtypes","text":"dtypes since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/dtypes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"DataTypes — dtypes","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) dtypes(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/endsWith.html","id":null,"dir":"Reference","previous_headings":"","what":"endsWith — endsWith","title":"endsWith — endsWith","text":"Determines entries x end string (entries ) suffix respectively, strings recycled common lengths.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/endsWith.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"endsWith — endsWith","text":"","code":"endsWith(x, suffix) # S4 method for Column endsWith(x, suffix)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/endsWith.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"endsWith — endsWith","text":"x vector character string whose \"ends\" considered suffix character vector (often length one)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/endsWith.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"endsWith — endsWith","text":"endsWith since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":null,"dir":"Reference","previous_headings":"","what":"%<=>% — %<=>%","title":"%<=>% — %<=>%","text":"Equality test safe null values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"%<=>% — %<=>%","text":"","code":"x %<=>% value # S4 method for Column %<=>%(x, value)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"%<=>% — %<=>%","text":"x Column value value compare","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"%<=>% — %<=>%","text":"Can used, unlike standard equality operator, perform null-safe joins. Equivalent Scala Column.<=> Column.eqNullSafe.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"%<=>% — %<=>%","text":"%<=>% since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/eq_null_safe.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"%<=>% — %<=>%","text":"","code":"if (FALSE) { df1 <- createDataFrame(data.frame( x = c(1, NA, 3, NA), y = c(2, 6, 3, NA) )) head(select(df1, df1$x == df1$y, df1$x %<=>% df1$y)) df2 <- createDataFrame(data.frame(y = c(3, NA))) count(join(df1, df2, df1$y == df2$y)) count(join(df1, df2, df1$y %<=>% df2$y)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":null,"dir":"Reference","previous_headings":"","what":"except — except","title":"except — except","text":"Return new SparkDataFrame containing rows SparkDataFrame another SparkDataFrame. equivalent EXCEPT DISTINCT SQL.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"except — except","text":"","code":"except(x, y) # S4 method for SparkDataFrame,SparkDataFrame except(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"except — except","text":"x SparkDataFrame. y SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"except — except","text":"SparkDataFrame containing result except operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"except — except","text":"except since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/except.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"except — except","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) exceptDF <- except(df, df2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":null,"dir":"Reference","previous_headings":"","what":"exceptAll — exceptAll","title":"exceptAll — exceptAll","text":"Return new SparkDataFrame containing rows SparkDataFrame another SparkDataFrame preserving duplicates. equivalent EXCEPT SQL. Also standard SQL, function resolves columns position (name).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"exceptAll — exceptAll","text":"","code":"exceptAll(x, y) # S4 method for SparkDataFrame,SparkDataFrame exceptAll(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"exceptAll — exceptAll","text":"x SparkDataFrame. y SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"exceptAll — exceptAll","text":"SparkDataFrame containing result except operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"exceptAll — exceptAll","text":"exceptAll since 2.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/exceptAll.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"exceptAll — exceptAll","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) exceptAllDF <- exceptAll(df1, df2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/explain.html","id":null,"dir":"Reference","previous_headings":"","what":"Explain — explain","title":"Explain — explain","text":"Print logical physical Catalyst plans console debugging.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/explain.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Explain — explain","text":"","code":"explain(x, ...) # S4 method for SparkDataFrame explain(x, extended = FALSE) # S4 method for StreamingQuery explain(x, extended = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/explain.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Explain — explain","text":"x SparkDataFrame StreamingQuery. ... arguments passed methods. extended Logical. extended FALSE, prints physical plan.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/explain.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Explain — explain","text":"explain since 1.4.0 explain(StreamingQuery) since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/explain.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Explain — explain","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) explain(df, TRUE) } if (FALSE) explain(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":null,"dir":"Reference","previous_headings":"","what":"Filter — filter","title":"Filter — filter","text":"Filter rows SparkDataFrame according given condition.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Filter — filter","text":"","code":"filter(x, condition) where(x, condition) # S4 method for SparkDataFrame,characterOrColumn filter(x, condition) # S4 method for SparkDataFrame,characterOrColumn where(x, condition)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Filter — filter","text":"x SparkDataFrame sorted. condition condition filter . may either Column expression string containing SQL statement","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Filter — filter","text":"SparkDataFrame containing rows meet condition.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Filter — filter","text":"filter since 1.4.0 since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/filter.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Filter — filter","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) filter(df, \"col1 > 0\") filter(df, df$col2 != \"abcdefg\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":null,"dir":"Reference","previous_headings":"","what":"Return the first row of a SparkDataFrame — first","title":"Return the first row of a SparkDataFrame — first","text":"Aggregate function: returns first value group.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return the first row of a SparkDataFrame — first","text":"","code":"first(x, ...) # S4 method for SparkDataFrame first(x) # S4 method for characterOrColumn first(x, na.rm = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return the first row of a SparkDataFrame — first","text":"x SparkDataFrame column used aggregation function. ... arguments passed methods. na.rm logical value indicating whether NA values stripped computation proceeds.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Return the first row of a SparkDataFrame — first","text":"function default returns first values sees. return first non-missing value sees na.rm set true. values missing, NA returned. Note: function non-deterministic results depends order rows may non-deterministic shuffle.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Return the first row of a SparkDataFrame — first","text":"first(SparkDataFrame) since 1.4.0 first(characterOrColumn) since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/first.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return the first row of a SparkDataFrame — first","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) first(df) } if (FALSE) { first(df$c) first(df$c, TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":null,"dir":"Reference","previous_headings":"","what":"Get fitted result from a k-means model — fitted","title":"Get fitted result from a k-means model — fitted","text":"Get fitted result k-means model, similarly R's fitted(). Note: saved-loaded model support method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get fitted result from a k-means model — fitted","text":"","code":"fitted(object, ...) # S4 method for KMeansModel fitted(object, method = c(\"centers\", \"classes\"))"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get fitted result from a k-means model — fitted","text":"object fitted k-means model. ... additional argument(s) passed method. method type fitted results, \"centers\" cluster centers \"classes\" assigned classes.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get fitted result from a k-means model — fitted","text":"fitted returns SparkDataFrame containing fitted values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get fitted result from a k-means model — fitted","text":"fitted since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/fitted.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get fitted result from a k-means model — fitted","text":"","code":"if (FALSE) { model <- spark.kmeans(trainingData, ~ ., 2) fitted.model <- fitted(model) showDF(fitted.model) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":null,"dir":"Reference","previous_headings":"","what":"Finding frequent items for columns, possibly with false positives — freqItems","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"Finding frequent items columns, possibly false positives. Using frequent element count algorithm described https://dl.acm.org/doi/10.1145/762471.762473, proposed Karp, Schenker, Papadimitriou.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"","code":"# S4 method for SparkDataFrame,character freqItems(x, cols, support = 0.01)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"x SparkDataFrame. cols vector column names search frequent items . support (Optional) minimum frequency item considered frequent. greater 1e-4. Default support = 0.01.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"local R data.frame frequent items column","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"freqItems since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/freqItems.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Finding frequent items for columns, possibly with false positives — freqItems","text":"","code":"if (FALSE) { df <- read.json(\"/path/to/file.json\") fi = freqItems(df, c(\"title\", \"gender\")) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":null,"dir":"Reference","previous_headings":"","what":"gapply — gapply","title":"gapply — gapply","text":"Groups SparkDataFrame using specified columns applies R function group.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"gapply — gapply","text":"","code":"gapply(x, ...) # S4 method for GroupedData gapply(x, func, schema) # S4 method for SparkDataFrame gapply(x, cols, func, schema)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"gapply — gapply","text":"x SparkDataFrame GroupedData. ... additional argument(s) passed method. func function applied group partition specified grouping column SparkDataFrame. See Details. schema schema resulting SparkDataFrame function applied. schema must match output func. defined output column preferred output column name corresponding data type. Since Spark 2.3, DDL-formatted string also supported schema. cols grouping columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"gapply — gapply","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"gapply — gapply","text":"func function two arguments. first, usually named key (though enforced) corresponds grouping key, unnamed list length(cols) length-one objects corresponding grouping columns' values current group. second, herein x, local data.frame columns input cols rows corresponding key. output func must data.frame matching schema -- particular means names output data.frame irrelevant","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"gapply — gapply","text":"gapply(GroupedData) since 2.0.0 gapply(SparkDataFrame) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapply.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"gapply — gapply","text":"","code":"if (FALSE) { # Computes the arithmetic mean of the second column by grouping # on the first and third columns. Output the grouping values and the average. df <- createDataFrame ( list(list(1L, 1, \"1\", 0.1), list(1L, 2, \"1\", 0.2), list(3L, 3, \"3\", 0.3)), c(\"a\", \"b\", \"c\", \"d\")) # Here our output contains three columns, the key which is a combination of two # columns with data types integer and string and the mean which is a double. schema <- structType(structField(\"a\", \"integer\"), structField(\"c\", \"string\"), structField(\"avg\", \"double\")) result <- gapply( df, c(\"a\", \"c\"), function(key, x) { # key will either be list(1L, '1') (for the group where a=1L,c='1') or # list(3L, '3') (for the group where a=3L,c='3') y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) # The schema also can be specified in a DDL-formatted string. schema <- \"a INT, c STRING, avg DOUBLE\" result <- gapply( df, c(\"a\", \"c\"), function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) # We can also group the data and afterwards call gapply on GroupedData. # For example: gdf <- group_by(df, \"a\", \"c\") result <- gapply( gdf, function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) collect(result) # Result # ------ # a c avg # 3 3 3.0 # 1 1 1.5 # Fits linear models on iris dataset by grouping on the 'Species' column and # using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length' # and 'Petal_Width' as training features. df <- createDataFrame (iris) schema <- structType(structField(\"(Intercept)\", \"double\"), structField(\"Sepal_Width\", \"double\"),structField(\"Petal_Length\", \"double\"), structField(\"Petal_Width\", \"double\")) df1 <- gapply( df, df$\"Species\", function(key, x) { m <- suppressWarnings(lm(Sepal_Length ~ Sepal_Width + Petal_Length + Petal_Width, x)) data.frame(t(coef(m))) }, schema) collect(df1) # Result # --------- # Model (Intercept) Sepal_Width Petal_Length Petal_Width # 1 0.699883 0.3303370 0.9455356 -0.1697527 # 2 1.895540 0.3868576 0.9083370 -0.6792238 # 3 2.351890 0.6548350 0.2375602 0.2521257 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":null,"dir":"Reference","previous_headings":"","what":"gapplyCollect — gapplyCollect","title":"gapplyCollect — gapplyCollect","text":"Groups SparkDataFrame using specified columns, applies R function group collects result back R data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"gapplyCollect — gapplyCollect","text":"","code":"gapplyCollect(x, ...) # S4 method for GroupedData gapplyCollect(x, func) # S4 method for SparkDataFrame gapplyCollect(x, cols, func)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"gapplyCollect — gapplyCollect","text":"x SparkDataFrame GroupedData. ... additional argument(s) passed method. func function applied group partition specified grouping column SparkDataFrame. See Details. cols grouping columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"gapplyCollect — gapplyCollect","text":"data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"gapplyCollect — gapplyCollect","text":"func function two arguments. first, usually named key (though enforced) corresponds grouping key, unnamed list length(cols) length-one objects corresponding grouping columns' values current group. second, herein x, local data.frame columns input cols rows corresponding key. output func must data.frame matching schema -- particular means names output data.frame irrelevant","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"gapplyCollect — gapplyCollect","text":"gapplyCollect(GroupedData) since 2.0.0 gapplyCollect(SparkDataFrame) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/gapplyCollect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"gapplyCollect — gapplyCollect","text":"","code":"if (FALSE) { # Computes the arithmetic mean of the second column by grouping # on the first and third columns. Output the grouping values and the average. df <- createDataFrame ( list(list(1L, 1, \"1\", 0.1), list(1L, 2, \"1\", 0.2), list(3L, 3, \"3\", 0.3)), c(\"a\", \"b\", \"c\", \"d\")) result <- gapplyCollect( df, c(\"a\", \"c\"), function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) colnames(y) <- c(\"key_a\", \"key_c\", \"mean_b\") y }) # We can also group the data and afterwards call gapply on GroupedData. # For example: gdf <- group_by(df, \"a\", \"c\") result <- gapplyCollect( gdf, function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) colnames(y) <- c(\"key_a\", \"key_c\", \"mean_b\") y }) # Result # ------ # key_a key_c mean_b # 3 3 3.0 # 1 1 1.5 # Fits linear models on iris dataset by grouping on the 'Species' column and # using 'Sepal_Length' as a target variable, 'Sepal_Width', 'Petal_Length' # and 'Petal_Width' as training features. df <- createDataFrame (iris) result <- gapplyCollect( df, df$\"Species\", function(key, x) { m <- suppressWarnings(lm(Sepal_Length ~ Sepal_Width + Petal_Length + Petal_Width, x)) data.frame(t(coef(m))) }) # Result # --------- # Model X.Intercept. Sepal_Width Petal_Length Petal_Width # 1 0.699883 0.3303370 0.9455356 -0.1697527 # 2 1.895540 0.3868576 0.9083370 -0.6792238 # 3 2.351890 0.6548350 0.2375602 0.2521257 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getLocalProperty.html","id":null,"dir":"Reference","previous_headings":"","what":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","title":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","text":"Get local property set thread, NULL missing. See setLocalProperty.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getLocalProperty.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","text":"","code":"getLocalProperty(key)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getLocalProperty.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","text":"key key local property.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getLocalProperty.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","text":"getLocalProperty since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getLocalProperty.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get a local property set in this thread, or NULL if it is missing. See\nsetLocalProperty. — getLocalProperty","text":"","code":"if (FALSE) { getLocalProperty(\"spark.scheduler.pool\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getNumPartitions.html","id":null,"dir":"Reference","previous_headings":"","what":"getNumPartitions — getNumPartitions","title":"getNumPartitions — getNumPartitions","text":"Return number partitions","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getNumPartitions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"getNumPartitions — getNumPartitions","text":"","code":"# S4 method for SparkDataFrame getNumPartitions(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getNumPartitions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"getNumPartitions — getNumPartitions","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getNumPartitions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"getNumPartitions — getNumPartitions","text":"getNumPartitions since 2.1.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/getNumPartitions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"getNumPartitions — getNumPartitions","text":"","code":"if (FALSE) { sparkR.session() df <- createDataFrame(cars, numPartitions = 2) getNumPartitions(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":null,"dir":"Reference","previous_headings":"","what":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"Fits generalized linear model, similarly R's glm().","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"","code":"# S4 method for formula,ANY,SparkDataFrame glm( formula, family = gaussian, data, epsilon = 1e-06, maxit = 25, weightCol = NULL, var.power = 0, link.power = 1 - var.power, stringIndexerOrderType = c(\"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\"), offsetCol = NULL )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. family description error distribution link function used model. can character string naming family function, family function result call family function. Refer R family https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html. Currently families supported: binomial, gaussian, poisson, Gamma, tweedie. data SparkDataFrame R's glm data training. epsilon positive convergence tolerance iterations. maxit integer giving maximal number IRLS iterations. weightCol weight column name. set NULL, treat instance weights 1.0. var.power index power variance function Tweedie family. link.power index power link function Tweedie family. stringIndexerOrderType order categories string feature column. used decide base level string feature last category ordering dropped encoding strings. Supported options \"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\". default value \"frequencyDesc\". ordering set \"alphabetDesc\", drops category R encoding strings. offsetCol offset column name. set empty, treat instance offsets 0.0. feature specified offset constant coefficient 1.0.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"glm returns fitted generalized linear model.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"glm since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/glm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generalized Linear Models (R-compliant) — glm,formula,ANY,SparkDataFrame-method","text":"","code":"if (FALSE) { sparkR.session() t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- glm(Freq ~ Sex + Age, df, family = \"gaussian\") summary(model) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":null,"dir":"Reference","previous_headings":"","what":"GroupBy — group_by","title":"GroupBy — group_by","text":"Groups SparkDataFrame using specified columns, can run aggregation .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"GroupBy — group_by","text":"","code":"group_by(x, ...) groupBy(x, ...) # S4 method for SparkDataFrame groupBy(x, ...) # S4 method for SparkDataFrame group_by(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"GroupBy — group_by","text":"x SparkDataFrame. ... character name(s) Column(s) group .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"GroupBy — group_by","text":"GroupedData.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"GroupBy — group_by","text":"groupBy since 1.4.0 group_by since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/groupBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"GroupBy — group_by","text":"","code":"if (FALSE) { # Compute the average for all numeric columns grouped by department. avg(groupBy(df, \"department\")) # Compute the max age and average salary, grouped by department and gender. agg(groupBy(df, \"department\", \"gender\"), salary=\"avg\", \"age\" -> \"max\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute the hashCode of an object — hashCode","title":"Compute the hashCode of an object — hashCode","text":"Java-style function compute hashCode given object. Returns integer value.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute the hashCode of an object — hashCode","text":"","code":"hashCode(key)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute the hashCode of an object — hashCode","text":"key object hashed","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute the hashCode of an object — hashCode","text":"hash code integer","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Compute the hashCode of an object — hashCode","text":"works integer, numeric character types right now.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Compute the hashCode of an object — hashCode","text":"hashCode since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hashCode.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute the hashCode of an object — hashCode","text":"","code":"if (FALSE) { hashCode(1L) # 1 hashCode(1.0) # 1072693248 hashCode(\"1\") # 49 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":null,"dir":"Reference","previous_headings":"","what":"Head — head","title":"Head — head","text":"Return first num rows SparkDataFrame R data.frame. num specified, head() returns first 6 rows R data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Head — head","text":"","code":"# S4 method for SparkDataFrame head(x, num = 6L)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Head — head","text":"x SparkDataFrame. num number rows return. Default 6.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Head — head","text":"data.frame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Head — head","text":"head since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/head.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Head — head","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) head(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":null,"dir":"Reference","previous_headings":"","what":"hint — hint","title":"hint — hint","text":"Specifies execution plan hint return new SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"hint — hint","text":"","code":"hint(x, name, ...) # S4 method for SparkDataFrame,character hint(x, name, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"hint — hint","text":"x SparkDataFrame. name name hint. ... optional parameters hint.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"hint — hint","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"hint — hint","text":"hint since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/hint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"hint — hint","text":"","code":"if (FALSE) { df <- createDataFrame(mtcars) avg_mpg <- mean(groupBy(createDataFrame(mtcars), \"cyl\"), \"mpg\") head(join(df, hint(avg_mpg, \"broadcast\"), df$cyl == avg_mpg$cyl)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":null,"dir":"Reference","previous_headings":"","what":"Compute histogram statistics for given column — histogram","title":"Compute histogram statistics for given column — histogram","text":"function computes histogram given SparkR Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compute histogram statistics for given column — histogram","text":"","code":"# S4 method for SparkDataFrame,characterOrColumn histogram(df, col, nbins = 10)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compute histogram statistics for given column — histogram","text":"df SparkDataFrame containing Column build histogram . col column Character string Column build histogram . nbins number bins (optional). Default value 10.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Compute histogram statistics for given column — histogram","text":"data.frame histogram statistics, .e., counts centroids.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Compute histogram statistics for given column — histogram","text":"histogram since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/histogram.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compute histogram statistics for given column — histogram","text":"","code":"if (FALSE) { # Create a SparkDataFrame from the Iris dataset irisDF <- createDataFrame(iris) # Compute histogram statistics histStats <- histogram(irisDF, irisDF$Sepal_Length, nbins = 12) # Once SparkR has computed the histogram statistics, the histogram can be # rendered using the ggplot2 library: require(ggplot2) plot <- ggplot(histStats, aes(x = centroids, y = counts)) + geom_bar(stat = \"identity\") + xlab(\"Sepal_Length\") + ylab(\"Frequency\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/insertInto.html","id":null,"dir":"Reference","previous_headings":"","what":"insertInto — insertInto","title":"insertInto — insertInto","text":"Insert contents SparkDataFrame table registered current SparkSession.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/insertInto.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"insertInto — insertInto","text":"","code":"insertInto(x, tableName, ...) # S4 method for SparkDataFrame,character insertInto(x, tableName, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/insertInto.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"insertInto — insertInto","text":"x SparkDataFrame. tableName character vector containing name table. ... arguments passed methods. existing rows table. overwrite logical argument indicating whether overwrite.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/insertInto.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"insertInto — insertInto","text":"insertInto since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/insertInto.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"insertInto — insertInto","text":"","code":"if (FALSE) { sparkR.session() df <- read.df(path, \"parquet\") df2 <- read.df(path2, \"parquet\") saveAsTable(df, \"table1\") insertInto(df2, \"table1\", overwrite = TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":null,"dir":"Reference","previous_headings":"","what":"Download and Install Apache Spark to a Local Directory — install.spark","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"install.spark downloads installs Spark local directory found. SPARK_HOME set environment, directory found, returned. Spark version use SparkR version. Users can specify desired Hadoop version, remote mirror site, directory package installed locally.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"","code":"install.spark( hadoopVersion = \"3\", mirrorUrl = NULL, localDir = NULL, overwrite = FALSE )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"hadoopVersion Version Hadoop install. Default \"3\". hadoopVersion = \"without\", \"Hadoop free\" build installed. See \"Hadoop Free\" Build information. patched version names can also used. mirrorUrl base URL repositories use. directory layout follow Apache mirrors. localDir local directory Spark installed. directory contains version-specific folders Spark packages. Default path cache directory: Mac OS X: ~/Library/Caches/spark Unix: $XDG_CACHE_HOME defined, otherwise ~/.cache/spark Windows: %LOCALAPPDATA%\\Apache\\Spark\\Cache. overwrite TRUE, download overwrite existing tar file localDir force re-install Spark (case local directory file corrupted)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"(invisible) local directory Spark found installed","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"full url remote file inferred mirrorUrl hadoopVersion. mirrorUrl specifies remote path Spark folder. followed subfolder named Spark version (corresponds SparkR), tar filename. filename composed four parts, .e. [Spark version]-bin-[Hadoop version].tgz. example, full path Spark 3.3.1 package https://archive.apache.org path: http://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz. hadoopVersion = \"without\", [Hadoop version] filename without-hadoop.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"install.spark since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/install.spark.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download and Install Apache Spark to a Local Directory — install.spark","text":"","code":"if (FALSE) { install.spark() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":null,"dir":"Reference","previous_headings":"","what":"Intersect — intersect","title":"Intersect — intersect","text":"Return new SparkDataFrame containing rows SparkDataFrame another SparkDataFrame. equivalent INTERSECT SQL.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Intersect — intersect","text":"","code":"intersect(x, y) # S4 method for SparkDataFrame,SparkDataFrame intersect(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Intersect — intersect","text":"x SparkDataFrame y SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Intersect — intersect","text":"SparkDataFrame containing result intersect.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Intersect — intersect","text":"intersect since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersect.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Intersect — intersect","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) intersectDF <- intersect(df, df2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":null,"dir":"Reference","previous_headings":"","what":"intersectAll — intersectAll","title":"intersectAll — intersectAll","text":"Return new SparkDataFrame containing rows SparkDataFrame another SparkDataFrame preserving duplicates. equivalent INTERSECT SQL. Also standard SQL, function resolves columns position (name).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"intersectAll — intersectAll","text":"","code":"intersectAll(x, y) # S4 method for SparkDataFrame,SparkDataFrame intersectAll(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"intersectAll — intersectAll","text":"x SparkDataFrame. y SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"intersectAll — intersectAll","text":"SparkDataFrame containing result intersect operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"intersectAll — intersectAll","text":"intersectAll since 2.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/intersectAll.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"intersectAll — intersectAll","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) intersectAllDF <- intersectAll(df1, df2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/invoke_higher_order_function.html","id":null,"dir":"Reference","previous_headings":"","what":"Invokes higher order function expression identified by name,\n(relative to o.a.s.sql.catalyst.expressions) — invoke_higher_order_function","title":"Invokes higher order function expression identified by name,\n(relative to o.a.s.sql.catalyst.expressions) — invoke_higher_order_function","text":"Invokes higher order function expression identified name, (relative o..s.sql.catalyst.expressions)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/invoke_higher_order_function.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Invokes higher order function expression identified by name,\n(relative to o.a.s.sql.catalyst.expressions) — invoke_higher_order_function","text":"","code":"invoke_higher_order_function(name, cols, funs)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/invoke_higher_order_function.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Invokes higher order function expression identified by name,\n(relative to o.a.s.sql.catalyst.expressions) — invoke_higher_order_function","text":"name character cols list character Column objects funs list named list(fun = ..., expected_narg = ...)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/invoke_higher_order_function.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Invokes higher order function expression identified by name,\n(relative to o.a.s.sql.catalyst.expressions) — invoke_higher_order_function","text":"Column representing name applied cols funs","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":null,"dir":"Reference","previous_headings":"","what":"isActive — isActive","title":"isActive — isActive","text":"Returns TRUE query actively running.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"isActive — isActive","text":"","code":"isActive(x) # S4 method for StreamingQuery isActive(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"isActive — isActive","text":"x StreamingQuery.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"isActive — isActive","text":"TRUE query actively running, FALSE stopped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"isActive — isActive","text":"isActive(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isActive.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"isActive — isActive","text":"","code":"if (FALSE) isActive(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isLocal.html","id":null,"dir":"Reference","previous_headings":"","what":"isLocal — isLocal","title":"isLocal — isLocal","text":"Returns True collect take methods can run locally (without Spark executors).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isLocal.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"isLocal — isLocal","text":"","code":"isLocal(x) # S4 method for SparkDataFrame isLocal(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isLocal.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"isLocal — isLocal","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isLocal.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"isLocal — isLocal","text":"isLocal since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isLocal.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"isLocal — isLocal","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) isLocal(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":null,"dir":"Reference","previous_headings":"","what":"isStreaming — isStreaming","title":"isStreaming — isStreaming","text":"Returns TRUE SparkDataFrame contains one sources continuously return data arrives. dataset reads data streaming source must executed StreamingQuery using write.stream.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"isStreaming — isStreaming","text":"","code":"isStreaming(x) # S4 method for SparkDataFrame isStreaming(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"isStreaming — isStreaming","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"isStreaming — isStreaming","text":"TRUE SparkDataFrame streaming source","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"isStreaming — isStreaming","text":"isStreaming since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/isStreaming.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"isStreaming — isStreaming","text":"","code":"if (FALSE) { sparkR.session() df <- read.stream(\"socket\", host = \"localhost\", port = 9999) isStreaming(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":null,"dir":"Reference","previous_headings":"","what":"Join — join","title":"Join — join","text":"Joins two SparkDataFrames based given join expression.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Join — join","text":"","code":"# S4 method for SparkDataFrame,SparkDataFrame join(x, y, joinExpr = NULL, joinType = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Join — join","text":"x SparkDataFrame y SparkDataFrame joinExpr (Optional) expression used perform join. joinExpr must Column expression. joinExpr omitted, default, inner join attempted error thrown Cartesian Product. Cartesian join, use crossJoin instead. joinType type join perform, default 'inner'. Must one : 'inner', 'cross', 'outer', 'full', 'fullouter', 'full_outer', 'left', 'leftouter', 'left_outer', 'right', 'rightouter', 'right_outer', 'semi', 'leftsemi', 'left_semi', 'anti', 'leftanti', 'left_anti'.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Join — join","text":"SparkDataFrame containing result join operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Join — join","text":"join since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/join.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Join — join","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) join(df1, df2, df1$col1 == df2$col2) # Performs an inner join based on expression join(df1, df2, df1$col1 == df2$col2, \"right_outer\") join(df1, df2) # Attempts an inner join }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":null,"dir":"Reference","previous_headings":"","what":"last — last","title":"last — last","text":"Aggregate function: returns last value group.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"last — last","text":"","code":"last(x, ...) # S4 method for characterOrColumn last(x, na.rm = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"last — last","text":"x column compute . ... arguments passed methods. na.rm logical value indicating whether NA values stripped computation proceeds.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"last — last","text":"function default returns last values sees. return last non-missing value sees na.rm set true. values missing, NA returned. Note: function non-deterministic results depends order rows may non-deterministic shuffle.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"last — last","text":"last since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/last.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"last — last","text":"","code":"if (FALSE) { last(df$c) last(df$c, TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/lastProgress.html","id":null,"dir":"Reference","previous_headings":"","what":"lastProgress — lastProgress","title":"lastProgress — lastProgress","text":"Prints recent progress update streaming query JSON format.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/lastProgress.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"lastProgress — lastProgress","text":"","code":"lastProgress(x) # S4 method for StreamingQuery lastProgress(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/lastProgress.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"lastProgress — lastProgress","text":"x StreamingQuery.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/lastProgress.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"lastProgress — lastProgress","text":"lastProgress(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/lastProgress.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"lastProgress — lastProgress","text":"","code":"if (FALSE) lastProgress(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":null,"dir":"Reference","previous_headings":"","what":"Limit — limit","title":"Limit — limit","text":"Limit resulting SparkDataFrame number rows specified.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Limit — limit","text":"","code":"limit(x, num) # S4 method for SparkDataFrame,numeric limit(x, num)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Limit — limit","text":"x SparkDataFrame num number rows return","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Limit — limit","text":"new SparkDataFrame containing number rows specified.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Limit — limit","text":"limit since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/limit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Limit — limit","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) limitedDF <- limit(df, 10) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns a list of columns for the given table/view in the specified database — listColumns","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"Returns list columns given table/view specified database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"","code":"listColumns(tableName, databaseName = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"tableName qualified unqualified name designates table/view. database identifier provided, refers table/view current database. databaseName parameter specified, must unqualified name. databaseName (optional) name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"SparkDataFrame list column descriptions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listColumns.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns a list of columns for the given table/view in the specified database — listColumns","text":"","code":"if (FALSE) { sparkR.session() listColumns(\"mytable\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listDatabases.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns a list of databases available — listDatabases","title":"Returns a list of databases available — listDatabases","text":"Returns list databases available.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listDatabases.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns a list of databases available — listDatabases","text":"","code":"listDatabases()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listDatabases.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns a list of databases available — listDatabases","text":"SparkDataFrame list databases.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listDatabases.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns a list of databases available — listDatabases","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listDatabases.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns a list of databases available — listDatabases","text":"","code":"if (FALSE) { sparkR.session() listDatabases() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns a list of functions registered in the specified database — listFunctions","title":"Returns a list of functions registered in the specified database — listFunctions","text":"Returns list functions registered specified database. includes temporary functions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns a list of functions registered in the specified database — listFunctions","text":"","code":"listFunctions(databaseName = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns a list of functions registered in the specified database — listFunctions","text":"databaseName (optional) name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns a list of functions registered in the specified database — listFunctions","text":"SparkDataFrame list function descriptions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns a list of functions registered in the specified database — listFunctions","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listFunctions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns a list of functions registered in the specified database — listFunctions","text":"","code":"if (FALSE) { sparkR.session() listFunctions() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns a list of tables or views in the specified database — listTables","title":"Returns a list of tables or views in the specified database — listTables","text":"Returns list tables views specified database. includes temporary views.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns a list of tables or views in the specified database — listTables","text":"","code":"listTables(databaseName = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns a list of tables or views in the specified database — listTables","text":"databaseName (optional) name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns a list of tables or views in the specified database — listTables","text":"SparkDataFrame list tables.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns a list of tables or views in the specified database — listTables","text":"since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/listTables.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns a list of tables or views in the specified database — listTables","text":"","code":"if (FALSE) { sparkR.session() listTables() listTables(\"default\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":null,"dir":"Reference","previous_headings":"","what":"localCheckpoint — localCheckpoint","title":"localCheckpoint — localCheckpoint","text":"Returns locally checkpointed version SparkDataFrame. Checkpointing can used truncate logical plan, especially useful iterative algorithms plan may grow exponentially. Local checkpoints stored executors using caching subsystem therefore reliable.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"localCheckpoint — localCheckpoint","text":"","code":"localCheckpoint(x, eager = TRUE) # S4 method for SparkDataFrame localCheckpoint(x, eager = TRUE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"localCheckpoint — localCheckpoint","text":"x SparkDataFrame eager whether locally checkpoint SparkDataFrame immediately","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"localCheckpoint — localCheckpoint","text":"new locally checkpointed SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"localCheckpoint — localCheckpoint","text":"localCheckpoint since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/localCheckpoint.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"localCheckpoint — localCheckpoint","text":"","code":"if (FALSE) { df <- localCheckpoint(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":null,"dir":"Reference","previous_headings":"","what":"Match a column with given values. — %in%","title":"Match a column with given values. — %in%","text":"Match column given values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Match a column with given values. — %in%","text":"","code":"# S4 method for Column %in%(x, table)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Match a column with given values. — %in%","text":"x Column. table collection values (coercible list) compare .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Match a column with given values. — %in%","text":"matched values result comparing given values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Match a column with given values. — %in%","text":"%% since 1.5.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/match.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Match a column with given values. — %in%","text":"","code":"if (FALSE) { filter(df, \"age in (10, 30)\") where(df, df$age %in% c(10, 30)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":null,"dir":"Reference","previous_headings":"","what":"Merges two data frames — merge","title":"Merges two data frames — merge","text":"Merges two data frames","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Merges two data frames — merge","text":"","code":"merge(x, y, ...) # S4 method for SparkDataFrame,SparkDataFrame merge( x, y, by = intersect(names(x), names(y)), by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all, sort = TRUE, suffixes = c(\"_x\", \"_y\"), ... )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Merges two data frames — merge","text":"x first data frame joined. y second data frame joined. ... additional argument(s) passed method. character vector specifying join columns. specified, common column names x y used. .x .y explicitly set NULL length 0, Cartesian Product x y returned. .x character vector specifying joining columns x. .y character vector specifying joining columns y. boolean value setting .x .y unset. .x boolean value indicating whether rows x including join. .y boolean value indicating whether rows y including join. sort logical argument indicating whether resulting columns sorted. suffixes string vector length 2 used make colnames x y unique. first element appended colname x. second element appended colname y.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Merges two data frames — merge","text":".x .y set FALSE, natural join returned. .x set TRUE .y set FALSE, left outer join returned. .x set FALSE .y set TRUE, right outer join returned. .x .y set TRUE, full outer join returned.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Merges two data frames — merge","text":"merge since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/merge.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Merges two data frames — merge","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) merge(df1, df2) # Performs an inner join by common columns merge(df1, df2, by = \"col1\") # Performs an inner join based on expression merge(df1, df2, by.x = \"col1\", by.y = \"col2\", all.y = TRUE) merge(df1, df2, by.x = \"col1\", by.y = \"col2\", all.x = TRUE) merge(df1, df2, by.x = \"col1\", by.y = \"col2\", all.x = TRUE, all.y = TRUE) merge(df1, df2, by.x = \"col1\", by.y = \"col2\", all = TRUE, sort = FALSE) merge(df1, df2, by = \"col1\", all = TRUE, suffixes = c(\"-X\", \"-Y\")) merge(df1, df2, by = NULL) # Performs a Cartesian join }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":null,"dir":"Reference","previous_headings":"","what":"Mutate — mutate","title":"Mutate — mutate","text":"Return new SparkDataFrame specified columns added replaced.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Mutate — mutate","text":"","code":"mutate(.data, ...) transform(`_data`, ...) # S4 method for SparkDataFrame mutate(.data, ...) # S4 method for SparkDataFrame transform(`_data`, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Mutate — mutate","text":".data SparkDataFrame. ... additional column argument(s) form name = col. _data SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Mutate — mutate","text":"new SparkDataFrame new columns added replaced.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Mutate — mutate","text":"mutate since 1.4.0 transform since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/mutate.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Mutate — mutate","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- mutate(df, newCol = df$col1 * 5, newCol2 = df$col1 * 2) names(newDF) # Will contain newCol, newCol2 newDF2 <- transform(df, newCol = df$col1 / 5, newCol2 = df$col1 * 2) df <- createDataFrame(list(list(\"Andy\", 30L), list(\"Justin\", 19L)), c(\"name\", \"age\")) # Replace the \"age\" column df1 <- mutate(df, age = df$age + 1L) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":null,"dir":"Reference","previous_headings":"","what":"A set of SparkDataFrame functions working with NA values — dropna","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"dropna, na.omit - Returns new SparkDataFrame omitting rows null values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"","code":"dropna(x, how = c(\"any\", \"all\"), minNonNulls = NULL, cols = NULL) na.omit(object, ...) fillna(x, value, cols = NULL) # S4 method for SparkDataFrame dropna(x, how = c(\"any\", \"all\"), minNonNulls = NULL, cols = NULL) # S4 method for SparkDataFrame na.omit(object, how = c(\"any\", \"all\"), minNonNulls = NULL, cols = NULL) # S4 method for SparkDataFrame fillna(x, value, cols = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"x SparkDataFrame. \"\" \"\". \"\", drop row contains nulls. \"\", drop row values null. minNonNulls specified, ignored. minNonNulls specified, drop rows less minNonNulls non-null values. overwrites parameter. cols optional list column names consider. fillna, columns specified cols matching data type ignored. example, value character, subset contains non-character column, non-character column simply ignored. object SparkDataFrame. ... arguments passed methods. value value replace null values . integer, numeric, character named list. value named list, cols ignored value must mapping column name (character) replacement value. replacement value must integer, numeric character.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"dropna since 1.4.0 na.omit since 1.5.0 fillna since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nafunctions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"A set of SparkDataFrame functions working with NA values — dropna","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) dropna(df) } if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) fillna(df, 1) fillna(df, list(\"age\" = 20, \"name\" = \"unknown\")) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ncol.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the number of columns in a SparkDataFrame — ncol","title":"Returns the number of columns in a SparkDataFrame — ncol","text":"Returns number columns SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ncol.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the number of columns in a SparkDataFrame — ncol","text":"","code":"# S4 method for SparkDataFrame ncol(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ncol.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns the number of columns in a SparkDataFrame — ncol","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ncol.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns the number of columns in a SparkDataFrame — ncol","text":"ncol since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/ncol.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the number of columns in a SparkDataFrame — ncol","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) ncol(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":null,"dir":"Reference","previous_headings":"","what":"! — not","title":"! — not","text":"Inversion boolean expression. Inversion boolean expression.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"! — not","text":"","code":"not(x) # S4 method for Column !(x) # S4 method for Column not(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"! — not","text":"x Column compute ","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"! — not","text":"! applied directly numerical column. achieve R-like truthiness column casted BooleanType.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"! — not","text":"! since 2.3.0 since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/not.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"! — not","text":"","code":"if (FALSE) { df <- createDataFrame(data.frame(x = c(-1, 0, 1))) head(select(df, !column(\"x\") > 0)) } if (FALSE) { df <- createDataFrame(data.frame( is_true = c(TRUE, FALSE, NA), flag = c(1, 0, 1) )) head(select(df, not(df$is_true))) # Explicit cast is required when working with numeric column head(select(df, not(cast(df$flag, \"boolean\")))) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nrow.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns the number of rows in a SparkDataFrame — nrow","title":"Returns the number of rows in a SparkDataFrame — nrow","text":"Returns number rows SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nrow.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns the number of rows in a SparkDataFrame — nrow","text":"","code":"# S4 method for SparkDataFrame count(x) # S4 method for SparkDataFrame nrow(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nrow.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns the number of rows in a SparkDataFrame — nrow","text":"x SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nrow.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns the number of rows in a SparkDataFrame — nrow","text":"count since 1.4.0 nrow since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/nrow.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns the number of rows in a SparkDataFrame — nrow","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) count(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":null,"dir":"Reference","previous_headings":"","what":"Ordering Columns in a WindowSpec — orderBy","title":"Ordering Columns in a WindowSpec — orderBy","text":"Defines ordering columns WindowSpec.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Ordering Columns in a WindowSpec — orderBy","text":"","code":"orderBy(x, col, ...) # S4 method for WindowSpec,character orderBy(x, col, ...) # S4 method for WindowSpec,Column orderBy(x, col, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Ordering Columns in a WindowSpec — orderBy","text":"x WindowSpec col character Column indicating ordering column ... additional sorting fields","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Ordering Columns in a WindowSpec — orderBy","text":"WindowSpec.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Ordering Columns in a WindowSpec — orderBy","text":"orderBy(WindowSpec, character) since 2.0.0 orderBy(WindowSpec, Column) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/orderBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Ordering Columns in a WindowSpec — orderBy","text":"","code":"if (FALSE) { orderBy(ws, \"col1\", \"col2\") orderBy(ws, df$col1, df$col2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/otherwise.html","id":null,"dir":"Reference","previous_headings":"","what":"otherwise — otherwise","title":"otherwise — otherwise","text":"values specified column null, returns value. Can used conjunction specify default value expressions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/otherwise.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"otherwise — otherwise","text":"","code":"otherwise(x, value) # S4 method for Column otherwise(x, value)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/otherwise.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"otherwise — otherwise","text":"x Column. value value replace corresponding entry x NA. Can single value Column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/otherwise.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"otherwise — otherwise","text":"otherwise since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/over.html","id":null,"dir":"Reference","previous_headings":"","what":"over — over","title":"over — over","text":"Define windowing column.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/over.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"over — over","text":"","code":"over(x, window) # S4 method for Column,WindowSpec over(x, window)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/over.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"over — over","text":"x Column, usually one returned window function(s). window WindowSpec object. Can created windowPartitionBy windowOrderBy configured WindowSpec methods.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/over.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"over — over","text":"since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/over.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"over — over","text":"","code":"if (FALSE) { df <- createDataFrame(mtcars) # Partition by am (transmission) and order by hp (horsepower) ws <- orderBy(windowPartitionBy(\"am\"), \"hp\") # Rank on hp within each partition out <- select(df, over(rank(), ws), df$hp, df$am) # Lag mpg values by 1 row on the partition-and-ordered table out <- select(df, over(lead(df$mpg), ws), df$mpg, df$hp, df$am) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":null,"dir":"Reference","previous_headings":"","what":"partitionBy — partitionBy","title":"partitionBy — partitionBy","text":"Defines partitioning columns WindowSpec.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"partitionBy — partitionBy","text":"","code":"partitionBy(x, ...) # S4 method for WindowSpec partitionBy(x, col, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"partitionBy — partitionBy","text":"x WindowSpec. ... additional column(s) partition . col column partition (described name Column).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"partitionBy — partitionBy","text":"WindowSpec.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"partitionBy — partitionBy","text":"partitionBy(WindowSpec) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/partitionBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"partitionBy — partitionBy","text":"","code":"if (FALSE) { partitionBy(ws, \"col1\", \"col2\") partitionBy(ws, df$col1, df$col2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/persist.html","id":null,"dir":"Reference","previous_headings":"","what":"Persist — persist","title":"Persist — persist","text":"Persist SparkDataFrame specified storage level. details supported storage levels, refer https://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/persist.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Persist — persist","text":"","code":"persist(x, newLevel) # S4 method for SparkDataFrame,character persist(x, newLevel)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/persist.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Persist — persist","text":"x SparkDataFrame persist. newLevel storage level chosen persistence. See available options description.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/persist.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Persist — persist","text":"persist since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/persist.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Persist — persist","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) persist(df, \"MEMORY_AND_DISK\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":null,"dir":"Reference","previous_headings":"","what":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"Pivot column GroupedData perform specified aggregation. two versions pivot function: one requires caller specify list distinct values pivot , one . latter concise less efficient, Spark needs first compute list distinct values internally.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"","code":"# S4 method for GroupedData,character pivot(x, colname, values = list())"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"x GroupedData object colname column name values value list/vector distinct values output columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"GroupedData object","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"pivot since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/pivot.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Pivot a column of the GroupedData and perform the specified aggregation. — pivot","text":"","code":"if (FALSE) { df <- createDataFrame(data.frame( earnings = c(10000, 10000, 11000, 15000, 12000, 20000, 21000, 22000), course = c(\"R\", \"Python\", \"R\", \"Python\", \"R\", \"Python\", \"R\", \"Python\"), period = c(\"1H\", \"1H\", \"2H\", \"2H\", \"1H\", \"1H\", \"2H\", \"2H\"), year = c(2015, 2015, 2015, 2015, 2016, 2016, 2016, 2016) )) group_sum <- sum(pivot(groupBy(df, \"year\"), \"course\"), \"earnings\") group_min <- min(pivot(groupBy(df, \"year\"), \"course\", \"R\"), \"earnings\") group_max <- max(pivot(groupBy(df, \"year\"), \"course\", c(\"Python\", \"R\")), \"earnings\") group_mean <- mean(pivot(groupBy(df, \"year\"), \"course\", list(\"Python\", \"R\")), \"earnings\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/predict.html","id":null,"dir":"Reference","previous_headings":"","what":"Makes predictions from a MLlib model — predict","title":"Makes predictions from a MLlib model — predict","text":"Makes predictions MLlib model. information, see specific MLlib model .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/predict.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Makes predictions from a MLlib model — predict","text":"","code":"predict(object, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/predict.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Makes predictions from a MLlib model — predict","text":"object fitted ML model object. ... additional argument(s) passed method.","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.jobj.html","id":null,"dir":"Reference","previous_headings":"","what":"Print a JVM object reference. — print.jobj","title":"Print a JVM object reference. — print.jobj","text":"function prints type id object stored SparkR JVM backend.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.jobj.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print a JVM object reference. — print.jobj","text":"","code":"# S3 method for jobj print(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.jobj.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print a JVM object reference. — print.jobj","text":"x JVM object reference ... arguments passed methods","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.jobj.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Print a JVM object reference. — print.jobj","text":"print.jobj since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structField.html","id":null,"dir":"Reference","previous_headings":"","what":"Print a Spark StructField. — print.structField","title":"Print a Spark StructField. — print.structField","text":"function prints contents StructField returned SparkR JVM backend.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structField.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print a Spark StructField. — print.structField","text":"","code":"# S3 method for structField print(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structField.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print a Spark StructField. — print.structField","text":"x StructField object ... arguments passed methods","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structField.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Print a Spark StructField. — print.structField","text":"print.structField since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structType.html","id":null,"dir":"Reference","previous_headings":"","what":"Print a Spark StructType. — print.structType","title":"Print a Spark StructType. — print.structType","text":"function prints contents StructType returned SparkR JVM backend.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structType.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print a Spark StructType. — print.structType","text":"","code":"# S3 method for structType print(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structType.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print a Spark StructType. — print.structType","text":"x StructType object ... arguments passed methods","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/print.structType.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Print a Spark StructType. — print.structType","text":"print.structType since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/printSchema.html","id":null,"dir":"Reference","previous_headings":"","what":"Print Schema of a SparkDataFrame — printSchema","title":"Print Schema of a SparkDataFrame — printSchema","text":"Prints schema tree format","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/printSchema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Print Schema of a SparkDataFrame — printSchema","text":"","code":"printSchema(x) # S4 method for SparkDataFrame printSchema(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/printSchema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Print Schema of a SparkDataFrame — printSchema","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/printSchema.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Print Schema of a SparkDataFrame — printSchema","text":"printSchema since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/printSchema.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Print Schema of a SparkDataFrame — printSchema","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) printSchema(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":null,"dir":"Reference","previous_headings":"","what":"queryName — queryName","title":"queryName — queryName","text":"Returns user-specified name query. specified write.stream(df, queryName = \"query\"). name, set, must unique across active queries.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"queryName — queryName","text":"","code":"queryName(x) # S4 method for StreamingQuery queryName(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"queryName — queryName","text":"x StreamingQuery.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"queryName — queryName","text":"name query, NULL specified.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"queryName — queryName","text":"queryName(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/queryName.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"queryName — queryName","text":"","code":"if (FALSE) queryName(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/randomSplit.html","id":null,"dir":"Reference","previous_headings":"","what":"randomSplit — randomSplit","title":"randomSplit — randomSplit","text":"Return list randomly split dataframes provided weights.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/randomSplit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"randomSplit — randomSplit","text":"","code":"randomSplit(x, weights, seed) # S4 method for SparkDataFrame,numeric randomSplit(x, weights, seed)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/randomSplit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"randomSplit — randomSplit","text":"x SparkDataFrame weights vector weights splits, normalized sum 1 seed seed use random split","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/randomSplit.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"randomSplit — randomSplit","text":"randomSplit since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/randomSplit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"randomSplit — randomSplit","text":"","code":"if (FALSE) { sparkR.session() df <- createDataFrame(data.frame(id = 1:1000)) df_list <- randomSplit(df, c(2, 3, 5), 0) # df_list contains 3 SparkDataFrames with each having about 200, 300 and 500 rows respectively sapply(df_list, count) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":null,"dir":"Reference","previous_headings":"","what":"rangeBetween — rangeBetween","title":"rangeBetween — rangeBetween","text":"Defines frame boundaries, start (inclusive) end (inclusive).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"rangeBetween — rangeBetween","text":"","code":"rangeBetween(x, start, end) # S4 method for WindowSpec,numeric,numeric rangeBetween(x, start, end)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"rangeBetween — rangeBetween","text":"x WindowSpec start boundary start, inclusive. frame unbounded minimum long value. end boundary end, inclusive. frame unbounded maximum long value.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"rangeBetween — rangeBetween","text":"WindowSpec","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"rangeBetween — rangeBetween","text":"start end relative current row. example, \"0\" means \"current row\", \"-1\" means one current row, \"5\" means five current row. recommend users use Window.unboundedPreceding, Window.unboundedFollowing, Window.currentRow specify special boundary values, rather using long values directly. range-based boundary based actual value ORDER expression(s). offset used alter value ORDER expression, instance current ORDER expression value 10 lower bound offset -3, resulting lower bound current row 10 - 3 = 7. however puts number constraints ORDER expressions: can one expression expression must numerical data type. exception can made offset unbounded, value modification needed, case multiple non-numeric ORDER expression allowed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"rangeBetween — rangeBetween","text":"rangeBetween since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rangeBetween.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"rangeBetween — rangeBetween","text":"","code":"if (FALSE) { id <- c(rep(1, 3), rep(2, 3), 3) desc <- c('New', 'New', 'Good', 'New', 'Good', 'Good', 'New') df <- data.frame(id, desc) df <- createDataFrame(df) w1 <- orderBy(windowPartitionBy('desc'), df$id) w2 <- rangeBetween(w1, 0, 3) df1 <- withColumn(df, \"sum\", over(sum(df$id), w2)) head(df1) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":null,"dir":"Reference","previous_headings":"","what":"Union two or more SparkDataFrames — rbind","title":"Union two or more SparkDataFrames — rbind","text":"Union two SparkDataFrames row. R's rbind, method requires input SparkDataFrames column names.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Union two or more SparkDataFrames — rbind","text":"","code":"rbind(..., deparse.level = 1) # S4 method for SparkDataFrame rbind(x, ..., deparse.level = 1)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Union two or more SparkDataFrames — rbind","text":"... additional SparkDataFrame(s). deparse.level currently used (put match signature base implementation). x SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Union two or more SparkDataFrames — rbind","text":"SparkDataFrame containing result union.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Union two or more SparkDataFrames — rbind","text":"Note: remove duplicate rows across two SparkDataFrames.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Union two or more SparkDataFrames — rbind","text":"rbind since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rbind.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Union two or more SparkDataFrames — rbind","text":"","code":"if (FALSE) { sparkR.session() unions <- rbind(df, df2, df3, df4) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":null,"dir":"Reference","previous_headings":"","what":"Load a SparkDataFrame — read.df","title":"Load a SparkDataFrame — read.df","text":"Returns dataset data source SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Load a SparkDataFrame — read.df","text":"","code":"read.df(path = NULL, source = NULL, schema = NULL, na.strings = \"NA\", ...) loadDF(path = NULL, source = NULL, schema = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Load a SparkDataFrame — read.df","text":"path path files load source name external data source schema data schema defined structType DDL-formatted string. na.strings Default string value NA source \"csv\" ... additional external data source specific named properties.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Load a SparkDataFrame — read.df","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Load a SparkDataFrame — read.df","text":"data source specified source set options(...). source specified, default data source configured \"spark.sql.sources.default\" used. Similar R read.csv, source \"csv\", default, value \"NA\" interpreted NA.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Load a SparkDataFrame — read.df","text":"read.df since 1.4.0 loadDF since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Load a SparkDataFrame — read.df","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.df(\"path/to/file.json\", source = \"json\") schema <- structType(structField(\"name\", \"string\"), structField(\"info\", \"map\")) df2 <- read.df(mapTypeJsonPath, \"json\", schema, multiLine = TRUE) df3 <- loadDF(\"data/test_table\", \"parquet\", mergeSchema = \"true\") stringSchema <- \"name STRING, info MAP\" df4 <- read.df(mapTypeJsonPath, \"json\", stringSchema, multiLine = TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"Additional JDBC database connection properties can set (...) can find JDBC-specific option parameter documentation reading tables via JDBC https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-optionData Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"","code":"read.jdbc( url, tableName, partitionColumn = NULL, lowerBound = NULL, upperBound = NULL, numPartitions = 0L, predicates = list(), ... )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"url JDBC database url form jdbc:subprotocol:subname tableName name table external database partitionColumn name column numeric, date, timestamp type used partitioning. lowerBound minimum value partitionColumn used decide partition stride upperBound maximum value partitionColumn used decide partition stride numPartitions number partitions, , along lowerBound (inclusive), upperBound (exclusive), form partition strides generated clause expressions used split column partitionColumn evenly. defaults SparkContext.defaultParallelism unset. predicates list conditions clause; one defines one partition ... additional JDBC database connection named properties.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"one partitionColumn predicates set. Partitions table retrieved parallel based numPartitions predicates. create many partitions parallel large cluster; otherwise Spark might crash external database systems.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"read.jdbc since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.jdbc.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a SparkDataFrame representing the database table accessible via JDBC URL — read.jdbc","text":"","code":"if (FALSE) { sparkR.session() jdbcUrl <- \"jdbc:mysql://localhost:3306/databasename\" df <- read.jdbc(jdbcUrl, \"table\", predicates = list(\"field<=123\"), user = \"username\") df2 <- read.jdbc(jdbcUrl, \"table2\", partitionColumn = \"index\", lowerBound = 0, upperBound = 10000, user = \"username\", password = \"password\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame from a JSON file. — read.json","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"Loads JSON file, returning result SparkDataFrame default, (JSON Lines text format newline-delimited JSON ) supported. JSON (one record per file), set named property multiLine TRUE. goes entire dataset determine schema.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"","code":"read.json(path, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"path Path file read. vector multiple paths allowed. ... additional external data source specific named properties. can find JSON-specific options reading JSON files https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-optionData Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"read.json since 1.6.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.json.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a SparkDataFrame from a JSON file. — read.json","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) df <- read.json(path, multiLine = TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":null,"dir":"Reference","previous_headings":"","what":"Load a fitted MLlib model from the input path. — read.ml","title":"Load a fitted MLlib model from the input path. — read.ml","text":"Load fitted MLlib model input path.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Load a fitted MLlib model from the input path. — read.ml","text":"","code":"read.ml(path)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Load a fitted MLlib model from the input path. — read.ml","text":"path path model read.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Load a fitted MLlib model from the input path. — read.ml","text":"fitted MLlib model.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Load a fitted MLlib model from the input path. — read.ml","text":"read.ml since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.ml.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Load a fitted MLlib model from the input path. — read.ml","text":"","code":"if (FALSE) { path <- \"path/to/model\" model <- read.ml(path) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.orc.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame from an ORC file. — read.orc","title":"Create a SparkDataFrame from an ORC file. — read.orc","text":"Loads ORC file, returning result SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.orc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame from an ORC file. — read.orc","text":"","code":"read.orc(path, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.orc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame from an ORC file. — read.orc","text":"path Path file read. ... additional external data source specific named properties. can find ORC-specific options reading ORC files https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-optionData Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.orc.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame from an ORC file. — read.orc","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.orc.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame from an ORC file. — read.orc","text":"read.orc since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame from a Parquet file. — read.parquet","title":"Create a SparkDataFrame from a Parquet file. — read.parquet","text":"Loads Parquet file, returning result SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame from a Parquet file. — read.parquet","text":"","code":"read.parquet(path, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame from a Parquet file. — read.parquet","text":"path path file read. vector multiple paths allowed. ... additional data source specific named properties. can find Parquet-specific options reading Parquet files https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-optionData Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.parquet.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame from a Parquet file. — read.parquet","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.parquet.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame from a Parquet file. — read.parquet","text":"read.parquet since 1.6.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":null,"dir":"Reference","previous_headings":"","what":"Load a streaming SparkDataFrame — read.stream","title":"Load a streaming SparkDataFrame — read.stream","text":"Returns dataset data source SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Load a streaming SparkDataFrame — read.stream","text":"","code":"read.stream(source = NULL, schema = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Load a streaming SparkDataFrame — read.stream","text":"source name external data source schema data schema defined structType DDL-formatted string, required file-based streaming data source ... additional external data source specific named options, instance path file-based streaming data source. timeZone indicate timezone used parse timestamps JSON/CSV data sources partition values; set, uses default value, session local timezone.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Load a streaming SparkDataFrame — read.stream","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Load a streaming SparkDataFrame — read.stream","text":"data source specified source set options(...). source specified, default data source configured \"spark.sql.sources.default\" used.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Load a streaming SparkDataFrame — read.stream","text":"read.stream since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.stream.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Load a streaming SparkDataFrame — read.stream","text":"","code":"if (FALSE) { sparkR.session() df <- read.stream(\"socket\", host = \"localhost\", port = 9999) q <- write.stream(df, \"text\", path = \"/home/user/out\", checkpointLocation = \"/home/user/cp\") df <- read.stream(\"json\", path = jsonDir, schema = schema, maxFilesPerTrigger = 1) stringSchema <- \"name STRING, info MAP\" df1 <- read.stream(\"json\", path = jsonDir, schema = stringSchema, maxFilesPerTrigger = 1) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame from a text file. — read.text","title":"Create a SparkDataFrame from a text file. — read.text","text":"Loads text files returns SparkDataFrame whose schema starts string column named \"value\", followed partitioned columns . text files must encoded UTF-8.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame from a text file. — read.text","text":"","code":"read.text(path, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame from a text file. — read.text","text":"path Path file read. vector multiple paths allowed. ... additional external data source specific named properties. can find text-specific options reading text files https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-optionData Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame from a text file. — read.text","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create a SparkDataFrame from a text file. — read.text","text":"line text file new row resulting SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame from a text file. — read.text","text":"read.text since 1.6.1","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/read.text.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a SparkDataFrame from a text file. — read.text","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.txt\" df <- read.text(path) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/recoverPartitions.html","id":null,"dir":"Reference","previous_headings":"","what":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","title":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","text":"Recovers partitions directory table update catalog. name reference partitioned table, view.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/recoverPartitions.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","text":"","code":"recoverPartitions(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/recoverPartitions.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","text":"tableName qualified unqualified name designates table. database identifier provided, refers table current database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/recoverPartitions.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/recoverPartitions.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Recovers all the partitions in the directory of a table and update the catalog — recoverPartitions","text":"","code":"if (FALSE) { sparkR.session() recoverPartitions(\"myTable\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshByPath.html","id":null,"dir":"Reference","previous_headings":"","what":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","title":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","text":"Invalidates refreshes cached data (associated metadata) SparkDataFrame contains given data source path. Path matching prefix, .e. \"/\" invalidate everything cached.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshByPath.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","text":"","code":"refreshByPath(path)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshByPath.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","text":"path path data source.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshByPath.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshByPath.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Invalidates and refreshes all the cached data and metadata for SparkDataFrame containing path — refreshByPath","text":"","code":"if (FALSE) { sparkR.session() refreshByPath(\"/path\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":null,"dir":"Reference","previous_headings":"","what":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"Invalidates refreshes cached data metadata given table. performance reasons, Spark SQL external data source library uses might cache certain metadata table, location blocks. change outside Spark SQL, users call function invalidate cache.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"","code":"refreshTable(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"tableName qualified unqualified name designates table. database identifier provided, refers table current database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"table cached InMemoryRelation, drop original cached version make new version cached lazily.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/refreshTable.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Invalidates and refreshes all the cached data and metadata of the given table — refreshTable","text":"","code":"if (FALSE) { sparkR.session() refreshTable(\"myTable\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/registerTempTable-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Register Temporary Table — registerTempTable","title":"(Deprecated) Register Temporary Table — registerTempTable","text":"Registers SparkDataFrame Temporary Table SparkSession","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/registerTempTable-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Register Temporary Table — registerTempTable","text":"","code":"registerTempTable(x, tableName) # S4 method for SparkDataFrame,character registerTempTable(x, tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/registerTempTable-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Register Temporary Table — registerTempTable","text":"x SparkDataFrame tableName character vector containing name table","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/registerTempTable-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Register Temporary Table — registerTempTable","text":"registerTempTable since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/registerTempTable-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Register Temporary Table — registerTempTable","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) registerTempTable(df, \"json_df\") new_df <- sql(\"SELECT * FROM json_df\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":null,"dir":"Reference","previous_headings":"","what":"rename — rename","title":"rename — rename","text":"Rename existing column SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"rename — rename","text":"","code":"rename(x, ...) withColumnRenamed(x, existingCol, newCol) # S4 method for SparkDataFrame,character,character withColumnRenamed(x, existingCol, newCol) # S4 method for SparkDataFrame rename(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"rename — rename","text":"x SparkDataFrame ... named pair form new_column_name = existing_column existingCol name column want change. newCol new column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"rename — rename","text":"SparkDataFrame column name changed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"rename — rename","text":"withColumnRenamed since 1.4.0 rename since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rename.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"rename — rename","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- withColumnRenamed(df, \"col1\", \"newCol1\") } if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- rename(df, col1 = df$newCol1) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartition.html","id":null,"dir":"Reference","previous_headings":"","what":"Repartition — repartition","title":"Repartition — repartition","text":"following options repartition possible: 1. Return new SparkDataFrame exactly numPartitions. 2. Return new SparkDataFrame hash partitioned given columns numPartitions. 3. Return new SparkDataFrame hash partitioned given column(s), using spark.sql.shuffle.partitions number partitions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartition.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Repartition — repartition","text":"","code":"repartition(x, ...) # S4 method for SparkDataFrame repartition(x, numPartitions = NULL, col = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartition.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Repartition — repartition","text":"x SparkDataFrame. ... additional column(s) used partitioning. numPartitions number partitions use. col column partitioning performed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartition.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Repartition — repartition","text":"repartition since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartition.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Repartition — repartition","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- repartition(df, 2L) newDF <- repartition(df, numPartitions = 2L) newDF <- repartition(df, col = df$\"col1\", df$\"col2\") newDF <- repartition(df, 3L, col = df$\"col1\", df$\"col2\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":null,"dir":"Reference","previous_headings":"","what":"Repartition by range — repartitionByRange","title":"Repartition by range — repartitionByRange","text":"following options repartition range possible: 1. Return new SparkDataFrame range partitioned given columns numPartitions. 2. Return new SparkDataFrame range partitioned given column(s), using spark.sql.shuffle.partitions number partitions. least one partition-expression must specified. explicit sort order specified, \"ascending nulls first\" assumed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Repartition by range — repartitionByRange","text":"","code":"repartitionByRange(x, ...) # S4 method for SparkDataFrame repartitionByRange(x, numPartitions = NULL, col = NULL, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Repartition by range — repartitionByRange","text":"x SparkDataFrame. ... additional column(s) used range partitioning. numPartitions number partitions use. col column range partitioning performed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Repartition by range — repartitionByRange","text":"Note due performance reasons method uses sampling estimate ranges. Hence, output may consistent, since sampling can return different values. sample size can controlled config spark.sql.execution.rangeExchange.sampleSizePerPartition.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Repartition by range — repartitionByRange","text":"repartitionByRange since 2.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/repartitionByRange.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Repartition by range — repartitionByRange","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- repartitionByRange(df, col = df$col1, df$col2) newDF <- repartitionByRange(df, 3L, col = df$col1, df$col2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":null,"dir":"Reference","previous_headings":"","what":"rollup — rollup","title":"rollup — rollup","text":"Create multi-dimensional rollup SparkDataFrame using specified columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"rollup — rollup","text":"","code":"rollup(x, ...) # S4 method for SparkDataFrame rollup(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"rollup — rollup","text":"x SparkDataFrame. ... character name(s) Column(s) group .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"rollup — rollup","text":"GroupedData.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"rollup — rollup","text":"grouping expression missing rollup creates single global aggregate equivalent direct application agg.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"rollup — rollup","text":"rollup since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rollup.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"rollup — rollup","text":"","code":"if (FALSE) { df <- createDataFrame(mtcars) mean(rollup(df, \"cyl\", \"gear\", \"am\"), \"mpg\") # Following calls are equivalent agg(rollup(df), mean(df$mpg)) agg(df, mean(df$mpg)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":null,"dir":"Reference","previous_headings":"","what":"rowsBetween — rowsBetween","title":"rowsBetween — rowsBetween","text":"Defines frame boundaries, start (inclusive) end (inclusive).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"rowsBetween — rowsBetween","text":"","code":"rowsBetween(x, start, end) # S4 method for WindowSpec,numeric,numeric rowsBetween(x, start, end)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"rowsBetween — rowsBetween","text":"x WindowSpec start boundary start, inclusive. frame unbounded minimum long value. end boundary end, inclusive. frame unbounded maximum long value.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"rowsBetween — rowsBetween","text":"WindowSpec","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"rowsBetween — rowsBetween","text":"start end relative positions current row. example, \"0\" means \"current row\", \"-1\" means row current row, \"5\" means fifth row current row. recommend users use Window.unboundedPreceding, Window.unboundedFollowing, Window.currentRow specify special boundary values, rather using long values directly. row based boundary based position row within partition. offset indicates number rows current row, frame current row starts ends. instance, given row based sliding frame lower bound offset -1 upper bound offset +2. frame row index 5 range index 4 index 6.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"rowsBetween — rowsBetween","text":"rowsBetween since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/rowsBetween.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"rowsBetween — rowsBetween","text":"","code":"if (FALSE) { id <- c(rep(1, 3), rep(2, 3), 3) desc <- c('New', 'New', 'Good', 'New', 'Good', 'Good', 'New') df <- data.frame(id, desc) df <- createDataFrame(df) w1 <- orderBy(windowPartitionBy('desc'), df$id) w2 <- rowsBetween(w1, 0, 3) df1 <- withColumn(df, \"sum\", over(sum(df$id), w2)) head(df1) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sample.html","id":null,"dir":"Reference","previous_headings":"","what":"Sample — sample","title":"Sample — sample","text":"Return sampled subset SparkDataFrame using random seed. Note: guaranteed provide exactly fraction specified total count given SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sample.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sample — sample","text":"","code":"sample(x, withReplacement = FALSE, fraction, seed) sample_frac(x, withReplacement = FALSE, fraction, seed) # S4 method for SparkDataFrame sample(x, withReplacement = FALSE, fraction, seed) # S4 method for SparkDataFrame sample_frac(x, withReplacement = FALSE, fraction, seed)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sample.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sample — sample","text":"x SparkDataFrame withReplacement Sampling replacement fraction (rough) sample target fraction seed Randomness seed value. Default random seed.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sample.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Sample — sample","text":"sample since 1.4.0 sample_frac since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sample.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Sample — sample","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) collect(sample(df, fraction = 0.5)) collect(sample(df, FALSE, 0.5)) collect(sample(df, TRUE, 0.5, seed = 3)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":null,"dir":"Reference","previous_headings":"","what":"Returns a stratified sample without replacement — sampleBy","title":"Returns a stratified sample without replacement — sampleBy","text":"Returns stratified sample without replacement based fraction given stratum.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Returns a stratified sample without replacement — sampleBy","text":"","code":"sampleBy(x, col, fractions, seed) # S4 method for SparkDataFrame,character,list,numeric sampleBy(x, col, fractions, seed)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Returns a stratified sample without replacement — sampleBy","text":"x SparkDataFrame col column defines strata fractions named list giving sampling fraction stratum. stratum specified, treat fraction zero. seed random seed","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Returns a stratified sample without replacement — sampleBy","text":"new SparkDataFrame represents stratified sample","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Returns a stratified sample without replacement — sampleBy","text":"sampleBy since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sampleBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Returns a stratified sample without replacement — sampleBy","text":"","code":"if (FALSE) { df <- read.json(\"/path/to/file.json\") sample <- sampleBy(df, \"key\", fractions, 36) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"data source specified source set options (...). source specified, default data source configured spark.sql.sources.default used.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"","code":"saveAsTable(df, tableName, source = NULL, mode = \"error\", ...) # S4 method for SparkDataFrame,character saveAsTable(df, tableName, source = NULL, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"df SparkDataFrame. tableName name table. source name external data source. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default) ... additional option(s) passed method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"Additionally, mode used specify behavior save operation data already exists data source. four modes: 'append': Contents SparkDataFrame expected appended existing data. 'overwrite': Existing data expected overwritten contents SparkDataFrame. 'error' 'errorifexists': exception expected thrown. 'ignore': save operation expected save contents SparkDataFrame change existing data.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"saveAsTable since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/saveAsTable.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the contents of the SparkDataFrame to a data source as a table — saveAsTable","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) saveAsTable(df, \"myfile\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/schema.html","id":null,"dir":"Reference","previous_headings":"","what":"Get schema object — schema","title":"Get schema object — schema","text":"Returns schema SparkDataFrame structType object.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/schema.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get schema object — schema","text":"","code":"schema(x) # S4 method for SparkDataFrame schema(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/schema.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get schema object — schema","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/schema.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get schema object — schema","text":"schema since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/schema.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get schema object — schema","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) dfSchema <- schema(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":null,"dir":"Reference","previous_headings":"","what":"Select — select","title":"Select — select","text":"Selects set columns names Column expressions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Select — select","text":"","code":"select(x, col, ...) # S4 method for SparkDataFrame $(x, name) # S4 method for SparkDataFrame $(x, name) <- value # S4 method for SparkDataFrame,character select(x, col, ...) # S4 method for SparkDataFrame,Column select(x, col, ...) # S4 method for SparkDataFrame,list select(x, col)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Select — select","text":"x SparkDataFrame. col list columns single Column name. ... additional column(s) one column specified col. one column assigned col, ... left empty. name name Column (without wrapped \"\"). value Column atomic vector length 1 literal value, NULL. NULL, specified Column dropped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Select — select","text":"new SparkDataFrame selected columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Select — select","text":"$ since 1.4.0 $<- since 1.4.0 select(SparkDataFrame, character) since 1.4.0 select(SparkDataFrame, Column) since 1.4.0 select(SparkDataFrame, list) since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/select.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Select — select","text":"","code":"if (FALSE) { select(df, \"*\") select(df, \"col1\", \"col2\") select(df, df$name, df$age + 1) select(df, c(\"col1\", \"col2\")) select(df, list(df$name, df$age + 1)) # Similar to R data frames columns can also be selected using $ df[,df$age] }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":null,"dir":"Reference","previous_headings":"","what":"SelectExpr — selectExpr","title":"SelectExpr — selectExpr","text":"Select SparkDataFrame using set SQL expressions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"SelectExpr — selectExpr","text":"","code":"selectExpr(x, expr, ...) # S4 method for SparkDataFrame,character selectExpr(x, expr, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"SelectExpr — selectExpr","text":"x SparkDataFrame selected . expr string containing SQL expression ... Additional expressions","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"SelectExpr — selectExpr","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"SelectExpr — selectExpr","text":"selectExpr since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/selectExpr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"SelectExpr — selectExpr","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) selectExpr(df, \"col1\", \"(col2 * 5) as newCol\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCheckpointDir.html","id":null,"dir":"Reference","previous_headings":"","what":"Set checkpoint directory — setCheckpointDir","title":"Set checkpoint directory — setCheckpointDir","text":"Set directory SparkDataFrame going checkpointed. directory must HDFS path running cluster.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCheckpointDir.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set checkpoint directory — setCheckpointDir","text":"","code":"setCheckpointDir(directory)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCheckpointDir.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set checkpoint directory — setCheckpointDir","text":"directory Directory path checkpoint ","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCheckpointDir.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Set checkpoint directory — setCheckpointDir","text":"setCheckpointDir since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCheckpointDir.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set checkpoint directory — setCheckpointDir","text":"","code":"if (FALSE) { setCheckpointDir(\"/checkpoint\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCurrentDatabase.html","id":null,"dir":"Reference","previous_headings":"","what":"Sets the current default database — setCurrentDatabase","title":"Sets the current default database — setCurrentDatabase","text":"Sets current default database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCurrentDatabase.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Sets the current default database — setCurrentDatabase","text":"","code":"setCurrentDatabase(databaseName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCurrentDatabase.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Sets the current default database — setCurrentDatabase","text":"databaseName name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCurrentDatabase.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Sets the current default database — setCurrentDatabase","text":"since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setCurrentDatabase.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Sets the current default database — setCurrentDatabase","text":"","code":"if (FALSE) { sparkR.session() setCurrentDatabase(\"default\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobDescription.html","id":null,"dir":"Reference","previous_headings":"","what":"Set a human readable description of the current job. — setJobDescription","title":"Set a human readable description of the current job. — setJobDescription","text":"Set description shown job description UI.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobDescription.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set a human readable description of the current job. — setJobDescription","text":"","code":"setJobDescription(value)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobDescription.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set a human readable description of the current job. — setJobDescription","text":"value job description current job.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobDescription.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Set a human readable description of the current job. — setJobDescription","text":"setJobDescription since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobDescription.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set a human readable description of the current job. — setJobDescription","text":"","code":"if (FALSE) { setJobDescription(\"This is an example job.\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobGroup.html","id":null,"dir":"Reference","previous_headings":"","what":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","title":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","text":"Assigns group ID jobs started thread group ID set different value cleared.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobGroup.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","text":"","code":"setJobGroup(groupId, description, interruptOnCancel)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobGroup.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","text":"groupId ID assigned job groups. description description job group ID. interruptOnCancel flag indicate job interrupted job cancellation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobGroup.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","text":"setJobGroup since 1.5.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setJobGroup.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Assigns a group ID to all the jobs started by this thread until the group ID is set to a\ndifferent value or cleared. — setJobGroup","text":"","code":"if (FALSE) { sparkR.session() setJobGroup(\"myJobGroup\", \"My job group description\", TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLocalProperty.html","id":null,"dir":"Reference","previous_headings":"","what":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","title":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","text":"Set local property affects jobs submitted thread, Spark fair scheduler pool.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLocalProperty.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","text":"","code":"setLocalProperty(key, value)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLocalProperty.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","text":"key key local property. value value local property.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLocalProperty.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","text":"setLocalProperty since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLocalProperty.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set a local property that affects jobs submitted from this thread, such as the\nSpark fair scheduler pool. — setLocalProperty","text":"","code":"if (FALSE) { setLocalProperty(\"spark.scheduler.pool\", \"poolA\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLogLevel.html","id":null,"dir":"Reference","previous_headings":"","what":"Set new log level — setLogLevel","title":"Set new log level — setLogLevel","text":"Set new log level: \"\", \"DEBUG\", \"ERROR\", \"FATAL\", \"INFO\", \"\", \"TRACE\", \"WARN\"","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLogLevel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Set new log level — setLogLevel","text":"","code":"setLogLevel(level)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLogLevel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Set new log level — setLogLevel","text":"level New log level","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLogLevel.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Set new log level — setLogLevel","text":"setLogLevel since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/setLogLevel.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Set new log level — setLogLevel","text":"","code":"if (FALSE) { setLogLevel(\"ERROR\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/show.html","id":null,"dir":"Reference","previous_headings":"","what":"show — show","title":"show — show","text":"eager evaluation enabled Spark object SparkDataFrame, evaluate SparkDataFrame print top rows SparkDataFrame, otherwise, print class type information Spark object.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/show.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"show — show","text":"","code":"# S4 method for Column show(object) # S4 method for GroupedData show(object) # S4 method for SparkDataFrame show(object) # S4 method for WindowSpec show(object) # S4 method for StreamingQuery show(object)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/show.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"show — show","text":"object Spark object. Can SparkDataFrame, Column, GroupedData, WindowSpec.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/show.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"show — show","text":"show(Column) since 1.4.0 show(GroupedData) since 1.4.0 show(SparkDataFrame) since 1.4.0 show(WindowSpec) since 2.0.0 show(StreamingQuery) since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/show.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"show — show","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) show(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/showDF.html","id":null,"dir":"Reference","previous_headings":"","what":"showDF — showDF","title":"showDF — showDF","text":"Print first numRows rows SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/showDF.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"showDF — showDF","text":"","code":"showDF(x, ...) # S4 method for SparkDataFrame showDF(x, numRows = 20, truncate = TRUE, vertical = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/showDF.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"showDF — showDF","text":"x SparkDataFrame. ... arguments passed methods. numRows number rows print. Defaults 20. truncate whether truncate long strings. TRUE, strings 20 characters truncated. However, set greater zero, truncates strings longer truncate characters cells aligned right. vertical whether print output rows vertically (one line per column value).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/showDF.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"showDF — showDF","text":"showDF since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/showDF.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"showDF — showDF","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) showDF(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":null,"dir":"Reference","previous_headings":"","what":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"path passed can either local file, file HDFS (Hadoop-supported filesystems), HTTP, HTTPS FTP URI. access file Spark jobs, use spark.getSparkFiles(fileName) find download location.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"","code":"spark.addFile(path, recursive = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"path path file added recursive Whether add files recursively path. Default FALSE.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"directory can given recursive option set true. Currently directories supported Hadoop-supported filesystems. Refer Hadoop-supported filesystems https://cwiki.apache.org/confluence/display/HADOOP2/HCFS. Note: path can added . Subsequent additions path ignored.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"spark.addFile since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.addFile.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Add a file or directory to be downloaded with this Spark job on every node. — spark.addFile","text":"","code":"if (FALSE) { spark.addFile(\"~/myfile\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":null,"dir":"Reference","previous_headings":"","what":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"spark.als learns latent factors collaborative filtering via alternating least squares. Users can call summary obtain fitted latent factors, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"","code":"spark.als(data, ...) # S4 method for SparkDataFrame spark.als( data, ratingCol = \"rating\", userCol = \"user\", itemCol = \"item\", rank = 10, regParam = 0.1, maxIter = 10, nonnegative = FALSE, implicitPrefs = FALSE, alpha = 1, numUserBlocks = 10, numItemBlocks = 10, checkpointInterval = 10, seed = 0 ) # S4 method for ALSModel summary(object) # S4 method for ALSModel predict(object, newData) # S4 method for ALSModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"data SparkDataFrame training. ... additional argument(s) passed method. ratingCol column name ratings. userCol column name user ids. Ids must (can coerced ) integers. itemCol column name item ids. Ids must (can coerced ) integers. rank rank matrix factorization (> 0). regParam regularization parameter (>= 0). maxIter maximum number iterations (>= 0). nonnegative logical value indicating whether apply nonnegativity constraints. implicitPrefs logical value indicating whether use implicit preference. alpha alpha parameter implicit preference formulation (>= 0). numUserBlocks number user blocks used parallelize computation (> 0). numItemBlocks number item blocks used parallelize computation (> 0). checkpointInterval number checkpoint intervals (>= 1) disable checkpoint (-1). Note: setting ignored checkpoint directory set. seed integer seed random number generation. object fitted ALS model. newData SparkDataFrame testing. path directory model saved. overwrite logical value indicating whether overwrite output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"spark.als returns fitted ALS model. summary returns summary information fitted model, list. list includes user (names user column), item (item column), rating (rating column), userFactors (estimated user factors), itemFactors (estimated item factors), rank (rank matrix factorization model). predict returns SparkDataFrame containing predicted values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"details, see MLlib: Collaborative Filtering.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"spark.als since 2.1.0 input rating dataframe ALS implementation deterministic. Nondeterministic data can cause failure fitting ALS model. example, order-sensitive operation like sampling repartition makes dataframe output nondeterministic, like sample(repartition(df, 2L), FALSE, 0.5, 1618L). Checkpointing sampled dataframe adding sort sampling can help make dataframe deterministic. summary(ALSModel) since 2.1.0 predict(ALSModel) since 2.1.0 write.ml(ALSModel, character) since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.als.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Alternating Least Squares (ALS) for Collaborative Filtering — spark.als","text":"","code":"if (FALSE) { ratings <- list(list(0, 0, 4.0), list(0, 1, 2.0), list(1, 1, 3.0), list(1, 2, 4.0), list(2, 1, 1.0), list(2, 2, 5.0)) df <- createDataFrame(ratings, c(\"user\", \"item\", \"rating\")) model <- spark.als(df, \"rating\", \"user\", \"item\") # extract latent factors stats <- summary(model) userFactors <- stats$userFactors itemFactors <- stats$itemFactors # make predictions predicted <- predict(model, df) showDF(predicted) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) # set other arguments modelS <- spark.als(df, \"rating\", \"user\", \"item\", rank = 20, regParam = 0.1, nonnegative = TRUE) statsS <- summary(modelS) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":null,"dir":"Reference","previous_headings":"","what":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"Fits bisecting k-means clustering model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models. Get fitted result bisecting k-means model. Note: saved-loaded model support method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"","code":"spark.bisectingKmeans(data, formula, ...) # S4 method for SparkDataFrame,formula spark.bisectingKmeans( data, formula, k = 4, maxIter = 20, seed = NULL, minDivisibleClusterSize = 1 ) # S4 method for BisectingKMeansModel summary(object) # S4 method for BisectingKMeansModel predict(object, newData) # S4 method for BisectingKMeansModel fitted(object, method = c(\"centers\", \"classes\")) # S4 method for BisectingKMeansModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-', '*', '^'. Note response variable formula empty spark.bisectingKmeans. ... additional argument(s) passed method. k desired number leaf clusters. Must > 1. actual number smaller divisible leaf clusters. maxIter maximum iteration number. seed random seed. minDivisibleClusterSize minimum number points (greater equal 1.0) minimum proportion points (less 1.0) divisible cluster. Note expert parameter. default value good enough cases. object fitted bisecting k-means model. newData SparkDataFrame testing. method type fitted results, \"centers\" cluster centers \"classes\" assigned classes. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"spark.bisectingKmeans returns fitted bisecting k-means model. summary returns summary information fitted model, list. list includes model's k (number cluster centers), coefficients (model cluster centers), size (number data points cluster), cluster (cluster centers transformed data; cluster NULL .loaded TRUE), .loaded (whether model loaded saved file). predict returns predicted values based bisecting k-means model. fitted returns SparkDataFrame containing fitted values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"spark.bisectingKmeans since 2.2.0 summary(BisectingKMeansModel) since 2.2.0 predict(BisectingKMeansModel) since 2.2.0 fitted since 2.2.0 write.ml(BisectingKMeansModel, character) since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.bisectingKmeans.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Bisecting K-Means Clustering Model — spark.bisectingKmeans","text":"","code":"if (FALSE) { sparkR.session() t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- spark.bisectingKmeans(df, Class ~ Survived, k = 4) summary(model) # get fitted result from a bisecting k-means model fitted.model <- fitted(model, \"centers\") showDF(fitted.model) # fitted values on training data fitted <- predict(model, df) head(select(fitted, \"Class\", \"prediction\")) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":null,"dir":"Reference","previous_headings":"","what":"Decision Tree Model for Regression and Classification — spark.decisionTree","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"spark.decisionTree fits Decision Tree Regression model Classification model SparkDataFrame. Users can call summary get summary fitted Decision Tree model, predict make predictions new data, write.ml/read.ml save/load fitted models. details, see Decision Tree Regression Decision Tree Classification","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"","code":"spark.decisionTree(data, formula, ...) # S4 method for SparkDataFrame,formula spark.decisionTree( data, formula, type = c(\"regression\", \"classification\"), maxDepth = 5, maxBins = 32, impurity = NULL, seed = NULL, minInstancesPerNode = 1, minInfoGain = 0, checkpointInterval = 10, maxMemoryInMB = 256, cacheNodeIds = FALSE, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for DecisionTreeRegressionModel summary(object) # S3 method for summary.DecisionTreeRegressionModel print(x, ...) # S4 method for DecisionTreeClassificationModel summary(object) # S3 method for summary.DecisionTreeClassificationModel print(x, ...) # S4 method for DecisionTreeRegressionModel predict(object, newData) # S4 method for DecisionTreeClassificationModel predict(object, newData) # S4 method for DecisionTreeRegressionModel,character write.ml(object, path, overwrite = FALSE) # S4 method for DecisionTreeClassificationModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', ':', '+', '-'. ... additional arguments passed method. type type model, one \"regression\" \"classification\", fit maxDepth Maximum depth tree (>= 0). maxBins Maximum number bins used discretizing continuous features choosing split features node. bins give higher granularity. Must >= 2 >= number categories categorical feature. impurity Criterion used information gain calculation. regression, must \"variance\". classification, must one \"entropy\" \"gini\", default \"gini\". seed integer seed random number generation. minInstancesPerNode Minimum number instances child must split. minInfoGain Minimum information gain split considered tree node. checkpointInterval Param set checkpoint interval (>= 1) disable checkpoint (-1). Note: setting ignored checkpoint directory set. maxMemoryInMB Maximum memory MiB allocated histogram aggregation. cacheNodeIds FALSE, algorithm pass trees executors match instances nodes. TRUE, algorithm cache node IDs instance. Caching can speed training deeper trees. Users can set often cache checkpointed disable setting checkpointInterval. handleInvalid handle invalid data (unseen labels NULL values) features label column string type classification model. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object fitted Decision Tree regression model classification model. x summary object Decision Tree regression model classification model returned summary. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"spark.decisionTree returns fitted Decision Tree model. summary returns summary information fitted model, list. list components includes formula (formula), numFeatures (number features), features (list features), featureImportances (feature importances), maxDepth (max depth trees). predict returns SparkDataFrame containing predicted labeled column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"spark.decisionTree since 2.3.0 summary(DecisionTreeRegressionModel) since 2.3.0 print.summary.DecisionTreeRegressionModel since 2.3.0 summary(DecisionTreeClassificationModel) since 2.3.0 print.summary.DecisionTreeClassificationModel since 2.3.0 predict(DecisionTreeRegressionModel) since 2.3.0 predict(DecisionTreeClassificationModel) since 2.3.0 write.ml(DecisionTreeRegressionModel, character) since 2.3.0 write.ml(DecisionTreeClassificationModel, character) since 2.3.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.decisionTree.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Decision Tree Model for Regression and Classification — spark.decisionTree","text":"","code":"if (FALSE) { # fit a Decision Tree Regression Model df <- createDataFrame(longley) model <- spark.decisionTree(df, Employed ~ ., type = \"regression\", maxDepth = 5, maxBins = 16) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) # fit a Decision Tree Classification Model t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- spark.decisionTree(df, Survived ~ Freq + Age, \"classification\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":null,"dir":"Reference","previous_headings":"","what":"Factorization Machines Classification Model — spark.fmClassifier","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"spark.fmClassifier fits factorization classification model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models. categorical data supported.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"","code":"spark.fmClassifier(data, formula, ...) # S4 method for SparkDataFrame,formula spark.fmClassifier( data, formula, factorSize = 8, fitLinear = TRUE, regParam = 0, miniBatchFraction = 1, initStd = 0.01, maxIter = 100, stepSize = 1, tol = 1e-06, solver = c(\"adamW\", \"gd\"), thresholds = NULL, seed = NULL, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for FMClassificationModel summary(object) # S4 method for FMClassificationModel predict(object, newData) # S4 method for FMClassificationModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"data SparkDataFrame observations labels model fitting. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. factorSize dimensionality factors. fitLinear whether fit linear term. # TODO Can express formula? regParam regularization parameter. miniBatchFraction mini-batch fraction parameter. initStd standard deviation initial coefficients. maxIter maximum iteration number. stepSize stepSize parameter. tol convergence tolerance iterations. solver solver parameter, supported options: \"gd\" (minibatch gradient descent) \"adamW\". thresholds binary classification, range [0, 1]. estimated probability class label 1 > threshold, predict 1, else 0. high threshold encourages model predict 0 often; low threshold encourages model predict 1 often. Note: Setting threshold p equivalent setting thresholds c(1-p, p). seed seed parameter weights initialization. handleInvalid handle invalid data (unseen labels NULL values) features label column string type. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object FM Classification model fitted spark.fmClassifier. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"spark.fmClassifier returns fitted Factorization Machines Classification Model. summary returns summary information fitted model, list. predict returns predicted values based FM Classification model.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"spark.fmClassifier since 3.1.0 summary(FMClassificationModel) since 3.1.0 predict(FMClassificationModel) since 3.1.0 write.ml(FMClassificationModel, character) since 3.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmClassifier.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Factorization Machines Classification Model — spark.fmClassifier","text":"","code":"if (FALSE) { df <- read.df(\"data/mllib/sample_binary_classification_data.txt\", source = \"libsvm\") # fit Factorization Machines Classification Model model <- spark.fmClassifier( df, label ~ features, regParam = 0.01, maxIter = 10, fitLinear = TRUE ) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":null,"dir":"Reference","previous_headings":"","what":"Factorization Machines Regression Model — spark.fmRegressor","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"spark.fmRegressor fits factorization regression model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"","code":"spark.fmRegressor(data, formula, ...) # S4 method for SparkDataFrame,formula spark.fmRegressor( data, formula, factorSize = 8, fitLinear = TRUE, regParam = 0, miniBatchFraction = 1, initStd = 0.01, maxIter = 100, stepSize = 1, tol = 1e-06, solver = c(\"adamW\", \"gd\"), seed = NULL, stringIndexerOrderType = c(\"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\") ) # S4 method for FMRegressionModel summary(object) # S4 method for FMRegressionModel predict(object, newData) # S4 method for FMRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"data SparkDataFrame observations labels model fitting. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. factorSize dimensionality factors. fitLinear whether fit linear term. # TODO Can express formula? regParam regularization parameter. miniBatchFraction mini-batch fraction parameter. initStd standard deviation initial coefficients. maxIter maximum iteration number. stepSize stepSize parameter. tol convergence tolerance iterations. solver solver parameter, supported options: \"gd\" (minibatch gradient descent) \"adamW\". seed seed parameter weights initialization. stringIndexerOrderType order categories string feature column. used decide base level string feature last category ordering dropped encoding strings. Supported options \"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\". default value \"frequencyDesc\". ordering set \"alphabetDesc\", drops category R encoding strings. object FM Regression Model model fitted spark.fmRegressor. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"spark.fmRegressor returns fitted Factorization Machines Regression Model. summary returns summary information fitted model, list. predict returns predicted values based FMRegressionModel.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"spark.fmRegressor since 3.1.0 summary(FMRegressionModel) since 3.1.0 predict(FMRegressionModel) since 3.1.0 write.ml(FMRegressionModel, character) since 3.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fmRegressor.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Factorization Machines Regression Model — spark.fmRegressor","text":"","code":"if (FALSE) { df <- read.df(\"data/mllib/sample_linear_regression_data.txt\", source = \"libsvm\") # fit Factorization Machines Regression Model model <- spark.fmRegressor( df, label ~ features, regParam = 0.01, maxIter = 10, fitLinear = TRUE ) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":null,"dir":"Reference","previous_headings":"","what":"FP-growth — spark.fpGrowth","title":"FP-growth — spark.fpGrowth","text":"parallel FP-growth algorithm mine frequent itemsets. spark.fpGrowth fits FP-growth model SparkDataFrame. Users can spark.freqItemsets get frequent itemsets, spark.associationRules get association rules, predict make predictions new data based generated association rules, write.ml/read.ml save/load fitted models. details, see FP-growth.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"FP-growth — spark.fpGrowth","text":"","code":"spark.fpGrowth(data, ...) spark.freqItemsets(object) spark.associationRules(object) # S4 method for SparkDataFrame spark.fpGrowth( data, minSupport = 0.3, minConfidence = 0.8, itemsCol = \"items\", numPartitions = NULL ) # S4 method for FPGrowthModel spark.freqItemsets(object) # S4 method for FPGrowthModel spark.associationRules(object) # S4 method for FPGrowthModel predict(object, newData) # S4 method for FPGrowthModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"FP-growth — spark.fpGrowth","text":"data SparkDataFrame training. ... additional argument(s) passed method. object fitted FPGrowth model. minSupport Minimal support level. minConfidence Minimal confidence level. itemsCol Features column name. numPartitions Number partitions used fitting. newData SparkDataFrame testing. path directory model saved. overwrite logical value indicating whether overwrite output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"FP-growth — spark.fpGrowth","text":"spark.fpGrowth returns fitted FPGrowth model. SparkDataFrame frequent itemsets. SparkDataFrame contains two columns: items (array type input column) freq (frequency itemset). SparkDataFrame association rules. SparkDataFrame contains five columns: antecedent (array type input column), consequent (array type input column), confidence (confidence rule) lift (lift rule) support (support rule) predict returns SparkDataFrame containing predicted values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"FP-growth — spark.fpGrowth","text":"spark.fpGrowth since 2.2.0 spark.freqItemsets(FPGrowthModel) since 2.2.0 spark.associationRules(FPGrowthModel) since 2.2.0 predict(FPGrowthModel) since 2.2.0 write.ml(FPGrowthModel, character) since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.fpGrowth.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"FP-growth — spark.fpGrowth","text":"","code":"if (FALSE) { raw_data <- read.df( \"data/mllib/sample_fpgrowth.txt\", source = \"csv\", schema = structType(structField(\"raw_items\", \"string\"))) data <- selectExpr(raw_data, \"split(raw_items, ' ') as items\") model <- spark.fpGrowth(data) # Show frequent itemsets frequent_itemsets <- spark.freqItemsets(model) showDF(frequent_itemsets) # Show association rules association_rules <- spark.associationRules(model) showDF(association_rules) # Predict on new data new_itemsets <- data.frame(items = c(\"t\", \"t,s\")) new_data <- selectExpr(createDataFrame(new_itemsets), \"split(items, ',') as items\") predict(model, new_data) # Save and load model path <- \"/path/to/model\" write.ml(model, path) read.ml(path) # Optional arguments baskets_data <- selectExpr(createDataFrame(itemsets), \"split(items, ',') as baskets\") another_model <- spark.fpGrowth(data, minSupport = 0.1, minConfidence = 0.5, itemsCol = \"baskets\", numPartitions = 10) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":null,"dir":"Reference","previous_headings":"","what":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"Fits multivariate gaussian mixture model SparkDataFrame, similarly R's mvnormalmixEM(). Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"","code":"spark.gaussianMixture(data, formula, ...) # S4 method for SparkDataFrame,formula spark.gaussianMixture(data, formula, k = 2, maxIter = 100, tol = 0.01) # S4 method for GaussianMixtureModel summary(object) # S4 method for GaussianMixtureModel predict(object, newData) # S4 method for GaussianMixtureModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. Note response variable formula empty spark.gaussianMixture. ... additional arguments passed method. k number independent Gaussians mixture model. maxIter maximum iteration number. tol convergence tolerance. object fitted gaussian mixture model. newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"spark.gaussianMixture returns fitted multivariate gaussian mixture model. summary returns summary fitted model, list. list includes model's lambda (lambda), mu (mu), sigma (sigma), loglik (loglik), posterior (posterior). predict returns SparkDataFrame containing predicted labels column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"spark.gaussianMixture since 2.1.0 summary(GaussianMixtureModel) since 2.1.0 predict(GaussianMixtureModel) since 2.1.0 write.ml(GaussianMixtureModel, character) since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gaussianMixture.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Multivariate Gaussian Mixture Model (GMM) — spark.gaussianMixture","text":"","code":"if (FALSE) { sparkR.session() library(mvtnorm) set.seed(100) a <- rmvnorm(4, c(0, 0)) b <- rmvnorm(6, c(3, 4)) data <- rbind(a, b) df <- createDataFrame(as.data.frame(data)) model <- spark.gaussianMixture(df, ~ V1 + V2, k = 2) summary(model) # fitted values on training data fitted <- predict(model, df) head(select(fitted, \"V1\", \"prediction\")) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":null,"dir":"Reference","previous_headings":"","what":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"spark.gbt fits Gradient Boosted Tree Regression model Classification model SparkDataFrame. Users can call summary get summary fitted Gradient Boosted Tree model, predict make predictions new data, write.ml/read.ml save/load fitted models. details, see GBT Regression GBT Classification","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"","code":"spark.gbt(data, formula, ...) # S4 method for SparkDataFrame,formula spark.gbt( data, formula, type = c(\"regression\", \"classification\"), maxDepth = 5, maxBins = 32, maxIter = 20, stepSize = 0.1, lossType = NULL, seed = NULL, subsamplingRate = 1, minInstancesPerNode = 1, minInfoGain = 0, checkpointInterval = 10, maxMemoryInMB = 256, cacheNodeIds = FALSE, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for GBTRegressionModel summary(object) # S3 method for summary.GBTRegressionModel print(x, ...) # S4 method for GBTClassificationModel summary(object) # S3 method for summary.GBTClassificationModel print(x, ...) # S4 method for GBTRegressionModel predict(object, newData) # S4 method for GBTClassificationModel predict(object, newData) # S4 method for GBTRegressionModel,character write.ml(object, path, overwrite = FALSE) # S4 method for GBTClassificationModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', ':', '+', '-', '*', '^'. ... additional arguments passed method. type type model, one \"regression\" \"classification\", fit maxDepth Maximum depth tree (>= 0). maxBins Maximum number bins used discretizing continuous features choosing split features node. bins give higher granularity. Must >= 2 >= number categories categorical feature. maxIter Param maximum number iterations (>= 0). stepSize Param Step size used iteration optimization. lossType Loss function GBT tries minimize. classification, must \"logistic\". regression, must one \"squared\" (L2) \"absolute\" (L1), default \"squared\". seed integer seed random number generation. subsamplingRate Fraction training data used learning decision tree, range (0, 1]. minInstancesPerNode Minimum number instances child must split. split causes left right child fewer minInstancesPerNode, split discarded invalid. >= 1. minInfoGain Minimum information gain split considered tree node. checkpointInterval Param set checkpoint interval (>= 1) disable checkpoint (-1). Note: setting ignored checkpoint directory set. maxMemoryInMB Maximum memory MiB allocated histogram aggregation. cacheNodeIds FALSE, algorithm pass trees executors match instances nodes. TRUE, algorithm cache node IDs instance. Caching can speed training deeper trees. Users can set often cache checkpointed disable setting checkpointInterval. handleInvalid handle invalid data (unseen labels NULL values) features label column string type classification model. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object fitted Gradient Boosted Tree regression model classification model. x summary object Gradient Boosted Tree regression model classification model returned summary. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"spark.gbt returns fitted Gradient Boosted Tree model. summary returns summary information fitted model, list. list components includes formula (formula), numFeatures (number features), features (list features), featureImportances (feature importances), maxDepth (max depth trees), numTrees (number trees), treeWeights (tree weights). predict returns SparkDataFrame containing predicted labeled column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"spark.gbt since 2.1.0 summary(GBTRegressionModel) since 2.1.0 print.summary.GBTRegressionModel since 2.1.0 summary(GBTClassificationModel) since 2.1.0 print.summary.GBTClassificationModel since 2.1.0 predict(GBTRegressionModel) since 2.1.0 predict(GBTClassificationModel) since 2.1.0 write.ml(GBTRegressionModel, character) since 2.1.0 write.ml(GBTClassificationModel, character) since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.gbt.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Gradient Boosted Tree Model for Regression and Classification — spark.gbt","text":"","code":"if (FALSE) { # fit a Gradient Boosted Tree Regression Model df <- createDataFrame(longley) model <- spark.gbt(df, Employed ~ ., type = \"regression\", maxDepth = 5, maxBins = 16) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) # fit a Gradient Boosted Tree Classification Model # label must be binary - Only binary classification is supported for GBT. t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- spark.gbt(df, Survived ~ Age + Freq, \"classification\") # numeric label is also supported t2 <- as.data.frame(Titanic) t2$NumericGender <- ifelse(t2$Sex == \"Male\", 0, 1) df <- createDataFrame(t2) model <- spark.gbt(df, NumericGender ~ ., type = \"classification\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"Get absolute path file added spark.addFile.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"","code":"spark.getSparkFiles(fileName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"fileName name file added spark.addFile","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"absolute path file added spark.addFile.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"spark.getSparkFiles since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFiles.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the absolute path of a file added through spark.addFile. — spark.getSparkFiles","text":"","code":"if (FALSE) { spark.getSparkFiles(\"myfile\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFilesRootDirectory.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","title":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","text":"Get root directory contains files added spark.addFile.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFilesRootDirectory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","text":"","code":"spark.getSparkFilesRootDirectory()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFilesRootDirectory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","text":"root directory contains files added spark.addFile","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFilesRootDirectory.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","text":"spark.getSparkFilesRootDirectory since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.getSparkFilesRootDirectory.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the root directory that contains files added through spark.addFile. — spark.getSparkFilesRootDirectory","text":"","code":"if (FALSE) { spark.getSparkFilesRootDirectory() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":null,"dir":"Reference","previous_headings":"","what":"Generalized Linear Models — spark.glm","title":"Generalized Linear Models — spark.glm","text":"Fits generalized linear model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Generalized Linear Models — spark.glm","text":"","code":"spark.glm(data, formula, ...) # S4 method for SparkDataFrame,formula spark.glm( data, formula, family = gaussian, tol = 1e-06, maxIter = 25, weightCol = NULL, regParam = 0, var.power = 0, link.power = 1 - var.power, stringIndexerOrderType = c(\"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\"), offsetCol = NULL ) # S4 method for GeneralizedLinearRegressionModel summary(object) # S3 method for summary.GeneralizedLinearRegressionModel print(x, ...) # S4 method for GeneralizedLinearRegressionModel predict(object, newData) # S4 method for GeneralizedLinearRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Generalized Linear Models — spark.glm","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-', '*', '^'. ... additional arguments passed method. family description error distribution link function used model. can character string naming family function, family function result call family function. Refer R family https://stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html. Currently families supported: binomial, gaussian, Gamma, poisson tweedie. Note two ways specify tweedie family. Set family = \"tweedie\" specify var.power link.power; package statmod loaded, tweedie family specified using family definition therein, .e., tweedie(var.power, link.power). tol positive convergence tolerance iterations. maxIter integer giving maximal number IRLS iterations. weightCol weight column name. set NULL, treat instance weights 1.0. regParam regularization parameter L2 regularization. var.power power variance function Tweedie distribution provides relationship variance mean distribution. applicable Tweedie family. link.power index power link function. applicable Tweedie family. stringIndexerOrderType order categories string feature column. used decide base level string feature last category ordering dropped encoding strings. Supported options \"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\". default value \"frequencyDesc\". ordering set \"alphabetDesc\", drops category R encoding strings. offsetCol offset column name. set empty, treat instance offsets 0.0. feature specified offset constant coefficient 1.0. object fitted generalized linear model. x summary object fitted generalized linear model returned summary function. newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Generalized Linear Models — spark.glm","text":"spark.glm returns fitted generalized linear model. summary returns summary information fitted model, list. list components includes least coefficients (coefficients matrix, includes coefficients, standard error coefficients, t value p value), null.deviance (null/residual degrees freedom), aic (AIC) iter (number iterations IRLS takes). collinear columns data, coefficients matrix provides coefficients. predict returns SparkDataFrame containing predicted labels column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Generalized Linear Models — spark.glm","text":"spark.glm since 2.0.0 summary(GeneralizedLinearRegressionModel) since 2.0.0 print.summary.GeneralizedLinearRegressionModel since 2.0.0 predict(GeneralizedLinearRegressionModel) since 1.5.0 write.ml(GeneralizedLinearRegressionModel, character) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.glm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Generalized Linear Models — spark.glm","text":"","code":"if (FALSE) { sparkR.session() t <- as.data.frame(Titanic, stringsAsFactors = FALSE) df <- createDataFrame(t) model <- spark.glm(df, Freq ~ Sex + Age, family = \"gaussian\") summary(model) # fitted values on training data fitted <- predict(model, df) head(select(fitted, \"Freq\", \"prediction\")) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) # note that the default string encoding is different from R's glm model2 <- glm(Freq ~ Sex + Age, family = \"gaussian\", data = t) summary(model2) # use stringIndexerOrderType = \"alphabetDesc\" to force string encoding # to be consistent with R model3 <- spark.glm(df, Freq ~ Sex + Age, family = \"gaussian\", stringIndexerOrderType = \"alphabetDesc\") summary(model3) # fit tweedie model model <- spark.glm(df, Freq ~ Sex + Age, family = \"tweedie\", var.power = 1.2, link.power = 0) summary(model) # use the tweedie family from statmod library(statmod) model <- spark.glm(df, Freq ~ Sex + Age, family = tweedie(1.2, 0)) summary(model) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":null,"dir":"Reference","previous_headings":"","what":"Isotonic Regression Model — spark.isoreg","title":"Isotonic Regression Model — spark.isoreg","text":"Fits Isotonic Regression model SparkDataFrame, similarly R's isoreg(). Users can print, make predictions produced model save model input path.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Isotonic Regression Model — spark.isoreg","text":"","code":"spark.isoreg(data, formula, ...) # S4 method for SparkDataFrame,formula spark.isoreg( data, formula, isotonic = TRUE, featureIndex = 0, weightCol = NULL ) # S4 method for IsotonicRegressionModel summary(object) # S4 method for IsotonicRegressionModel predict(object, newData) # S4 method for IsotonicRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Isotonic Regression Model — spark.isoreg","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. isotonic Whether output sequence isotonic/increasing (TRUE) antitonic/decreasing (FALSE). featureIndex index feature featuresCol vector column (default: 0), effect otherwise. weightCol weight column name. object fitted IsotonicRegressionModel. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Isotonic Regression Model — spark.isoreg","text":"spark.isoreg returns fitted Isotonic Regression model. summary returns summary information fitted model, list. list includes model's boundaries (boundaries increasing order) predictions (predictions associated boundaries index). predict returns SparkDataFrame containing predicted values.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Isotonic Regression Model — spark.isoreg","text":"spark.isoreg since 2.1.0 summary(IsotonicRegressionModel) since 2.1.0 predict(IsotonicRegressionModel) since 2.1.0 write.ml(IsotonicRegression, character) since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.isoreg.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Isotonic Regression Model — spark.isoreg","text":"","code":"if (FALSE) { sparkR.session() data <- list(list(7.0, 0.0), list(5.0, 1.0), list(3.0, 2.0), list(5.0, 3.0), list(1.0, 4.0)) df <- createDataFrame(data, c(\"label\", \"feature\")) model <- spark.isoreg(df, label ~ feature, isotonic = FALSE) # return model boundaries and prediction as lists result <- summary(model, df) # prediction based on fitted model predict_data <- list(list(-2.0), list(-1.0), list(0.5), list(0.75), list(1.0), list(2.0), list(9.0)) predict_df <- createDataFrame(predict_data, c(\"feature\")) # get prediction column predict_result <- collect(select(predict(model, predict_df), \"prediction\")) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":null,"dir":"Reference","previous_headings":"","what":"K-Means Clustering Model — spark.kmeans","title":"K-Means Clustering Model — spark.kmeans","text":"Fits k-means clustering model SparkDataFrame, similarly R's kmeans(). Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"K-Means Clustering Model — spark.kmeans","text":"","code":"spark.kmeans(data, formula, ...) # S4 method for SparkDataFrame,formula spark.kmeans( data, formula, k = 2, maxIter = 20, initMode = c(\"k-means||\", \"random\"), seed = NULL, initSteps = 2, tol = 1e-04 ) # S4 method for KMeansModel summary(object) # S4 method for KMeansModel predict(object, newData) # S4 method for KMeansModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"K-Means Clustering Model — spark.kmeans","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. Note response variable formula empty spark.kmeans. ... additional argument(s) passed method. k number centers. maxIter maximum iteration number. initMode initialization algorithm chosen fit model. seed random seed cluster initialization. initSteps number steps k-means|| initialization mode. advanced setting, default 2 almost always enough. Must > 0. tol convergence tolerance iterations. object fitted k-means model. newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"K-Means Clustering Model — spark.kmeans","text":"spark.kmeans returns fitted k-means model. summary returns summary information fitted model, list. list includes model's k (configured number cluster centers), coefficients (model cluster centers), size (number data points cluster), cluster (cluster centers transformed data), .loaded (whether model loaded saved file), clusterSize (actual number cluster centers. using initMode = \"random\", clusterSize may equal k). predict returns predicted values based k-means model.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"K-Means Clustering Model — spark.kmeans","text":"spark.kmeans since 2.0.0 summary(KMeansModel) since 2.0.0 predict(KMeansModel) since 2.0.0 write.ml(KMeansModel, character) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kmeans.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"K-Means Clustering Model — spark.kmeans","text":"","code":"if (FALSE) { sparkR.session() t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- spark.kmeans(df, Class ~ Survived, k = 4, initMode = \"random\") summary(model) # fitted values on training data fitted <- predict(model, df) head(select(fitted, \"Class\", \"prediction\")) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and print savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":null,"dir":"Reference","previous_headings":"","what":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"spark.kstest Conduct two-sided Kolmogorov-Smirnov (KS) test data sampled continuous distribution. comparing largest difference empirical cumulative distribution sample data theoretical distribution can provide test null hypothesis sample data comes theoretical distribution. Users can call summary obtain summary test, print.summary.KSTest print summary result.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"","code":"spark.kstest(data, ...) # S4 method for SparkDataFrame spark.kstest( data, testCol = \"test\", nullHypothesis = c(\"norm\"), distParams = c(0, 1) ) # S4 method for KSTest summary(object) # S3 method for summary.KSTest print(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"data SparkDataFrame user data. ... additional argument(s) passed method. testCol column name test data . column double type. nullHypothesis name theoretical distribution tested . Currently \"norm\" normal distribution supported. distParams parameters(s) distribution. nullHypothesis = \"norm\", can provide vector mean standard deviation distribution. none provided, standard normal used. one provided, standard deviation set one. object test result object KSTest spark.kstest. x summary object KSTest returned summary.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"spark.kstest returns test result object. summary returns summary information KSTest object, list. list includes p.value (p-value), statistic (test statistic computed test), nullHypothesis (null hypothesis parameters tested ) degreesOfFreedom (degrees freedom test).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"spark.kstest since 2.1.0 summary(KSTest) since 2.1.0 print.summary.KSTest since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.kstest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(One-Sample) Kolmogorov-Smirnov Test — spark.kstest","text":"","code":"if (FALSE) { data <- data.frame(test = c(0.1, 0.15, 0.2, 0.3, 0.25)) df <- createDataFrame(data) test <- spark.kstest(df, \"test\", \"norm\", c(0, 1)) # get a summary of the test result testSummary <- summary(test) testSummary # print out the summary in an organized way print.summary.KSTest(testSummary) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":null,"dir":"Reference","previous_headings":"","what":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"Run function list elements, distributing computations Spark. Applies function manner similar doParallel lapply elements list. computations distributed using Spark. conceptually following code: lapply(list, func)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"","code":"spark.lapply(list, func)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"list list elements func function takes one argument.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"list results (exact type determined function)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"Known limitations: variable scoping capture: compared R's rich support variable resolutions, distributed nature SparkR limits variables resolved runtime. variables available lexical scoping embedded closure function available read-variables within function. environment variables stored temporary variables outside function, directly accessed within function. loading external packages: order use package, need load inside closure. example, rely MASS module, use :","code":"train <- function(hyperparam) { library(MASS) lm.ridge(\"y ~ x+z\", data, lambda=hyperparam) model }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"spark.lapply since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lapply.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Run a function over a list of elements, distributing the computations with Spark — spark.lapply","text":"","code":"if (FALSE) { sparkR.session() doubled <- spark.lapply(1:10, function(x) {2 * x}) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":null,"dir":"Reference","previous_headings":"","what":"Latent Dirichlet Allocation — spark.lda","title":"Latent Dirichlet Allocation — spark.lda","text":"spark.lda fits Latent Dirichlet Allocation model SparkDataFrame. Users can call summary get summary fitted LDA model, spark.posterior compute posterior probabilities new data, spark.perplexity compute log perplexity new data write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Latent Dirichlet Allocation — spark.lda","text":"","code":"spark.lda(data, ...) spark.posterior(object, newData) spark.perplexity(object, data) # S4 method for SparkDataFrame spark.lda( data, features = \"features\", k = 10, maxIter = 20, optimizer = c(\"online\", \"em\"), subsamplingRate = 0.05, topicConcentration = -1, docConcentration = -1, customizedStopWords = \"\", maxVocabSize = bitwShiftL(1, 18) ) # S4 method for LDAModel summary(object, maxTermsPerTopic) # S4 method for LDAModel,SparkDataFrame spark.perplexity(object, data) # S4 method for LDAModel,SparkDataFrame spark.posterior(object, newData) # S4 method for LDAModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Latent Dirichlet Allocation — spark.lda","text":"data SparkDataFrame training. ... additional argument(s) passed method. object Latent Dirichlet Allocation model fitted spark.lda. newData SparkDataFrame testing. features Features column name. Either libSVM-format column character-format column valid. k Number topics. maxIter Maximum iterations. optimizer Optimizer train LDA model, \"online\" \"em\", default \"online\". subsamplingRate (online optimizer) Fraction corpus sampled used iteration mini-batch gradient descent, range (0, 1]. topicConcentration concentration parameter (commonly named beta eta) prior placed topic distributions terms, default -1 set automatically Spark side. Use summary retrieve effective topicConcentration. 1-size numeric accepted. docConcentration concentration parameter (commonly named alpha) prior placed documents distributions topics (theta), default -1 set automatically Spark side. Use summary retrieve effective docConcentration. 1-size k-size numeric accepted. customizedStopWords stopwords need removed given corpus. Ignore parameter libSVM-format column used features column. maxVocabSize maximum vocabulary size, default 1 << 18 maxTermsPerTopic Maximum number terms collect topic. Default value 10. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Latent Dirichlet Allocation — spark.lda","text":"spark.lda returns fitted Latent Dirichlet Allocation model. summary returns summary information fitted model, list. list includes docConcentration concentration parameter commonly named alpha prior placed documents distributions topics theta topicConcentration concentration parameter commonly named beta eta prior placed topic distributions terms logLikelihood log likelihood entire corpus logPerplexity log perplexity isDistributed TRUE distributed model FALSE local model vocabSize number terms corpus topics top 10 terms weights topics vocabulary whole terms training corpus, NULL libsvm format file used training set trainingLogLikelihood Log likelihood observed tokens training set, given current parameter estimates: log P(docs | topics, topic distributions docs, Dirichlet hyperparameters) distributed LDA model (.e., optimizer = \"em\") logPrior Log probability current parameter estimate: log P(topics, topic distributions docs | Dirichlet hyperparameters) distributed LDA model (.e., optimizer = \"em\") spark.perplexity returns log perplexity given SparkDataFrame, log perplexity training data missing argument \"data\". spark.posterior returns SparkDataFrame containing posterior probabilities vectors named \"topicDistribution\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Latent Dirichlet Allocation — spark.lda","text":"spark.lda since 2.1.0 summary(LDAModel) since 2.1.0 spark.perplexity(LDAModel) since 2.1.0 spark.posterior(LDAModel) since 2.1.0 write.ml(LDAModel, character) since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lda.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Latent Dirichlet Allocation — spark.lda","text":"","code":"if (FALSE) { text <- read.df(\"data/mllib/sample_lda_libsvm_data.txt\", source = \"libsvm\") model <- spark.lda(data = text, optimizer = \"em\") # get a summary of the model summary(model) # compute posterior probabilities posterior <- spark.posterior(model, text) showDF(posterior) # compute perplexity perplexity <- spark.perplexity(model, text) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":null,"dir":"Reference","previous_headings":"","what":"Linear Regression Model — spark.lm","title":"Linear Regression Model — spark.lm","text":"spark.lm fits linear regression model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Linear Regression Model — spark.lm","text":"","code":"spark.lm(data, formula, ...) # S4 method for SparkDataFrame,formula spark.lm( data, formula, maxIter = 100L, regParam = 0, elasticNetParam = 0, tol = 1e-06, standardization = TRUE, solver = c(\"auto\", \"l-bfgs\", \"normal\"), weightCol = NULL, aggregationDepth = 2L, loss = c(\"squaredError\", \"huber\"), epsilon = 1.35, stringIndexerOrderType = c(\"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\") ) # S4 method for LinearRegressionModel summary(object) # S4 method for LinearRegressionModel predict(object, newData) # S4 method for LinearRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Linear Regression Model — spark.lm","text":"data SparkDataFrame observations labels model fitting. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. maxIter maximum iteration number. regParam regularization parameter. elasticNetParam ElasticNet mixing parameter, range [0, 1]. alpha = 0, penalty L2 penalty. alpha = 1, L1 penalty. tol convergence tolerance iterations. standardization whether standardize training features fitting model. solver solver algorithm optimization. Supported options: \"l-bfgs\", \"normal\" \"auto\". weightCol weight column name. aggregationDepth suggested depth treeAggregate (>= 2). loss loss function optimized. Supported options: \"squaredError\" \"huber\". epsilon shape parameter control amount robustness. stringIndexerOrderType order categories string feature column. used decide base level string feature last category ordering dropped encoding strings. Supported options \"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\". default value \"frequencyDesc\". ordering set \"alphabetDesc\", drops category R encoding strings. object Linear Regression Model model fitted spark.lm. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Linear Regression Model — spark.lm","text":"spark.lm returns fitted Linear Regression Model. summary returns summary information fitted model, list. predict returns predicted values based LinearRegressionModel.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Linear Regression Model — spark.lm","text":"spark.lm since 3.1.0 summary(LinearRegressionModel) since 3.1.0 predict(LinearRegressionModel) since 3.1.0 write.ml(LinearRegressionModel, character) since 3.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.lm.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Linear Regression Model — spark.lm","text":"","code":"if (FALSE) { df <- read.df(\"data/mllib/sample_linear_regression_data.txt\", source = \"libsvm\") # fit Linear Regression Model model <- spark.lm(df, label ~ features, regParam = 0.01, maxIter = 1) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":null,"dir":"Reference","previous_headings":"","what":"Logistic Regression Model — spark.logit","title":"Logistic Regression Model — spark.logit","text":"Fits logistic regression model SparkDataFrame. supports \"binomial\": Binary logistic regression pivoting; \"multinomial\": Multinomial logistic (softmax) regression without pivoting, similar glmnet. Users can print, make predictions produced model save model input path.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Logistic Regression Model — spark.logit","text":"","code":"spark.logit(data, formula, ...) # S4 method for SparkDataFrame,formula spark.logit( data, formula, regParam = 0, elasticNetParam = 0, maxIter = 100, tol = 1e-06, family = \"auto\", standardization = TRUE, thresholds = 0.5, weightCol = NULL, aggregationDepth = 2, lowerBoundsOnCoefficients = NULL, upperBoundsOnCoefficients = NULL, lowerBoundsOnIntercepts = NULL, upperBoundsOnIntercepts = NULL, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for LogisticRegressionModel summary(object) # S4 method for LogisticRegressionModel predict(object, newData) # S4 method for LogisticRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Logistic Regression Model — spark.logit","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. regParam regularization parameter. elasticNetParam ElasticNet mixing parameter. alpha = 0.0, penalty L2 penalty. alpha = 1.0, L1 penalty. 0.0 < alpha < 1.0, penalty combination L1 L2. Default 0.0 L2 penalty. maxIter maximum iteration number. tol convergence tolerance iterations. family name family description label distribution used model. Supported options: \"auto\": Automatically select family based number classes: number classes == 1 || number classes == 2, set \"binomial\". Else, set \"multinomial\". \"binomial\": Binary logistic regression pivoting. \"multinomial\": Multinomial logistic (softmax) regression without pivoting. standardization whether standardize training features fitting model. coefficients models always returned original scale, transparent users. Note /without standardization, models always converged solution regularization applied. Default TRUE, glmnet. thresholds binary classification, range [0, 1]. estimated probability class label 1 > threshold, predict 1, else 0. high threshold encourages model predict 0 often; low threshold encourages model predict 1 often. Note: Setting threshold p equivalent setting thresholds c(1-p, p). multiclass (binary) classification adjust probability predicting class. Array must length equal number classes, values > 0, excepting one value may 0. class largest value p/t predicted, p original probability class t class's threshold. weightCol weight column name. aggregationDepth depth treeAggregate (greater equal 2). dimensions features number partitions large, param adjusted larger size. expert parameter. Default value good cases. lowerBoundsOnCoefficients lower bounds coefficients fitting bound constrained optimization. bound matrix must compatible shape (1, number features) binomial regression, (number classes, number features) multinomial regression. R matrix. upperBoundsOnCoefficients upper bounds coefficients fitting bound constrained optimization. bound matrix must compatible shape (1, number features) binomial regression, (number classes, number features) multinomial regression. R matrix. lowerBoundsOnIntercepts lower bounds intercepts fitting bound constrained optimization. bounds vector size must equal 1 binomial regression, number classes multinomial regression. upperBoundsOnIntercepts upper bounds intercepts fitting bound constrained optimization. bound vector size must equal 1 binomial regression, number classes multinomial regression. handleInvalid handle invalid data (unseen labels NULL values) features label column string type. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object LogisticRegressionModel fitted spark.logit. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Logistic Regression Model — spark.logit","text":"spark.logit returns fitted logistic regression model. summary returns summary information fitted model, list. list includes coefficients (coefficients matrix fitted model). predict returns predicted values based LogisticRegressionModel.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Logistic Regression Model — spark.logit","text":"spark.logit since 2.1.0 summary(LogisticRegressionModel) since 2.1.0 predict(LogisticRegressionModel) since 2.1.0 write.ml(LogisticRegression, character) since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.logit.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Logistic Regression Model — spark.logit","text":"","code":"if (FALSE) { sparkR.session() # binary logistic regression t <- as.data.frame(Titanic) training <- createDataFrame(t) model <- spark.logit(training, Survived ~ ., regParam = 0.5) summary <- summary(model) # fitted values on training data fitted <- predict(model, training) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and predict # Note that summary deos not work on loaded model savedModel <- read.ml(path) summary(savedModel) # binary logistic regression against two classes with # upperBoundsOnCoefficients and upperBoundsOnIntercepts ubc <- matrix(c(1.0, 0.0, 1.0, 0.0), nrow = 1, ncol = 4) model <- spark.logit(training, Species ~ ., upperBoundsOnCoefficients = ubc, upperBoundsOnIntercepts = 1.0) # multinomial logistic regression model <- spark.logit(training, Class ~ ., regParam = 0.5) summary <- summary(model) # multinomial logistic regression with # lowerBoundsOnCoefficients and lowerBoundsOnIntercepts lbc <- matrix(c(0.0, -1.0, 0.0, -1.0, 0.0, -1.0, 0.0, -1.0), nrow = 2, ncol = 4) lbi <- as.array(c(0.0, 0.0)) model <- spark.logit(training, Species ~ ., family = \"multinomial\", lowerBoundsOnCoefficients = lbc, lowerBoundsOnIntercepts = lbi) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":null,"dir":"Reference","previous_headings":"","what":"Multilayer Perceptron Classification Model — spark.mlp","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"spark.mlp fits multi-layer perceptron neural network model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models. categorical data supported. details, see Multilayer Perceptron","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"","code":"spark.mlp(data, formula, ...) # S4 method for SparkDataFrame,formula spark.mlp( data, formula, layers, blockSize = 128, solver = \"l-bfgs\", maxIter = 100, tol = 1e-06, stepSize = 0.03, seed = NULL, initialWeights = NULL, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for MultilayerPerceptronClassificationModel summary(object) # S4 method for MultilayerPerceptronClassificationModel predict(object, newData) # S4 method for MultilayerPerceptronClassificationModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"data SparkDataFrame observations labels model fitting. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional arguments passed method. layers integer vector containing number nodes layer. blockSize blockSize parameter. solver solver parameter, supported options: \"gd\" (minibatch gradient descent) \"l-bfgs\". maxIter maximum iteration number. tol convergence tolerance iterations. stepSize stepSize parameter. seed seed parameter weights initialization. initialWeights initialWeights parameter weights initialization, numeric vector. handleInvalid handle invalid data (unseen labels NULL values) features label column string type. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object Multilayer Perceptron Classification Model fitted spark.mlp newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"spark.mlp returns fitted Multilayer Perceptron Classification Model. summary returns summary information fitted model, list. list includes numOfInputs (number inputs), numOfOutputs (number outputs), layers (array layer sizes including input output layers), weights (weights layers). weights, numeric vector length equal expected given architecture (.e., 8-10-2 network, 112 connection weights). predict returns SparkDataFrame containing predicted labeled column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"spark.mlp since 2.1.0 summary(MultilayerPerceptronClassificationModel) since 2.1.0 predict(MultilayerPerceptronClassificationModel) since 2.1.0 write.ml(MultilayerPerceptronClassificationModel, character) since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.mlp.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Multilayer Perceptron Classification Model — spark.mlp","text":"","code":"if (FALSE) { df <- read.df(\"data/mllib/sample_multiclass_classification_data.txt\", source = \"libsvm\") # fit a Multilayer Perceptron Classification Model model <- spark.mlp(df, label ~ features, blockSize = 128, layers = c(4, 3), solver = \"l-bfgs\", maxIter = 100, tol = 0.5, stepSize = 1, seed = 1, initialWeights = c(0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 9, 9, 9, 9, 9)) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":null,"dir":"Reference","previous_headings":"","what":"Naive Bayes Models — spark.naiveBayes","title":"Naive Bayes Models — spark.naiveBayes","text":"spark.naiveBayes fits Bernoulli naive Bayes model SparkDataFrame. Users can call summary print summary fitted model, predict make predictions new data, write.ml/read.ml save/load fitted models. categorical data supported.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Naive Bayes Models — spark.naiveBayes","text":"","code":"spark.naiveBayes(data, formula, ...) # S4 method for SparkDataFrame,formula spark.naiveBayes( data, formula, smoothing = 1, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for NaiveBayesModel summary(object) # S4 method for NaiveBayesModel predict(object, newData) # S4 method for NaiveBayesModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Naive Bayes Models — spark.naiveBayes","text":"data SparkDataFrame observations labels model fitting. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-'. ... additional argument(s) passed method. Currently smoothing. smoothing smoothing parameter. handleInvalid handle invalid data (unseen labels NULL values) features label column string type. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object naive Bayes model fitted spark.naiveBayes. newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Naive Bayes Models — spark.naiveBayes","text":"spark.naiveBayes returns fitted naive Bayes model. summary returns summary information fitted model, list. list includes apriori (label distribution) tables (conditional probabilities given target label). predict returns SparkDataFrame containing predicted labeled column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Naive Bayes Models — spark.naiveBayes","text":"spark.naiveBayes since 2.0.0 summary(NaiveBayesModel) since 2.0.0 predict(NaiveBayesModel) since 2.0.0 write.ml(NaiveBayesModel, character) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.naiveBayes.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Naive Bayes Models — spark.naiveBayes","text":"","code":"if (FALSE) { data <- as.data.frame(UCBAdmissions) df <- createDataFrame(data) # fit a Bernoulli naive Bayes model model <- spark.naiveBayes(df, Admit ~ Gender + Dept, smoothing = 0) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":null,"dir":"Reference","previous_headings":"","what":"PowerIterationClustering — spark.assignClusters","title":"PowerIterationClustering — spark.assignClusters","text":"scalable graph clustering algorithm. Users can call spark.assignClusters return cluster assignment input vertex. Run PIC algorithm returns cluster assignment input vertex.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"PowerIterationClustering — spark.assignClusters","text":"","code":"spark.assignClusters(data, ...) # S4 method for SparkDataFrame spark.assignClusters( data, k = 2L, initMode = c(\"random\", \"degree\"), maxIter = 20L, sourceCol = \"src\", destinationCol = \"dst\", weightCol = NULL )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"PowerIterationClustering — spark.assignClusters","text":"data SparkDataFrame. ... additional argument(s) passed method. k number clusters create. initMode initialization algorithm; \"random\" \"degree\" maxIter maximum number iterations. sourceCol name input column source vertex IDs. destinationCol name input column destination vertex IDs weightCol weight column name. set NULL, treat instance weights 1.0.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"PowerIterationClustering — spark.assignClusters","text":"dataset contains columns vertex id corresponding cluster id. schema : id: integer, cluster: integer","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"PowerIterationClustering — spark.assignClusters","text":"spark.assignClusters(SparkDataFrame) since 3.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.powerIterationClustering.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"PowerIterationClustering — spark.assignClusters","text":"","code":"if (FALSE) { df <- createDataFrame(list(list(0L, 1L, 1.0), list(0L, 2L, 1.0), list(1L, 2L, 1.0), list(3L, 4L, 1.0), list(4L, 0L, 0.1)), schema = c(\"src\", \"dst\", \"weight\")) clusters <- spark.assignClusters(df, initMode = \"degree\", weightCol = \"weight\") showDF(clusters) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":null,"dir":"Reference","previous_headings":"","what":"PrefixSpan — spark.findFrequentSequentialPatterns","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"parallel PrefixSpan algorithm mine frequent sequential patterns. spark.findFrequentSequentialPatterns returns complete set frequent sequential patterns. details, see PrefixSpan.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"","code":"spark.findFrequentSequentialPatterns(data, ...) # S4 method for SparkDataFrame spark.findFrequentSequentialPatterns( data, minSupport = 0.1, maxPatternLength = 10L, maxLocalProjDBSize = 32000000L, sequenceCol = \"sequence\" )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"data SparkDataFrame. ... additional argument(s) passed method. minSupport Minimal support level. maxPatternLength Maximal pattern length. maxLocalProjDBSize Maximum number items (including delimiters used internal storage format) allowed projected database local processing. sequenceCol name sequence column dataset.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"complete set frequent sequential patterns input sequences itemsets. returned SparkDataFrame contains columns sequence corresponding frequency. schema : sequence: ArrayType(ArrayType(T)), freq: integer T item type","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"spark.findFrequentSequentialPatterns(SparkDataFrame) since 3.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.prefixSpan.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"PrefixSpan — spark.findFrequentSequentialPatterns","text":"","code":"if (FALSE) { df <- createDataFrame(list(list(list(list(1L, 2L), list(3L))), list(list(list(1L), list(3L, 2L), list(1L, 2L))), list(list(list(1L, 2L), list(5L))), list(list(list(6L)))), schema = c(\"sequence\")) frequency <- spark.findFrequentSequentialPatterns(df, minSupport = 0.5, maxPatternLength = 5L, maxLocalProjDBSize = 32000000L) showDF(frequency) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":null,"dir":"Reference","previous_headings":"","what":"Random Forest Model for Regression and Classification — spark.randomForest","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"spark.randomForest fits Random Forest Regression model Classification model SparkDataFrame. Users can call summary get summary fitted Random Forest model, predict make predictions new data, write.ml/read.ml save/load fitted models. details, see Random Forest Regression Random Forest Classification","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"","code":"spark.randomForest(data, formula, ...) # S4 method for SparkDataFrame,formula spark.randomForest( data, formula, type = c(\"regression\", \"classification\"), maxDepth = 5, maxBins = 32, numTrees = 20, impurity = NULL, featureSubsetStrategy = \"auto\", seed = NULL, subsamplingRate = 1, minInstancesPerNode = 1, minInfoGain = 0, checkpointInterval = 10, maxMemoryInMB = 256, cacheNodeIds = FALSE, handleInvalid = c(\"error\", \"keep\", \"skip\"), bootstrap = TRUE ) # S4 method for RandomForestRegressionModel summary(object) # S3 method for summary.RandomForestRegressionModel print(x, ...) # S4 method for RandomForestClassificationModel summary(object) # S3 method for summary.RandomForestClassificationModel print(x, ...) # S4 method for RandomForestRegressionModel predict(object, newData) # S4 method for RandomForestClassificationModel predict(object, newData) # S4 method for RandomForestRegressionModel,character write.ml(object, path, overwrite = FALSE) # S4 method for RandomForestClassificationModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', ':', '+', '-'. ... additional arguments passed method. type type model, one \"regression\" \"classification\", fit maxDepth Maximum depth tree (>= 0). maxBins Maximum number bins used discretizing continuous features choosing split features node. bins give higher granularity. Must >= 2 >= number categories categorical feature. numTrees Number trees train (>= 1). impurity Criterion used information gain calculation. regression, must \"variance\". classification, must one \"entropy\" \"gini\", default \"gini\". featureSubsetStrategy number features consider splits tree node. Supported options: \"auto\" (choose automatically task: numTrees == 1, set \".\" numTrees > 1 (forest), set \"sqrt\" classification \"onethird\" regression), \"\" (use features), \"onethird\" (use 1/3 features), \"sqrt\" (use sqrt(number features)), \"log2\" (use log2(number features)), \"n\": (n range (0, 1.0], use n * number features. n range (1, number features), use n features). Default \"auto\". seed integer seed random number generation. subsamplingRate Fraction training data used learning decision tree, range (0, 1]. minInstancesPerNode Minimum number instances child must split. minInfoGain Minimum information gain split considered tree node. checkpointInterval Param set checkpoint interval (>= 1) disable checkpoint (-1). Note: setting ignored checkpoint directory set. maxMemoryInMB Maximum memory MiB allocated histogram aggregation. cacheNodeIds FALSE, algorithm pass trees executors match instances nodes. TRUE, algorithm cache node IDs instance. Caching can speed training deeper trees. Users can set often cache checkpointed disable setting checkpointInterval. handleInvalid handle invalid data (unseen labels NULL values) features label column string type classification model. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". bootstrap Whether bootstrap samples used building trees. object fitted Random Forest regression model classification model. x summary object Random Forest regression model classification model returned summary. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"spark.randomForest returns fitted Random Forest model. summary returns summary information fitted model, list. list components includes formula (formula), numFeatures (number features), features (list features), featureImportances (feature importances), maxDepth (max depth trees), numTrees (number trees), treeWeights (tree weights). predict returns SparkDataFrame containing predicted labeled column named \"prediction\".","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"spark.randomForest since 2.1.0 summary(RandomForestRegressionModel) since 2.1.0 print.summary.RandomForestRegressionModel since 2.1.0 summary(RandomForestClassificationModel) since 2.1.0 print.summary.RandomForestClassificationModel since 2.1.0 predict(RandomForestRegressionModel) since 2.1.0 predict(RandomForestClassificationModel) since 2.1.0 write.ml(RandomForestRegressionModel, character) since 2.1.0 write.ml(RandomForestClassificationModel, character) since 2.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.randomForest.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Random Forest Model for Regression and Classification — spark.randomForest","text":"","code":"if (FALSE) { # fit a Random Forest Regression Model df <- createDataFrame(longley) model <- spark.randomForest(df, Employed ~ ., type = \"regression\", maxDepth = 5, maxBins = 16) # get the summary of the model summary(model) # make predictions predictions <- predict(model, df) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) # fit a Random Forest Classification Model t <- as.data.frame(Titanic) df <- createDataFrame(t) model <- spark.randomForest(df, Survived ~ Freq + Age, \"classification\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":null,"dir":"Reference","previous_headings":"","what":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"spark.survreg fits accelerated failure time (AFT) survival regression model SparkDataFrame. Users can call summary get summary fitted AFT model, predict make predictions new data, write.ml/read.ml save/load fitted models.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"","code":"spark.survreg(data, formula, ...) # S4 method for SparkDataFrame,formula spark.survreg( data, formula, aggregationDepth = 2, stringIndexerOrderType = c(\"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\") ) # S4 method for AFTSurvivalRegressionModel summary(object) # S4 method for AFTSurvivalRegressionModel predict(object, newData) # S4 method for AFTSurvivalRegressionModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', ':', '+', '-'. Note operator '.' supported currently. ... additional arguments passed method. aggregationDepth depth treeAggregate (greater equal 2). dimensions features number partitions large, param adjusted larger size. expert parameter. Default value good cases. stringIndexerOrderType order categories string feature column. used decide base level string feature last category ordering dropped encoding strings. Supported options \"frequencyDesc\", \"frequencyAsc\", \"alphabetDesc\", \"alphabetAsc\". default value \"frequencyDesc\". ordering set \"alphabetDesc\", drops category R encoding strings. object fitted AFT survival regression model. newData SparkDataFrame testing. path directory model saved. overwrite overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"spark.survreg returns fitted AFT survival regression model. summary returns summary information fitted model, list. list includes model's coefficients (features, coefficients, intercept log(scale)). predict returns SparkDataFrame containing predicted values original scale data (mean predicted value scale = 1.0).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"spark.survreg since 2.0.0 summary(AFTSurvivalRegressionModel) since 2.0.0 predict(AFTSurvivalRegressionModel) since 2.0.0 write.ml(AFTSurvivalRegressionModel, character) since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.survreg.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Accelerated Failure Time (AFT) Survival Regression Model — spark.survreg","text":"","code":"if (FALSE) { df <- createDataFrame(ovarian) model <- spark.survreg(df, Surv(futime, fustat) ~ ecog_ps + rx) # get a summary of the model summary(model) # make predictions predicted <- predict(model, df) showDF(predicted) # save and load the model path <- \"path/to/model\" write.ml(model, path) savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":null,"dir":"Reference","previous_headings":"","what":"Linear SVM Model — spark.svmLinear","title":"Linear SVM Model — spark.svmLinear","text":"Fits linear SVM model SparkDataFrame, similar svm e1071 package. Currently supports binary classification model linear kernel. Users can print, make predictions produced model save model input path.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Linear SVM Model — spark.svmLinear","text":"","code":"spark.svmLinear(data, formula, ...) # S4 method for SparkDataFrame,formula spark.svmLinear( data, formula, regParam = 0, maxIter = 100, tol = 1e-06, standardization = TRUE, threshold = 0, weightCol = NULL, aggregationDepth = 2, handleInvalid = c(\"error\", \"keep\", \"skip\") ) # S4 method for LinearSVCModel predict(object, newData) # S4 method for LinearSVCModel summary(object) # S4 method for LinearSVCModel,character write.ml(object, path, overwrite = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Linear SVM Model — spark.svmLinear","text":"data SparkDataFrame training. formula symbolic description model fitted. Currently formula operators supported, including '~', '.', ':', '+', '-', '*', '^'. ... additional arguments passed method. regParam regularization parameter. supports L2 regularization currently. maxIter Maximum iteration number. tol Convergence tolerance iterations. standardization Whether standardize training features fitting model. coefficients models always returned original scale, transparent users. Note /without standardization, models always converged solution regularization applied. threshold threshold binary classification applied linear model prediction. threshold can real number, Inf make predictions 0.0 -Inf make predictions 1.0. weightCol weight column name. aggregationDepth depth treeAggregate (greater equal 2). dimensions features number partitions large, param adjusted larger size. expert parameter. Default value good cases. handleInvalid handle invalid data (unseen labels NULL values) features label column string type. Supported options: \"skip\" (filter rows invalid data), \"error\" (throw error), \"keep\" (put invalid data special additional bucket, index numLabels). Default \"error\". object LinearSVCModel fitted spark.svmLinear. newData SparkDataFrame testing. path directory model saved. overwrite Overwrites output path already exists. Default FALSE means throw exception output path exists.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Linear SVM Model — spark.svmLinear","text":"spark.svmLinear returns fitted linear SVM model. predict returns predicted values based LinearSVCModel. summary returns summary information fitted model, list. list includes coefficients (coefficients fitted model), numClasses (number classes), numFeatures (number features).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Linear SVM Model — spark.svmLinear","text":"spark.svmLinear since 2.2.0 predict(LinearSVCModel) since 2.2.0 summary(LinearSVCModel) since 2.2.0 write.ml(LogisticRegression, character) since 2.2.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/spark.svmLinear.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Linear SVM Model — spark.svmLinear","text":"","code":"if (FALSE) { sparkR.session() t <- as.data.frame(Titanic) training <- createDataFrame(t) model <- spark.svmLinear(training, Survived ~ ., regParam = 0.5) summary <- summary(model) # fitted values on training data fitted <- predict(model, training) # save fitted model to input path path <- \"path/to/model\" write.ml(model, path) # can also read back the saved model and predict # Note that summary deos not work on loaded model savedModel <- read.ml(path) summary(savedModel) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":null,"dir":"Reference","previous_headings":"","what":"Call Java Methods — sparkR.callJMethod","title":"Call Java Methods — sparkR.callJMethod","text":"Call Java method JVM running Spark driver. return values automatically converted R objects simple objects. values returned \"jobj\" references objects JVM.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call Java Methods — sparkR.callJMethod","text":"","code":"sparkR.callJMethod(x, methodName, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call Java Methods — sparkR.callJMethod","text":"x object invoke method . \"jobj\" created newJObject. methodName method name call. ... parameters pass Java method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call Java Methods — sparkR.callJMethod","text":"return value Java method. Either returned R object can deserialized returned \"jobj\". See details section .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call Java Methods — sparkR.callJMethod","text":"low level function access JVM directly used advanced use cases. arguments return values primitive R types (like integer, numeric, character, lists) automatically translated /Java types (like Integer, Double, String, Array). full list can found serialize.R deserialize.R Apache Spark code base.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Call Java Methods — sparkR.callJMethod","text":"sparkR.callJMethod since 2.0.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJMethod.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call Java Methods — sparkR.callJMethod","text":"","code":"if (FALSE) { sparkR.session() # Need to have a Spark JVM running before calling newJObject # Create a Java ArrayList and populate it jarray <- sparkR.newJObject(\"java.util.ArrayList\") sparkR.callJMethod(jarray, \"add\", 42L) sparkR.callJMethod(jarray, \"get\", 0L) # Will print 42 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":null,"dir":"Reference","previous_headings":"","what":"Call Static Java Methods — sparkR.callJStatic","title":"Call Static Java Methods — sparkR.callJStatic","text":"Call static method JVM running Spark driver. return value automatically converted R objects simple objects. values returned \"jobj\" references objects JVM.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Call Static Java Methods — sparkR.callJStatic","text":"","code":"sparkR.callJStatic(x, methodName, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Call Static Java Methods — sparkR.callJStatic","text":"x fully qualified Java class name contains static method invoke. methodName name static method invoke. ... parameters pass Java method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Call Static Java Methods — sparkR.callJStatic","text":"return value Java method. Either returned R object can deserialized returned \"jobj\". See details section .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Call Static Java Methods — sparkR.callJStatic","text":"low level function access JVM directly used advanced use cases. arguments return values primitive R types (like integer, numeric, character, lists) automatically translated /Java types (like Integer, Double, String, Array). full list can found serialize.R deserialize.R Apache Spark code base.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Call Static Java Methods — sparkR.callJStatic","text":"sparkR.callJStatic since 2.0.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.callJStatic.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Call Static Java Methods — sparkR.callJStatic","text":"","code":"if (FALSE) { sparkR.session() # Need to have a Spark JVM running before calling callJStatic sparkR.callJStatic(\"java.lang.System\", \"currentTimeMillis\") sparkR.callJStatic(\"java.lang.System\", \"getProperty\", \"java.home\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":null,"dir":"Reference","previous_headings":"","what":"Get Runtime Config from the current active SparkSession — sparkR.conf","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"Get Runtime Config current active SparkSession. change SparkSession Runtime Config, please see sparkR.session().","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"","code":"sparkR.conf(key, defaultValue)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"key (optional) key config get, omitted, config returned defaultValue (optional) default value config return config set, omitted, call fails config key set","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"list config values keys names","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"sparkR.conf since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.conf.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get Runtime Config from the current active SparkSession — sparkR.conf","text":"","code":"if (FALSE) { sparkR.session() allConfigs <- sparkR.conf() masterValue <- unlist(sparkR.conf(\"spark.master\")) namedConfig <- sparkR.conf(\"spark.executor.memory\", \"0g\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.init-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Initialize a new Spark Context — sparkR.init","title":"(Deprecated) Initialize a new Spark Context — sparkR.init","text":"function initializes new SparkContext.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.init-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Initialize a new Spark Context — sparkR.init","text":"","code":"sparkR.init( master = \"\", appName = \"SparkR\", sparkHome = Sys.getenv(\"SPARK_HOME\"), sparkEnvir = list(), sparkExecutorEnv = list(), sparkJars = \"\", sparkPackages = \"\" )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.init-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Initialize a new Spark Context — sparkR.init","text":"master Spark master URL appName Application name register cluster manager sparkHome Spark Home directory sparkEnvir Named list environment variables set worker nodes sparkExecutorEnv Named list environment variables used launching executors sparkJars Character vector jar files pass worker nodes sparkPackages Character vector package coordinates","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.init-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Initialize a new Spark Context — sparkR.init","text":"sparkR.init since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.init-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Initialize a new Spark Context — sparkR.init","text":"","code":"if (FALSE) { sc <- sparkR.init(\"local[2]\", \"SparkR\", \"/home/spark\") sc <- sparkR.init(\"local[2]\", \"SparkR\", \"/home/spark\", list(spark.executor.memory=\"1g\")) sc <- sparkR.init(\"yarn-client\", \"SparkR\", \"/home/spark\", list(spark.executor.memory=\"4g\"), list(LD_LIBRARY_PATH=\"/directory of JVM libraries (libjvm.so) on workers/\"), c(\"one.jar\", \"two.jar\", \"three.jar\"), c(\"com.databricks:spark-avro_2.11:2.0.1\")) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":null,"dir":"Reference","previous_headings":"","what":"Create Java Objects — sparkR.newJObject","title":"Create Java Objects — sparkR.newJObject","text":"Create new Java object JVM running Spark driver. return value automatically converted R object simple objects. values returned \"jobj\" reference object JVM.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create Java Objects — sparkR.newJObject","text":"","code":"sparkR.newJObject(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create Java Objects — sparkR.newJObject","text":"x fully qualified Java class name. ... arguments passed constructor.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create Java Objects — sparkR.newJObject","text":"object created. Either returned R object can deserialized returned \"jobj\". See details section .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Create Java Objects — sparkR.newJObject","text":"low level function access JVM directly used advanced use cases. arguments return values primitive R types (like integer, numeric, character, lists) automatically translated /Java types (like Integer, Double, String, Array). full list can found serialize.R deserialize.R Apache Spark code base.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create Java Objects — sparkR.newJObject","text":"sparkR.newJObject since 2.0.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.newJObject.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create Java Objects — sparkR.newJObject","text":"","code":"if (FALSE) { sparkR.session() # Need to have a Spark JVM running before calling newJObject # Create a Java ArrayList and populate it jarray <- sparkR.newJObject(\"java.util.ArrayList\") sparkR.callJMethod(jarray, \"add\", 42L) sparkR.callJMethod(jarray, \"get\", 0L) # Will print 42 }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"SparkSession entry point SparkR. sparkR.session gets existing SparkSession initializes new SparkSession. Additional Spark properties can set ..., named parameters take priority values master, appName, named lists sparkConfig.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"","code":"sparkR.session( master = \"\", appName = \"SparkR\", sparkHome = Sys.getenv(\"SPARK_HOME\"), sparkConfig = list(), sparkJars = \"\", sparkPackages = \"\", enableHiveSupport = TRUE, ... )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"master Spark master URL. appName application name register cluster manager. sparkHome Spark Home directory. sparkConfig named list Spark configuration set worker nodes. sparkJars character vector jar files pass worker nodes. sparkPackages character vector package coordinates enableHiveSupport enable support Hive, fallback built Hive support; set, turned existing session ... named Spark properties passed method.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"called interactive session, method checks Spark installation, , found, downloaded cached automatically. Alternatively, install.spark can called manually. default warehouse created automatically current directory managed table created via sql statement CREATE TABLE, example. change location warehouse, set named parameter spark.sql.warehouse.dir SparkSession. Along warehouse, accompanied metastore may also automatically created current directory new SparkSession initialized enableHiveSupport set TRUE, default. details, refer Hive configuration https://spark.apache.org/docs/latest/sql-programming-guide.html#hive-tables. details initialize use SparkR, refer SparkR programming guide https://spark.apache.org/docs/latest/sparkr.html#starting--sparksession.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"sparkR.session since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the existing SparkSession or initialize a new SparkSession. — sparkR.session","text":"","code":"if (FALSE) { sparkR.session() df <- read.json(path) sparkR.session(\"local[2]\", \"SparkR\", \"/home/spark\") sparkR.session(\"yarn\", \"SparkR\", \"/home/spark\", list(spark.executor.memory=\"4g\", spark.submit.deployMode=\"client\"), c(\"one.jar\", \"two.jar\", \"three.jar\"), c(\"com.databricks:spark-avro_2.12:2.0.1\")) sparkR.session(spark.master = \"yarn\", spark.submit.deployMode = \"client\", spark.executor.memory = \"4g\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.stop.html","id":null,"dir":"Reference","previous_headings":"","what":"Stop the Spark Session and Spark Context — sparkR.session.stop","title":"Stop the Spark Session and Spark Context — sparkR.session.stop","text":"Stop Spark Session Spark Context.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.stop.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Stop the Spark Session and Spark Context — sparkR.session.stop","text":"","code":"sparkR.session.stop() sparkR.stop()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.stop.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Stop the Spark Session and Spark Context — sparkR.session.stop","text":"Also terminates backend R session connected .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.session.stop.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Stop the Spark Session and Spark Context — sparkR.session.stop","text":"sparkR.session.stop since 2.0.0 sparkR.stop since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.uiWebUrl.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","title":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","text":"Get URL SparkUI instance current active SparkSession.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.uiWebUrl.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","text":"","code":"sparkR.uiWebUrl()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.uiWebUrl.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","text":"SparkUI URL, NA disabled, started.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.uiWebUrl.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","text":"sparkR.uiWebUrl since 2.1.1","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.uiWebUrl.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the URL of the SparkUI instance for the current active SparkSession — sparkR.uiWebUrl","text":"","code":"if (FALSE) { sparkR.session() url <- sparkR.uiWebUrl() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.version.html","id":null,"dir":"Reference","previous_headings":"","what":"Get version of Spark on which this application is running — sparkR.version","title":"Get version of Spark on which this application is running — sparkR.version","text":"Get version Spark application running.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.version.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get version of Spark on which this application is running — sparkR.version","text":"","code":"sparkR.version()"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.version.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get version of Spark on which this application is running — sparkR.version","text":"character string Spark version","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.version.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Get version of Spark on which this application is running — sparkR.version","text":"sparkR.version since 2.0.1","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkR.version.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get version of Spark on which this application is running — sparkR.version","text":"","code":"if (FALSE) { sparkR.session() version <- sparkR.version() }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"function creates HiveContext existing JavaSparkContext","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"","code":"sparkRHive.init(jsc = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"jsc existing JavaSparkContext created SparkR.init()","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"Starting SparkR 2.0, SparkSession initialized returned instead. API deprecated kept backward compatibility .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"sparkRHive.init since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRHive.init-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Initialize a new HiveContext — sparkRHive.init","text":"","code":"if (FALSE) { sc <- sparkR.init() sqlContext <- sparkRHive.init(sc) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":null,"dir":"Reference","previous_headings":"","what":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"function creates SparkContext existing JavaSparkContext uses initialize new SQLContext","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"","code":"sparkRSQL.init(jsc = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"jsc existing JavaSparkContext created SparkR.init()","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"Starting SparkR 2.0, SparkSession initialized returned instead. API deprecated kept backward compatibility .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"sparkRSQL.init since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sparkRSQL.init-deprecated.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"(Deprecated) Initialize a new SQLContext — sparkRSQL.init","text":"","code":"if (FALSE) { sc <- sparkR.init() sqlContext <- sparkRSQL.init(sc) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":null,"dir":"Reference","previous_headings":"","what":"SQL Query — sql","title":"SQL Query — sql","text":"Executes SQL query using Spark, returning result SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"SQL Query — sql","text":"","code":"sql(sqlQuery)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"SQL Query — sql","text":"sqlQuery character vector containing SQL query","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"SQL Query — sql","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"SQL Query — sql","text":"sql since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/sql.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"SQL Query — sql","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) createOrReplaceTempView(df, \"table\") new_df <- sql(\"SELECT * FROM table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/startsWith.html","id":null,"dir":"Reference","previous_headings":"","what":"startsWith — startsWith","title":"startsWith — startsWith","text":"Determines entries x start string (entries ) prefix respectively, strings recycled common lengths.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/startsWith.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"startsWith — startsWith","text":"","code":"startsWith(x, prefix) # S4 method for Column startsWith(x, prefix)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/startsWith.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"startsWith — startsWith","text":"x vector character string whose \"starts\" considered prefix character vector (often length one)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/startsWith.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"startsWith — startsWith","text":"startsWith since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/status.html","id":null,"dir":"Reference","previous_headings":"","what":"status — status","title":"status — status","text":"Prints current status query JSON format.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/status.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"status — status","text":"","code":"status(x) # S4 method for StreamingQuery status(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/status.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"status — status","text":"x StreamingQuery.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/status.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"status — status","text":"status(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/status.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"status — status","text":"","code":"if (FALSE) status(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/stopQuery.html","id":null,"dir":"Reference","previous_headings":"","what":"stopQuery — stopQuery","title":"stopQuery — stopQuery","text":"Stops execution query running. method blocks execution stopped.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/stopQuery.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"stopQuery — stopQuery","text":"","code":"stopQuery(x) # S4 method for StreamingQuery stopQuery(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/stopQuery.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"stopQuery — stopQuery","text":"x StreamingQuery.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/stopQuery.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"stopQuery — stopQuery","text":"stopQuery(StreamingQuery) since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/stopQuery.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"stopQuery — stopQuery","text":"","code":"if (FALSE) stopQuery(sq)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/storageLevel.html","id":null,"dir":"Reference","previous_headings":"","what":"StorageLevel — storageLevel","title":"StorageLevel — storageLevel","text":"Get storagelevel SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/storageLevel.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"StorageLevel — storageLevel","text":"","code":"# S4 method for SparkDataFrame storageLevel(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/storageLevel.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"StorageLevel — storageLevel","text":"x SparkDataFrame get storageLevel.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/storageLevel.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"StorageLevel — storageLevel","text":"storageLevel since 2.1.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/storageLevel.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"StorageLevel — storageLevel","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) persist(df, \"MEMORY_AND_DISK\") storageLevel(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/str.html","id":null,"dir":"Reference","previous_headings":"","what":"Compactly display the structure of a dataset — str","title":"Compactly display the structure of a dataset — str","text":"Display structure SparkDataFrame, including column names, column types, well small sample rows.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/str.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Compactly display the structure of a dataset — str","text":"","code":"# S4 method for SparkDataFrame str(object)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/str.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Compactly display the structure of a dataset — str","text":"object SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/str.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Compactly display the structure of a dataset — str","text":"str since 1.6.1","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/str.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Compactly display the structure of a dataset — str","text":"","code":"if (FALSE) { # Create a SparkDataFrame from the Iris dataset irisDF <- createDataFrame(iris) # Show the structure of the SparkDataFrame str(irisDF) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":null,"dir":"Reference","previous_headings":"","what":"structField — structField","title":"structField — structField","text":"Create structField object contains metadata single field schema.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"structField — structField","text":"","code":"structField(x, ...) # S3 method for jobj structField(x, ...) # S3 method for character structField(x, type, nullable = TRUE, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"structField — structField","text":"x name field. ... additional argument(s) passed method. type data type field nullable logical vector indicating whether field nullable","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"structField — structField","text":"structField object.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"structField — structField","text":"structField since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structField.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"structField — structField","text":"","code":"if (FALSE) { field1 <- structField(\"a\", \"integer\") field2 <- structField(\"c\", \"string\") field3 <- structField(\"avg\", \"double\") schema <- structType(field1, field2, field3) df1 <- gapply(df, list(\"a\", \"c\"), function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":null,"dir":"Reference","previous_headings":"","what":"structType — structType","title":"structType — structType","text":"Create structType object contains metadata SparkDataFrame. Intended use createDataFrame toDF.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"structType — structType","text":"","code":"structType(x, ...) # S3 method for jobj structType(x, ...) # S3 method for structField structType(x, ...) # S3 method for character structType(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"structType — structType","text":"x structField object (created structField method). Since Spark 2.3, can DDL-formatted string, comma separated list field definitions, e.g., \"INT, b STRING\". ... additional structField objects","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"structType — structType","text":"structType object","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"structType — structType","text":"structType since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/structType.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"structType — structType","text":"","code":"if (FALSE) { schema <- structType(structField(\"a\", \"integer\"), structField(\"c\", \"string\"), structField(\"avg\", \"double\")) df1 <- gapply(df, list(\"a\", \"c\"), function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) schema <- structType(\"a INT, c STRING, avg DOUBLE\") df1 <- gapply(df, list(\"a\", \"c\"), function(key, x) { y <- data.frame(key, mean(x$b), stringsAsFactors = FALSE) }, schema) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":null,"dir":"Reference","previous_headings":"","what":"Subset — subset","title":"Subset — subset","text":"Return subsets SparkDataFrame according given conditions","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Subset — subset","text":"","code":"subset(x, ...) # S4 method for SparkDataFrame,numericOrcharacter [[(x, i) # S4 method for SparkDataFrame,numericOrcharacter [[(x, i) <- value # S4 method for SparkDataFrame [(x, i, j, ..., drop = F) # S4 method for SparkDataFrame subset(x, subset, select, drop = F, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Subset — subset","text":"x SparkDataFrame. ... currently used. , subset (Optional) logical expression filter rows. extract operator [[ replacement operator [[<-, indexing parameter single Column. value Column atomic vector length 1 literal value, NULL. NULL, specified Column dropped. j, select expression single Column list columns select SparkDataFrame. drop TRUE, Column returned resulting dataset one column. Otherwise, SparkDataFrame always returned.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Subset — subset","text":"new SparkDataFrame containing rows meet condition selected columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Subset — subset","text":"[[ since 1.4.0 [[<- since 2.1.1 [ since 1.4.0 subset since 1.5.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/subset.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Subset — subset","text":"","code":"if (FALSE) { # Columns can be selected using [[ and [ df[[2]] == df[[\"age\"]] df[,2] == df[,\"age\"] df[,c(\"name\", \"age\")] # Or to filter rows df[df$age > 20,] # SparkDataFrame can be subset on both rows and Columns df[df$name == \"Smith\", c(1,2)] df[df$age %in% c(19, 30), 1:2] subset(df, df$age %in% c(19, 30), 1:2) subset(df, df$age %in% c(19), select = c(1,2)) subset(df, select = c(1,2)) # Columns can be selected and set df[[\"age\"]] <- 23 df[[1]] <- df$age df[[2]] <- NULL # drop column }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/substr.html","id":null,"dir":"Reference","previous_headings":"","what":"substr — substr","title":"substr — substr","text":"expression returns substring.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/substr.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"substr — substr","text":"","code":"# S4 method for Column substr(x, start, stop)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/substr.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"substr — substr","text":"x Column. start starting position. 1-base. stop ending position.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/substr.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"substr — substr","text":"substr since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/substr.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"substr — substr","text":"","code":"if (FALSE) { df <- createDataFrame(list(list(a=\"abcdef\"))) collect(select(df, substr(df$a, 1, 4))) # the result is `abcd`. collect(select(df, substr(df$a, 2, 4))) # the result is `bcd`. }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":null,"dir":"Reference","previous_headings":"","what":"summarize — agg","title":"summarize — agg","text":"Aggregates entire SparkDataFrame without groups. resulting SparkDataFrame also contain grouping columns. Compute aggregates specifying list columns","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"summarize — agg","text":"","code":"agg(x, ...) summarize(x, ...) # S4 method for GroupedData agg(x, ...) # S4 method for GroupedData summarize(x, ...) # S4 method for SparkDataFrame agg(x, ...) # S4 method for SparkDataFrame summarize(x, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"summarize — agg","text":"x SparkDataFrame GroupedData. ... arguments passed methods.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"summarize — agg","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"summarize — agg","text":"df2 <- agg(df, = ) df2 <- agg(df, newColName = aggFunction(column))","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"summarize — agg","text":"agg since 1.4.0 summarize since 1.4.0 agg since 1.4.0 summarize since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summarize.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"summarize — agg","text":"","code":"if (FALSE) { df2 <- agg(df, age = \"sum\") # new column name will be created as 'SUM(age#0)' df3 <- agg(df, ageSum = sum(df$age)) # Creates a new column named ageSum df4 <- summarize(df, ageSum = max(df$age)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":null,"dir":"Reference","previous_headings":"","what":"summary — summary","title":"summary — summary","text":"Computes specified statistics numeric string columns. Available statistics : count mean stddev min max arbitrary approximate percentiles specified percentage (e.g., \"75%\") statistics given, function computes count, mean, stddev, min, approximate quartiles (percentiles 25%, 50%, 75%), max. function meant exploratory data analysis, make guarantee backward compatibility schema resulting Dataset. want programmatically compute summary statistics, use agg function instead.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"summary — summary","text":"","code":"summary(object, ...) # S4 method for SparkDataFrame summary(object, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"summary — summary","text":"object SparkDataFrame summarized. ... (optional) statistics computed columns.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"summary — summary","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"summary — summary","text":"summary(SparkDataFrame) since 1.5.0 statistics provided summary change 2.3.0 use describe previous defaults.","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/summary.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"summary — summary","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) summary(df) summary(df, \"min\", \"25%\", \"75%\", \"max\") summary(select(df, \"age\", \"height\")) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":null,"dir":"Reference","previous_headings":"","what":"Table Names — tableNames","title":"Table Names — tableNames","text":"Returns names tables given database array.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Table Names — tableNames","text":"","code":"tableNames(databaseName = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Table Names — tableNames","text":"databaseName (optional) name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Table Names — tableNames","text":"list table names","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Table Names — tableNames","text":"tableNames since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableNames.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Table Names — tableNames","text":"","code":"if (FALSE) { sparkR.session() tableNames(\"hive\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":null,"dir":"Reference","previous_headings":"","what":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"Returns specified table view SparkDataFrame. table view must already exist already registered SparkSession.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"","code":"tableToDF(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"tableName qualified unqualified name designates table view. database specified, identifies table/view database. Otherwise, first attempts find temporary view given name match table/view current database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"tableToDF since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tableToDF.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create a SparkDataFrame from a SparkSQL table or view — tableToDF","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) createOrReplaceTempView(df, \"table\") new_df <- tableToDF(\"table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":null,"dir":"Reference","previous_headings":"","what":"Tables — tables","title":"Tables — tables","text":"Returns SparkDataFrame containing names tables given database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Tables — tables","text":"","code":"tables(databaseName = NULL)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Tables — tables","text":"databaseName (optional) name database","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Tables — tables","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Tables — tables","text":"tables since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/tables.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Tables — tables","text":"","code":"if (FALSE) { sparkR.session() tables(\"hive\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/take.html","id":null,"dir":"Reference","previous_headings":"","what":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","title":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","text":"Take first NUM rows SparkDataFrame return results R data.frame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/take.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","text":"","code":"take(x, num) # S4 method for SparkDataFrame,numeric take(x, num)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/take.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","text":"x SparkDataFrame. num number rows take.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/take.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","text":"take since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/take.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Take the first NUM rows of a SparkDataFrame and return the results as a R data.frame — take","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) take(df, 2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":null,"dir":"Reference","previous_headings":"","what":"toJSON — toJSON","title":"toJSON — toJSON","text":"Converts SparkDataFrame SparkDataFrame JSON string. row turned JSON document columns different fields. returned SparkDataFrame single character column name value","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"toJSON — toJSON","text":"","code":"# S4 method for SparkDataFrame toJSON(x)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"toJSON — toJSON","text":"x SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"toJSON — toJSON","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"toJSON — toJSON","text":"toJSON since 2.2.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/toJSON.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"toJSON — toJSON","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.parquet\" df <- read.parquet(path) df_json <- toJSON(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":null,"dir":"Reference","previous_headings":"","what":"Uncache Table — uncacheTable","title":"Uncache Table — uncacheTable","text":"Removes specified table -memory cache.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Uncache Table — uncacheTable","text":"","code":"uncacheTable(tableName)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Uncache Table — uncacheTable","text":"tableName qualified unqualified name designates table. database identifier provided, refers table current database.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Uncache Table — uncacheTable","text":"SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Uncache Table — uncacheTable","text":"uncacheTable since 1.4.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/uncacheTable.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Uncache Table — uncacheTable","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) createOrReplaceTempView(df, \"table\") uncacheTable(\"table\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":null,"dir":"Reference","previous_headings":"","what":"Return a new SparkDataFrame containing the union of rows — union","title":"Return a new SparkDataFrame containing the union of rows — union","text":"Return new SparkDataFrame containing union rows SparkDataFrame another SparkDataFrame. equivalent UNION SQL. Input SparkDataFrames can different schemas (names data types).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return a new SparkDataFrame containing the union of rows — union","text":"","code":"union(x, y) # S4 method for SparkDataFrame,SparkDataFrame union(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return a new SparkDataFrame containing the union of rows — union","text":"x SparkDataFrame y SparkDataFrame","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return a new SparkDataFrame containing the union of rows — union","text":"SparkDataFrame containing result union.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Return a new SparkDataFrame containing the union of rows — union","text":"Note: remove duplicate rows across two SparkDataFrames. Also standard SQL, function resolves columns position (name).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Return a new SparkDataFrame containing the union of rows — union","text":"union since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/union.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return a new SparkDataFrame containing the union of rows — union","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) unioned <- union(df, df2) unions <- rbind(df, df2, df3, df4) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":null,"dir":"Reference","previous_headings":"","what":"Return a new SparkDataFrame containing the union of rows. — unionAll","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"alias union.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"","code":"unionAll(x, y) # S4 method for SparkDataFrame,SparkDataFrame unionAll(x, y)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"x SparkDataFrame. y SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"SparkDataFrame containing result unionAll operation.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"unionAll since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionAll.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return a new SparkDataFrame containing the union of rows. — unionAll","text":"","code":"if (FALSE) { sparkR.session() df1 <- read.json(path) df2 <- read.json(path2) unionAllDF <- unionAll(df1, df2) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":null,"dir":"Reference","previous_headings":"","what":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"Return new SparkDataFrame containing union rows SparkDataFrame another SparkDataFrame. different union function, UNION UNION DISTINCT SQL column positions taken account. Input SparkDataFrames can different data types schema.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"","code":"unionByName(x, y, ...) # S4 method for SparkDataFrame,SparkDataFrame unionByName(x, y, allowMissingColumns = FALSE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"x SparkDataFrame y SparkDataFrame ... arguments passed methods. allowMissingColumns logical","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"SparkDataFrame containing result union.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"parameter allowMissingColumns `TRUE`, set column names x y can differ; missing columns filled null. , missing columns x added end schema union result. Note: remove duplicate rows across two SparkDataFrames. function resolves columns name (position).","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"unionByName since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unionByName.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Return a new SparkDataFrame containing the union of rows, matched by column names — unionByName","text":"","code":"if (FALSE) { sparkR.session() df1 <- select(createDataFrame(mtcars), \"carb\", \"am\", \"gear\") df2 <- select(createDataFrame(mtcars), \"am\", \"gear\", \"carb\") head(unionByName(df1, df2)) df3 <- select(createDataFrame(mtcars), \"carb\") head(unionByName(df1, df3, allowMissingColumns = TRUE)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unpersist.html","id":null,"dir":"Reference","previous_headings":"","what":"Unpersist — unpersist","title":"Unpersist — unpersist","text":"Mark SparkDataFrame non-persistent, remove blocks memory disk.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unpersist.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Unpersist — unpersist","text":"","code":"unpersist(x, ...) # S4 method for SparkDataFrame unpersist(x, blocking = TRUE)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unpersist.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Unpersist — unpersist","text":"x SparkDataFrame unpersist. ... arguments passed methods. blocking whether block blocks deleted.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unpersist.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Unpersist — unpersist","text":"unpersist since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unpersist.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Unpersist — unpersist","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) persist(df, \"MEMORY_AND_DISK\") unpersist(df) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unresolved_named_lambda_var.html","id":null,"dir":"Reference","previous_headings":"","what":"Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable,\nconvert it to o.s.sql.Column and wrap with R Column.\nUsed by higher order functions. — unresolved_named_lambda_var","title":"Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable,\nconvert it to o.s.sql.Column and wrap with R Column.\nUsed by higher order functions. — unresolved_named_lambda_var","text":"Create o..s.sql.expressions.UnresolvedNamedLambdaVariable, convert o.s.sql.Column wrap R Column. Used higher order functions.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unresolved_named_lambda_var.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable,\nconvert it to o.s.sql.Column and wrap with R Column.\nUsed by higher order functions. — unresolved_named_lambda_var","text":"","code":"unresolved_named_lambda_var(...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unresolved_named_lambda_var.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable,\nconvert it to o.s.sql.Column and wrap with R Column.\nUsed by higher order functions. — unresolved_named_lambda_var","text":"... character length = 1 length(...) > 1 argument interpreted nested Column, example unresolved_named_lambda_var(\"\", \"b\", \"c\") yields unresolved .b.c","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/unresolved_named_lambda_var.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable,\nconvert it to o.s.sql.Column and wrap with R Column.\nUsed by higher order functions. — unresolved_named_lambda_var","text":"Column object wrapping JVM UnresolvedNamedLambdaVariable","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowOrderBy.html","id":null,"dir":"Reference","previous_headings":"","what":"windowOrderBy — windowOrderBy","title":"windowOrderBy — windowOrderBy","text":"Creates WindowSpec ordering defined.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowOrderBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"windowOrderBy — windowOrderBy","text":"","code":"windowOrderBy(col, ...) # S4 method for character windowOrderBy(col, ...) # S4 method for Column windowOrderBy(col, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowOrderBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"windowOrderBy — windowOrderBy","text":"col column name Column rows ordered within windows. ... Optional column names Columns addition col, rows ordered within windows.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowOrderBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"windowOrderBy — windowOrderBy","text":"windowOrderBy(character) since 2.0.0 windowOrderBy(Column) since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowOrderBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"windowOrderBy — windowOrderBy","text":"","code":"if (FALSE) { ws <- windowOrderBy(\"key1\", \"key2\") df1 <- select(df, over(lead(\"value\", 1), ws)) ws <- windowOrderBy(df$key1, df$key2) df1 <- select(df, over(lead(\"value\", 1), ws)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowPartitionBy.html","id":null,"dir":"Reference","previous_headings":"","what":"windowPartitionBy — windowPartitionBy","title":"windowPartitionBy — windowPartitionBy","text":"Creates WindowSpec partitioning defined.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowPartitionBy.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"windowPartitionBy — windowPartitionBy","text":"","code":"windowPartitionBy(col, ...) # S4 method for character windowPartitionBy(col, ...) # S4 method for Column windowPartitionBy(col, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowPartitionBy.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"windowPartitionBy — windowPartitionBy","text":"col column name Column rows partitioned windows. ... Optional column names Columns addition col, rows partitioned windows.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowPartitionBy.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"windowPartitionBy — windowPartitionBy","text":"windowPartitionBy(character) since 2.0.0 windowPartitionBy(Column) since 2.0.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/windowPartitionBy.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"windowPartitionBy — windowPartitionBy","text":"","code":"if (FALSE) { ws <- orderBy(windowPartitionBy(\"key1\", \"key2\"), \"key3\") df1 <- select(df, over(lead(\"value\", 1), ws)) ws <- orderBy(windowPartitionBy(df$key1, df$key2), df$key3) df1 <- select(df, over(lead(\"value\", 1), ws)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/with.html","id":null,"dir":"Reference","previous_headings":"","what":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","title":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","text":"Evaluate R expression environment constructed SparkDataFrame () allows access columns SparkDataFrame simply referring name. appends every column SparkDataFrame new environment. , given expression evaluated new environment.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/with.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","text":"","code":"with(data, expr, ...) # S4 method for SparkDataFrame with(data, expr, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/with.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","text":"data (SparkDataFrame) SparkDataFrame use constructing environment. expr (expression) Expression evaluate. ... arguments passed future methods.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/with.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","text":"since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/with.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Evaluate a R expression in an environment constructed from a SparkDataFrame — with","text":"","code":"if (FALSE) { with(irisDf, nrow(Sepal_Width)) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":null,"dir":"Reference","previous_headings":"","what":"WithColumn — withColumn","title":"WithColumn — withColumn","text":"Return new SparkDataFrame adding column replacing existing column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"WithColumn — withColumn","text":"","code":"withColumn(x, colName, col) # S4 method for SparkDataFrame,character withColumn(x, colName, col)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"WithColumn — withColumn","text":"x SparkDataFrame. colName column name. col Column expression (must refer SparkDataFrame), atomic vector length 1 literal value.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"WithColumn — withColumn","text":"SparkDataFrame new column added existing column replaced.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"WithColumn — withColumn","text":"Note: method introduces projection internally. Therefore, calling multiple times, instance, via loops order add multiple columns can generate big plans can cause performance issues even StackOverflowException. avoid , use select multiple columns .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"WithColumn — withColumn","text":"withColumn since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withColumn.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"WithColumn — withColumn","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) newDF <- withColumn(df, \"newCol\", df$col1 * 5) # Replace an existing column newDF2 <- withColumn(newDF, \"newCol\", newDF$col1) newDF3 <- withColumn(newDF, \"newCol\", 42) # Use extract operator to set an existing or new column df[[\"age\"]] <- 23 df[[2]] <- df$col1 df[[2]] <- NULL # drop column }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withField.html","id":null,"dir":"Reference","previous_headings":"","what":"withField — withField","title":"withField — withField","text":"Adds/replaces field struct Column name.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withField.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"withField — withField","text":"","code":"withField(x, fieldName, col) # S4 method for Column,character,Column withField(x, fieldName, col)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withField.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"withField — withField","text":"x Column fieldName character col Column expression","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withField.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"withField — withField","text":"withField since 3.1.0","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withField.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"withField — withField","text":"","code":"if (FALSE) { df <- withColumn( createDataFrame(iris), \"sepal\", struct(column(\"Sepal_Width\"), column(\"Sepal_Length\")) ) head(select( df, withField(df$sepal, \"product\", df$Sepal_Length * df$Sepal_Width) )) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":null,"dir":"Reference","previous_headings":"","what":"withWatermark — withWatermark","title":"withWatermark — withWatermark","text":"Defines event time watermark streaming SparkDataFrame. watermark tracks point time assume late data going arrive.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"withWatermark — withWatermark","text":"","code":"withWatermark(x, eventTime, delayThreshold) # S4 method for SparkDataFrame,character,character withWatermark(x, eventTime, delayThreshold)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"withWatermark — withWatermark","text":"x streaming SparkDataFrame eventTime string specifying name Column contains event time row. delayThreshold string specifying minimum delay wait data arrive late, relative latest record processed form interval (e.g. \"1 minute\" \"5 hours\"). NOTE: negative.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"withWatermark — withWatermark","text":"SparkDataFrame.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"withWatermark — withWatermark","text":"Spark use watermark several purposes: know given time window aggregation can finalized thus can emitted using output modes allow updates. minimize amount state need keep -going aggregations. current watermark computed looking MAX(eventTime) seen across partitions query minus user specified delayThreshold. Due cost coordinating value across partitions, actual watermark used guaranteed least delayThreshold behind actual event time. cases may still process records arrive delayThreshold late.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"withWatermark — withWatermark","text":"withWatermark since 2.3.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/withWatermark.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"withWatermark — withWatermark","text":"","code":"if (FALSE) { sparkR.session() schema <- structType(structField(\"time\", \"timestamp\"), structField(\"value\", \"double\")) df <- read.stream(\"json\", path = jsonDir, schema = schema, maxFilesPerTrigger = 1) df <- withWatermark(df, \"time\", \"10 minutes\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the contents of SparkDataFrame to a data source. — write.df","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"data source specified source set options (...). source specified, default data source configured spark.sql.sources.default used.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"","code":"write.df(df, path = NULL, ...) saveDF(df, path, source = NULL, mode = \"error\", ...) write.df(df, path = NULL, ...) # S4 method for SparkDataFrame write.df( df, path = NULL, source = NULL, mode = \"error\", partitionBy = NULL, ... ) # S4 method for SparkDataFrame,character saveDF(df, path, source = NULL, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"df SparkDataFrame. path name table. ... additional argument(s) passed method. source name external data source. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default) partitionBy name list names columns partition output file system. specified, output laid file system similar Hive's partitioning scheme.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"Additionally, mode used specify behavior save operation data already exists data source. four modes: 'append': Contents SparkDataFrame expected appended existing data. 'overwrite': Existing data expected overwritten contents SparkDataFrame. 'error' 'errorifexists': exception expected thrown. 'ignore': save operation expected save contents SparkDataFrame change existing data.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"write.df since 1.4.0 saveDF since 1.4.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.df.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the contents of SparkDataFrame to a data source. — write.df","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) write.df(df, \"myfile\", \"parquet\", \"overwrite\", partitionBy = c(\"col1\", \"col2\")) saveDF(df, parquetPath2, \"parquet\", mode = \"append\", mergeSchema = TRUE) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"Save content SparkDataFrame external database table via JDBC. Additional JDBC database connection properties can set (...) can find JDBC-specific option parameter documentation writing tables via JDBC https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html#data-source-option Data Source Option version use.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"","code":"write.jdbc(x, url, tableName, mode = \"error\", ...) # S4 method for SparkDataFrame,character,character write.jdbc(x, url, tableName, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"x SparkDataFrame. url JDBC database url form jdbc:subprotocol:subname. tableName name table external database. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default) ... additional JDBC database connection properties.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"Also, mode used specify behavior save operation data already exists data source. four modes: 'append': Contents SparkDataFrame expected appended existing data. 'overwrite': Existing data expected overwritten contents SparkDataFrame. 'error' 'errorifexists': exception expected thrown. 'ignore': save operation expected save contents SparkDataFrame change existing data.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"write.jdbc since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.jdbc.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the content of SparkDataFrame to an external database table via JDBC. — write.jdbc","text":"","code":"if (FALSE) { sparkR.session() jdbcUrl <- \"jdbc:mysql://localhost:3306/databasename\" write.jdbc(df, jdbcUrl, \"table\", user = \"username\", password = \"password\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.json.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the contents of SparkDataFrame as a JSON file — write.json","title":"Save the contents of SparkDataFrame as a JSON file — write.json","text":"Save contents SparkDataFrame JSON file ( JSON Lines text format newline-delimited JSON). Files written method can read back SparkDataFrame using read.json().","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.json.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the contents of SparkDataFrame as a JSON file — write.json","text":"","code":"write.json(x, path, ...) # S4 method for SparkDataFrame,character write.json(x, path, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.json.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the contents of SparkDataFrame as a JSON file — write.json","text":"x SparkDataFrame path directory file saved ... additional argument(s) passed method. can find JSON-specific options writing JSON files https://spark.apache.org/docs/latest/sql-data-sources-json.html#data-source-optionData Source Option version use. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.json.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the contents of SparkDataFrame as a JSON file — write.json","text":"write.json since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.json.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the contents of SparkDataFrame as a JSON file — write.json","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) write.json(df, \"/tmp/sparkr-tmp/\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.ml.html","id":null,"dir":"Reference","previous_headings":"","what":"Saves the MLlib model to the input path — write.ml","title":"Saves the MLlib model to the input path — write.ml","text":"Saves MLlib model input path. information, see specific MLlib model .","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.ml.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Saves the MLlib model to the input path — write.ml","text":"","code":"write.ml(object, path, ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.ml.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Saves the MLlib model to the input path — write.ml","text":"object fitted ML model object. path directory model saved. ... additional argument(s) passed method.","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.orc.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","title":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","text":"Save contents SparkDataFrame ORC file, preserving schema. Files written method can read back SparkDataFrame using read.orc().","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.orc.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","text":"","code":"write.orc(x, path, ...) # S4 method for SparkDataFrame,character write.orc(x, path, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.orc.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","text":"x SparkDataFrame path directory file saved ... additional argument(s) passed method. can find ORC-specific options writing ORC files https://spark.apache.org/docs/latest/sql-data-sources-orc.html#data-source-optionData Source Option version use. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.orc.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","text":"write.orc since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.orc.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the contents of SparkDataFrame as an ORC file, preserving the schema. — write.orc","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) write.orc(df, \"/tmp/sparkr-tmp1/\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.parquet.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","title":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","text":"Save contents SparkDataFrame Parquet file, preserving schema. Files written method can read back SparkDataFrame using read.parquet().","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.parquet.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","text":"","code":"write.parquet(x, path, ...) # S4 method for SparkDataFrame,character write.parquet(x, path, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.parquet.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","text":"x SparkDataFrame path directory file saved ... additional argument(s) passed method. can find Parquet-specific options writing Parquet files https://spark.apache.org/docs/latest/sql-data-sources-parquet.html#data-source-optionData Source Option version use. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.parquet.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","text":"write.parquet since 1.6.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.parquet.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the contents of SparkDataFrame as a Parquet file, preserving the schema. — write.parquet","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.json\" df <- read.json(path) write.parquet(df, \"/tmp/sparkr-tmp1/\") }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":null,"dir":"Reference","previous_headings":"","what":"Write the streaming SparkDataFrame to a data source. — write.stream","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"data source specified source set options (...). source specified, default data source configured spark.sql.sources.default used.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"","code":"write.stream(df, source = NULL, outputMode = NULL, ...) # S4 method for SparkDataFrame write.stream( df, source = NULL, outputMode = NULL, partitionBy = NULL, trigger.processingTime = NULL, trigger.once = NULL, ... )"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"df streaming SparkDataFrame. source name external data source. outputMode one 'append', 'complete', 'update'. ... additional external data source specific named options. partitionBy name list names columns partition output file system. specified, output laid file system similar Hive's partitioning scheme. trigger.processingTime processing time interval string, e.g. '5 seconds', '1 minute'. trigger runs query periodically based processing time. value '0 seconds', query run fast possible, default. one trigger can set. trigger.logical, must set TRUE. trigger processes one batch data streaming query terminates query. one trigger can set.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":"details","dir":"Reference","previous_headings":"","what":"Details","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"Additionally, outputMode specifies data streaming SparkDataFrame written output data source. three modes: append: new rows streaming SparkDataFrame written . output mode can used queries contain aggregation. complete: rows streaming SparkDataFrame written every time updates. output mode can used queries contain aggregations. update: rows updated streaming SparkDataFrame written every time updates. query contain aggregations, equivalent append mode.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"write.stream since 2.2.0 experimental","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.stream.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Write the streaming SparkDataFrame to a data source. — write.stream","text":"","code":"if (FALSE) { sparkR.session() df <- read.stream(\"socket\", host = \"localhost\", port = 9999) isStreaming(df) wordCounts <- count(group_by(df, \"value\")) # console q <- write.stream(wordCounts, \"console\", outputMode = \"complete\") # text stream q <- write.stream(df, \"text\", path = \"/home/user/out\", checkpointLocation = \"/home/user/cp\", partitionBy = c(\"year\", \"month\"), trigger.processingTime = \"30 seconds\") # memory stream q <- write.stream(wordCounts, \"memory\", queryName = \"outs\", outputMode = \"complete\") head(sql(\"SELECT * from outs\")) queryName(q) stopQuery(q) }"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.text.html","id":null,"dir":"Reference","previous_headings":"","what":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","title":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","text":"Save content SparkDataFrame text file specified path. SparkDataFrame must one column string type name \"value\". row becomes new line output file. text files encoded UTF-8.","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.text.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","text":"","code":"write.text(x, path, ...) # S4 method for SparkDataFrame,character write.text(x, path, mode = \"error\", ...)"},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.text.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","text":"x SparkDataFrame path directory file saved ... additional argument(s) passed method. can find text-specific options writing text files https://spark.apache.org/docs/latest/sql-data-sources-text.html#data-source-optionData Source Option version use. mode one 'append', 'overwrite', 'error', 'errorifexists', 'ignore' save mode ('error' default)","code":""},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.text.html","id":"note","dir":"Reference","previous_headings":"","what":"Note","title":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","text":"write.text since 2.0.0","code":""},{"path":[]},{"path":"https://spark.apache.org/docs/3.3.4/api/R/reference/write.text.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Save the content of SparkDataFrame in a text file at the specified path. — write.text","text":"","code":"if (FALSE) { sparkR.session() path <- \"path/to/file.txt\" df <- read.text(path) write.text(df, \"/tmp/sparkr-tmp/\") }"}]