Summary: Nested Classes | Ctors | Methods | Inherited Methods | [Expand All]

public class

DataFrame

extends Object
implements Iterable<T>

java.lang.Object
↳	joinery.DataFrame<V>

Class Overview

A data frame implementation in the spirit of Pandas or R data frames.

Below is a simple motivating example. When working in Java, data operations like the following should be easy. The code below retrieves the S&P 500 daily market data for 2008 from Yahoo! Finance and returns the average monthly close for the three top months of the year.

 > DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("gspc.csv"))
 >     .retain("Date", "Close")
 >     .groupBy(row -> Date.class.cast(row.get(0)).getMonth())
 >     .mean()
 >     .sortBy("Close")
 >     .tail(3)
 >     .apply(value -> Number.class.cast(value).intValue())
 >     .col("Close");
 [1370, 1378, 1403]

Taking each step in turn:

readCsv(String) reads csv data from files and urls
retain(Object) is used to eliminate columns that are not needed
groupBy(KeyFunction) with a key function is used to group the rows by month
mean() calculates the average close for each month
sortBy(Object) orders the rows according to average closing price
tail(int) returns the last three rows (alternatively, sort in descending order and use head)
apply(Function) is used to convert the closing prices to integers (this is purely to ease comparisons for verifying the results
finally, col(Object) is used to extract the values as a list

Find more details on the github project page.

Summary

Nested Classes
interface	DataFrame.Aggregate<I, O>	A function that converts lists of `data frame` values to aggregate results.
enum	DataFrame.Axis	An enumeration of data frame axes.
interface	DataFrame.Function<I, O>	A function that is applied to objects (rows or values) in a `data frame`.
enum	DataFrame.JoinType	An enumeration of join types for joining data frames together.
interface	DataFrame.KeyFunction<I>	A function that converts `data frame` rows to index or group keys.
enum	DataFrame.NumberDefault
enum	DataFrame.PlotType	An enumeration of plot types for displaying data frames with charts.
interface	DataFrame.Predicate<I>	An interface used to filter a `data frame`.
interface	DataFrame.RowFunction<I, O>
enum	DataFrame.SortDirection

Public Constructors
	DataFrame() Construct an empty data frame.
	DataFrame(String... columns) Construct an empty data frame with the specified columns.
	DataFrame(Collection<?> columns) Construct an empty data frame with the specified columns.
	DataFrame(Collection<?> index, Collection<?> columns) Construct a data frame containing the specified rows and columns.
	DataFrame(List<? extends List<? extends V>> data) Construct a data frame from the specified list of columns.
	DataFrame(Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data) Construct a new data frame using the specified data and indices.

Public Methods
DataFrame<V>	add(Object column, List<V> values) Add a new column to the data frame containing the value provided.
DataFrame<V>	add(Object column, Function<List<V>, V> function) Add the results of applying a row-wise function to the data frame as a new column.
DataFrame<V>	add(List<V> values) Add the list of values as a new column.
DataFrame<V>	add(Object... columns) Add new columns to the data frame.
<U> DataFrame<V>	aggregate(Aggregate<V, U> function) Apply an aggregate function to each group or the entire data frame if the data is not grouped.
DataFrame<V>	append(Object name, List<? extends V> row) Append rows indexed by the the specified name to the data frame.
DataFrame<V>	append(Object name, V[] row)
DataFrame<V>	append(List<? extends V> row) Append rows to the data frame.
<U> DataFrame<U>	apply(Function<V, U> function) Apply a function to each value in the data frame.
<T> DataFrame<T>	cast(Class<T> cls) Cast this data frame to the specified type.
final DataFrame<V>	coalesce(DataFrame...<? extends V> others) Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.
List<V>	col(Object column) Return a data frame column as a list.
List<V>	col(Integer column) Return a data frame column as a list.
DataFrame<V>	collapse()
Set<Object>	columns() Return the column names for the data frame.
final static <V> DataFrame<String>	compare(DataFrame<V> df1, DataFrame<V> df2)
final DataFrame<V>	concat(DataFrame...<? extends V> others) Concatenate the specified data frames with this data frame and return the result.
final DataFrame<V>	convert(Class...<? extends V> columnTypes) Convert columns based on the requested types.
DataFrame<V>	convert(DataFrame.NumberDefault numDefault, String naString)
DataFrame<V>	convert() Attempt to infer better types for object columns.
DataFrame<V>	count()
DataFrame<Number>	cov()
DataFrame<V>	cummax()
DataFrame<V>	cummin()
DataFrame<V>	cumprod()
DataFrame<V>	cumsum()
DataFrame<V>	describe()
DataFrame<V>	diff(int period)
DataFrame<V>	diff()
final void	draw(Container container, DataFrame.PlotType type) Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.
final void	draw(Container container) Draw the numeric columns of this data frame as a chart in the specified Container.
DataFrame<V>	drop(Integer... cols) Create a new data frame by leaving out the specified columns.
DataFrame<V>	drop(Object... cols) Create a new data frame by leaving out the specified columns.
DataFrame<V>	dropna()
DataFrame<V>	dropna(DataFrame.Axis direction)
Map<Object, DataFrame<V>>	explode() Return a map of group names to data frame for grouped data frames.
DataFrame<V>	fillna(V fill) Returns a view of the of data frame with NA's replaced with `fill`.
List<V>	flatten() Return the values of the data frame as a flat list.
V	get(Integer row, Integer col) Return the value located by the (row, column) coordinates.
V	get(Object row, Object col) Return the value located by the (row, column) names.
DataFrame<V>	groupBy(Object... cols) Group the data frame rows by the specified column names.
DataFrame<V>	groupBy(Integer... cols) Group the data frame rows by the specified columns.
DataFrame<V>	groupBy(KeyFunction<V> function) Group the data frame rows using the specified key function.
Grouping	groups()
DataFrame<V>	head(int limit) Return a data frame containing the first `limit` rows of this data frame.
DataFrame<V>	head() Return a data frame containing the first ten rows of this data frame.
Set<Object>	index() Return the index names for the data frame.
boolean	isEmpty() Return `true` if the data frame contains no data.
DataFrame<Boolean>	isnull() Create a new data frame containing boolean values such that `null` object references in the original data frame yield `true` and valid references yield `false`.
ListIterator<List<V>>	iterator() Return an iterator over the rows of the data frame.
ListIterator<List<V>>	itercols()
ListIterator<Map<Object, V>>	itermap()
ListIterator<List<V>>	iterrows()
ListIterator<V>	itervalues()
final DataFrame<V>	join(DataFrame<V> other) Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.
final DataFrame<V>	join(DataFrame<V> other, KeyFunction<V> on) Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.
final DataFrame<V>	join(DataFrame<V> other, DataFrame.JoinType join) Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.
final DataFrame<V>	join(DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on) Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.
final DataFrame<V>	joinOn(DataFrame<V> other, DataFrame.JoinType join, Integer... cols) Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
final DataFrame<V>	joinOn(DataFrame<V> other, Integer... cols) Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V>	joinOn(DataFrame<V> other, Object... cols) Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V>	joinOn(DataFrame<V> other, DataFrame.JoinType join, Object... cols) Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
DataFrame<V>	kurt()
int	length() Return the length (number of rows) of the data frame.
final static void	main(String[] args) Entry point to joinery as a command line tool.
Map<V, List<V>>	map(Object key, Object value)
Map<Object, List<V>>	map() Return a map of index names to rows.
Map<V, List<V>>	map(Integer key, Integer value)
DataFrame<V>	max()
DataFrame<V>	mean() Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V>	median()
final DataFrame<V>	merge(DataFrame<V> other, DataFrame.JoinType join) Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.
final DataFrame<V>	merge(DataFrame<V> other) Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.
DataFrame<V>	min()
DataFrame<V>	mode()
DataFrame<V>	nonnumeric() Return a data frame containing only columns with non-numeric data.
DataFrame<Boolean>	notnull() Create a new data frame containing boolean values such that valid object references in the original data frame yield `true` and `null` references yield `false`.
DataFrame<Number>	numeric() Return a data frame containing only columns with numeric data.
DataFrame<V>	percentChange()
DataFrame<V>	percentChange(int period)
DataFrame<V>	percentile(double quantile) Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V>	pivot(Object row, Object col, Object... values)
DataFrame<V>	pivot(Integer row, Integer col, Integer... values)
DataFrame<V>	pivot(Integer[] rows, Integer[] cols, Integer[] values)
<U> DataFrame<U>	pivot(KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)
DataFrame<V>	pivot(List<Object> rows, List<Object> cols, List<Object> values)
final void	plot(DataFrame.PlotType type) Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.
final void	plot() Display the numeric columns of this data frame as a line chart in a new swing frame.
DataFrame<V>	prod() Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.
final static DataFrame<Object>	readCsv(String file) Read the specified csv file and return the data as a data frame.
final static DataFrame<Object>	readCsv(InputStream input, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object>	readCsv(InputStream input, String separator)
final static DataFrame<Object>	readCsv(String file, String separator, DataFrame.NumberDefault longDefault, String naString)
final static DataFrame<Object>	readCsv(InputStream input) Read csv records from an input stream and return the data as a data frame.
final static DataFrame<Object>	readCsv(String file, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object>	readCsv(String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)
final static DataFrame<Object>	readCsv(InputStream input, String separator, String naString, boolean hasHeader)
final static DataFrame<Object>	readCsv(String file, String separator)
final static DataFrame<Object>	readCsv(String file, String separator, String naString, boolean hasHeader)
final static DataFrame<Object>	readCsv(InputStream input, String separator, String naString)
final static DataFrame<Object>	readSql(ResultSet rs) Read data from the provided query results into a new data frame.
final static DataFrame<Object>	readSql(Connection c, String sql) Execute the SQL query and return the results as a new data frame.
final static DataFrame<Object>	readXls(String file) Read data from the specified excel workbook into a new data frame.
final static DataFrame<Object>	readXls(InputStream input) Read data from the input stream as an excel workbook into a new data frame.
DataFrame<V>	reindex(Integer col, boolean drop) Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.
DataFrame<V>	reindex(Integer... cols) Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.
DataFrame<V>	reindex(Integer[] cols, boolean drop) Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.
DataFrame<V>	reindex(Object... cols) Re-index the rows of the data frame using the specified column names and removing the columns from the data.
DataFrame<V>	reindex(Object[] cols, boolean drop) Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.
DataFrame<V>	reindex(Object col, boolean drop) Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.
DataFrame<V>	rename(Map<Object, Object> names)
DataFrame<V>	rename(Object old, Object name)
DataFrame<V>	resetIndex() Return a new data frame with the default index, rows names will be reset to the string value of their integer index.
DataFrame<V>	reshape(Integer rows, Integer cols) Reshape a data frame to the specified dimensions.
DataFrame<V>	reshape(Collection<?> rows, Collection<?> cols) Reshape a data frame to the specified indices.
DataFrame<V>	retain(Integer... cols) Create a new data frame containing only the specified columns.
DataFrame<V>	retain(Object... cols) Create a new data frame containing only the specified columns.
DataFrame<V>	rollapply(Function<List<V>, V> function)
DataFrame<V>	rollapply(Function<List<V>, V> function, int period)
List<V>	row(Object row) Return a data frame row as a list.
List<V>	row(Integer row) Return a data frame row as a list.
DataFrame<V>	select(Predicate<V> predicate) Select a subset of the data frame using a predicate function.
void	set(Object row, Object col, V value) Set the value located by the names (row, column).
void	set(Integer row, Integer col, V value) Set the value located by the coordinates (row, column).
final void	show()
int	size() Return the size (number of columns) of the data frame.
DataFrame<V>	skew()
DataFrame<V>	slice(Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)
DataFrame<V>	slice(Integer rowStart, Integer rowEnd)
DataFrame<V>	slice(Object rowStart, Object rowEnd)
DataFrame<V>	slice(Object rowStart, Object rowEnd, Object colStart, Object colEnd)
DataFrame<V>	sortBy(Comparator<List<V>> comparator)
DataFrame<V>	sortBy(Integer... cols)
DataFrame<V>	sortBy(Object... cols)
DataFrame<V>	stddev() Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V>	sum() Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V>	tail() Return a data frame containing the last ten rows of this data frame.
DataFrame<V>	tail(int limit) Return a data frame containing the last `limit` rows of this data frame.
Object[]	toArray() Copy the values of contained in the data frame into a flat array of length `#size()` * `#length()`.
<U> U[][]	toArray(U[][] array)
<U> U	toArray(Class<U> cls) Copy the values of contained in the data frame into a array of the specified type.
<U> U[]	toArray(U[] array) Copy the values of contained in the data frame into the specified array.
double[][]	toModelMatrix(double fillValue) Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
DataFrame<Number>	toModelMatrixDataFrame() Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
String	toString()
final String	toString(int limit)
<U> DataFrame<U>	transform(RowFunction<V, U> transform)
DataFrame<V>	transpose() Transpose the rows and columns of the data frame.
List<Class<?>>	types() Return the types for each of the data frame columns.
DataFrame<V>	unique(Integer... cols)
DataFrame<V>	unique()
DataFrame<V>	unique(Object... cols)
final DataFrame<V>	update(DataFrame...<? extends V> others) Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.
DataFrame<V>	var()
final void	writeCsv(String file) Write the data from this data frame to the specified file as comma separated values.
final void	writeCsv(OutputStream output) Write the data from this data frame to the provided output stream as comma separated values.
final void	writeSql(PreparedStatement stmt) Write the data from the data frame to a database by executing the provided prepared SQL statement.
final void	writeSql(Connection c, String sql) Write the data from the data frame to a database by executing the specified SQL statement.
final void	writeXls(OutputStream output) Write the data from the data frame to the provided output stream as an excel workbook.
final void	writeXls(String file) Write the data from the data frame to the specified file as an excel workbook.

[Expand]

Inherited Methods

From class java.lang.Object

From interface java.lang.Iterable

Public Constructors

public DataFrame ()

Construct an empty data frame.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();
 true

public DataFrame (String... columns)

Construct an empty data frame with the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value]

Parameters

columns	the data frame column names.

public DataFrame (Collection<?> columns)

Construct an empty data frame with the specified columns.

 > List<String> columns = new ArrayList<>();
 > columns.add("name");
 > columns.add("value");
 > DataFrame<Object> df = new DataFrame<>(columns);
 > df.columns();
 [name, value]

Parameters

columns	the data frame column names.

public DataFrame (Collection<?> index, Collection<?> columns)

Construct a data frame containing the specified rows and columns.

 > List<String> rows = Arrays.asList("row1", "row2", "row3");
 > List<String> columns = Arrays.asList("col1", "col2");
 > DataFrame<Object> df = new DataFrame<>(rows, columns);
 > df.get("row1", "col1");
 null

Parameters

index	the row names
columns	the column names

public DataFrame (List<? extends List<? extends V>> data)

Construct a data frame from the specified list of columns.

 > List<List<Object>> data = Arrays.asList(
 >       Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >       Arrays.<Object>asList(1, 2, 3)
 > );
 > DataFrame<Object> df = new DataFrame<>(data);
 > df.row(0);
 [alpha, 1]

Parameters

data	a list of columns containing the data elements.

public DataFrame (Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)

Construct a new data frame using the specified data and indices.

Parameters

index	the row names
columns	the column names
data	the data

Public Methods

public DataFrame<V> add (Object column, List<V> values)

Add a new column to the data frame containing the value provided. Any existing rows with indices greater than the size of the specified column data will have null values for the new column.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value", Arrays.<Object>asList(1));
 > df.columns();
 [value]

Parameters

column	the new column names
values	the new column values

Returns

the data frame with the column added

public DataFrame<V> add (Object column, Function<List<V>, V> function)

Add the results of applying a row-wise function to the data frame as a new column.

Parameters

column	the new column name
function	the function to compute the new column values

Returns

the data frame with the column added

public DataFrame<V> add (List<V> values)

Add the list of values as a new column.

Parameters

values	the new column values

Returns

the data frame with the column added

public DataFrame<V> add (Object... columns)

Add new columns to the data frame. Any existing rows will have null values for the new columns.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value");
 > df.columns();
 [value]

Parameters

columns	the new column names

Returns

the data frame with the columns added

public DataFrame<V> aggregate (Aggregate<V, U> function)

Apply an aggregate function to each group or the entire data frame if the data is not grouped.

Parameters

function	the aggregate function

Returns

the new data frame

public DataFrame<V> append (Object name, List<? extends V> row)

Append rows indexed by the the specified name to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("alpha", 1));
 > df.append("row2", Arrays.asList("bravo", 2));
 > df.index();
 [row1, row2]

Parameters

name	the row name to add to the index
row	the row to append

Returns

the data frame with the new data appended

public DataFrame<V> append (Object name, V[] row)

public DataFrame<V> append (List<? extends V> row)

Append rows to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.length();
 2

Parameters

row	the row to append

Returns

the data frame with the new data appended

public DataFrame<U> apply (Function<V, U> function)

Apply a function to each value in the data frame.

 > DataFrame<Number> df = new DataFrame<>(
 >         Arrays.<List<Number>>asList(
 >                 Arrays.<Number>asList(1, 2),
 >                 Arrays.<Number>asList(3, 4)
 >             )
 >     );
 > df = df.apply(new Function<Number, Number>() {
 >         public Number apply(Number value) {
 >             return value.intValue() * value.intValue();
 >         
 >     });
 > df.flatten();
 [1, 4, 9, 16] }

Parameters

function	the function to apply

Returns

a new data frame with the function results

public DataFrame<T> cast (Class<T> cls)

Cast this data frame to the specified type.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", "1"));
 > DataFrame<String> dfs = df.cast(String.class);
 > dfs.get(0, 0).getClass().getName();
 java.lang.String

Returns

the data frame cast to the specified type

public final DataFrame<V> coalesce (DataFrame...<? extends V> others)

Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.

Parameters

others	the other data frames

Returns

this data frame with the overwritten values

public List<V> col (Object column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col("value");
 [1, 2, 3]

Parameters

column	the column name

Returns

the list of values

public List<V> col (Integer column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col(1);
 [1, 2, 3]

Parameters

column	the column index

Returns

the list of values

public DataFrame<V> collapse ()

public Set<Object> columns ()

Return the column names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value]

Returns

the column names

public static final DataFrame<String> compare (DataFrame<V> df1, DataFrame<V> df2)

public final DataFrame<V> concat (DataFrame...<? extends V> others)

Concatenate the specified data frames with this data frame and return the result.

 > DataFrame<Object> left = new DataFrame<>("a", "b", "c");
 > left.append("one", Arrays.asList(1, 2, 3));
 > left.append("two", Arrays.asList(4, 5, 6));
 > left.append("three", Arrays.asList(7, 8, 9));
 > DataFrame<Object> right = new DataFrame<>("a", "b", "d");
 > right.append("one", Arrays.asList(10, 20, 30));
 > right.append("two", Arrays.asList(40, 50, 60));
 > right.append("four", Arrays.asList(70, 80, 90));
 > left.concat(right).length();
 6

Parameters

others	the other data frames

Returns

the data frame containing all the values

public final DataFrame<V> convert (Class...<? extends V> columnTypes)

Convert columns based on the requested types.

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("a", "b", "c");
 > df.append(Arrays.asList("one", 1, 1.0));
 > df.append(Arrays.asList("two", 2, 2.0));
 > df.convert(
 >     null,         // leave column "a" as is
 >     Long.class,   // convert column "b" to Long
 >     Number.class  // convert column "c" to Double
 > );
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.lang.Double]

Returns

the data frame with the converted values

public DataFrame<V> convert (DataFrame.NumberDefault numDefault, String naString)

public DataFrame<V> convert ()

Attempt to infer better types for object columns.

The following conversions are performed where applicable:

Floating point numbers are converted to Double values
Whole numbers are converted to Long values
True, false, yes, and no are converted to Boolean values
Date strings in the following formats are converted to Date values:
2000-01-01T00:00:00+1, 2000-01-01T00:00:00EST, 2000-01-01
Time strings in the following formats are converted to Date values:
2000/01/01, 1/01/2000, 12:01:01 AM, 23:01:01, 12:01 AM, 23:01

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "date");
 > df.append(Arrays.asList("one", "1", new Date()));
 > df.convert();
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.util.Date]

Returns

the data frame with the converted values

public DataFrame<V> count ()

public DataFrame<Number> cov ()

public DataFrame<V> cummax ()

public DataFrame<V> cummin ()

public DataFrame<V> cumprod ()

public DataFrame<V> cumsum ()

public DataFrame<V> describe ()

public DataFrame<V> diff (int period)

public DataFrame<V> diff ()

public final void draw (Container container, DataFrame.PlotType type)

Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.

Parameters

container	the container to use for the chart
type	the type of plot to draw

public final void draw (Container container)

Draw the numeric columns of this data frame as a chart in the specified Container.

Parameters

container	the container to use for the chart

public DataFrame<V> drop (Integer... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop(2).columns();
 [name, value]

Parameters

cols	the indices of the columns to be removed

Returns

a shallow copy of the data frame with the columns removed

public DataFrame<V> drop (Object... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop("category").columns();
 [name, value]

Parameters

cols	the names of columns to be removed

Returns

a shallow copy of the data frame with the columns removed

public DataFrame<V> dropna ()

public DataFrame<V> dropna (DataFrame.Axis direction)

public Map<Object, DataFrame<V>> explode ()

Return a map of group names to data frame for grouped data frames. Observe that for this method to have any effect a groupBy call must have been done before.

Returns

a map of group names to data frames

public DataFrame<V> fillna (V fill)

Returns a view of the of data frame with NA's replaced with fill.

Parameters

fill	the value used to replace missing values

Returns

the new data frame

public List<V> flatten ()

Return the values of the data frame as a flat list.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.flatten();
 [one, two, alpha, bravo]

Returns

the list of values

public V get (Integer row, Integer col)

Return the value located by the (row, column) coordinates.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get(1, 0);
 bravo

Parameters

row	the row index
col	the column index

Returns

the value

public V get (Object row, Object col)

Return the value located by the (row, column) names.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList("row1", "row2", "row3"),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get("row2", "name");
 bravo

Parameters

row	the row name
col	the column name

Returns

the value

public DataFrame<V> groupBy (Object... cols)

Group the data frame rows by the specified column names.

Parameters

cols	the column names

Returns

the grouped data frame

public DataFrame<V> groupBy (Integer... cols)

Group the data frame rows by the specified columns.

Parameters

cols	the column indices

Returns

the grouped data frame

public DataFrame<V> groupBy (KeyFunction<V> function)

Group the data frame rows using the specified key function.

Parameters

function	the function to reduce rows to grouping keys

Returns

the grouped data frame

public Grouping groups ()

public DataFrame<V> head (int limit)

Return a data frame containing the first limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head(3)
 >   .col("value");
 [0, 1, 2]

Parameters

limit	the number of rows to include in the result

Returns

the new data frame

public DataFrame<V> head ()

Return a data frame containing the first ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head()
 >   .col("value");
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Returns

the new data frame

public Set<Object> index ()

Return the index names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("one", 1));
 > df.index();
 [row1]

Returns

the index names

public boolean isEmpty ()

Return true if the data frame contains no data.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();
 true

Returns

the number of columns

public DataFrame<Boolean> isnull ()

Create a new data frame containing boolean values such that null object references in the original data frame yield true and valid references yield false.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", null),
 >         Arrays.asList(null, 2, 3)
 >     )
 > );
 > df.isnull().row(0);
 [false, true]

Returns

the new boolean data frame

public ListIterator<List<V>> iterator ()

Return an iterator over the rows of the data frame. Also used implicitly with foreach loops.

 > DataFrame<Integer> df = new DataFrame<>(
 >         Arrays.asList(
 >             Arrays.asList(1, 2),
 >             Arrays.asList(3, 4)
 >         )
 >     );
 > List<Integer> results = new ArrayList<>();
 > for (List<Integer> row : df)
 >     results.add(row.get(0));
 > results;
 [1, 2]

Returns

an iterator over the rows of the data frame.

public ListIterator<List<V>> itercols ()

public ListIterator<Map<Object, V>> itermap ()

public ListIterator<List<V>> iterrows ()

public ListIterator<V> itervalues ()

public final DataFrame<V> join (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.

 > DataFrame<Object> left = new DataFrame<>("a", "b");
 > left.append("one", Arrays.asList(1, 2));
 > left.append("two", Arrays.asList(3, 4));
 > left.append("three", Arrays.asList(5, 6));
 > DataFrame<Object> right = new DataFrame<>("c", "d");
 > right.append("one", Arrays.asList(10, 20));
 > right.append("two", Arrays.asList(30, 40));
 > right.append("four", Arrays.asList(50, 60));
 > left.join(right)
 >     .index();
 [one, two, three]

Parameters

other	the other data frame

Returns

the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, KeyFunction<V> on)

Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.

Parameters

other	the other data frame
on	the function to generate the join keys

Returns

the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.

Parameters

other	the other data frame
join	the join type

Returns

the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.

Parameters

other	the other data frame
join	the join type
on	the function to generate the join keys

Returns

the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Integer... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

Parameters

other	the other data frame
join	the join type
cols	the indices of the columns to use as the join key

Returns

the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Integer... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

Parameters

other	the other data frame
cols	the indices of the columns to use as the join key

Returns

the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Object... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

Parameters

other	the other data frame
cols	the names of the columns to use as the join key

Returns

the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Object... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

Parameters

other	the other data frame
join	the join type
cols	the names of the columns to use as the join key

Returns

the result of the join operation as a new data frame

public DataFrame<V> kurt ()

public int length ()

Return the length (number of rows) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.append(Arrays.asList("charlie", 3));
 > df.length();
 3

Returns

the number of rows

public static final void main (String[] args)

Entry point to joinery as a command line tool. The available commands are:

show: display the specified data frame as a swing table
plot: display the specified data frame as a chart
compare: merge the specified data frames and output the result
shell: launch an interactive javascript shell for exploring data

Parameters

args	file paths or urls of csv input data

Throws

IOException	if an error occurs reading input

public Map<V, List<V>> map (Object key, Object value)

public Map<Object, List<V>> map ()

Return a map of index names to rows.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("alpha", Arrays.asList(1));
 > df.append("bravo", Arrays.asList(2));
 > df.map();
 {alpha=[1], bravo=[2]}

Returns

a map of index names to rows.

public Map<V, List<V>> map (Integer key, Integer value)

public DataFrame<V> max ()

public DataFrame<V> mean ()

Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);
 [4.0]

Returns

the new data frame

public DataFrame<V> median ()

public final DataFrame<V> merge (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.

Parameters

other	the other data frame

Returns

the result of the merge operation as a new data frame

public final DataFrame<V> merge (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.

Parameters

other	the other data frame

Returns

the result of the merge operation as a new data frame

public DataFrame<V> min ()

public DataFrame<V> mode ()

public DataFrame<V> nonnumeric ()

Return a data frame containing only columns with non-numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.nonnumeric().columns();
 [name]

Returns

a data frame containing only the non-numeric columns

public DataFrame<Boolean> notnull ()

Create a new data frame containing boolean values such that valid object references in the original data frame yield true and null references yield false.

 > DataFrame<Object> df = new DataFrame<>(
 >     Arrays.asList(
 >         Arrays.<Object>asList("alpha", "bravo", null),
 >         Arrays.<Object>asList(null, 2, 3)
 >     )
 > );
 > df.notnull().row(0);
 [true, false]

Returns

the new boolean data frame

public DataFrame<Number> numeric ()

Return a data frame containing only columns with numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.numeric().columns();
 [value]

Returns

a data frame containing only the numeric columns

public DataFrame<V> percentChange ()

public DataFrame<V> percentChange (int period)

public DataFrame<V> percentile (double quantile)

Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);
 [4.0]

Returns

the new data frame

public DataFrame<V> pivot (Object row, Object col, Object... values)

public DataFrame<V> pivot (Integer row, Integer col, Integer... values)

public DataFrame<V> pivot (Integer[] rows, Integer[] cols, Integer[] values)

public DataFrame<U> pivot (KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)

public DataFrame<V> pivot (List<Object> rows, List<Object> cols, List<Object> values)

public final void plot (DataFrame.PlotType type)

Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot(PlotType.AREA);

Parameters

type	the type of plot to display

public final void plot ()

Display the numeric columns of this data frame as a line chart in a new swing frame.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot();

public DataFrame<V> prod ()

Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .prod()
 >   .col("value");
 [6.0, 20.0]

Returns

the new data frame

public static final DataFrame<Object> readCsv (String file)

Read the specified csv file and return the data as a data frame.

Parameters

file	the csv file

Returns

a new data frame

Throws

IOException	if an error reading the file occurs

public static final DataFrame<Object> readCsv (InputStream input, String separator, DataFrame.NumberDefault longDefault)

Throws

IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator)

Throws

IOException

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault, String naString)

Throws

IOException

public static final DataFrame<Object> readCsv (InputStream input)

Read csv records from an input stream and return the data as a data frame.

Parameters

input	the input stream

Returns

a new data frame

Throws

IOException	if an error reading the stream occurs

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault)

Throws

IOException

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)

Throws

IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString, boolean hasHeader)

Throws

IOException

public static final DataFrame<Object> readCsv (String file, String separator)

Throws

IOException

public static final DataFrame<Object> readCsv (String file, String separator, String naString, boolean hasHeader)

Throws

IOException

public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString)

Throws

IOException

public static final DataFrame<Object> readSql (ResultSet rs)

Read data from the provided query results into a new data frame.

Parameters

rs	the query results

Returns

a new data frame

Throws

SQLException	if an error occurs reading the results

public static final DataFrame<Object> readSql (Connection c, String sql)

Execute the SQL query and return the results as a new data frame.

 > Connection c = DriverManager.getConnection("jdbc:derby:memory:testdb;create=true");
 > c.createStatement().executeUpdate("create table data (a varchar(8), b int)");
 > c.createStatement().executeUpdate("insert into data values ('test', 1)");
 > DataFrame.readSql(c, "select * from data").flatten();
 [test, 1]

Parameters

c	the database connection
sql	the SQL query

Returns

a new data frame

Throws

SQLException	if an error occurs execution the query

public static final DataFrame<Object> readXls (String file)

Read data from the specified excel workbook into a new data frame.

Parameters

file	the excel workbook

Returns

a new data frame

Throws

IOException	if an error occurs reading the workbook

public static final DataFrame<Object> readXls (InputStream input)

Read data from the input stream as an excel workbook into a new data frame.

Parameters

input	the input stream

Returns

a new data frame

Throws

IOException	if an error occurs reading the input stream

public DataFrame<V> reindex (Integer col, boolean drop)

Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0, true)
 >   .index();
 [alpha, bravo]

Parameters

col	the column to use as the new index
drop	true to remove the index column from the data, false otherwise

Returns

a new data frame with index specified

public DataFrame<V> reindex (Integer... cols)

Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0)
 >   .index();
 [alpha, bravo]

Parameters

cols	the column to use as the new index

Returns

a new data frame with index specified

public DataFrame<V> reindex (Integer[] cols, boolean drop)

Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new Integer[] { 0, 1 , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

Parameters

cols	the column to use as the new index
drop	true to remove the index column from the data, false otherwise

Returns

a new data frame with index specified

public DataFrame<V> reindex (Object... cols)

Re-index the rows of the data frame using the specified column names and removing the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo]

Parameters

cols	the column to use as the new index

Returns

a new data frame with index specified

public DataFrame<V> reindex (Object[] cols, boolean drop)

Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new String[] { "one", "two" , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

Parameters

cols	the column to use as the new index
drop	true to remove the index column from the data, false otherwise

Returns

a new data frame with index specified

public DataFrame<V> reindex (Object col, boolean drop)

Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo]

Parameters

col	the column to use as the new index
drop	true to remove the index column from the data, false otherwise

Returns

a new data frame with index specified

public DataFrame<V> rename (Map<Object, Object> names)

public DataFrame<V> rename (Object old, Object name)

public DataFrame<V> resetIndex ()

Return a new data frame with the default index, rows names will be reset to the string value of their integer index.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.resetIndex()
 >   .index();
 [0, 1]

Returns

a new data frame with the default index.

public DataFrame<V> reshape (Integer rows, Integer cols)

Reshape a data frame to the specified dimensions.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(3, 2)
 >   .length();
 3

Parameters

rows	the number of rows the new data frame will contain
cols	the number of columns the new data frame will contain

Returns

a new data frame with the specified dimensions

public DataFrame<V> reshape (Collection<?> rows, Collection<?> cols)

Reshape a data frame to the specified indices.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(Arrays.asList("0", "1", "2"), Arrays.asList("0", "1"))
 >   .length();
 3

Parameters

rows	the names of rows the new data frame will contain
cols	the names of columns the new data frame will contain

Returns

a new data frame with the specified indices

public DataFrame<V> retain (Integer... cols)

Create a new data frame containing only the specified columns.

 DataFrame<Object> df = new DataFrame<>("name", "value", "category");
  df.retain(0, 2).columns();
 [name, category]

Parameters

cols	the columns to include in the new data frame

Returns

a new data frame containing only the specified columns

public DataFrame<V> retain (Object... cols)

Create a new data frame containing only the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.retain("name", "category").columns();
 [name, category]

Parameters

cols	the columns to include in the new data frame

Returns

a new data frame containing only the specified columns

public DataFrame<V> rollapply (Function<List<V>, V> function)

public DataFrame<V> rollapply (Function<List<V>, V> function, int period)

public List<V> row (Object row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Arrays.asList("row1", "row2", "row3"),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row("row2");
 [bravo, 2]

Parameters

row	the row name

Returns

the list of values

public List<V> row (Integer row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row(1);
 [bravo, 2]

Parameters

row	the row index

Returns

the list of values

public DataFrame<V> select (Predicate<V> predicate)

Select a subset of the data frame using a predicate function.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > for (int i = 0; i < 10; i++)
 >     df.append(Arrays.asList("name" + i, i));
 > df.select(new Predicate<Object>() {
 >         @Override
 >         public Boolean apply(List<Object> values) {
 >             return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
 >         
 >     })
 >   .col(1);
 [0, 2, 4, 6, 8] }

Parameters

predicate	a function returning true for rows to be included in the subset

Returns

a subset of the data frame

public void set (Object row, Object col, V value)

Set the value located by the names (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set("row1", "col2", new Integer(7));
 > df.col(1);
 [7, null]

Parameters

row	the row name
col	the column name
value	the new value

public void set (Integer row, Integer col, V value)

Set the value located by the coordinates (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set(1, 0, new Integer(7));
 > df.col(0);
 [null, 7]

Parameters

row	the row index
col	the column index
value	the new value

public final void show ()

public int size ()

Return the size (number of columns) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.size();
 2

Returns

the number of columns

public DataFrame<V> skew ()

public DataFrame<V> slice (Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)

public DataFrame<V> slice (Integer rowStart, Integer rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd, Object colStart, Object colEnd)

public DataFrame<V> sortBy (Comparator<List<V>> comparator)

public DataFrame<V> sortBy (Integer... cols)

public DataFrame<V> sortBy (Object... cols)

public DataFrame<V> stddev ()

Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 6, 8)
 >             )
 >     );
 > df.groupBy("name")
 >   .stddev()
 >   .col("value");
 [1.0, 2.0]

Returns

the new data frame

public DataFrame<V> sum ()

Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .sum()
 >   .col("value");
 [6.0, 9.0]

Returns

the new data frame

public DataFrame<V> tail ()

Return a data frame containing the last ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail()
 >   .col("value");
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

Returns

the new data frame

public DataFrame<V> tail (int limit)

Return a data frame containing the last limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail(3)
 >   .col("value");
 [17, 18, 19]

Parameters

limit	the number of rows to include in the result

Returns

the new data frame

public Object[] toArray ()

Copy the values of contained in the data frame into a flat array of length #size() * #length().

Returns

the array

public U[][] toArray (U[][] array)

public U toArray (Class<U> cls)

Copy the values of contained in the data frame into a array of the specified type. If the type specified is a two dimensional array, for example double[][].class, a row-wise copy will be made.

Returns

the array

Throws

IllegalArgumentException	if the values are not assignable to the specified component type

public U[] toArray (U[] array)

Copy the values of contained in the data frame into the specified array. If the length of the provided array is less than length #size() * #length() a new array will be created.

Returns

the array

public double[][] toModelMatrix (double fillValue)

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

Parameters

fillValue	value to replace NA's with

Returns

a model matrix

public DataFrame<Number> toModelMatrixDataFrame ()

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

Returns

a model matrix

public String toString ()

public final String toString (int limit)

public DataFrame<U> transform (RowFunction<V, U> transform)

public DataFrame<V> transpose ()

Transpose the rows and columns of the data frame.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.transpose().flatten();
 [one, alpha, two, bravo]

Returns

a new data frame with the rows and columns transposed

public List<Class<?>> types ()

Return the types for each of the data frame columns.

Returns

the list of column types

public DataFrame<V> unique (Integer... cols)

public DataFrame<V> unique ()

public DataFrame<V> unique (Object... cols)

public final DataFrame<V> update (DataFrame...<? extends V> others)

Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.

Parameters

others	the other data frames

Returns

this data frame with the overwritten values

public DataFrame<V> var ()

public final void writeCsv (String file)

Write the data from this data frame to the specified file as comma separated values.

Parameters

file	the file to write

Throws

IOException	if an error occurs writing the file

public final void writeCsv (OutputStream output)

Write the data from this data frame to the provided output stream as comma separated values.

Throws

IOException

public final void writeSql (PreparedStatement stmt)

Write the data from the data frame to a database by executing the provided prepared SQL statement.

Parameters

stmt	a prepared insert statement

Throws

SQLException	if an error occurs executing the statement

public final void writeSql (Connection c, String sql)

Write the data from the data frame to a database by executing the specified SQL statement.

Parameters

c	the database connection
sql	the SQL statement

Throws

SQLException	if an error occurs executing the statement

public final void writeXls (OutputStream output)

Write the data from the data frame to the provided output stream as an excel workbook.

Throws

IOException	if an error occurs writing the file

public final void writeXls (String file)

Write the data from the data frame to the specified file as an excel workbook.

Parameters

file	the file to write

Throws

IOException	if an error occurs writing the file