public class


extends Object
implements Iterable<T>
   ↳ joinery.DataFrame<V>

Class Overview

A data frame implementation in the spirit of Pandas or R data frames.

Below is a simple motivating example. When working in Java, data operations like the following should be easy. The code below retrieves the S&P 500 daily market data for 2008 from Yahoo! Finance and returns the average monthly close for the three top months of the year.

 > DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("gspc.csv"))
 >     .retain("Date", "Close")
 >     .groupBy(row -> Date.class.cast(row.get(0)).getMonth())
 >     .mean()
 >     .sortBy("Close")
 >     .tail(3)
 >     .apply(value -> Number.class.cast(value).intValue())
 >     .col("Close");
 [1370, 1378, 1403] 

Taking each step in turn:

  1. readCsv(String) reads csv data from files and urls
  2. retain(Object) is used to eliminate columns that are not needed
  3. groupBy(KeyFunction) with a key function is used to group the rows by month
  4. mean() calculates the average close for each month
  5. sortBy(Object) orders the rows according to average closing price
  6. tail(int) returns the last three rows (alternatively, sort in descending order and use head)
  7. apply(Function) is used to convert the closing prices to integers (this is purely to ease comparisons for verifying the results
  8. finally, col(Object) is used to extract the values as a list

Find more details on the github project page.


Nested Classes
interface DataFrame.Aggregate<I, O> A function that converts lists of data frame values to aggregate results. 
enum DataFrame.Axis An enumeration of data frame axes. 
interface DataFrame.Function<I, O> A function that is applied to objects (rows or values) in a data frame
enum DataFrame.JoinType An enumeration of join types for joining data frames together. 
interface DataFrame.KeyFunction<I> A function that converts data frame rows to index or group keys. 
enum DataFrame.NumberDefault  
enum DataFrame.PlotType An enumeration of plot types for displaying data frames with charts. 
interface DataFrame.Predicate<I> An interface used to filter a data frame
interface DataFrame.RowFunction<I, O>  
enum DataFrame.SortDirection  
Public Constructors
Construct an empty data frame.
DataFrame(String... columns)
Construct an empty data frame with the specified columns.
DataFrame(Collection<?> columns)
Construct an empty data frame with the specified columns.
DataFrame(Collection<?> index, Collection<?> columns)
Construct a data frame containing the specified rows and columns.
DataFrame(List<? extends List<? extends V>> data)
Construct a data frame from the specified list of columns.
DataFrame(Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)
Construct a new data frame using the specified data and indices.
Public Methods
DataFrame<V> add(Object column, List<V> values)
Add a new column to the data frame containing the value provided.
DataFrame<V> add(Object column, Function<List<V>, V> function)
Add the results of applying a row-wise function to the data frame as a new column.
DataFrame<V> add(List<V> values)
Add the list of values as a new column.
DataFrame<V> add(Object... columns)
Add new columns to the data frame.
<U> DataFrame<V> aggregate(Aggregate<V, U> function)
Apply an aggregate function to each group or the entire data frame if the data is not grouped.
DataFrame<V> append(Object name, List<? extends V> row)
Append rows indexed by the the specified name to the data frame.
DataFrame<V> append(Object name, V[] row)
DataFrame<V> append(List<? extends V> row)
Append rows to the data frame.
<U> DataFrame<U> apply(Function<V, U> function)
Apply a function to each value in the data frame.
<T> DataFrame<T> cast(Class<T> cls)
Cast this data frame to the specified type.
final DataFrame<V> coalesce(DataFrame...<? extends V> others)
Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.
List<V> col(Object column)
Return a data frame column as a list.
List<V> col(Integer column)
Return a data frame column as a list.
DataFrame<V> collapse()
Set<Object> columns()
Return the column names for the data frame.
final static <V> DataFrame<String> compare(DataFrame<V> df1, DataFrame<V> df2)
final DataFrame<V> concat(DataFrame...<? extends V> others)
Concatenate the specified data frames with this data frame and return the result.
final DataFrame<V> convert(Class...<? extends V> columnTypes)
Convert columns based on the requested types.
DataFrame<V> convert(DataFrame.NumberDefault numDefault, String naString)
DataFrame<V> convert()
Attempt to infer better types for object columns.
DataFrame<V> count()
DataFrame<Number> cov()
DataFrame<V> cummax()
DataFrame<V> cummin()
DataFrame<V> cumprod()
DataFrame<V> cumsum()
DataFrame<V> describe()
DataFrame<V> diff(int period)
DataFrame<V> diff()
final void draw(Container container, DataFrame.PlotType type)
Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.
final void draw(Container container)
Draw the numeric columns of this data frame as a chart in the specified Container.
DataFrame<V> drop(Integer... cols)
Create a new data frame by leaving out the specified columns.
DataFrame<V> drop(Object... cols)
Create a new data frame by leaving out the specified columns.
DataFrame<V> dropna()
DataFrame<V> dropna(DataFrame.Axis direction)
Map<Object, DataFrame<V>> explode()
Return a map of group names to data frame for grouped data frames.
DataFrame<V> fillna(V fill)
Returns a view of the of data frame with NA's replaced with fill.
List<V> flatten()
Return the values of the data frame as a flat list.
V get(Integer row, Integer col)
Return the value located by the (row, column) coordinates.
V get(Object row, Object col)
Return the value located by the (row, column) names.
DataFrame<V> groupBy(Object... cols)
Group the data frame rows by the specified column names.
DataFrame<V> groupBy(Integer... cols)
Group the data frame rows by the specified columns.
DataFrame<V> groupBy(KeyFunction<V> function)
Group the data frame rows using the specified key function.
Grouping groups()
DataFrame<V> head(int limit)
Return a data frame containing the first limit rows of this data frame.
DataFrame<V> head()
Return a data frame containing the first ten rows of this data frame.
Set<Object> index()
Return the index names for the data frame.
boolean isEmpty()
Return true if the data frame contains no data.
DataFrame<Boolean> isnull()
Create a new data frame containing boolean values such that null object references in the original data frame yield true and valid references yield false.
ListIterator<List<V>> iterator()
Return an iterator over the rows of the data frame.
ListIterator<List<V>> itercols()
ListIterator<Map<Object, V>> itermap()
ListIterator<List<V>> iterrows()
ListIterator<V> itervalues()
final DataFrame<V> join(DataFrame<V> other)
Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.
final DataFrame<V> join(DataFrame<V> other, KeyFunction<V> on)
Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.
final DataFrame<V> join(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.
final DataFrame<V> join(DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.
final DataFrame<V> joinOn(DataFrame<V> other, DataFrame.JoinType join, Integer... cols)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, Integer... cols)
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, Object... cols)
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
final DataFrame<V> joinOn(DataFrame<V> other, DataFrame.JoinType join, Object... cols)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
DataFrame<V> kurt()
int length()
Return the length (number of rows) of the data frame.
final static void main(String[] args)
Entry point to joinery as a command line tool.
Map<V, List<V>> map(Object key, Object value)
Map<Object, List<V>> map()
Return a map of index names to rows.
Map<V, List<V>> map(Integer key, Integer value)
DataFrame<V> max()
DataFrame<V> mean()
Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> median()
final DataFrame<V> merge(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.
final DataFrame<V> merge(DataFrame<V> other)
Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.
DataFrame<V> min()
DataFrame<V> mode()
DataFrame<V> nonnumeric()
Return a data frame containing only columns with non-numeric data.
DataFrame<Boolean> notnull()
Create a new data frame containing boolean values such that valid object references in the original data frame yield true and null references yield false.
DataFrame<Number> numeric()
Return a data frame containing only columns with numeric data.
DataFrame<V> percentChange()
DataFrame<V> percentChange(int period)
DataFrame<V> percentile(double quantile)
Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> pivot(Object row, Object col, Object... values)
DataFrame<V> pivot(Integer row, Integer col, Integer... values)
DataFrame<V> pivot(Integer[] rows, Integer[] cols, Integer[] values)
<U> DataFrame<U> pivot(KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)
DataFrame<V> pivot(List<Object> rows, List<Object> cols, List<Object> values)
final void plot(DataFrame.PlotType type)
Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.
final void plot()
Display the numeric columns of this data frame as a line chart in a new swing frame.
DataFrame<V> prod()
Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.
final static DataFrame<Object> readCsv(String file)
Read the specified csv file and return the data as a data frame.
final static DataFrame<Object> readCsv(InputStream input, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object> readCsv(InputStream input, String separator)
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault longDefault, String naString)
final static DataFrame<Object> readCsv(InputStream input)
Read csv records from an input stream and return the data as a data frame.
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault longDefault)
final static DataFrame<Object> readCsv(String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(InputStream input, String separator, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(String file, String separator)
final static DataFrame<Object> readCsv(String file, String separator, String naString, boolean hasHeader)
final static DataFrame<Object> readCsv(InputStream input, String separator, String naString)
final static DataFrame<Object> readSql(ResultSet rs)
Read data from the provided query results into a new data frame.
final static DataFrame<Object> readSql(Connection c, String sql)
Execute the SQL query and return the results as a new data frame.
final static DataFrame<Object> readXls(String file)
Read data from the specified excel workbook into a new data frame.
final static DataFrame<Object> readXls(InputStream input)
Read data from the input stream as an excel workbook into a new data frame.
DataFrame<V> reindex(Integer col, boolean drop)
Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.
DataFrame<V> reindex(Integer... cols)
Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.
DataFrame<V> reindex(Integer[] cols, boolean drop)
Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.
DataFrame<V> reindex(Object... cols)
Re-index the rows of the data frame using the specified column names and removing the columns from the data.
DataFrame<V> reindex(Object[] cols, boolean drop)
Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.
DataFrame<V> reindex(Object col, boolean drop)
Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.
DataFrame<V> rename(Map<Object, Object> names)
DataFrame<V> rename(Object old, Object name)
DataFrame<V> resetIndex()
Return a new data frame with the default index, rows names will be reset to the string value of their integer index.
DataFrame<V> reshape(Integer rows, Integer cols)
Reshape a data frame to the specified dimensions.
DataFrame<V> reshape(Collection<?> rows, Collection<?> cols)
Reshape a data frame to the specified indices.
DataFrame<V> retain(Integer... cols)
Create a new data frame containing only the specified columns.
DataFrame<V> retain(Object... cols)
Create a new data frame containing only the specified columns.
DataFrame<V> rollapply(Function<List<V>, V> function)
DataFrame<V> rollapply(Function<List<V>, V> function, int period)
List<V> row(Object row)
Return a data frame row as a list.
List<V> row(Integer row)
Return a data frame row as a list.
DataFrame<V> select(Predicate<V> predicate)
Select a subset of the data frame using a predicate function.
void set(Object row, Object col, V value)
Set the value located by the names (row, column).
void set(Integer row, Integer col, V value)
Set the value located by the coordinates (row, column).
final void show()
int size()
Return the size (number of columns) of the data frame.
DataFrame<V> skew()
DataFrame<V> slice(Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)
DataFrame<V> slice(Integer rowStart, Integer rowEnd)
DataFrame<V> slice(Object rowStart, Object rowEnd)
DataFrame<V> slice(Object rowStart, Object rowEnd, Object colStart, Object colEnd)
DataFrame<V> sortBy(Comparator<List<V>> comparator)
DataFrame<V> sortBy(Integer... cols)
DataFrame<V> sortBy(Object... cols)
DataFrame<V> stddev()
Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> sum()
Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.
DataFrame<V> tail()
Return a data frame containing the last ten rows of this data frame.
DataFrame<V> tail(int limit)
Return a data frame containing the last limit rows of this data frame.
Object[] toArray()
Copy the values of contained in the data frame into a flat array of length #size() * #length().
<U> U[][] toArray(U[][] array)
<U> U toArray(Class<U> cls)
Copy the values of contained in the data frame into a array of the specified type.
<U> U[] toArray(U[] array)
Copy the values of contained in the data frame into the specified array.
double[][] toModelMatrix(double fillValue)
Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
DataFrame<Number> toModelMatrixDataFrame()
Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column.
String toString()
final String toString(int limit)
<U> DataFrame<U> transform(RowFunction<V, U> transform)
DataFrame<V> transpose()
Transpose the rows and columns of the data frame.
List<Class<?>> types()
Return the types for each of the data frame columns.
DataFrame<V> unique(Integer... cols)
DataFrame<V> unique()
DataFrame<V> unique(Object... cols)
final DataFrame<V> update(DataFrame...<? extends V> others)
Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.
DataFrame<V> var()
final void writeCsv(String file)
Write the data from this data frame to the specified file as comma separated values.
final void writeCsv(OutputStream output)
Write the data from this data frame to the provided output stream as comma separated values.
final void writeSql(PreparedStatement stmt)
Write the data from the data frame to a database by executing the provided prepared SQL statement.
final void writeSql(Connection c, String sql)
Write the data from the data frame to a database by executing the specified SQL statement.
final void writeXls(OutputStream output)
Write the data from the data frame to the provided output stream as an excel workbook.
final void writeXls(String file)
Write the data from the data frame to the specified file as an excel workbook.
Inherited Methods
From class java.lang.Object
From interface java.lang.Iterable

Public Constructors

public DataFrame ()

Construct an empty data frame.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();

public DataFrame (String... columns)

Construct an empty data frame with the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value] 

columns the data frame column names.

public DataFrame (Collection<?> columns)

Construct an empty data frame with the specified columns.

 > List<String> columns = new ArrayList<>();
 > columns.add("name");
 > columns.add("value");
 > DataFrame<Object> df = new DataFrame<>(columns);
 > df.columns();
 [name, value] 

columns the data frame column names.

public DataFrame (Collection<?> index, Collection<?> columns)

Construct a data frame containing the specified rows and columns.

 > List<String> rows = Arrays.asList("row1", "row2", "row3");
 > List<String> columns = Arrays.asList("col1", "col2");
 > DataFrame<Object> df = new DataFrame<>(rows, columns);
 > df.get("row1", "col1");

index the row names
columns the column names

public DataFrame (List<? extends List<? extends V>> data)

Construct a data frame from the specified list of columns.

 > List<List<Object>> data = Arrays.asList(
 >       Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >       Arrays.<Object>asList(1, 2, 3)
 > );
 > DataFrame<Object> df = new DataFrame<>(data);
 > df.row(0);
 [alpha, 1] 

data a list of columns containing the data elements.

public DataFrame (Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)

Construct a new data frame using the specified data and indices.

index the row names
columns the column names
data the data

Public Methods

public DataFrame<V> add (Object column, List<V> values)

Add a new column to the data frame containing the value provided. Any existing rows with indices greater than the size of the specified column data will have null values for the new column.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value", Arrays.<Object>asList(1));
 > df.columns();

column the new column names
values the new column values
  • the data frame with the column added

public DataFrame<V> add (Object column, Function<List<V>, V> function)

Add the results of applying a row-wise function to the data frame as a new column.

column the new column name
function the function to compute the new column values
  • the data frame with the column added

public DataFrame<V> add (List<V> values)

Add the list of values as a new column.

values the new column values
  • the data frame with the column added

public DataFrame<V> add (Object... columns)

Add new columns to the data frame. Any existing rows will have null values for the new columns.

 > DataFrame<Object> df = new DataFrame<>();
 > df.add("value");
 > df.columns();

columns the new column names
  • the data frame with the columns added

public DataFrame<V> aggregate (Aggregate<V, U> function)

Apply an aggregate function to each group or the entire data frame if the data is not grouped.

function the aggregate function
  • the new data frame

public DataFrame<V> append (Object name, List<? extends V> row)

Append rows indexed by the the specified name to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("alpha", 1));
 > df.append("row2", Arrays.asList("bravo", 2));
 > df.index();
 [row1, row2] 

name the row name to add to the index
row the row to append
  • the data frame with the new data appended

public DataFrame<V> append (Object name, V[] row)

public DataFrame<V> append (List<? extends V> row)

Append rows to the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.length();

row the row to append
  • the data frame with the new data appended

public DataFrame<U> apply (Function<V, U> function)

Apply a function to each value in the data frame.

 > DataFrame<Number> df = new DataFrame<>(
 >         Arrays.<List<Number>>asList(
 >                 Arrays.<Number>asList(1, 2),
 >                 Arrays.<Number>asList(3, 4)
 >             )
 >     );
 > df = df.apply(new Function<Number, Number>() {
 >         public Number apply(Number value) {
 >             return value.intValue() * value.intValue();
 >     });
 > df.flatten();
 [1, 4, 9, 16] }

function the function to apply
  • a new data frame with the function results

public DataFrame<T> cast (Class<T> cls)

Cast this data frame to the specified type.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", "1"));
 > DataFrame<String> dfs = df.cast(String.class);
 > dfs.get(0, 0).getClass().getName();

  • the data frame cast to the specified type

public final DataFrame<V> coalesce (DataFrame...<? extends V> others)

Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.

others the other data frames
  • this data frame with the overwritten values

public List<V> col (Object column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col("value");
 [1, 2, 3] 

column the column name
  • the list of values

public List<V> col (Integer column)

Return a data frame column as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.col(1);
 [1, 2, 3] 

column the column index
  • the list of values

public DataFrame<V> collapse ()

public Set<Object> columns ()

Return the column names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.columns();
 [name, value] 

  • the column names

public static final DataFrame<String> compare (DataFrame<V> df1, DataFrame<V> df2)

public final DataFrame<V> concat (DataFrame...<? extends V> others)

Concatenate the specified data frames with this data frame and return the result.

 > DataFrame<Object> left = new DataFrame<>("a", "b", "c");
 > left.append("one", Arrays.asList(1, 2, 3));
 > left.append("two", Arrays.asList(4, 5, 6));
 > left.append("three", Arrays.asList(7, 8, 9));
 > DataFrame<Object> right = new DataFrame<>("a", "b", "d");
 > right.append("one", Arrays.asList(10, 20, 30));
 > right.append("two", Arrays.asList(40, 50, 60));
 > right.append("four", Arrays.asList(70, 80, 90));
 > left.concat(right).length();

others the other data frames
  • the data frame containing all the values

public final DataFrame<V> convert (Class...<? extends V> columnTypes)

Convert columns based on the requested types.

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("a", "b", "c");
 > df.append(Arrays.asList("one", 1, 1.0));
 > df.append(Arrays.asList("two", 2, 2.0));
 > df.convert(
 >     null,         // leave column "a" as is
 >     Long.class,   // convert column "b" to Long
 >     Number.class  // convert column "c" to Double
 > );
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.lang.Double] 

  • the data frame with the converted values

public DataFrame<V> convert (DataFrame.NumberDefault numDefault, String naString)

public DataFrame<V> convert ()

Attempt to infer better types for object columns.

The following conversions are performed where applicable:

  • Floating point numbers are converted to Double values
  • Whole numbers are converted to Long values
  • True, false, yes, and no are converted to Boolean values
  • Date strings in the following formats are converted to Date values:
    2000-01-01T00:00:00+1, 2000-01-01T00:00:00EST, 2000-01-01
  • Time strings in the following formats are converted to Date values:
    2000/01/01, 1/01/2000, 12:01:01 AM, 23:01:01, 12:01 AM, 23:01

Note, the conversion process replaces existing values with values of the converted type.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "date");
 > df.append(Arrays.asList("one", "1", new Date()));
 > df.convert();
 > df.types();
 [class java.lang.String, class java.lang.Long, class java.util.Date] 

  • the data frame with the converted values

public DataFrame<V> count ()

public DataFrame<Number> cov ()

public DataFrame<V> cummax ()

public DataFrame<V> cummin ()

public DataFrame<V> cumprod ()

public DataFrame<V> cumsum ()

public DataFrame<V> describe ()

public DataFrame<V> diff (int period)

public DataFrame<V> diff ()

public final void draw (Container container, DataFrame.PlotType type)

Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.

container the container to use for the chart
type the type of plot to draw

public final void draw (Container container)

Draw the numeric columns of this data frame as a chart in the specified Container.

container the container to use for the chart

public DataFrame<V> drop (Integer... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop(2).columns();
 [name, value] 

cols the indices of the columns to be removed
  • a shallow copy of the data frame with the columns removed

public DataFrame<V> drop (Object... cols)

Create a new data frame by leaving out the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.drop("category").columns();
 [name, value] 

cols the names of columns to be removed
  • a shallow copy of the data frame with the columns removed

public DataFrame<V> dropna ()

public DataFrame<V> dropna (DataFrame.Axis direction)

public Map<Object, DataFrame<V>> explode ()

Return a map of group names to data frame for grouped data frames. Observe that for this method to have any effect a groupBy call must have been done before.

  • a map of group names to data frames

public DataFrame<V> fillna (V fill)

Returns a view of the of data frame with NA's replaced with fill.

fill the value used to replace missing values
  • the new data frame

public List<V> flatten ()

Return the values of the data frame as a flat list.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.flatten();
 [one, two, alpha, bravo] 

  • the list of values

public V get (Integer row, Integer col)

Return the value located by the (row, column) coordinates.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get(1, 0);

row the row index
col the column index
  • the value

public V get (Object row, Object col)

Return the value located by the (row, column) names.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList("row1", "row2", "row3"),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.get("row2", "name");

row the row name
col the column name
  • the value

public DataFrame<V> groupBy (Object... cols)

Group the data frame rows by the specified column names.

cols the column names
  • the grouped data frame

public DataFrame<V> groupBy (Integer... cols)

Group the data frame rows by the specified columns.

cols the column indices
  • the grouped data frame

public DataFrame<V> groupBy (KeyFunction<V> function)

Group the data frame rows using the specified key function.

function the function to reduce rows to grouping keys
  • the grouped data frame

public Grouping groups ()

public DataFrame<V> head (int limit)

Return a data frame containing the first limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head(3)
 >   .col("value");
 [0, 1, 2] 

limit the number of rows to include in the result
  • the new data frame

public DataFrame<V> head ()

Return a data frame containing the first ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.head()
 >   .col("value");
 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 

  • the new data frame

public Set<Object> index ()

Return the index names for the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append("row1", Arrays.asList("one", 1));
 > df.index();

  • the index names

public boolean isEmpty ()

Return true if the data frame contains no data.

 > DataFrame<Object> df = new DataFrame<>();
 > df.isEmpty();

  • the number of columns

public DataFrame<Boolean> isnull ()

Create a new data frame containing boolean values such that null object references in the original data frame yield true and valid references yield false.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", null),
 >         Arrays.asList(null, 2, 3)
 >     )
 > );
 > df.isnull().row(0);
 [false, true] 

  • the new boolean data frame

public ListIterator<List<V>> iterator ()

Return an iterator over the rows of the data frame. Also used implicitly with foreach loops.

 > DataFrame<Integer> df = new DataFrame<>(
 >         Arrays.asList(
 >             Arrays.asList(1, 2),
 >             Arrays.asList(3, 4)
 >         )
 >     );
 > List<Integer> results = new ArrayList<>();
 > for (List<Integer> row : df)
 >     results.add(row.get(0));
 > results;
 [1, 2] 

  • an iterator over the rows of the data frame.

public ListIterator<List<V>> itercols ()

public ListIterator<Map<Object, V>> itermap ()

public ListIterator<List<V>> iterrows ()

public ListIterator<V> itervalues ()

public final DataFrame<V> join (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.

 > DataFrame<Object> left = new DataFrame<>("a", "b");
 > left.append("one", Arrays.asList(1, 2));
 > left.append("two", Arrays.asList(3, 4));
 > left.append("three", Arrays.asList(5, 6));
 > DataFrame<Object> right = new DataFrame<>("c", "d");
 > right.append("one", Arrays.asList(10, 20));
 > right.append("two", Arrays.asList(30, 40));
 > right.append("four", Arrays.asList(50, 60));
 > left.join(right)
 >     .index();
 [one, two, three] 

other the other data frame
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, KeyFunction<V> on)

Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.

other the other data frame
on the function to generate the join keys
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.

other the other data frame
join the join type
  • the result of the join operation as a new data frame

public final DataFrame<V> join (DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.

other the other data frame
join the join type
on the function to generate the join keys
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Integer... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

other the other data frame
join the join type
cols the indices of the columns to use as the join key
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Integer... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

other the other data frame
cols the indices of the columns to use as the join key
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, Object... cols)

Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.

other the other data frame
cols the names of the columns to use as the join key
  • the result of the join operation as a new data frame

public final DataFrame<V> joinOn (DataFrame<V> other, DataFrame.JoinType join, Object... cols)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.

other the other data frame
join the join type
cols the names of the columns to use as the join key
  • the result of the join operation as a new data frame

public DataFrame<V> kurt ()

public int length ()

Return the length (number of rows) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("alpha", 1));
 > df.append(Arrays.asList("bravo", 2));
 > df.append(Arrays.asList("charlie", 3));
 > df.length();

  • the number of rows

public static final void main (String[] args)

Entry point to joinery as a command line tool. The available commands are:

display the specified data frame as a swing table
display the specified data frame as a chart
merge the specified data frames and output the result
launch an interactive javascript shell for exploring data

args file paths or urls of csv input data
IOException if an error occurs reading input

public Map<V, List<V>> map (Object key, Object value)

public Map<Object, List<V>> map ()

Return a map of index names to rows.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("alpha", Arrays.asList(1));
 > df.append("bravo", Arrays.asList(2));
 {alpha=[1], bravo=[2]}

  • a map of index names to rows.

public Map<V, List<V>> map (Integer key, Integer value)

public DataFrame<V> max ()

public DataFrame<V> mean ()

Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);

  • the new data frame

public DataFrame<V> median ()

public final DataFrame<V> merge (DataFrame<V> other, DataFrame.JoinType join)

Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.

other the other data frame
  • the result of the merge operation as a new data frame

public final DataFrame<V> merge (DataFrame<V> other)

Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.

other the other data frame
  • the result of the merge operation as a new data frame

public DataFrame<V> min ()

public DataFrame<V> mode ()

public DataFrame<V> nonnumeric ()

Return a data frame containing only columns with non-numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.nonnumeric().columns();

  • a data frame containing only the non-numeric columns

public DataFrame<Boolean> notnull ()

Create a new data frame containing boolean values such that valid object references in the original data frame yield true and null references yield false.

 > DataFrame<Object> df = new DataFrame<>(
 >     Arrays.asList(
 >         Arrays.<Object>asList("alpha", "bravo", null),
 >         Arrays.<Object>asList(null, 2, 3)
 >     )
 > );
 > df.notnull().row(0);
 [true, false] 

  • the new boolean data frame

public DataFrame<Number> numeric ()

Return a data frame containing only columns with numeric data.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.append(Arrays.asList("one", 1));
 > df.append(Arrays.asList("two", 2));
 > df.numeric().columns();

  • a data frame containing only the numeric columns

public DataFrame<V> percentChange ()

public DataFrame<V> percentChange (int period)

public DataFrame<V> percentile (double quantile)

Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > df.append("one", Arrays.asList(1));
 > df.append("two", Arrays.asList(5));
 > df.append("three", Arrays.asList(3));
 > df.append("four",  Arrays.asList(7));
 > df.mean().col(0);

  • the new data frame

public DataFrame<V> pivot (Object row, Object col, Object... values)

public DataFrame<V> pivot (Integer row, Integer col, Integer... values)

public DataFrame<V> pivot (Integer[] rows, Integer[] cols, Integer[] values)

public DataFrame<U> pivot (KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values)

public DataFrame<V> pivot (List<Object> rows, List<Object> cols, List<Object> values)

public final void plot (DataFrame.PlotType type)

Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot(PlotType.AREA);

type the type of plot to display

public final void plot ()

Display the numeric columns of this data frame as a line chart in a new swing frame.

 > DataFrame<Object> df = new DataFrame<Object>(
 >     Collections.emptyList(),
 >     Arrays.asList("name", "value"),
 >     Arrays.asList(
 >         Arrays.asList("alpha", "bravo", "charlie"),
 >         Arrays.asList(10, 20, 30)
 >     )
 > );
 > df.plot();

public DataFrame<V> prod ()

Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .prod()
 >   .col("value");
 [6.0, 20.0] 

  • the new data frame

public static final DataFrame<Object> readCsv (String file)

Read the specified csv file and return the data as a data frame.

file the csv file
  • a new data frame
IOException if an error reading the file occurs

public static final DataFrame<Object> readCsv (InputStream input, String separator, DataFrame.NumberDefault longDefault)


public static final DataFrame<Object> readCsv (InputStream input, String separator)


public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault, String naString)


public static final DataFrame<Object> readCsv (InputStream input)

Read csv records from an input stream and return the data as a data frame.

input the input stream
  • a new data frame
IOException if an error reading the stream occurs

public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault longDefault)


public static final DataFrame<Object> readCsv (String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader)


public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString, boolean hasHeader)


public static final DataFrame<Object> readCsv (String file, String separator)


public static final DataFrame<Object> readCsv (String file, String separator, String naString, boolean hasHeader)


public static final DataFrame<Object> readCsv (InputStream input, String separator, String naString)


public static final DataFrame<Object> readSql (ResultSet rs)

Read data from the provided query results into a new data frame.

rs the query results
  • a new data frame
SQLException if an error occurs reading the results

public static final DataFrame<Object> readSql (Connection c, String sql)

Execute the SQL query and return the results as a new data frame.

 > Connection c = DriverManager.getConnection("jdbc:derby:memory:testdb;create=true");
 > c.createStatement().executeUpdate("create table data (a varchar(8), b int)");
 > c.createStatement().executeUpdate("insert into data values ('test', 1)");
 > DataFrame.readSql(c, "select * from data").flatten();
 [test, 1] 

c the database connection
sql the SQL query
  • a new data frame
SQLException if an error occurs execution the query

public static final DataFrame<Object> readXls (String file)

Read data from the specified excel workbook into a new data frame.

file the excel workbook
  • a new data frame
IOException if an error occurs reading the workbook

public static final DataFrame<Object> readXls (InputStream input)

Read data from the input stream as an excel workbook into a new data frame.

input the input stream
  • a new data frame
IOException if an error occurs reading the input stream

public DataFrame<V> reindex (Integer col, boolean drop)

Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0, true)
 >   .index();
 [alpha, bravo] 

col the column to use as the new index
drop true to remove the index column from the data, false otherwise
  • a new data frame with index specified

public DataFrame<V> reindex (Integer... cols)

Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex(0)
 >   .index();
 [alpha, bravo] 

cols the column to use as the new index
  • a new data frame with index specified

public DataFrame<V> reindex (Integer[] cols, boolean drop)

Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new Integer[] { 0, 1 , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

cols the column to use as the new index
drop true to remove the index column from the data, false otherwise
  • a new data frame with index specified

public DataFrame<V> reindex (Object... cols)

Re-index the rows of the data frame using the specified column names and removing the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo] 

cols the column to use as the new index
  • a new data frame with index specified

public DataFrame<V> reindex (Object[] cols, boolean drop)

Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two", "three");
 > df.append("a", Arrays.asList("alpha", 1, 10));
 > df.append("b", Arrays.asList("bravo", 2, 20));
 > df.reindex(new String[] { "one", "two" , true)
 >   .index();
 [[alpha, 1], [bravo, 2]] }

cols the column to use as the new index
drop true to remove the index column from the data, false otherwise
  • a new data frame with index specified

public DataFrame<V> reindex (Object col, boolean drop)

Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.reindex("one", true)
 >   .index();
 [alpha, bravo] 

col the column to use as the new index
drop true to remove the index column from the data, false otherwise
  • a new data frame with index specified

public DataFrame<V> rename (Map<Object, Object> names)

public DataFrame<V> rename (Object old, Object name)

public DataFrame<V> resetIndex ()

Return a new data frame with the default index, rows names will be reset to the string value of their integer index.

 > DataFrame<Object> df = new DataFrame<>("one", "two");
 > df.append("a", Arrays.asList("alpha", 1));
 > df.append("b", Arrays.asList("bravo", 2));
 > df.resetIndex()
 >   .index();
 [0, 1] 

  • a new data frame with the default index.

public DataFrame<V> reshape (Integer rows, Integer cols)

Reshape a data frame to the specified dimensions.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(3, 2)
 >   .length();

rows the number of rows the new data frame will contain
cols the number of columns the new data frame will contain
  • a new data frame with the specified dimensions

public DataFrame<V> reshape (Collection<?> rows, Collection<?> cols)

Reshape a data frame to the specified indices.

 > DataFrame<Object> df = new DataFrame<>("0", "1", "2");
 > df.append("0", Arrays.asList(10, 20, 30));
 > df.append("1", Arrays.asList(40, 50, 60));
 > df.reshape(Arrays.asList("0", "1", "2"), Arrays.asList("0", "1"))
 >   .length();

rows the names of rows the new data frame will contain
cols the names of columns the new data frame will contain
  • a new data frame with the specified indices

public DataFrame<V> retain (Integer... cols)

Create a new data frame containing only the specified columns.

 DataFrame<Object> df = new DataFrame<>("name", "value", "category");
  df.retain(0, 2).columns();
 [name, category] 

cols the columns to include in the new data frame
  • a new data frame containing only the specified columns

public DataFrame<V> retain (Object... cols)

Create a new data frame containing only the specified columns.

 > DataFrame<Object> df = new DataFrame<>("name", "value", "category");
 > df.retain("name", "category").columns();
 [name, category] 

cols the columns to include in the new data frame
  • a new data frame containing only the specified columns

public DataFrame<V> rollapply (Function<List<V>, V> function)

public DataFrame<V> rollapply (Function<List<V>, V> function, int period)

public List<V> row (Object row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Arrays.asList("row1", "row2", "row3"),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row("row2");
 [bravo, 2] 

row the row name
  • the list of values

public List<V> row (Integer row)

Return a data frame row as a list.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Collections.emptyList(),
 >         Arrays.asList(
 >             Arrays.<Object>asList("alpha", "bravo", "charlie"),
 >             Arrays.<Object>asList(1, 2, 3)
 >         )
 >     );
 > df.row(1);
 [bravo, 2] 

row the row index
  • the list of values

public DataFrame<V> select (Predicate<V> predicate)

Select a subset of the data frame using a predicate function.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > for (int i = 0; i < 10; i++)
 >     df.append(Arrays.asList("name" + i, i));
 > Predicate<Object>() {
 >         @Override
 >         public Boolean apply(List<Object> values) {
 >             return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
 >     })
 >   .col(1);
 [0, 2, 4, 6, 8] } 

predicate a function returning true for rows to be included in the subset
  • a subset of the data frame

public void set (Object row, Object col, V value)

Set the value located by the names (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set("row1", "col2", new Integer(7));
 > df.col(1);
 [7, null] 

row the row name
col the column name
value the new value

public void set (Integer row, Integer col, V value)

Set the value located by the coordinates (row, column).

 > DataFrame<Object> df = new DataFrame<>(
 >        Arrays.asList("row1", "row2"),
 >        Arrays.asList("col1", "col2")
 >     );
 > df.set(1, 0, new Integer(7));
 > df.col(0);
 [null, 7] 

row the row index
col the column index
value the new value

public final void show ()

public int size ()

Return the size (number of columns) of the data frame.

 > DataFrame<Object> df = new DataFrame<>("name", "value");
 > df.size();

  • the number of columns

public DataFrame<V> skew ()

public DataFrame<V> slice (Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd)

public DataFrame<V> slice (Integer rowStart, Integer rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd)

public DataFrame<V> slice (Object rowStart, Object rowEnd, Object colStart, Object colEnd)

public DataFrame<V> sortBy (Comparator<List<V>> comparator)

public DataFrame<V> sortBy (Integer... cols)

public DataFrame<V> sortBy (Object... cols)

public DataFrame<V> stddev ()

Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 6, 8)
 >             )
 >     );
 > df.groupBy("name")
 >   .stddev()
 >   .col("value");
 [1.0, 2.0] 

  • the new data frame

public DataFrame<V> sum ()

Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.

 > DataFrame<Object> df = new DataFrame<>(
 >         Collections.emptyList(),
 >         Arrays.asList("name", "value"),
 >         Arrays.asList(
 >                 Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
 >                 Arrays.<Object>asList(1, 2, 3, 4, 5)
 >             )
 >     );
 > df.groupBy("name")
 >   .sum()
 >   .col("value");
 [6.0, 9.0] 

  • the new data frame

public DataFrame<V> tail ()

Return a data frame containing the last ten rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail()
 >   .col("value");
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] 

  • the new data frame

public DataFrame<V> tail (int limit)

Return a data frame containing the last limit rows of this data frame.

 > DataFrame<Integer> df = new DataFrame<>("value");
 > for (int i = 0; i < 20; i++)
 >     df.append(Arrays.asList(i));
 > df.tail(3)
 >   .col("value");
 [17, 18, 19] 

limit the number of rows to include in the result
  • the new data frame

public Object[] toArray ()

Copy the values of contained in the data frame into a flat array of length #size() * #length().

  • the array

public U[][] toArray (U[][] array)

public U toArray (Class<U> cls)

Copy the values of contained in the data frame into a array of the specified type. If the type specified is a two dimensional array, for example double[][].class, a row-wise copy will be made.

  • the array
IllegalArgumentException if the values are not assignable to the specified component type

public U[] toArray (U[] array)

Copy the values of contained in the data frame into the specified array. If the length of the provided array is less than length #size() * #length() a new array will be created.

  • the array

public double[][] toModelMatrix (double fillValue)

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

fillValue value to replace NA's with
  • a model matrix

public DataFrame<Number> toModelMatrixDataFrame ()

Encodes the DataFrame as a model matrix, converting nominal values to dummy variables but does not add an intercept column. More methods with additional parameters to control the conversion to the model matrix are available in the Conversion class.

  • a model matrix

public String toString ()

public final String toString (int limit)

public DataFrame<U> transform (RowFunction<V, U> transform)

public DataFrame<V> transpose ()

Transpose the rows and columns of the data frame.

 > DataFrame<String> df = new DataFrame<>(
 >         Arrays.asList(
 >                 Arrays.asList("one", "two"),
 >                 Arrays.asList("alpha", "bravo")
 >             )
 >     );
 > df.transpose().flatten();
 [one, alpha, two, bravo] 

  • a new data frame with the rows and columns transposed

public List<Class<?>> types ()

Return the types for each of the data frame columns.

  • the list of column types

public DataFrame<V> unique (Integer... cols)

public DataFrame<V> unique ()

public DataFrame<V> unique (Object... cols)

public final DataFrame<V> update (DataFrame...<? extends V> others)

Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.

others the other data frames
  • this data frame with the overwritten values

public DataFrame<V> var ()

public final void writeCsv (String file)

Write the data from this data frame to the specified file as comma separated values.

file the file to write
IOException if an error occurs writing the file

public final void writeCsv (OutputStream output)

Write the data from this data frame to the provided output stream as comma separated values.


public final void writeSql (PreparedStatement stmt)

Write the data from the data frame to a database by executing the provided prepared SQL statement.

stmt a prepared insert statement
SQLException if an error occurs executing the statement

public final void writeSql (Connection c, String sql)

Write the data from the data frame to a database by executing the specified SQL statement.

c the database connection
sql the SQL statement
SQLException if an error occurs executing the statement

public final void writeXls (OutputStream output)

Write the data from the data frame to the provided output stream as an excel workbook.

IOException if an error occurs writing the file

public final void writeXls (String file)

Write the data from the data frame to the specified file as an excel workbook.

file the file to write
IOException if an error occurs writing the file