java.lang.Object | |
↳ | joinery.DataFrame<V> |
A data frame implementation in the spirit of Pandas or R data frames.
Below is a simple motivating example. When working in Java, data operations like the following should be easy. The code below retrieves the S&P 500 daily market data for 2008 from Yahoo! Finance and returns the average monthly close for the three top months of the year.
> DataFrame.readCsv(ClassLoader.getSystemResourceAsStream("gspc.csv"))
> .retain("Date", "Close")
> .groupBy(row -> Date.class.cast(row.get(0)).getMonth())
> .mean()
> .sortBy("Close")
> .tail(3)
> .apply(value -> Number.class.cast(value).intValue())
> .col("Close");
[1370, 1378, 1403]
Taking each step in turn:
readCsv(String)
reads csv data from files and urlsretain(Object)
is used to
eliminate columns that are not neededgroupBy(KeyFunction)
with a key function
is used to group the rows by monthmean()
calculates the average close for each monthsortBy(Object)
orders the rows according
to average closing pricetail(int)
returns the last three rows
(alternatively, sort in descending order and use head)apply(Function)
is used to convert the
closing prices to integers (this is purely to ease
comparisons for verifying the resultscol(Object)
is used to
extract the values as a listFind more details on the github project page.
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
interface | DataFrame.Aggregate<I, O> | A function that converts lists of data frame
values to aggregate results. |
|||||||||
enum | DataFrame.Axis | An enumeration of data frame axes. | |||||||||
interface | DataFrame.Function<I, O> | A function that is applied to objects (rows or values)
in a data frame . |
|||||||||
enum | DataFrame.JoinType | An enumeration of join types for joining data frames together. | |||||||||
interface | DataFrame.KeyFunction<I> | A function that converts data frame
rows to index or group keys. |
|||||||||
enum | DataFrame.NumberDefault | ||||||||||
enum | DataFrame.PlotType | An enumeration of plot types for displaying data frames with charts. | |||||||||
interface | DataFrame.Predicate<I> | An interface used to filter a data frame . |
|||||||||
interface | DataFrame.RowFunction<I, O> | ||||||||||
enum | DataFrame.SortDirection |
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DataFrame()
Construct an empty data frame.
| |||||||||||
DataFrame(String... columns)
Construct an empty data frame with the specified columns.
| |||||||||||
DataFrame(Collection<?> columns)
Construct an empty data frame with the specified columns.
| |||||||||||
DataFrame(Collection<?> index, Collection<?> columns)
Construct a data frame containing the specified rows and columns.
| |||||||||||
DataFrame(List<? extends List<? extends V>> data)
Construct a data frame from the specified list of columns.
| |||||||||||
DataFrame(Collection<?> index, Collection<?> columns, List<? extends List<? extends V>> data)
Construct a new data frame using the specified data and indices.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DataFrame<V> |
add(Object column, List<V> values)
Add a new column to the data frame containing the value provided.
| ||||||||||
DataFrame<V> |
add(Object column, Function<List<V>, V> function)
Add the results of applying a row-wise
function to the data frame as a new column.
| ||||||||||
DataFrame<V> |
add(List<V> values)
Add the list of values as a new column.
| ||||||||||
DataFrame<V> |
add(Object... columns)
Add new columns to the data frame.
| ||||||||||
<U> DataFrame<V> |
aggregate(Aggregate<V, U> function)
Apply an aggregate function to each group or the entire
data frame if the data is not grouped.
| ||||||||||
DataFrame<V> |
append(Object name, List<? extends V> row)
Append rows indexed by the the specified name to the data frame.
| ||||||||||
DataFrame<V> | append(Object name, V[] row) | ||||||||||
DataFrame<V> |
append(List<? extends V> row)
Append rows to the data frame.
| ||||||||||
<U> DataFrame<U> |
apply(Function<V, U> function)
Apply a function to each value in the data frame.
| ||||||||||
<T> DataFrame<T> |
cast(Class<T> cls)
Cast this data frame to the specified type.
| ||||||||||
final DataFrame<V> |
coalesce(DataFrame...<? extends V> others)
Update the data frame in place by overwriting any null values with
any non-null values provided by the data frame arguments.
| ||||||||||
List<V> |
col(Object column)
Return a data frame column as a list.
| ||||||||||
List<V> |
col(Integer column)
Return a data frame column as a list.
| ||||||||||
DataFrame<V> | collapse() | ||||||||||
Set<Object> |
columns()
Return the column names for the data frame.
| ||||||||||
final static <V> DataFrame<String> | compare(DataFrame<V> df1, DataFrame<V> df2) | ||||||||||
final DataFrame<V> |
concat(DataFrame...<? extends V> others)
Concatenate the specified data frames with this data frame
and return the result.
| ||||||||||
final DataFrame<V> |
convert(Class...<? extends V> columnTypes)
Convert columns based on the requested types.
| ||||||||||
DataFrame<V> | convert(DataFrame.NumberDefault numDefault, String naString) | ||||||||||
DataFrame<V> |
convert()
Attempt to infer better types for object columns.
| ||||||||||
DataFrame<V> | count() | ||||||||||
DataFrame<Number> | cov() | ||||||||||
DataFrame<V> | cummax() | ||||||||||
DataFrame<V> | cummin() | ||||||||||
DataFrame<V> | cumprod() | ||||||||||
DataFrame<V> | cumsum() | ||||||||||
DataFrame<V> | describe() | ||||||||||
DataFrame<V> | diff(int period) | ||||||||||
DataFrame<V> | diff() | ||||||||||
final void |
draw(Container container, DataFrame.PlotType type)
Draw the numeric columns of this data frame as a chart
in the specified Container using the specified type.
| ||||||||||
final void |
draw(Container container)
Draw the numeric columns of this data frame
as a chart in the specified Container.
| ||||||||||
DataFrame<V> |
drop(Integer... cols)
Create a new data frame by leaving out the specified columns.
| ||||||||||
DataFrame<V> |
drop(Object... cols)
Create a new data frame by leaving out the specified columns.
| ||||||||||
DataFrame<V> | dropna() | ||||||||||
DataFrame<V> | dropna(DataFrame.Axis direction) | ||||||||||
Map<Object, DataFrame<V>> |
explode()
Return a map of group names to data frame for grouped
data frames.
| ||||||||||
DataFrame<V> |
fillna(V fill)
Returns a view of the of data frame with NA's replaced with
fill . | ||||||||||
List<V> |
flatten()
Return the values of the data frame as a flat list.
| ||||||||||
V |
get(Integer row, Integer col)
Return the value located by the (row, column) coordinates.
| ||||||||||
V |
get(Object row, Object col)
Return the value located by the (row, column) names.
| ||||||||||
DataFrame<V> |
groupBy(Object... cols)
Group the data frame rows by the specified column names.
| ||||||||||
DataFrame<V> |
groupBy(Integer... cols)
Group the data frame rows by the specified columns.
| ||||||||||
DataFrame<V> |
groupBy(KeyFunction<V> function)
Group the data frame rows using the specified key function.
| ||||||||||
Grouping | groups() | ||||||||||
DataFrame<V> |
head(int limit)
Return a data frame containing the first
limit rows of this data frame. | ||||||||||
DataFrame<V> |
head()
Return a data frame containing the first ten rows of this data frame.
| ||||||||||
Set<Object> |
index()
Return the index names for the data frame.
| ||||||||||
boolean |
isEmpty()
Return
true if the data frame contains no data. | ||||||||||
DataFrame<Boolean> |
isnull()
Create a new data frame containing boolean values such that
null object references in the original data frame
yield true and valid references yield false . | ||||||||||
ListIterator<List<V>> |
iterator()
Return an iterator over the rows of the data frame.
| ||||||||||
ListIterator<List<V>> | itercols() | ||||||||||
ListIterator<Map<Object, V>> | itermap() | ||||||||||
ListIterator<List<V>> | iterrows() | ||||||||||
ListIterator<V> | itervalues() | ||||||||||
final DataFrame<V> |
join(DataFrame<V> other)
Return a new data frame created by performing a left outer join
of this data frame with the argument and using the row indices
as the join key.
| ||||||||||
final DataFrame<V> |
join(DataFrame<V> other, KeyFunction<V> on)
Return a new data frame created by performing a left outer join of this
data frame with the argument using the specified key function.
| ||||||||||
final DataFrame<V> |
join(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
using the row indices as the join key.
| ||||||||||
final DataFrame<V> |
join(DataFrame<V> other, DataFrame.JoinType join, KeyFunction<V> on)
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the specified key function.
| ||||||||||
final DataFrame<V> |
joinOn(DataFrame<V> other, DataFrame.JoinType join, Integer... cols)
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the column values as the join key.
| ||||||||||
final DataFrame<V> |
joinOn(DataFrame<V> other, Integer... cols)
Return a new data frame created by performing a left outer join of
this data frame with the argument using the column values as the join key.
| ||||||||||
final DataFrame<V> |
joinOn(DataFrame<V> other, Object... cols)
Return a new data frame created by performing a left outer join of
this data frame with the argument using the column values as the join key.
| ||||||||||
final DataFrame<V> |
joinOn(DataFrame<V> other, DataFrame.JoinType join, Object... cols)
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the column values as the join key.
| ||||||||||
DataFrame<V> | kurt() | ||||||||||
int |
length()
Return the length (number of rows) of the data frame.
| ||||||||||
final static void |
main(String[] args)
Entry point to joinery as a command line tool.
| ||||||||||
Map<V, List<V>> | map(Object key, Object value) | ||||||||||
Map<Object, List<V>> |
map()
Return a map of index names to rows.
| ||||||||||
Map<V, List<V>> | map(Integer key, Integer value) | ||||||||||
DataFrame<V> | max() | ||||||||||
DataFrame<V> |
mean()
Compute the mean of the numeric columns for each group
or the entire data frame if the data is not grouped.
| ||||||||||
DataFrame<V> | median() | ||||||||||
final DataFrame<V> |
merge(DataFrame<V> other, DataFrame.JoinType join)
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the common, non-numeric columns from each data frame as the join key.
| ||||||||||
final DataFrame<V> |
merge(DataFrame<V> other)
Return a new data frame created by performing a left outer join of this
data frame with the argument using the common, non-numeric columns
from each data frame as the join key.
| ||||||||||
DataFrame<V> | min() | ||||||||||
DataFrame<V> | mode() | ||||||||||
DataFrame<V> |
nonnumeric()
Return a data frame containing only columns with non-numeric data.
| ||||||||||
DataFrame<Boolean> |
notnull()
Create a new data frame containing boolean values such that
valid object references in the original data frame yield
true
and null references yield false . | ||||||||||
DataFrame<Number> |
numeric()
Return a data frame containing only columns with numeric data.
| ||||||||||
DataFrame<V> | percentChange() | ||||||||||
DataFrame<V> | percentChange(int period) | ||||||||||
DataFrame<V> |
percentile(double quantile)
Compute the percentile of the numeric columns for each group
or the entire data frame if the data is not grouped.
| ||||||||||
DataFrame<V> | pivot(Object row, Object col, Object... values) | ||||||||||
DataFrame<V> | pivot(Integer row, Integer col, Integer... values) | ||||||||||
DataFrame<V> | pivot(Integer[] rows, Integer[] cols, Integer[] values) | ||||||||||
<U> DataFrame<U> | pivot(KeyFunction<V> rows, KeyFunction<V> cols, Map<Integer, Aggregate<V, U>> values) | ||||||||||
DataFrame<V> | pivot(List<Object> rows, List<Object> cols, List<Object> values) | ||||||||||
final void |
plot(DataFrame.PlotType type)
Display the numeric columns of this data frame
as a chart in a new swing frame using the specified type.
| ||||||||||
final void |
plot()
Display the numeric columns of this data frame
as a line chart in a new swing frame.
| ||||||||||
DataFrame<V> |
prod()
Compute the product of the numeric columns for each group
or the entire data frame if the data is not grouped.
| ||||||||||
final static DataFrame<Object> |
readCsv(String file)
Read the specified csv file and
return the data as a data frame.
| ||||||||||
final static DataFrame<Object> | readCsv(InputStream input, String separator, DataFrame.NumberDefault longDefault) | ||||||||||
final static DataFrame<Object> | readCsv(InputStream input, String separator) | ||||||||||
final static DataFrame<Object> | readCsv(String file, String separator, DataFrame.NumberDefault longDefault, String naString) | ||||||||||
final static DataFrame<Object> |
readCsv(InputStream input)
Read csv records from an input stream
and return the data as a data frame.
| ||||||||||
final static DataFrame<Object> | readCsv(String file, String separator, DataFrame.NumberDefault longDefault) | ||||||||||
final static DataFrame<Object> | readCsv(String file, String separator, DataFrame.NumberDefault numberDefault, String naString, boolean hasHeader) | ||||||||||
final static DataFrame<Object> | readCsv(InputStream input, String separator, String naString, boolean hasHeader) | ||||||||||
final static DataFrame<Object> | readCsv(String file, String separator) | ||||||||||
final static DataFrame<Object> | readCsv(String file, String separator, String naString, boolean hasHeader) | ||||||||||
final static DataFrame<Object> | readCsv(InputStream input, String separator, String naString) | ||||||||||
final static DataFrame<Object> |
readSql(ResultSet rs)
Read data from the provided query results into a new data frame.
| ||||||||||
final static DataFrame<Object> |
readSql(Connection c, String sql)
Execute the SQL query and return the results as a new data frame.
| ||||||||||
final static DataFrame<Object> |
readXls(String file)
Read data from the specified excel
workbook into a new data frame.
| ||||||||||
final static DataFrame<Object> |
readXls(InputStream input)
Read data from the input stream as an
excel workbook into a new data frame.
| ||||||||||
DataFrame<V> |
reindex(Integer col, boolean drop)
Re-index the rows of the data frame using the specified column index,
optionally dropping the column from the data.
| ||||||||||
DataFrame<V> |
reindex(Integer... cols)
Re-index the rows of the data frame using the specified column indices
and dropping the columns from the data.
| ||||||||||
DataFrame<V> |
reindex(Integer[] cols, boolean drop)
Re-index the rows of the data frame using the specified column indices,
optionally dropping the columns from the data.
| ||||||||||
DataFrame<V> |
reindex(Object... cols)
Re-index the rows of the data frame using the specified column names
and removing the columns from the data.
| ||||||||||
DataFrame<V> |
reindex(Object[] cols, boolean drop)
Re-index the rows of the data frame using the specified column names,
optionally dropping the columns from the data.
| ||||||||||
DataFrame<V> |
reindex(Object col, boolean drop)
Re-index the rows of the data frame using the specified column name,
optionally dropping the row from the data.
| ||||||||||
DataFrame<V> | rename(Map<Object, Object> names) | ||||||||||
DataFrame<V> | rename(Object old, Object name) | ||||||||||
DataFrame<V> |
resetIndex()
Return a new data frame with the default index, rows names will
be reset to the string value of their integer index.
| ||||||||||
DataFrame<V> |
reshape(Integer rows, Integer cols)
Reshape a data frame to the specified dimensions.
| ||||||||||
DataFrame<V> |
reshape(Collection<?> rows, Collection<?> cols)
Reshape a data frame to the specified indices.
| ||||||||||
DataFrame<V> |
retain(Integer... cols)
Create a new data frame containing only the specified columns.
| ||||||||||
DataFrame<V> |
retain(Object... cols)
Create a new data frame containing only the specified columns.
| ||||||||||
DataFrame<V> | rollapply(Function<List<V>, V> function) | ||||||||||
DataFrame<V> | rollapply(Function<List<V>, V> function, int period) | ||||||||||
List<V> |
row(Object row)
Return a data frame row as a list.
| ||||||||||
List<V> |
row(Integer row)
Return a data frame row as a list.
| ||||||||||
DataFrame<V> |
select(Predicate<V> predicate)
Select a subset of the data frame using a predicate function.
| ||||||||||
void |
set(Object row, Object col, V value)
Set the value located by the names (row, column).
| ||||||||||
void |
set(Integer row, Integer col, V value)
Set the value located by the coordinates (row, column).
| ||||||||||
final void | show() | ||||||||||
int |
size()
Return the size (number of columns) of the data frame.
| ||||||||||
DataFrame<V> | skew() | ||||||||||
DataFrame<V> | slice(Integer rowStart, Integer rowEnd, Integer colStart, Integer colEnd) | ||||||||||
DataFrame<V> | slice(Integer rowStart, Integer rowEnd) | ||||||||||
DataFrame<V> | slice(Object rowStart, Object rowEnd) | ||||||||||
DataFrame<V> | slice(Object rowStart, Object rowEnd, Object colStart, Object colEnd) | ||||||||||
DataFrame<V> | sortBy(Comparator<List<V>> comparator) | ||||||||||
DataFrame<V> | sortBy(Integer... cols) | ||||||||||
DataFrame<V> | sortBy(Object... cols) | ||||||||||
DataFrame<V> |
stddev()
Compute the standard deviation of the numeric columns for each group
or the entire data frame if the data is not grouped.
| ||||||||||
DataFrame<V> |
sum()
Compute the sum of the numeric columns for each group
or the entire data frame if the data is not grouped.
| ||||||||||
DataFrame<V> |
tail()
Return a data frame containing the last ten rows of this data frame.
| ||||||||||
DataFrame<V> |
tail(int limit)
Return a data frame containing the last
limit rows of this data frame. | ||||||||||
Object[] |
toArray()
Copy the values of contained in the data frame into a
flat array of length
#size() * #length() . | ||||||||||
<U> U[][] | toArray(U[][] array) | ||||||||||
<U> U |
toArray(Class<U> cls)
Copy the values of contained in the data frame into a
array of the specified type.
| ||||||||||
<U> U[] |
toArray(U[] array)
Copy the values of contained in the data frame into the
specified array.
| ||||||||||
double[][] |
toModelMatrix(double fillValue)
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
| ||||||||||
DataFrame<Number> |
toModelMatrixDataFrame()
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
| ||||||||||
String | toString() | ||||||||||
final String | toString(int limit) | ||||||||||
<U> DataFrame<U> | transform(RowFunction<V, U> transform) | ||||||||||
DataFrame<V> |
transpose()
Transpose the rows and columns of the data frame.
| ||||||||||
List<Class<?>> |
types()
Return the types for each of the data frame columns.
| ||||||||||
DataFrame<V> | unique(Integer... cols) | ||||||||||
DataFrame<V> | unique() | ||||||||||
DataFrame<V> | unique(Object... cols) | ||||||||||
final DataFrame<V> |
update(DataFrame...<? extends V> others)
Update the data frame in place by overwriting the any values
with the non-null values provided by the data frame arguments.
| ||||||||||
DataFrame<V> | var() | ||||||||||
final void |
writeCsv(String file)
Write the data from this data frame to
the specified file as comma separated values.
| ||||||||||
final void |
writeCsv(OutputStream output)
Write the data from this data frame to
the provided output stream as comma separated values.
| ||||||||||
final void |
writeSql(PreparedStatement stmt)
Write the data from the data frame to a database by
executing the provided prepared SQL statement.
| ||||||||||
final void |
writeSql(Connection c, String sql)
Write the data from the data frame to a database by
executing the specified SQL statement.
| ||||||||||
final void |
writeXls(OutputStream output)
Write the data from the data frame
to the provided output stream as an excel workbook.
| ||||||||||
final void |
writeXls(String file)
Write the data from the data frame
to the specified file as an excel workbook.
|
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
From class
java.lang.Object
| |||||||||||
From interface
java.lang.Iterable
|
Construct an empty data frame.
> DataFrame<Object> df = new DataFrame<>();
> df.isEmpty();
true
Construct an empty data frame with the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.columns();
[name, value]
columns | the data frame column names. |
---|
Construct an empty data frame with the specified columns.
> List<String> columns = new ArrayList<>();
> columns.add("name");
> columns.add("value");
> DataFrame<Object> df = new DataFrame<>(columns);
> df.columns();
[name, value]
columns | the data frame column names. |
---|
Construct a data frame containing the specified rows and columns.
> List<String> rows = Arrays.asList("row1", "row2", "row3");
> List<String> columns = Arrays.asList("col1", "col2");
> DataFrame<Object> df = new DataFrame<>(rows, columns);
> df.get("row1", "col1");
null
index | the row names |
---|---|
columns | the column names |
Construct a data frame from the specified list of columns.
> List<List<Object>> data = Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> );
> DataFrame<Object> df = new DataFrame<>(data);
> df.row(0);
[alpha, 1]
data | a list of columns containing the data elements. |
---|
Construct a new data frame using the specified data and indices.
index | the row names |
---|---|
columns | the column names |
data | the data |
Add a new column to the data frame containing the value provided.
Any existing rows with indices greater than the size of the
specified column data will have null
values for the new column.
> DataFrame<Object> df = new DataFrame<>();
> df.add("value", Arrays.<Object>asList(1));
> df.columns();
[value]
column | the new column names |
---|---|
values | the new column values |
Add the results of applying a row-wise function to the data frame as a new column.
column | the new column name |
---|---|
function | the function to compute the new column values |
Add the list of values as a new column.
values | the new column values |
---|
Add new columns to the data frame.
Any existing rows will have null
values for the new columns.
> DataFrame<Object> df = new DataFrame<>();
> df.add("value");
> df.columns();
[value]
columns | the new column names |
---|
Apply an aggregate function to each group or the entire data frame if the data is not grouped.
function | the aggregate function |
---|
Append rows indexed by the the specified name to the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append("row1", Arrays.asList("alpha", 1));
> df.append("row2", Arrays.asList("bravo", 2));
> df.index();
[row1, row2]
name | the row name to add to the index |
---|---|
row | the row to append |
Append rows to the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("alpha", 1));
> df.append(Arrays.asList("bravo", 2));
> df.length();
2
row | the row to append |
---|
Apply a function to each value in the data frame.
> DataFrame<Number> df = new DataFrame<>(
> Arrays.<List<Number>>asList(
> Arrays.<Number>asList(1, 2),
> Arrays.<Number>asList(3, 4)
> )
> );
> df = df.apply(new Function<Number, Number>() {
> public Number apply(Number value) {
> return value.intValue() * value.intValue();
>
> });
> df.flatten();
[1, 4, 9, 16] }
function | the function to apply |
---|
Cast this data frame to the specified type.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", "1"));
> DataFrame<String> dfs = df.cast(String.class);
> dfs.get(0, 0).getClass().getName();
java.lang.String
Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.
others | the other data frames |
---|
Return a data frame column as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.col("value");
[1, 2, 3]
column | the column name |
---|
Return a data frame column as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.col(1);
[1, 2, 3]
column | the column index |
---|
Return the column names for the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.columns();
[name, value]
Concatenate the specified data frames with this data frame and return the result.
> DataFrame<Object> left = new DataFrame<>("a", "b", "c");
> left.append("one", Arrays.asList(1, 2, 3));
> left.append("two", Arrays.asList(4, 5, 6));
> left.append("three", Arrays.asList(7, 8, 9));
> DataFrame<Object> right = new DataFrame<>("a", "b", "d");
> right.append("one", Arrays.asList(10, 20, 30));
> right.append("two", Arrays.asList(40, 50, 60));
> right.append("four", Arrays.asList(70, 80, 90));
> left.concat(right).length();
6
others | the other data frames |
---|
Convert columns based on the requested types.
Note, the conversion process replaces existing values with values of the converted type.
> DataFrame<Object> df = new DataFrame<>("a", "b", "c");
> df.append(Arrays.asList("one", 1, 1.0));
> df.append(Arrays.asList("two", 2, 2.0));
> df.convert(
> null, // leave column "a" as is
> Long.class, // convert column "b" to Long
> Number.class // convert column "c" to Double
> );
> df.types();
[class java.lang.String, class java.lang.Long, class java.lang.Double]
Attempt to infer better types for object columns.
The following conversions are performed where applicable:
Double
valuesLong
valuesBoolean
valuesDate
values:Date
values:Note, the conversion process replaces existing values with values of the converted type.
> DataFrame<Object> df = new DataFrame<>("name", "value", "date");
> df.append(Arrays.asList("one", "1", new Date()));
> df.convert();
> df.types();
[class java.lang.String, class java.lang.Long, class java.util.Date]
Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.
container | the container to use for the chart |
---|---|
type | the type of plot to draw |
Draw the numeric columns of this data frame as a chart in the specified Container.
container | the container to use for the chart |
---|
Create a new data frame by leaving out the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.drop(2).columns();
[name, value]
cols | the indices of the columns to be removed |
---|
Create a new data frame by leaving out the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.drop("category").columns();
[name, value]
cols | the names of columns to be removed |
---|
Return a map of group names to data frame for grouped
data frames. Observe that for this method to have any
effect a groupBy
call must have been done before.
Returns a view of the of data frame with NA's replaced with fill
.
fill | the value used to replace missing values |
---|
Return the values of the data frame as a flat list.
> DataFrame<String> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList("one", "two"),
> Arrays.asList("alpha", "bravo")
> )
> );
> df.flatten();
[one, two, alpha, bravo]
Return the value located by the (row, column) coordinates.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.get(1, 0);
bravo
row | the row index |
---|---|
col | the column index |
Return the value located by the (row, column) names.
> DataFrame<Object> df = new DataFrame<Object>(
> Arrays.asList("row1", "row2", "row3"),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.get("row2", "name");
bravo
row | the row name |
---|---|
col | the column name |
Group the data frame rows by the specified column names.
cols | the column names |
---|
Group the data frame rows by the specified columns.
cols | the column indices |
---|
Group the data frame rows using the specified key function.
function | the function to reduce rows to grouping keys |
---|
Return a data frame containing the first limit
rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.head(3)
> .col("value");
[0, 1, 2]
limit | the number of rows to include in the result |
---|
Return a data frame containing the first ten rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.head()
> .col("value");
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Return the index names for the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append("row1", Arrays.asList("one", 1));
> df.index();
[row1]
Return true
if the data frame contains no data.
> DataFrame<Object> df = new DataFrame<>();
> df.isEmpty();
true
Create a new data frame containing boolean values such that
null
object references in the original data frame
yield true
and valid references yield false
.
> DataFrame<Object> df = new DataFrame<Object>(
> Arrays.asList(
> Arrays.asList("alpha", "bravo", null),
> Arrays.asList(null, 2, 3)
> )
> );
> df.isnull().row(0);
[false, true]
Return an iterator over the rows of the data frame. Also used
implicitly with foreach
loops.
> DataFrame<Integer> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList(1, 2),
> Arrays.asList(3, 4)
> )
> );
> List<Integer> results = new ArrayList<>();
> for (List<Integer> row : df)
> results.add(row.get(0));
> results;
[1, 2]
Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.
> DataFrame<Object> left = new DataFrame<>("a", "b");
> left.append("one", Arrays.asList(1, 2));
> left.append("two", Arrays.asList(3, 4));
> left.append("three", Arrays.asList(5, 6));
> DataFrame<Object> right = new DataFrame<>("c", "d");
> right.append("one", Arrays.asList(10, 20));
> right.append("two", Arrays.asList(30, 40));
> right.append("four", Arrays.asList(50, 60));
> left.join(right)
> .index();
[one, two, three]
other | the other data frame |
---|
Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.
other | the other data frame |
---|---|
on | the function to generate the join keys |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.
other | the other data frame |
---|---|
join | the join type |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.
other | the other data frame |
---|---|
join | the join type |
on | the function to generate the join keys |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
other | the other data frame |
---|---|
join | the join type |
cols | the indices of the columns to use as the join key |
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
other | the other data frame |
---|---|
cols | the indices of the columns to use as the join key |
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
other | the other data frame |
---|---|
cols | the names of the columns to use as the join key |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
other | the other data frame |
---|---|
join | the join type |
cols | the names of the columns to use as the join key |
Return the length (number of rows) of the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("alpha", 1));
> df.append(Arrays.asList("bravo", 2));
> df.append(Arrays.asList("charlie", 3));
> df.length();
3
Entry point to joinery as a command line tool. The available commands are:
args | file paths or urls of csv input data |
---|
IOException | if an error occurs reading input |
---|
Return a map of index names to rows.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("alpha", Arrays.asList(1));
> df.append("bravo", Arrays.asList(2));
> df.map();
{alpha=[1], bravo=[2]
}
Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("one", Arrays.asList(1));
> df.append("two", Arrays.asList(5));
> df.append("three", Arrays.asList(3));
> df.append("four", Arrays.asList(7));
> df.mean().col(0);
[4.0]
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.
other | the other data frame |
---|
Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.
other | the other data frame |
---|
Return a data frame containing only columns with non-numeric data.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", 1));
> df.append(Arrays.asList("two", 2));
> df.nonnumeric().columns();
[name]
Create a new data frame containing boolean values such that
valid object references in the original data frame yield true
and null
references yield false
.
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", null),
> Arrays.<Object>asList(null, 2, 3)
> )
> );
> df.notnull().row(0);
[true, false]
Return a data frame containing only columns with numeric data.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", 1));
> df.append(Arrays.asList("two", 2));
> df.numeric().columns();
[value]
Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("one", Arrays.asList(1));
> df.append("two", Arrays.asList(5));
> df.append("three", Arrays.asList(3));
> df.append("four", Arrays.asList(7));
> df.mean().col(0);
[4.0]
Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.plot(PlotType.AREA);
type | the type of plot to display |
---|
Display the numeric columns of this data frame as a line chart in a new swing frame.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.plot();
Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 5)
> )
> );
> df.groupBy("name")
> .prod()
> .col("value");
[6.0, 20.0]
Read the specified csv file and return the data as a data frame.
file | the csv file |
---|
IOException | if an error reading the file occurs |
---|
IOException |
---|
IOException |
---|
IOException |
---|
Read csv records from an input stream and return the data as a data frame.
input | the input stream |
---|
IOException | if an error reading the stream occurs |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
Read data from the provided query results into a new data frame.
rs | the query results |
---|
SQLException | if an error occurs reading the results |
---|
Execute the SQL query and return the results as a new data frame.
> Connection c = DriverManager.getConnection("jdbc:derby:memory:testdb;create=true");
> c.createStatement().executeUpdate("create table data (a varchar(8), b int)");
> c.createStatement().executeUpdate("insert into data values ('test', 1)");
> DataFrame.readSql(c, "select * from data").flatten();
[test, 1]
c | the database connection |
---|---|
sql | the SQL query |
SQLException | if an error occurs execution the query |
---|
Read data from the specified excel workbook into a new data frame.
file | the excel workbook |
---|
IOException | if an error occurs reading the workbook |
---|
Read data from the input stream as an excel workbook into a new data frame.
input | the input stream |
---|
IOException | if an error occurs reading the input stream |
---|
Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex(0, true)
> .index();
[alpha, bravo]
col | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex(0)
> .index();
[alpha, bravo]
cols | the column to use as the new index |
---|
Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two", "three");
> df.append("a", Arrays.asList("alpha", 1, 10));
> df.append("b", Arrays.asList("bravo", 2, 20));
> df.reindex(new Integer[] { 0, 1
, true)
> .index();
[[alpha, 1], [bravo, 2]] }
cols | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column names and removing the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex("one", true)
> .index();
[alpha, bravo]
cols | the column to use as the new index |
---|
Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two", "three");
> df.append("a", Arrays.asList("alpha", 1, 10));
> df.append("b", Arrays.asList("bravo", 2, 20));
> df.reindex(new String[] { "one", "two"
, true)
> .index();
[[alpha, 1], [bravo, 2]] }
cols | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex("one", true)
> .index();
[alpha, bravo]
col | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Return a new data frame with the default index, rows names will be reset to the string value of their integer index.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.resetIndex()
> .index();
[0, 1]
Reshape a data frame to the specified dimensions.
> DataFrame<Object> df = new DataFrame<>("0", "1", "2");
> df.append("0", Arrays.asList(10, 20, 30));
> df.append("1", Arrays.asList(40, 50, 60));
> df.reshape(3, 2)
> .length();
3
rows | the number of rows the new data frame will contain |
---|---|
cols | the number of columns the new data frame will contain |
Reshape a data frame to the specified indices.
> DataFrame<Object> df = new DataFrame<>("0", "1", "2");
> df.append("0", Arrays.asList(10, 20, 30));
> df.append("1", Arrays.asList(40, 50, 60));
> df.reshape(Arrays.asList("0", "1", "2"), Arrays.asList("0", "1"))
> .length();
3
rows | the names of rows the new data frame will contain |
---|---|
cols | the names of columns the new data frame will contain |
Create a new data frame containing only the specified columns.
DataFrame<Object> df = new DataFrame<>("name", "value", "category");
df.retain(0, 2).columns();
[name, category]
cols | the columns to include in the new data frame |
---|
Create a new data frame containing only the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.retain("name", "category").columns();
[name, category]
cols | the columns to include in the new data frame |
---|
Return a data frame row as a list.
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2", "row3"),
> Collections.emptyList(),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.row("row2");
[bravo, 2]
row | the row name |
---|
Return a data frame row as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Collections.emptyList(),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.row(1);
[bravo, 2]
row | the row index |
---|
Select a subset of the data frame using a predicate function.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> for (int i = 0; i < 10; i++)
> df.append(Arrays.asList("name" + i, i));
> df.select(new Predicate<Object>() {
> @Override
> public Boolean apply(List<Object> values) {
> return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
>
> })
> .col(1);
[0, 2, 4, 6, 8] }
predicate | a function returning true for rows to be included in the subset |
---|
Set the value located by the names (row, column).
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2"),
> Arrays.asList("col1", "col2")
> );
> df.set("row1", "col2", new Integer(7));
> df.col(1);
[7, null]
row | the row name |
---|---|
col | the column name |
value | the new value |
Set the value located by the coordinates (row, column).
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2"),
> Arrays.asList("col1", "col2")
> );
> df.set(1, 0, new Integer(7));
> df.col(0);
[null, 7]
row | the row index |
---|---|
col | the column index |
value | the new value |
Return the size (number of columns) of the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.size();
2
Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 6, 8)
> )
> );
> df.groupBy("name")
> .stddev()
> .col("value");
[1.0, 2.0]
Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 5)
> )
> );
> df.groupBy("name")
> .sum()
> .col("value");
[6.0, 9.0]
Return a data frame containing the last ten rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.tail()
> .col("value");
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Return a data frame containing the last limit
rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.tail(3)
> .col("value");
[17, 18, 19]
limit | the number of rows to include in the result |
---|
Copy the values of contained in the data frame into a
flat array of length #size()
* #length()
.
Copy the values of contained in the data frame into a
array of the specified type. If the type specified is
a two dimensional array, for example double[][].class
,
a row-wise copy will be made.
IllegalArgumentException | if the values are not assignable to the specified component type |
---|
Copy the values of contained in the data frame into the
specified array. If the length of the provided array is
less than length #size()
* #length()
a
new array will be created.
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
More methods with additional parameters to control the conversion to
the model matrix are available in the Conversion
class.
fillValue | value to replace NA's with |
---|
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
More methods with additional parameters to control the conversion to
the model matrix are available in the Conversion
class.
Transpose the rows and columns of the data frame.
> DataFrame<String> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList("one", "two"),
> Arrays.asList("alpha", "bravo")
> )
> );
> df.transpose().flatten();
[one, alpha, two, bravo]
Return the types for each of the data frame columns.
Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.
others | the other data frames |
---|
Write the data from this data frame to the specified file as comma separated values.
file | the file to write |
---|
IOException | if an error occurs writing the file |
---|
Write the data from this data frame to the provided output stream as comma separated values.
IOException |
---|
Write the data from the data frame to a database by executing the provided prepared SQL statement.
stmt | a prepared insert statement |
---|
SQLException | if an error occurs executing the statement |
---|
Write the data from the data frame to a database by executing the specified SQL statement.
c | the database connection |
---|---|
sql | the SQL statement |
SQLException | if an error occurs executing the statement |
---|
Write the data from the data frame to the provided output stream as an excel workbook.
IOException | if an error occurs writing the file |
---|
Write the data from the data frame to the specified file as an excel workbook.
file | the file to write |
---|
IOException | if an error occurs writing the file |
---|