java.lang.Object | |
↳ | joinery.DataFrame<V> |
A data frame implementation in the spirit of Pandas or R data frames.
Below is a simple motivating example. When working in Java, data operations like the following should be easy. The code below retrieves the S&P 500 daily market data for 2008 from Yahoo! Finance and returns the average monthly close for the three top months of the year.
> DataFrame.readCsv(String.format(
> "%s?s=%s&a=%d&b=%d&c=%d&d=%d&e=%d&f=%d",
> "http://real-chart.finance.yahoo.com/table.csv",
> "^GSPC", // symbol for S&P 500
> 0, 2, 2008, // start date
> 11, 31, 2008 // end date
> ))
> .retain("Date", "Close")
> .groupBy(new KeyFunction<Object>() {
> public Object apply(List<Object> row) {
> return Date.class.cast(row.get(0)).getMonth();
>
> })
> .mean()
> .sortBy("Close")
> .tail(3)
> .apply(new Function
Taking each step in turn:
readCsv(String)
reads csv data from files and urlsretain(Object)
is used to
eliminate columns that are not neededgroupBy(KeyFunction)
with a key function
is used to group the rows by monthmean()
calculates the average close for each monthsortBy(Object)
orders the rows according
to average closing pricetail(int)
returns the last three rows
(alternatively, sort in descending order and use head)apply(Function)
is used to convert the
closing prices to integers (this is purely to ease
comparisons for verifying the resultscol(Object)
is used to
extract the values as a listFind more details on the github project page.
Nested Classes | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
DataFrame.Aggregate<I, O> | A function that converts lists of data frame
values to aggregate results. |
||||||||||
DataFrame.Axis | An enumeration of data frame axes. | ||||||||||
DataFrame.Function<I, O> | A function that is applied to objects (rows or values)
in a data frame . |
||||||||||
DataFrame.JoinType | An enumeration of join types for joining data frames together. | ||||||||||
DataFrame.KeyFunction<I> | A function that converts data frame
rows to index or group keys. |
||||||||||
DataFrame.NumberDefault | |||||||||||
DataFrame.PlotType | An enumeration of plot types for displaying data frames with charts. | ||||||||||
DataFrame.Predicate<I> | An interface used to filter a data frame . |
||||||||||
DataFrame.RowFunction<I, O> | |||||||||||
DataFrame.SortDirection |
Public Constructors | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Construct an empty data frame.
| |||||||||||
Construct an empty data frame with the specified columns.
| |||||||||||
Construct an empty data frame with the specified columns.
| |||||||||||
Construct a data frame containing the specified rows and columns.
| |||||||||||
Construct a data frame from the specified list of columns.
| |||||||||||
Construct a new data frame using the specified data and indices.
|
Public Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Add a new column to the data frame containing the value provided.
| |||||||||||
Add new columns to the data frame.
| |||||||||||
Apply an aggregate function to each group or the entire
data frame if the data is not grouped.
| |||||||||||
Append rows indexed by the the specified name to the data frame.
| |||||||||||
Append rows to the data frame.
| |||||||||||
Apply a function to each value in the data frame.
| |||||||||||
Cast this data frame to the specified type.
| |||||||||||
Update the data frame in place by overwriting any null values with
any non-null values provided by the data frame arguments.
| |||||||||||
Return a data frame column as a list.
| |||||||||||
Return a data frame column as a list.
| |||||||||||
Return the column names for the data frame.
| |||||||||||
Convert columns based on the requested types.
| |||||||||||
Attempt to infer better types for object columns.
| |||||||||||
Draw the numeric columns of this data frame as a chart
in the specified Container using the specified type.
| |||||||||||
Draw the numeric columns of this data frame
as a chart in the specified Container.
| |||||||||||
Create a new data frame by leaving out the specified columns.
| |||||||||||
Create a new data frame by leaving out the specified columns.
| |||||||||||
Return a map of group names to data frame for grouped
data frames.
| |||||||||||
Returns a view of the of data frame with NA's replaced with
fill . | |||||||||||
Return the values of the data frame as a flat list.
| |||||||||||
Return the value located by the (row, column) coordinates.
| |||||||||||
Return the value located by the (row, column) names.
| |||||||||||
Group the data frame rows by the specified column names.
| |||||||||||
Group the data frame rows by the specified columns.
| |||||||||||
Group the data frame rows using the specified key function.
| |||||||||||
Return a data frame containing the first
limit rows of this data frame. | |||||||||||
Return a data frame containing the first ten rows of this data frame.
| |||||||||||
Return the index names for the data frame.
| |||||||||||
Return
true if the data frame contains no data. | |||||||||||
Create a new data frame containing boolean values such that
null object references in the original data frame
yield true and valid references yield false . | |||||||||||
Return an iterator over the rows of the data frame.
| |||||||||||
Return a new data frame created by performing a left outer join
of this data frame with the argument and using the row indices
as the join key.
| |||||||||||
Return a new data frame created by performing a left outer join of this
data frame with the argument using the specified key function.
| |||||||||||
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
using the row indices as the join key.
| |||||||||||
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the specified key function.
| |||||||||||
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the column values as the join key.
| |||||||||||
Return a new data frame created by performing a left outer join of
this data frame with the argument using the column values as the join key.
| |||||||||||
Return a new data frame created by performing a left outer join of
this data frame with the argument using the column values as the join key.
| |||||||||||
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the column values as the join key.
| |||||||||||
Return the length (number of rows) of the data frame.
| |||||||||||
Entry point to joinery as a command line tool.
| |||||||||||
Return a map of index names to rows.
| |||||||||||
Compute the mean of the numeric columns for each group
or the entire data frame if the data is not grouped.
| |||||||||||
Return a new data frame created by performing a join of this
data frame with the argument using the specified join type and
the common, non-numeric columns from each data frame as the join key.
| |||||||||||
Return a new data frame created by performing a left outer join of this
data frame with the argument using the common, non-numeric columns
from each data frame as the join key.
| |||||||||||
Return a data frame containing only columns with non-numeric data.
| |||||||||||
Create a new data frame containing boolean values such that
valid object references in the original data frame yield
true
and null references yield false . | |||||||||||
Return a data frame containing only columns with numeric data.
| |||||||||||
Compute the percentile of the numeric columns for each group
or the entire data frame if the data is not grouped.
| |||||||||||
Display the numeric columns of this data frame
as a chart in a new swing frame using the specified type.
| |||||||||||
Display the numeric columns of this data frame
as a line chart in a new swing frame.
| |||||||||||
Compute the product of the numeric columns for each group
or the entire data frame if the data is not grouped.
| |||||||||||
Re-index the rows of the data frame using the specified column index,
optionally dropping the column from the data.
| |||||||||||
Re-index the rows of the data frame using the specified column indices
and dropping the columns from the data.
| |||||||||||
Re-index the rows of the data frame using the specified column indices,
optionally dropping the columns from the data.
| |||||||||||
Re-index the rows of the data frame using the specified column names
and removing the columns from the data.
| |||||||||||
Re-index the rows of the data frame using the specified column names,
optionally dropping the columns from the data.
| |||||||||||
Re-index the rows of the data frame using the specified column name,
optionally dropping the row from the data.
| |||||||||||
Return a new data frame with the default index, rows names will
be reset to the string value of their integer index.
| |||||||||||
Reshape a data frame to the specified dimensions.
| |||||||||||
Reshape a data frame to the specified indices.
| |||||||||||
Create a new data frame containing only the specified columns.
| |||||||||||
Create a new data frame containing only the specified columns.
| |||||||||||
Return a data frame row as a list.
| |||||||||||
Return a data frame row as a list.
| |||||||||||
Select a subset of the data frame using a predicate function.
| |||||||||||
Set the value located by the names (row, column).
| |||||||||||
Set the value located by the coordinates (row, column).
| |||||||||||
Return the size (number of columns) of the data frame.
| |||||||||||
Compute the standard deviation of the numeric columns for each group
or the entire data frame if the data is not grouped.
| |||||||||||
Compute the sum of the numeric columns for each group
or the entire data frame if the data is not grouped.
| |||||||||||
Return a data frame containing the last ten rows of this data frame.
| |||||||||||
Return a data frame containing the last
limit rows of this data frame. | |||||||||||
Copy the values of contained in the data frame into a
flat array of length
#size() * #length() . | |||||||||||
Copy the values of contained in the data frame into a
array of the specified type.
| |||||||||||
Copy the values of contained in the data frame into the
specified array.
| |||||||||||
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
| |||||||||||
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
| |||||||||||
Transpose the rows and columns of the data frame.
| |||||||||||
Return the types for each of the data frame columns.
| |||||||||||
Update the data frame in place by overwriting the any values
with the non-null values provided by the data frame arguments.
| |||||||||||
[Expand]
Inherited Methods | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
![]() | |||||||||||
![]() |
Construct an empty data frame.
> DataFrame<Object> df = new DataFrame<>();
> df.isEmpty();
true
Construct an empty data frame with the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.columns();
[name, value]
columns | the data frame column names. |
---|
Construct an empty data frame with the specified columns.
> List<String> columns = new ArrayList<>();
> columns.add("name");
> columns.add("value");
> DataFrame<Object> df = new DataFrame<>(columns);
> df.columns();
[name, value]
columns | the data frame column names. |
---|
Construct a data frame containing the specified rows and columns.
> List<String> rows = Arrays.asList("row1", "row2", "row3");
> List<String> columns = Arrays.asList("col1", "col2");
> DataFrame<Object> df = new DataFrame<>(rows, columns);
> df.get("row1", "col1");
null
index | the row names |
---|---|
columns | the column names |
Construct a data frame from the specified list of columns.
> List<List<Object>> data = Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> );
> DataFrame<Object> df = new DataFrame<>(data);
> df.row(0);
[alpha, 1]
data | a list of columns containing the data elements. |
---|
Construct a new data frame using the specified data and indices.
index | the row names |
---|---|
columns | the column names |
data | the data |
Add a new column to the data frame containing the value provided.
Any existing rows with indices greater than the size of the
specified column data will have null
values for the new column.
> DataFrame<Object> df = new DataFrame<>();
> df.add("value", Arrays.<Object>asList(1));
> df.columns();
[value]
column | the new column names |
---|---|
values | the new column values |
Add new columns to the data frame.
Any existing rows will have null
values for the new columns.
> DataFrame<Object> df = new DataFrame<>();
> df.add("value");
> df.columns();
[value]
columns | the new column names |
---|
Apply an aggregate function to each group or the entire data frame if the data is not grouped.
function | the aggregate function |
---|
Append rows indexed by the the specified name to the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append("row1", Arrays.asList("alpha", 1));
> df.append("row2", Arrays.asList("bravo", 2));
> df.index();
[row1, row2]
name | the row name to add to the index |
---|---|
row | the row to append |
Append rows to the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("alpha", 1));
> df.append(Arrays.asList("bravo", 2));
> df.length();
2
row | the row to append |
---|
Apply a function to each value in the data frame.
> DataFrame<Number> df = new DataFrame<>(
> Arrays.<List<Number>>asList(
> Arrays.<Number>asList(1, 2),
> Arrays.<Number>asList(3, 4)
> )
> );
> df = df.apply(new Function<Number, Number>() {
> public Number apply(Number value) {
> return value.intValue() * value.intValue();
>
> });
> df.flatten();
[1, 4, 9, 16] }
function | the function to apply |
---|
Cast this data frame to the specified type.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", "1"));
> DataFrame<String> dfs = df.cast(String.class);
> dfs.get(0, 0).getClass().getName();
java.lang.String
Update the data frame in place by overwriting any null values with any non-null values provided by the data frame arguments.
others | the other data frames |
---|
Return a data frame column as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.col("value");
[1, 2, 3]
column | the column name |
---|
Return a data frame column as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.col(1);
[1, 2, 3]
column | the column index |
---|
Return the column names for the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.columns();
[name, value]
Convert columns based on the requested types.
Note, the conversion process replaces existing values with values of the converted type.
> DataFrame<Object> df = new DataFrame<>("a", "b", "c");
> df.append(Arrays.asList("one", 1, 1.0));
> df.append(Arrays.asList("two", 2, 2.0));
> df.convert(
> null, // leave column "a" as is
> Long.class, // convert column "b" to Long
> Number.class // convert column "c" to Double
> );
> df.types();
[class java.lang.String, class java.lang.Long, class java.lang.Double]
Attempt to infer better types for object columns.
The following conversions are performed where applicable:
Double
valuesLong
valuesBoolean
valuesDate
values:Date
values:Note, the conversion process replaces existing values with values of the converted type.
> DataFrame<Object> df = new DataFrame<>("name", "value", "date");
> df.append(Arrays.asList("one", "1", new Date()));
> df.convert();
> df.types();
[class java.lang.String, class java.lang.Long, class java.util.Date]
Draw the numeric columns of this data frame as a chart in the specified Container using the specified type.
container | the container to use for the chart |
---|---|
type | the type of plot to draw |
Draw the numeric columns of this data frame as a chart in the specified Container.
container | the container to use for the chart |
---|
Create a new data frame by leaving out the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.drop(2).columns();
[name, value]
cols | the indices of the columns to be removed |
---|
Create a new data frame by leaving out the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.drop("category").columns();
[name, value]
cols | the names of columns to be removed |
---|
Return a map of group names to data frame for grouped
data frames. Observe that for this method to have any
effect a groupBy
call must have been done before.
Returns a view of the of data frame with NA's replaced with fill
.
fill | the value used to replace missing values |
---|
Return the values of the data frame as a flat list.
> DataFrame<String> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList("one", "two"),
> Arrays.asList("alpha", "bravo")
> )
> );
> df.flatten();
[one, two, alpha, bravo]
Return the value located by the (row, column) coordinates.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.get(1, 0);
bravo
row | the row index |
---|---|
col | the column index |
Return the value located by the (row, column) names.
> DataFrame<Object> df = new DataFrame<Object>(
> Arrays.asList("row1", "row2", "row3"),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.get("row2", "name");
bravo
row | the row name |
---|---|
col | the column name |
Group the data frame rows by the specified column names.
cols | the column names |
---|
Group the data frame rows by the specified columns.
cols | the column indices |
---|
Group the data frame rows using the specified key function.
function | the function to reduce rows to grouping keys |
---|
Return a data frame containing the first limit
rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.head(3)
> .col("value");
[0, 1, 2]
limit | the number of rows to include in the result |
---|
Return a data frame containing the first ten rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.head()
> .col("value");
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Return the index names for the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append("row1", Arrays.asList("one", 1));
> df.index();
[row1]
Return true
if the data frame contains no data.
> DataFrame<Object> df = new DataFrame<>();
> df.isEmpty();
true
Create a new data frame containing boolean values such that
null
object references in the original data frame
yield true
and valid references yield false
.
> DataFrame<Object> df = new DataFrame<Object>(
> Arrays.asList(
> Arrays.asList("alpha", "bravo", null),
> Arrays.asList(null, 2, 3)
> )
> );
> df.isnull().row(0);
[false, true]
Return an iterator over the rows of the data frame. Also used
implicitly with foreach
loops.
> DataFrame<Integer> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList(1, 2),
> Arrays.asList(3, 4)
> )
> );
> List<Integer> results = new ArrayList<>();
> for (List<Integer> row : df)
> results.add(row.get(0));
> results;
[1, 2]
Return a new data frame created by performing a left outer join of this data frame with the argument and using the row indices as the join key.
> DataFrame<Object> left = new DataFrame<>("a", "b");
> left.append("one", Arrays.asList(1, 2));
> left.append("two", Arrays.asList(3, 4));
> left.append("three", Arrays.asList(5, 6));
> DataFrame<Object> right = new DataFrame<>("c", "d");
> right.append("one", Arrays.asList(10, 20));
> right.append("two", Arrays.asList(30, 40));
> right.append("four", Arrays.asList(50, 60));
> left.join(right)
> .index();
[one, two, three]
other | the other data frame |
---|
Return a new data frame created by performing a left outer join of this data frame with the argument using the specified key function.
other | the other data frame |
---|---|
on | the function to generate the join keys |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and using the row indices as the join key.
other | the other data frame |
---|---|
join | the join type |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the specified key function.
other | the other data frame |
---|---|
join | the join type |
on | the function to generate the join keys |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
other | the other data frame |
---|---|
join | the join type |
cols | the indices of the columns to use as the join key |
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
other | the other data frame |
---|---|
cols | the indices of the columns to use as the join key |
Return a new data frame created by performing a left outer join of this data frame with the argument using the column values as the join key.
other | the other data frame |
---|---|
cols | the names of the columns to use as the join key |
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the column values as the join key.
other | the other data frame |
---|---|
join | the join type |
cols | the names of the columns to use as the join key |
Return the length (number of rows) of the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("alpha", 1));
> df.append(Arrays.asList("bravo", 2));
> df.append(Arrays.asList("charlie", 3));
> df.length();
3
Entry point to joinery as a command line tool. The available commands are:
args | file paths or urls of csv input data |
---|
IOException | if an error occurs reading input |
---|
Return a map of index names to rows.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("alpha", Arrays.asList(1));
> df.append("bravo", Arrays.asList(2));
> df.map();
{alpha=[1], bravo=[2]
}
Compute the mean of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("one", Arrays.asList(1));
> df.append("two", Arrays.asList(5));
> df.append("three", Arrays.asList(3));
> df.append("four", Arrays.asList(7));
> df.mean().col(0);
[4.0]
Return a new data frame created by performing a join of this data frame with the argument using the specified join type and the common, non-numeric columns from each data frame as the join key.
other | the other data frame |
---|
Return a new data frame created by performing a left outer join of this data frame with the argument using the common, non-numeric columns from each data frame as the join key.
other | the other data frame |
---|
Return a data frame containing only columns with non-numeric data.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", 1));
> df.append(Arrays.asList("two", 2));
> df.nonnumeric().columns();
[name]
Create a new data frame containing boolean values such that
valid object references in the original data frame yield true
and null
references yield false
.
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", null),
> Arrays.<Object>asList(null, 2, 3)
> )
> );
> df.notnull().row(0);
[true, false]
Return a data frame containing only columns with numeric data.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.append(Arrays.asList("one", 1));
> df.append(Arrays.asList("two", 2));
> df.numeric().columns();
[value]
Compute the percentile of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Integer> df = new DataFrame<>("value");
> df.append("one", Arrays.asList(1));
> df.append("two", Arrays.asList(5));
> df.append("three", Arrays.asList(3));
> df.append("four", Arrays.asList(7));
> df.mean().col(0);
[4.0]
Display the numeric columns of this data frame as a chart in a new swing frame using the specified type.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.plot(PlotType.AREA);
type | the type of plot to display |
---|
Display the numeric columns of this data frame as a line chart in a new swing frame.
> DataFrame<Object> df = new DataFrame<Object>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.asList("alpha", "bravo", "charlie"),
> Arrays.asList(10, 20, 30)
> )
> );
> df.plot();
Compute the product of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 5)
> )
> );
> df.groupBy("name")
> .prod()
> .col("value");
[6.0, 20.0]
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|
Re-index the rows of the data frame using the specified column index, optionally dropping the column from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex(0, true)
> .index();
[alpha, bravo]
col | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column indices and dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex(0)
> .index();
[alpha, bravo]
cols | the column to use as the new index |
---|
Re-index the rows of the data frame using the specified column indices, optionally dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two", "three");
> df.append("a", Arrays.asList("alpha", 1, 10));
> df.append("b", Arrays.asList("bravo", 2, 20));
> df.reindex(new Integer[] { 0, 1
, true)
> .index();
[[alpha, 1], [bravo, 2]] }
cols | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column names and removing the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex("one", true)
> .index();
[alpha, bravo]
cols | the column to use as the new index |
---|
Re-index the rows of the data frame using the specified column names, optionally dropping the columns from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two", "three");
> df.append("a", Arrays.asList("alpha", 1, 10));
> df.append("b", Arrays.asList("bravo", 2, 20));
> df.reindex(new String[] { "one", "two"
, true)
> .index();
[[alpha, 1], [bravo, 2]] }
cols | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Re-index the rows of the data frame using the specified column name, optionally dropping the row from the data.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.reindex("one", true)
> .index();
[alpha, bravo]
col | the column to use as the new index |
---|---|
drop | true to remove the index column from the data, false otherwise |
Return a new data frame with the default index, rows names will be reset to the string value of their integer index.
> DataFrame<Object> df = new DataFrame<>("one", "two");
> df.append("a", Arrays.asList("alpha", 1));
> df.append("b", Arrays.asList("bravo", 2));
> df.resetIndex()
> .index();
[0, 1]
Reshape a data frame to the specified dimensions.
> DataFrame<Object> df = new DataFrame<>("0", "1", "2");
> df.append("0", Arrays.asList(10, 20, 30));
> df.append("1", Arrays.asList(40, 50, 60));
> df.reshape(3, 2)
> .length();
3
rows | the number of rows the new data frame will contain |
---|---|
cols | the number of columns the new data frame will contain |
Reshape a data frame to the specified indices.
> DataFrame<Object> df = new DataFrame<>("0", "1", "2");
> df.append("0", Arrays.asList(10, 20, 30));
> df.append("1", Arrays.asList(40, 50, 60));
> df.reshape(Arrays.asList("0", "1", "2"), Arrays.asList("0", "1"))
> .length();
3
rows | the names of rows the new data frame will contain |
---|---|
cols | the names of columns the new data frame will contain |
Create a new data frame containing only the specified columns.
DataFrame<Object> df = new DataFrame<>("name", "value", "category");
df.retain(0, 2).columns();
[name, category]
cols | the columns to include in the new data frame |
---|
Create a new data frame containing only the specified columns.
> DataFrame<Object> df = new DataFrame<>("name", "value", "category");
> df.retain("name", "category").columns();
[name, category]
cols | the columns to include in the new data frame |
---|
Return a data frame row as a list.
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2", "row3"),
> Collections.emptyList(),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.row("row2");
[bravo, 2]
row | the row name |
---|
Return a data frame row as a list.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Collections.emptyList(),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "bravo", "charlie"),
> Arrays.<Object>asList(1, 2, 3)
> )
> );
> df.row(1);
[bravo, 2]
row | the row index |
---|
Select a subset of the data frame using a predicate function.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> for (int i = 0; i < 10; i++)
> df.append(Arrays.asList("name" + i, i));
> df.select(new Predicate<Object>() {
> @Override
> public Boolean apply(List<Object> values) {
> return Integer.class.cast(values.get(1)).intValue() % 2 == 0;
>
> })
> .col(1);
[0, 2, 4, 6, 8] }
predicate | a function returning true for rows to be included in the subset |
---|
Set the value located by the names (row, column).
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2"),
> Arrays.asList("col1", "col2")
> );
> df.set("row1", "col2", new Integer(7));
> df.col(1);
[7, null]
row | the row name |
---|---|
col | the column name |
value | the new value |
Set the value located by the coordinates (row, column).
> DataFrame<Object> df = new DataFrame<>(
> Arrays.asList("row1", "row2"),
> Arrays.asList("col1", "col2")
> );
> df.set(1, 0, new Integer(7));
> df.col(0);
[null, 7]
row | the row index |
---|---|
col | the column index |
value | the new value |
Return the size (number of columns) of the data frame.
> DataFrame<Object> df = new DataFrame<>("name", "value");
> df.size();
2
Compute the standard deviation of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 6, 8)
> )
> );
> df.groupBy("name")
> .stddev()
> .col("value");
[1.0, 2.0]
Compute the sum of the numeric columns for each group or the entire data frame if the data is not grouped.
> DataFrame<Object> df = new DataFrame<>(
> Collections.emptyList(),
> Arrays.asList("name", "value"),
> Arrays.asList(
> Arrays.<Object>asList("alpha", "alpha", "alpha", "bravo", "bravo"),
> Arrays.<Object>asList(1, 2, 3, 4, 5)
> )
> );
> df.groupBy("name")
> .sum()
> .col("value");
[6.0, 9.0]
Return a data frame containing the last ten rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.tail()
> .col("value");
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
Return a data frame containing the last limit
rows of this data frame.
> DataFrame<Integer> df = new DataFrame<>("value");
> for (int i = 0; i < 20; i++)
> df.append(Arrays.asList(i));
> df.tail(3)
> .col("value");
[17, 18, 19]
limit | the number of rows to include in the result |
---|
Copy the values of contained in the data frame into a
flat array of length #size()
* #length()
.
Copy the values of contained in the data frame into a
array of the specified type. If the type specified is
a two dimensional array, for example double[][].class
,
a row-wise copy will be made.
IllegalArgumentException | if the values are not assignable to the specified component type |
---|
Copy the values of contained in the data frame into the
specified array. If the length of the provided array is
less than length #size()
* #length()
a
new array will be created.
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
More methods with additional parameters to control the conversion to
the model matrix are available in the Conversion
class.
fillValue | value to replace NA's with |
---|
Encodes the DataFrame as a model matrix, converting nominal values
to dummy variables but does not add an intercept column.
More methods with additional parameters to control the conversion to
the model matrix are available in the Conversion
class.
Transpose the rows and columns of the data frame.
> DataFrame<String> df = new DataFrame<>(
> Arrays.asList(
> Arrays.asList("one", "two"),
> Arrays.asList("alpha", "bravo")
> )
> );
> df.transpose().flatten();
[one, alpha, two, bravo]
Return the types for each of the data frame columns.
Update the data frame in place by overwriting the any values with the non-null values provided by the data frame arguments.
others | the other data frames |
---|
IOException |
---|
IOException |
---|
IOException |
---|
IOException |
---|