Java read all files in folder and write only unique rows

This program read all files in a folder then it will filter out all rows smaller then 10 symbols and it will save then in one file

Program’s steps:

  • Read all files from “/home/username/folder”. Please change this to the folder which you will have your files
  • filter(Files::isRegularFile) will leave only the files (without the folders)
  • map(WordListMerger::readFile) will execute readFile method which will take the rows > 6 symbols and add them in sets (so the rows will be unique)
  • collect(Collectors.toList()) is collection to list of sets. Then we need to go over that list and add all sets to one file set. This way the rows which are duplicated between files will be filtered out
  • saveFile(finalResult) method will save the final set. To every row it will be added System.lineSeparator() so at the end in the new files the rows re preserved. Also the new file path is “/home/username/OutputFile.txt”. Please change this to your desire final destination

Test files which should be in one folder. This files contains names which can be duplicated:

Output file result from this two input files. Contain only names which are at least 6 symbols and they are unique:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.stream.Collectors;
import java.util.stream.Stream;

public class WordListMerger {
    public static void main(String[] args) throws IOException {

        System.out.println("Program start");
        Set<String> finalResult = new HashSet<>();
        // get all the paths from as specific folder
        Path mainFolderPath = Paths.get("/home/username/folder");
        try (Stream<Path> paths = Files.walk(mainFolderPath)) {
            // filter to get only the files,
            // then execute readFile method
            // and collect the result
            List<Set<String>> result = paths
                    .filter(Files::isRegularFile)
                    .map(WordListMerger::readFile)
                    .collect(Collectors.toList());
            System.out.println("All files are read");
            // Flat the List of Sets to one Set and remove
            // all not unique occurrences.
            for (Set<String> oneFileSet : result) {
                finalResult.addAll(oneFileSet);
            }
            System.out.println("Final set is created");
        }

        saveFile(finalResult);
        System.out.println("New Files is saved. Program exit");
    }

    public static void saveFile(Set<String> set) {
        System.out.println("Final Set for saving row size: " + set.size());
        StringBuffer sb = new StringBuffer();
        for (String s : set) {
            // add to the end of the string new line
            sb.append(s + System.lineSeparator());
        }
        final Path path = Paths.get("/home/username/OutputFile.txt");

        try (final BufferedWriter writer = Files.newBufferedWriter(path,
                StandardCharsets.UTF_8, StandardOpenOption.CREATE)) {
            writer.write(sb.toString());
            writer.flush();
        } catch (IOException e) {
            throw new RuntimeException(e);
        }
    }

    public static Set<String> readFile(Path path) {
        Set<String> set;
        System.out.println("Current file for reading: " + path);
        try (BufferedReader reader = Files.newBufferedReader(path)) {
            // filter out only rows that have 6 or more letters
            // Collect them in set so there is only unique rows
            set = reader.lines()
                    .filter(l -> l.length() >= 6)
                    .collect(Collectors.toSet());
        } catch (IOException ex) {
            throw new RuntimeException(ex);
        }
        System.out.println("Set row size: "+ set.size());
        return set;
    }
}

Output:

Program start
Current file for reading: /home/username/folder/text.txt
Set row size: 12
Current file for reading: /home/username/folder/text_2.txt
Set row size: 10
All files are read
Final set is created
Final Set for saving row size: 17
New Files is saved. Program exit

Leave a Comment

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.