This program read all files in a folder then it will filter out all rows smaller then 10 symbols and it will save then in one file
Program’s steps:
- Read all files from “/home/username/folder”. Please change this to the folder which you will have your files
- filter(Files::isRegularFile) will leave only the files (without the folders)
- map(WordListMerger::readFile) will execute readFile method which will take the rows > 6 symbols and add them in sets (so the rows will be unique)
- collect(Collectors.toList()) is collection to list of sets. Then we need to go over that list and add all sets to one file set. This way the rows which are duplicated between files will be filtered out
- saveFile(finalResult) method will save the final set. To every row it will be added System.lineSeparator() so at the end in the new files the rows re preserved. Also the new file path is “/home/username/OutputFile.txt”. Please change this to your desire final destination
Test files which should be in one folder. This files contains names which can be duplicated:
Output file result from this two input files. Contain only names which are at least 6 symbols and they are unique:
import java.io.BufferedReader; import java.io.BufferedWriter; import java.io.IOException; import java.nio.charset.StandardCharsets; import java.nio.file.Files; import java.nio.file.Path; import java.nio.file.Paths; import java.nio.file.StandardOpenOption; import java.util.HashSet; import java.util.List; import java.util.Set; import java.util.stream.Collectors; import java.util.stream.Stream; public class WordListMerger { public static void main(String[] args) throws IOException { System.out.println("Program start"); Set<String> finalResult = new HashSet<>(); // get all the paths from as specific folder Path mainFolderPath = Paths.get("/home/username/folder"); try (Stream<Path> paths = Files.walk(mainFolderPath)) { // filter to get only the files, // then execute readFile method // and collect the result List<Set<String>> result = paths .filter(Files::isRegularFile) .map(WordListMerger::readFile) .collect(Collectors.toList()); System.out.println("All files are read"); // Flat the List of Sets to one Set and remove // all not unique occurrences. for (Set<String> oneFileSet : result) { finalResult.addAll(oneFileSet); } System.out.println("Final set is created"); } saveFile(finalResult); System.out.println("New Files is saved. Program exit"); } public static void saveFile(Set<String> set) { System.out.println("Final Set for saving row size: " + set.size()); StringBuffer sb = new StringBuffer(); for (String s : set) { // add to the end of the string new line sb.append(s + System.lineSeparator()); } final Path path = Paths.get("/home/username/OutputFile.txt"); try (final BufferedWriter writer = Files.newBufferedWriter(path, StandardCharsets.UTF_8, StandardOpenOption.CREATE)) { writer.write(sb.toString()); writer.flush(); } catch (IOException e) { throw new RuntimeException(e); } } public static Set<String> readFile(Path path) { Set<String> set; System.out.println("Current file for reading: " + path); try (BufferedReader reader = Files.newBufferedReader(path)) { // filter out only rows that have 6 or more letters // Collect them in set so there is only unique rows set = reader.lines() .filter(l -> l.length() >= 6) .collect(Collectors.toSet()); } catch (IOException ex) { throw new RuntimeException(ex); } System.out.println("Set row size: "+ set.size()); return set; } }
Output:
Program start
Current file for reading: /home/username/folder/text.txt
Set row size: 12
Current file for reading: /home/username/folder/text_2.txt
Set row size: 10
All files are read
Final set is created
Final Set for saving row size: 17
New Files is saved. Program exit