Text Analytics Assistant (Filtering, Sort, Remove Duplicate) / Tool
Tool
How to use
This is a tool that runs in the browser and manipulates text line by line to assist in analysis. For logging and data analysis. Line count, filtering, sorting, duplicate line removal, etc.
- Converts automatically when entering text or changing settings.
- Keywords: You can enter multiple keywords separated by a new line. In that case, filtering is done by OR.
- Remove blank lines: Remove lines containing only whitespace and line feed codes.
- Remove duplicate lines: Remove duplicate lines except for the first occurrence.
Filtering by Keyword
This is useful when you want to narrow down logs and other data for analysis. For example, suppose you have a log text like this, and you want to see only process1.
INFO 2019-01-31 15:00:00.000 1234/process1 message
ERROR 2019-01-31 15:00:00.000 1234/process1 message
INFO 2019-01-31 15:00:00.000 4321/process2 message
INFO 2019-01-31 15:00:00.000 1234/process1 message
INFO 2019-01-31 15:00:00.000 4321/process2 message
INFO 2019-01-31 15:00:00.000 4321/process3 message
INFO 2019-01-31 15:00:00.000 1234/process1 message
Select “contain” & enter “process1” as a keyword…
INFO 2019-01-31 15:00:00.000 1234/process1 message
ERROR 2019-01-31 15:00:00.000 1234/process1 message
INFO 2019-01-31 15:00:00.000 1234/process1 message
INFO 2019-01-31 15:00:00.000 1234/process1 message
All lines except process1 have disappeared. Yay!
By the way, filtering is realized by regular expressions. In the case of “only lines that contain the keywords”, “lines that do not contain keywords” is removed using the following regular expression.
/^(?!.*(keyword1|keyword2)).*$(\r\n|\r|\n)?/gm
Regular expressions are introduced in the following articles.
Regex: Remove lines that match the conditions
Sort order
- Ascending/Descending: Sort according to character code (UTF-16) order.
- Character count (Asc/Desc): Sort by number of characters (not bytes).
- Reverse: Reverses the order of the current input fields.
The character code order in ascending order is “symbols (half), numbers (half), alphabet (half), Characters (full), symbols (full), numbers (full), alphabet (full)”.
+ (half)
- (half)
1 (half)
2 (half)
A (half)
B (half)
あ (full)
い (full)
ア (full)
イ (full)
亜 (full)
腕 (full)
+ (full)
- (full)
1 (full)
2 (full)
A (full)
B (full)