Text Analytics Assistant (Filtering, Sort, Remove Duplicate) / Tool

Tool

0 lines
0 lines
Filtering
Only lines that
the keywords.
Remove lines
Sort order
Ascending/Descending = UTF-16 order

How to use

This is a tool that runs in the browser and manipulates text line by line to assist in analysis. For logging and data analysis. Line count, filtering, sorting, duplicate line removal, etc.

  • Converts automatically when entering text or changing settings.
  • Keywords: You can enter multiple keywords separated by a new line. In that case, filtering is done by OR.
  • Remove blank lines: Remove lines containing only whitespace and line feed codes.
  • Remove duplicate lines: Remove duplicate lines except for the first occurrence.

Filtering by Keyword

This is useful when you want to narrow down logs and other data for analysis. For example, suppose you have a log text like this, and you want to see only process1.

INFO  2019-01-31 15:00:00.000 1234/process1 message
ERROR 2019-01-31 15:00:00.000 1234/process1 message
INFO  2019-01-31 15:00:00.000 4321/process2 message
INFO  2019-01-31 15:00:00.000 1234/process1 message
INFO  2019-01-31 15:00:00.000 4321/process2 message
INFO  2019-01-31 15:00:00.000 4321/process3 message
INFO  2019-01-31 15:00:00.000 1234/process1 message

Select “contain” & enter “process1” as a keyword…

INFO  2019-01-31 15:00:00.000 1234/process1 message
ERROR 2019-01-31 15:00:00.000 1234/process1 message
INFO  2019-01-31 15:00:00.000 1234/process1 message
INFO  2019-01-31 15:00:00.000 1234/process1 message

All lines except process1 have disappeared. Yay!

By the way, filtering is realized by regular expressions. In the case of “only lines that contain the keywords”, “lines that do not contain keywords” is removed using the following regular expression.

/^(?!.*(keyword1|keyword2)).*$(\r\n|\r|\n)?/gm

Regular expressions are introduced in the following articles.
Regex: Remove lines that match the conditions

Sort order

  • Ascending/Descending: Sort according to character code (UTF-16) order.
  • Character count (Asc/Desc): Sort by number of characters (not bytes).
  • Reverse: Reverses the order of the current input fields.

The character code order in ascending order is “symbols (half), numbers (half), alphabet (half), Characters (full), symbols (full), numbers (full), alphabet (full)”.

+ (half)
- (half)
1 (half)
2 (half)
A (half)
B (half)
あ (full)
い (full)
ア (full)
イ (full)
亜 (full)
腕 (full)
+ (full)
- (full)
1 (full)
2 (full)
A (full)
B (full)