2015年5月10日 星期日

[記錄] Objective-c 的正規表示式問題排除.



經過網友的提醒發現,正規表示式裡頭存在著重複的規則,所以就把它移掉來測試一下。
因為以實況來說,我已經用 c 重新改寫了本來那個 method 的運作流程,也沒好掉這麼多地時間;而且一併拿來進行測試, test unit & ouput report 我就懶得打了 ^^"

讓數據證明一切,用看的吧 ~

test unit :
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExObjC1
{
NSPredicate * predicate;
predicate = [NSPredicate predicateWithFormat: @"SELF MATCHES %@", @"([^*|:\"<>?]|[ ]|\\w)+@[1-9][0-9]*[xX]$"];
[predicate evaluateWithObject: @"1234567890123456789012345"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExObjC2
{
NSPredicate * predicate;
predicate = [NSPredicate predicateWithFormat: @"SELF MATCHES %@", @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
[predicate evaluateWithObject: @"1234567890123456789012345"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExObjC3
{
NSPredicate * predicate;
predicate = [NSPredicate predicateWithFormat: @"SELF MATCHES %@", @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
[predicate evaluateWithObject: @"123456789012345678901234567890"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExObjC4
{
NSPredicate * predicate;
predicate = [NSPredicate predicateWithFormat: @"SELF MATCHES %@", @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
[predicate evaluateWithObject: @"123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExC1
{
NSString * parseString;
parseString = @"1234567890123456789012345";
[parseString compareByRegularExpression: @"([^*|:\"<>?]|[ ]|\\w)+@[1-9][0-9]*[xX]$"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExC2
{
NSString * parseString;
parseString = @"1234567890123456789012345";
[parseString compareByRegularExpression: @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExC3
{
NSString * parseString;
parseString = @"123456789012345678901234567890";
[parseString compareByRegularExpression: @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
}
// ------------------------------------------------------------------------------------------------
- ( void ) testRegExC4
{
NSString * parseString;
parseString = @"123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890";
[parseString compareByRegularExpression: @"([^*|:\"<>?])+@[1-9][0-9]*[xX]$"];
}
// ------------------------------------------------------------------------------------------------
view raw gistfile1.m hosted with ❤ by GitHub

output report :
Test Case '-[TechDNSStringTest testRegExC1]' started.
Test Case '-[TechDNSStringTest testRegExC1]' passed (0.001 seconds).
Test Case '-[TechDNSStringTest testRegExC2]' started.
Test Case '-[TechDNSStringTest testRegExC2]' passed (0.000 seconds).
Test Case '-[TechDNSStringTest testRegExC3]' started.
Test Case '-[TechDNSStringTest testRegExC3]' passed (0.000 seconds).
Test Case '-[TechDNSStringTest testRegExC4]' started.
Test Case '-[TechDNSStringTest testRegExC4]' passed (0.000 seconds).
Test Case '-[TechDNSStringTest testRegExObjC1]' started.
Test Case '-[TechDNSStringTest testRegExObjC1]' passed (17.567 seconds).
Test Case '-[TechDNSStringTest testRegExObjC2]' started.
Test Case '-[TechDNSStringTest testRegExObjC2]' passed (0.001 seconds).
Test Case '-[TechDNSStringTest testRegExObjC3]' started.
Test Case '-[TechDNSStringTest testRegExObjC3]' passed (0.001 seconds).
Test Case '-[TechDNSStringTest testRegExObjC4]' started.
Test Case '-[TechDNSStringTest testRegExObjC4]' passed (0.001 seconds).
view raw gistfile1.m hosted with ❤ by GitHub

可以比較出來,如果正規表示式條件沒有調整到很正確的狀況時,Objective-C 的處理時間會比 C 處理的時間還要多出許多。
( 因為 C 的語法也經過一定程度判斷簡化了 )


用 c 的語法改寫正規表示式檢查方式的 code :
// ------------------------------------------------------------------------------------------------
- ( BOOL ) compareByRegularExpression:(NSString *)regularExpression
{
// NSParameterAssert( regularExpression );
//
// NSPredicate * predicate;
//
// predicate = [NSPredicate predicateWithFormat: @"SELF MATCHES %@", regularExpression];
// NSParameterAssert( predicate );
// return [predicate evaluateWithObject: self];
NSParameterAssert( regularExpression );
regex_t regular;
int result;
regmatch_t matches[1];
char errorMsg[BUFSIZ];
memset( &matches, 0, sizeof(matches) );
memset( &errorMsg, 0, sizeof( errorMsg ) );
result = regcomp( &regular, [regularExpression cStringUsingEncoding: NSASCIIStringEncoding], REG_EXTENDED );
if ( 0 != result )
{
regerror( result, &regular, errorMsg, sizeof( errorMsg ) );
regfree( &regular );
return NO;
}
result = regexec( &regular, [self cStringUsingEncoding: NSASCIIStringEncoding], 1, matches, 0 );
if ( REG_NOMATCH == result )
{
regerror( result, &regular, errorMsg, sizeof( errorMsg ) );
regfree( &regular );
return NO;
}
// must all character equal for regular expression.
if ( 0 != matches[0].rm_so ) // check start character.
{
regfree( &regular );
return NO;
}
if ( ( [self length] ) != matches[0].rm_eo ) // check match length equal or not.
{
regfree( &regular );
return NO;
}
regfree( &regular );
return YES;
}
// ------------------------------------------------------------------------------------------------
view raw gistfile1.m hosted with ❤ by GitHub


※ 其中的

    regmatch_t                      matches[1];
    // ...
    result                          = regexec( &regular, [self cStringUsingEncoding: NSASCIIStringEncoding], 1, matches, 0 );

因為這個這個函式的功能只判斷,字串是否符合該正規表示式,所以整體運算流程只需要執行個一次就好了,也不需要讓 regexec 這個函式反覆進行判斷


PS: 當我在把函式中的 character set 調整成 UTF8 之後, c 的處理時間有些微的上漲了 0.001 秒 XD



沒有留言:

張貼留言